[ICML 2025] Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures
Chemical reactions happen when atoms rearrange themselves, turning molecules into something new—like making medicines or new materials. But current AI methods still aren't good at guessing how reactions will turn out, including conditions, yields, or reaction types.
We think this is because these AI methods can't easily see how atoms move around or clearly understand molecules' 3D shapes. To fix that, we propose Reaction Graph (RG). RG connects each atom before and after a reaction, clearly showing how they rearrange. It also highlights simple triangles to show the molecules' 3D shapes, which surprisingly helps a lot.
RG can potentially help chemists conduct better research and develop better products. To this end, we provide an Online Platform with GUI for reaction analysis — feel free to give it a try!
-
Ubuntu & CUDA 11.3 & Conda (Windows / WSL require corresponding CUDA versions. Different CUDA versions may cause issues with DGL )
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.19.01_linux.run sudo sh cuda_11.3.1_465.19.01_linux.run wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh
-
Setting up Python Environment (It is strongly recommended to follow the provided bash commands)
conda create -n rg python=3.9 conda activate rg pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 pip install dgl-cu113 -f https://data.dgl.ai/wheels/repo.html pip install rdkit rxnmapper scikit-learn pycm deepmerge pip install transformers==4.34.0 pip install pandas==2.2.3 pip install numpy==1.26.3
-
(Not Required for Inference) Download the original and cooked Data from Hugging Face, and put the files under
datasetsdirectorySince the files are large, you can selectively download only the parts you need
Reaction-Graph/ ├── analysts ├── ...... ├── dataloaders ├── datasets/ │ ├── hte/ │ │ ├── buchwald_hartwig │ │ └── ...... │ ├── uspto_condition │ ├── uspto_tpl │ └── uspto_yield ├── encoders └── ...... -
Download the Model Parameters from Hugging Face, and put the files under
checkpointsdirectoryReaction-Graph/ ├── analysts ├── checkpoints/ │ └── reaction_graph/ │ ├── hte │ ├── uspto_condition │ ├── uspto_tpl │ └── uspto_yield ├── dataloaders └── ......
We provide a series of interfaces for predicting reaction conditions, yields, and reaction types
- Condition Recommendation Example
python inference.py --dataset uspto_condition --reactions "C>>C" "CO>>OC" "CCO>>COC"
- Yield Prediction Example
python inference.py --dataset uspto_yield --experiment gram --reactions "C>>C" "CO>>OC" "CCO>>COC"
- Reaction Classification Example
python inference.py --dataset uspto_yield --reactions "C>>C" "CO>>OC" "CCO>>COC"
To modify the device used during inference, please update the
deviceandgpusettings in the corresponding config file. For example, the config file for Condition Recommendation ismetadatas/reaction_graph_uspto_condition_config.py
We provide scripts to run our model on the test dataset. The testing experiments are executed concurrently in the background, and the output results will be saved in the logs/test directory
- USPTO Condition
python test.py --dataset uspto_condition
- HTE
python test.py --dataset hte
- USPTO Yield
python test.py --dataset uspto_yield
- USPTO TPL
python test.py --dataset uspto_tpl
To modify the experiments during testing, please update the
experimentssettings in the corresponding config file. For example, the config file for HTE ismetadatas/reaction_graph_hte_config.py
-
(Optional) We provide scripts for preprocessing the raw dataset into Reaction Graph data. The preprocessing is executed concurrently in the background, and the outputs will be saved in the
logs/preprocessand thedatasetsfolder. Before preprocessing, please ensure that the raw data files are placed according to the File Structure- USPTO Condition
python preprocess.py --dataset uspto_condition
- HTE
python preprocess.py --dataset hte
- USPTO Yield
python preprocess.py --dataset uspto_yield
- USPTO TPL
python preprocess.py --dataset uspto_tpl
- USPTO Condition
-
We provide scripts for training the model on the provided datasets. The training process is executed concurrently in the background, and the outputs will be saved in the
logs/trainand thecheckpointsfolder- USPTO Condition
python train.py --dataset uspto_condition
- HTE
python train.py --dataset hte
- USPTO Yield
python train.py --dataset uspto_yield
- USPTO TPL
python train.py --dataset uspto_tpl
- USPTO Condition
-
Analyze your dataset by passing all reactions into the
analyst.pyscript. This step will return the metadata of the dataset. You can modify the data features as needed and save them in the metadata folder (refer tometadatas/uspto_condition_metadata.py)python analyst.py --reactions "C>C" "CO>>CO" "CCO>>COC" -
To preprocess the data into Reaction Graph format, you can organize your dataset files according to the structure of existing datasets and perform preprocessing (refer to
preprocess.py) -
Specify the model hyperparameters and the preprocessed data location in the config file (refer to
metadatas/reaction_graph_uspto_condition_config.py), and use the corresponding model (Condition, Yield, or Type) for training (refer totrain.py).
The pipeline for customized data fitting is currently under construction. We provide several online fitting service interfaces on our Website.
-
Release python library
-
Customized data fitting
-
Code refactoring - Jun 4, 2025
-
Release model parameters - Jun 4, 2025
-
Release inference code - May 23, 2025
-
Release evaluation code - Jun 4, 2025
-
Release training code - Jun 4, 2025
If our paper has inspired your research or our code has been helpful in your work, we would greatly appreciate it if you could kindly cite our paper!
@inproceedings{jianreaction,
title={Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures},
author={Jian, Yingzhao and Zhang, Yue and Wei, Ying and Fan, Hehe and Yang, Yi},
booktitle={Forty-second International Conference on Machine Learning}
}