If you find this work useful for your research, please consider citing it. π
@misc{yue2025reqflowrectifiedquaternionflow,
title={ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation},
author={Angxiao Yue and Zichong Wang and Hongteng Xu},
year={2025},
eprint={2502.14637},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.14637},
}
2025/05/02
π₯ ReQFlow is accepted by ICML 2025οΌοΌππ2025/02/24
π₯ Our model weights are hosted on Hugging Face and Google Drive now π.2025/02/20
π₯ We release our work ReQFlow for efficient and high-quality protein backbone generation!
Our ReQFlow achieves state-of-the-art (SOTA) performance in protein backbone generation while requiring significantly fewer sampling steps and substantially reducing inference time. For example, it is 37Γ faster than RFDiffusion and 62Γ faster than Genie2 when generating a backbone of length 300, demonstrating both its effectiveness and efficiency.
We recommend using mamba.
If using mamba then use mamba
in place of conda
.
conda env create -f reqflow-env.yml
conda activate reqflow-env
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
pip install --upgrade deepspeed
# Install local package.
# Current directory should be ReQFlow/
pip install -e .
Our model weights are available for download on Hugging Face or Google Drive. You can also use your own weights. If using ours, please organize the directory as follows:
ReQFlow
βββ ckpts
β βββ qflow_pdb
β β βββ config.yaml
β β βββ qflow_pdb.ckpt
β βββ qflow_scope
β β βββ config.yaml
β β βββ qflow_scope.ckpt
β βββ reqflow_pdb_rectify
β β βββ config.yaml
β β βββ reqflow_pdb_rectify.ckpt
β βββ reqflow_scope_rectify
β βββ config.yaml
β βββ reqflow_scope_rectify.ckpt
The inference configurations are available in configs/inference_unconditional.yaml
, where you can conveniently specify the inference settings.
inference:
task: unconditional
ckpt_path: ./ckpts/reqflow_pdb_rectify/reqflow_pdb_rectify.ckpt # path to ckpts
inference_subdir: ./inference_outputs/run_${now:%Y-%m-%d}_${now:%H-%M-%S} # path to inference outputs
pmpnn_dir: ./ProteinMPNN
pt_hub_dir: ./.cache/torch/ # path to ESMFold
num_gpus: 4
samples:
min_length: 100
max_length: 300 # We recommend < 500
length_step: 50 # sampling on length (100,150,200,250,300)
samples_per_length: 50
seq_per_sample: 8 # num. of seq. generated by ProteinMPNN
interpolant:
sampling:
num_timesteps: 500
do_sde: False
rots:
sample_schedule: exp
Once you have specified the configurations, you can run inference using the following command:
python -W ignore experiments/inference_se3_flows.py -cn inference_unconditional
During inference, we evaluate results using the ProteinMPNN and ESMFold following FrameDiff. The outputs will be saved as follows,
inference_outputs
βββ expriment_name # Default is date time of inference
βββ config.yaml # Config used during inference
βββ length_100 # Sampled length
βββ sample_0 # Sample ID for length
βΒ Β βββ noise.pdb # First sample, i.e., noise
βΒ Β βββ sample.pdb # Final sample
βΒ Β βββ self_consistency # Self consistency results
βΒ Β βΒ Β βββ esmf # ESMFold predictions using ProteinMPNN sequences
βΒ Β βΒ Β βΒ Β βββ sample_0.pdb
βΒ Β βΒ Β β βββ ...
βΒ Β βΒ Β βΒ Β βββ sample_8.pdb
βΒ Β βΒ Β βββ parsed_pdbs.jsonl # Parsed chains for ProteinMPNN
βΒ Β βΒ Β βββ sample.pdb
βΒ Β βΒ Β βββ sc_results.csv # Summary metrics CSV
βΒ Β βΒ Β βββ seqs
βΒ Β βΒ Β βββ sample.fa # ProteinMPNN sequences
βΒ Β βββ x0_traj_1.pdb # x_0 model prediction trajectory
βββ sample_1 # Next sample
Based on this inference_outputs
, we can compute Designability, Diversity and Novelty. More evaluation details to reproduce the paper results are here.
We train our models on Protein Data Bank (PDB) and SCOPe dataset, seperately. For PDB dataset, we reprocessed from PDB using the steps described in the FrameDiff, and detailed procedure is also available here. We also provide a demo PDB dataset in data
folder to help you test or debug. For SCOPe, we directly downloaded using the link provided by FrameFlow. Tha dataset path is set in configs/_datasets.yaml
.
Similar to inference, you can simply control your training settings using the yaml files in configs
. Take training QFlow on PDB dataset as an example, we speicfy the configurations in configs/train_pdb_base.yaml
,
data:
dataset: pdb
rectify: False
sampler:
# Setting for 80GB GPUs
max_batch_size: 128
max_num_res_squared: 1000000
experiment:
is_training: True
debug: False
num_devices: 4
warm_start: null # keep it null on first stage
warm_start_cfg_override: True
training:
aux_loss_t_pass: 0.50
wandb:
name: reqflow_train_pdb_base
project: reqflow
checkpointer: # where to save checkpoints
dirpath: ./ckpts/${experiment.wandb.project}/${experiment.wandb.name}/${now:%Y-%m-%d}_${now:%H-%M-%S}
save_last: True
save_top_k: -1
And make sure configs in _datasets.yaml
is set following instructions here.
The according training command is
python -W ignore experiments/train_se3_flows.py -cn train_pdb_base
One of our key contributions is rectifying the SE(3) generation trajectories in Euclidean/Quaternion space to accelerate inference and enhance the designability of the generated protein backbones. We recitify the QFlow model with the generated noise-sample pairs (see noise.pdb
and sample.pdb
in inference_outputs
).
We construct the rectify dataset by converting the generated .pdb
files into a compatible format. You can follow instructions here to do it.
Once the rectify dataset is obtained, the training pipeline remains the same as QFlow. The configurations can be found in configs/train_pdb_rectify.yaml
, and make sure experiment.warm_start
is set to the ckpt you get from first stage training. The command to run it is:
python -W ignore experiments/train_se3_flows.py -cn train_pdb_rectify
The training of SCOPe dataset is the same as PDB dataset.
Thanks to FrameFlow, FrameDiff, FoldFlow for their great work and codebase, which served as the foundation for developing ReQFlow.
If you have any question, please feel free to contact us via angxiaoyue@ruc.edu.cn or zichongwang@ruc.edu.cn.