VibeGen: End-to-end de novo protein generation targeting normal mode vibrations using a language diffusion model duo
Bo Ni1,2, Markus J. Buehler1,3,4*
1 Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology
2 Department of Materials Science and Engineering, Carnegie Mellon University
3 Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology
4 Lead contact
* Correspondence: mbuehler@MIT.EDU
Rarely being static, natural proteins often rely on dynamic motions to achieve key biological functions, such as enzymatic activity, signal transduction and structural regulation. However, it remains challenging to grasp the direct link between sequences and dynamics of natural proteins or design proteins beyond nature based on their dynamical signature. Here, we report a generative duo of protein language diffusion models that generate proteins to meet the specified normal mode of vibration as design goals. Consisting of a protein designer and a predictor, our duo designs ensemble of various protein sequences based on the given normal mode and predicts their vibrations to select the accurate ones, aiming at both diversity and accuracy. Via full-atom molecular simulations for direct validation, we demonstrate the generated proteins are mostly de novo and fulfill the targeted vibrational mode across the residues of the backbone. Our models provide end-to-end connections between protein sequences and vibrational motions in both directions, offer efficient pathways to navigate the broad protein sequence space unconstrained by biological synthesis, and enable the discovery of flexible proteins with desired dynamic properties and biological functions.
Create a virtual environment
conda create --prefix=./VibeGen_env
conda activate ./VibeGen_env
Install:
pip install git+https://github.com/lamm-mit/ModeShapeDiffusionDesign.git
If you want to create an editable installation, clone the repository using git
:
git clone https://github.com/lamm-mit/ModeShapeDiffusionDesign.git
cd ModeShapeDiffusionDesign
Then, install:
pip install -r requirements.txt
pip install -e .
ModeShapeDiffusionDesign/
│
├── VibeGen/ # Source code directory
│ ├── DataSetPack.py
│ ├── ModelPack.py
│ ├── TrainerPack.py
│ ├── UtilityPack.py
│ ├── JointSamplingPack.py
│ └── ...
│
├── demo_1_Inferrence_with_trained_duo.ipynb # demo 1: make an inference
│
├── colab_demo/ # demos for colab
│ ├── Inference_demo.ipynb # demo 1: make an inference
│ └── ...
│
├── setup.py # The setup file for packaging
├── requirements.txt # List of dependencies
├── README.md # Documentation
├── assets/ # Support materials
└── ...
In the following example, for each input normal mode shape condition, we use the trained ProteinDesigner to propose 20 candidates. Then the trained ProteinPredictor will pick the best and worst two from them based on its predition. The chosen seqeucnes then will be folded using OmegaFold and the seondary strucutre of them will be analyzed.
demo_1_inference_with_trained_duo.ipynb
Alternatively, similar demo can run using Colab.
The checkpoints of the pretrained models that make up the agentic system is hosted at the repository on Huggingface.
@paper{BoBuehler2025VibeGen,
title={Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model},
author={Bo Ni and Markus J. Buehler},
year={2025},
eprint={2502.10173},
archivePrefix={arXiv},
primaryClass={q-bio.BM},
url={https://arxiv.org/abs/2502.10173},
}
Our implementation is inspired by the imagen-pytorch repository by Phil Wang.