FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

Science: Built-in safeguards might stop AI from designing bioweapons
Nature Biotechnology: Watermarking generative AI for protein structure
Princeton AI Lab: Deep Dive Series: Building Biosecurity Safeguards into AI for Science

🌟 Try Our Demo!

We've created an interactive demo on Hugging Face Spaces where you can:

Input protein sequences and get watermarked structure predictions
Compare watermarked vs. non-watermarked structures
Visualize the differences in 3D
Pretrained Checkpoints and Inference code

Try the Demo →

🚀 Overview

FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:

Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm
# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
# Install local package
pip install -e .

📊 Training Pipeline

Data Setup

Download preprocessed SCOPe dataset (~280MB): Download Link

Extract the data:

tar -xvzf preprocessed_scope.tar.gz
rm preprocessed_scope.tar.gz

Training Steps

Pretrain the model:

python -W ignore experiments/pretrain.py

Finetune with watermarking:

python -W ignore experiments/finetune.py

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📝 Citation

If you find this work helpful, please cite our paper:

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

🙏 Acknowledgments

We thank the following open-source projects for their valuable contributions:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
analysis		analysis
assets		assets
configs		configs
data		data
experiments		experiments
models		models
openfold		openfold
LICENSE		LICENSE
README.md		README.md
foldmark.yml		foldmark.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

🌟 Try Our Demo!

🚀 Overview

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

📊 Training Pipeline

Data Setup

Training Steps

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📝 Citation

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Languages

License

zaixizhang/FoldMark

Folders and files

Latest commit

History

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

🌟 Try Our Demo!

🚀 Overview

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

📊 Training Pipeline

Data Setup

Training Steps

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📝 Citation

🙏 Acknowledgments

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages