FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking
- Science: Built-in safeguards might stop AI from designing bioweapons
- Nature Biotechnology: Watermarking generative AI for protein structure
- Princeton AI Lab: Deep Dive Series: Building Biosecurity Safeguards into AI for Science
We've created an interactive demo on Hugging Face Spaces where you can:
- Input protein sequences and get watermarked structure predictions
- Compare watermarked vs. non-watermarked structures
- Visualize the differences in 3D
- Pretrained Checkpoints and Inference code
FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:
- Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
- High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
- Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
- Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
- Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.
# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm
# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
# Install local package
pip install -e .
- Download preprocessed SCOPe dataset (~280MB): Download Link
- Extract the data:
tar -xvzf preprocessed_scope.tar.gz rm preprocessed_scope.tar.gz
- Pretrain the model:
python -W ignore experiments/pretrain.py
- Finetune with watermarking:
python -W ignore experiments/finetune.py
If you find this work helpful, please cite our paper:
@article{zhang2024foldmark,
title={FoldMark: Protecting Protein Generative Models with Watermarking},
author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
journal={bioRxiv},
pages={2024--10},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
We thank the following open-source projects for their valuable contributions:
This project is licensed under the MIT License - see the LICENSE file for details.