You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodonMPNN is similar to ProteinMPNN but produces a codon sequences instead of an amino acid sequence and it is conditioned on a specific organism (the organism conditioning is optional).
The current version has been trained on Monomers from the AlphaFold database.
Conda environment
Here is one version of a conda environment that works for running the code. Keep in mind that the following assumes that you have a CUDA compatible GPU.
If that is not the case you should follow the pytorch installation instructions here instead: https://pytorch.org/get-started/locally/
A toy dataset is in afdb_small to run the code on. CodonMPNN was trained on a MMSeqs clustered version of AFDB.
Here is a link to download the .csv file for the whole training validation and test data.
It contains AFDB ids for representatives of mmseqs clusters with a maximum 30% sequence identity. The train, val, test split is a random split.
Unfortunately, I cannot upload all the .pdb files somewhere due to the dataset size. You can download the pdb files from AFDB using their api and code similar to this:
import requests
url = f"https://alphafold.ebi.ac.uk/files/{afdb_id}.pdb"
response = requests.get(url)
if response.status_code == 200:
with open(f'{afdb_id}.pdb', 'wb') as pdb_file:
pdb_file.write(response.content)