EvoLlama

This is the official repository for the paper EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations.

[Dataset] | [Model] | [Preprint]

Quickstart

Environment Setups

We recommend using Python >= 3.9, and then simply use pip to install the required packages:

pip install -r requirements.txt

Download Model Weights

Model weights are publicly available on 🤗HuggingFace. During training, the parameters of Llama-3 are frozen. To initialize EvoLlama, you need to manually download the LLM weights at meta-llama/Meta-Llama-3-8B-Instruct.

For projection-tuned EvoLlama, only the projection layers are trainable. Therefore, you need to manually download the ProteinMPNN weights/ GearNet weights, and the ESM-2 weights.

The table below provides a summary of the EvoLlama model family and includes links to their model weights on 🤗HuggingFace.

Models	Stages	Datasets	PDB	Links
EvoLlama (ProteinMPNN + ESM-2)	Projection Tuning	SwissProt	AlphaFold-2	Download
EvoLlama (ProteinMPNN + ESM-2)	Supervised Fine-tuning	PMol + PEER	ESMFold	Download
EvoLlama (GearNet + ESM-2)	Projection Tuning	SwissProt	AlphaFold-2	Download
EvoLlama (GearNet + ESM-2)	Supervised Fine-tuning	PMol + PEER	ESMFold	Download

Inference

Helper functions for initializing EvoLlama and generating responses are provided in src/infer/infer.py. Note that the function infer() accepts a list of lists of PDB files and sequences, and a list of arbitrary prompts as inputs. When utilizing EvoLlama without a structure/sequence encoder, set the corresponding parameter None.

import os
from src.infer.infer import init_evo_llama, infer
# 1. Initialize EvoLlama
model_weights_path = '/path/to/EvoLlama'
llm_weights_path = '/path/to/llm'
evo_llama = init_evo_llama(
    structure_encoder_path=os.path.join(model_weights_path, 'structure_encoder_weights'),
    structure_encoder_name='ProteinMPNN',
    sequence_encoder_path=os.path.join(model_weights_path, 'sequence_encoder'),
    llm_path=llm_weights_path,
    projection_path=os.path.join(model_weights_path, 'projection_weights.bin'),
    projection_fusion=True,
    is_inference=True
)
# 2. Inference with EvoLlama
pdb_files = ['examples/ea91f233142ab1a17749be765a461255.pdb']  # We use the MD5 hash of the protein sequence as the filename.
sequences = ['MANHKSTQKSIRQDQKRNLINKSRKSNVKTFLKRVTLAINAGDKKVASEALSAAHSKLAKAANKGIYKLNTVSRKVSRLSRKIKQLEDKI']
prompt = 'Analyze the given amino acid sequence, and determine the function of the resulting protein, its subcellular localization, and any biological processes it may be part of.'
responses = infer(evo_llama, [pdb_files], [sequences], [prompt])

Additionally, simply run scripts scripts/eval_molinst.sh and scripts/eval_peer.sh to evaluate EvoLlama on the protein understanding and protein property prediction tasks, respectively.

# Evaluate EvoLlama on the protein understanding tasks.
bash scripts/eval_molinst.sh
# Evaluate EvoLlama on the protein property prediction tasks.
bash scripts/eval_peer.sh

Training

Coming soon ...

Citation

@misc{liu2024evollama,
    title={EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations}, 
    author={Nuowei Liu and Changzhi Sun and Tao Ji and Junfeng Tian and Jianxin Tang and Yuanbin Wu and Man Lan},
    year={2024},
    eprint={2412.11618},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2412.11618}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvoLlama

Quickstart

Environment Setups

Download Model Weights

Inference

Training

Citation

About

Uh oh!

Releases

Packages

Languages

License

sornkL/EvoLlama

Folders and files

Latest commit

History

Repository files navigation

EvoLlama

Quickstart

Environment Setups

Download Model Weights

Inference

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages