Compound generator: A GPT-based drug generator applied within a specific domain.

Workflow

Directory Structure

├── docker-compose.yml
├── Dockerfile
├── image
│   └── workflow.png
├── README.md
├── requirements.txt
└── src
    ├── config.json
    ├── dataset
    │   ├── Inhibitor
    │   │   ├── testing.csv
    │   │   └── training.csv
    │   └── sglt2
    │       └── sglt2.csv
    ├── logs
    ├── machine_model.py
    ├── main.py
    ├── ML_logs
    ├── model_save_path_reward
    ├── model_save_path_zinc20
    │   └── checkpoint-4468000
    ├── output
    │   └── reward_epoch
    ├── reward.py
    ├── trainer.py
    ├── visualize.ipynb
    └── zinc20M_gpt2_tokenizer
17 directories, 36 files

Requirements

CompoundGPT currently supports Python > 3.10

Installation

pip or conda

To install PyTorch, visit the PyTorch official website and follow the instructions to download and install the PyTorch version that corresponds to your CUDA version.

Requirements can be installed using pip or conda as

pip install -r requirements.txt

or

conda list -e > requirements.txt

Docker

If you want to install CompoundGPT using a Docker, we have provided an image for your use.

First check if nvidia-docker is installed for GPU support, if not, please visit the Nvidia website

Build image

docker build -t compound:py310-torch211-cuda121 .

execute container

docker compose up -d

enter container

docker exec -it compound bash

Getting Started

Preparing Your Dataset

Format the Data: Prepare your dataset in the specified CSV format and place it under the /src/dataset/ directory. The expected format for the CSV should be as follows:

smiles, label

For an example of the file format, refer to the sample provided at here:

Configuration

Update Config File: Once your dataset is prepared and placed in the correct directory, navigate to the config.json file. Update the path in the config file to point to your newly processed dataset.

"train_data_path": "./dataset/your_path"
"kinase_name": "Name"

Training the Expert System

python machine_model.py

after finish training expert system it will show you performance.

Model Evaluation:
Sensitivity (Sn): 1.000
Specificity (Sp): 0.997
Accuracy (Acc): 0.999
Matthews Correlation Coefficient (MCC): 0.997

Finetune LLM

After completing the initial training of your model, the next step is to fine-tune your LLM.

Execute the following command to start the fine-tuning process.

CUDA_VISIBLE_DEVICES=1 python reward.py

During the training process, the model checkpoints for each epoch are saved in a specific directory. Here is how the saving mechanism is set up:

Model Checkpoints : /src/model_save_path_reward
Training Outputs : /src/output/reward_epoch

You can generate or check the differences for each epoch through these two locations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Compound generator: A GPT-based drug generator applied within a specific domain.

Directory Structure

Requirements

Installation

pip or conda

Docker

Getting Started

Preparing Your Dataset

Configuration

Training the Expert System

Finetune LLM

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
image		image
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Charliefff/CompoundGPT

Folders and files

Latest commit

History

Repository files navigation

Compound generator: A GPT-based drug generator applied within a specific domain.

Directory Structure

Requirements

Installation

pip or conda

Docker

Getting Started

Preparing Your Dataset

Configuration

Training the Expert System

Finetune LLM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages