Workflow
├── docker-compose.yml
├── Dockerfile
├── image
│ └── workflow.png
├── README.md
├── requirements.txt
└── src
├── config.json
├── dataset
│ ├── Inhibitor
│ │ ├── testing.csv
│ │ └── training.csv
│ └── sglt2
│ └── sglt2.csv
├── logs
├── machine_model.py
├── main.py
├── ML_logs
├── model_save_path_reward
├── model_save_path_zinc20
│ └── checkpoint-4468000
├── output
│ └── reward_epoch
├── reward.py
├── trainer.py
├── visualize.ipynb
└── zinc20M_gpt2_tokenizer
17 directories, 36 files
CompoundGPT currently supports Python > 3.10
To install PyTorch, visit the PyTorch official website and follow the instructions to download and install the PyTorch version that corresponds to your CUDA version.
Requirements can be installed using pip or conda as
pip install -r requirements.txt
or
conda list -e > requirements.txt
If you want to install CompoundGPT using a Docker, we have provided an image for your use.
First check if nvidia-docker is installed for GPU support, if not, please visit the Nvidia website
Build image
docker build -t compound:py310-torch211-cuda121 .
execute container
docker compose up -d
enter container
docker exec -it compound bash
Format the Data: Prepare your dataset in the specified CSV format and place it under the /src/dataset/ directory. The expected format for the CSV should be as follows:
smiles, label
For an example of the file format, refer to the sample provided at here:
Update Config File: Once your dataset is prepared and placed in the correct directory, navigate to the config.json file. Update the path in the config file to point to your newly processed dataset.
"train_data_path": "./dataset/your_path"
"kinase_name": "Name"
python machine_model.py
after finish training expert system it will show you performance.
Model Evaluation:
Sensitivity (Sn): 1.000
Specificity (Sp): 0.997
Accuracy (Acc): 0.999
Matthews Correlation Coefficient (MCC): 0.997
After completing the initial training of your model, the next step is to fine-tune your LLM.
Execute the following command to start the fine-tuning process.
CUDA_VISIBLE_DEVICES=1 python reward.py
During the training process, the model checkpoints for each epoch are saved in a specific directory. Here is how the saving mechanism is set up:
- Model Checkpoints : /src/model_save_path_reward
- Training Outputs : /src/output/reward_epoch
You can generate or check the differences for each epoch through these two locations.