MolScore: A scoring, evaluation and benchmarking framework for de novo drug design

Overview

MolScore contains code to score de novo compounds in the context of generative de novo design by generative models via the subpackage molscore, as well as, facilitate downstream evaluation via the subpackage moleval. An objective is defined via a JSON file which can be shared to propose new multi-parameter objectives for drug design. MolScore can be used in several ways:

To implement a multi-parameter objective to for prospective drug design.
To benchmark objectives/generative models/optimization using benchmark mode (MolScoreBenchmark).
To implement a sequence of objectives using curriculum mode (MolScoreCurriculum).

Contributions and/or ideas for added functionality are welcomed!

Installation

Install MolScore with PyPI (recommended):

pip install molscore --upgrade

or directly from GitHub:

git clone https://github.com/MorganCThomas/MolScore.git
cd MolScore ; pip install -e .

Note: I recommend mamba for environment handling

Scoring

Simplest integration of MolScore requires a config file, for example:

from molscore import MolScore
ms = MolScore(
    model_name='test',
    task_config='molscore/configs/QED.json',
    budget=10000
)
while not ms.finished:
    scores = ms.score(SMILES)

Note: see tutorial for more detail

A GUI exists to help write the config file, which can be run with the following command.

molscore_config

Note: see tutorial for more detail

Scoring functionality

Scoring functions

Descriptors: RDKit, Maximum consecutive rotatable bonds, Penalized LogP, LinkerDescriptors (Fragment linking),
- MolSkill: Extracting medicinal chemistry intuition via preference machine learning as available on Nature Communications.
Synthesizability: RAscore, AiZynthFinder, SAscore, ReactionFilters (Scaffold decoration)
2D Similarity: Fingerprint similarity (any RDKit fingerprint and similarity measure), substructure match/filter, Applicability domain
3D Similarity: ROCS, Open3DAlign
QSAR: Scikit-learn (classification/regression), ChemProp
- PIDGINv5: Pre-trained RF classifiers for ~2,300 ChEMBL31 targets at different activity thresholds of 0.1 uM, 1 uM, 10 uM & 100 uM.
- ADMET-AI: Pre-trained predictive models of various ADMET endpoints.
Docking: Glide^a, Smina, OpenEye^a, GOLD^a, PLANTS, rDock, Vina, Gnina
- Ligand preparation: RDKit->Epik, Moka->Corina, Ligprep, Gypsum-DL

^a Requires a license

Transformation functions (transform values to [0-1])

Linear
Linear threshold
Step
Step threshold
Gaussian

Aggregation functions (combine multiple scores into 1)

Arithmetic mean
Geometric mean
Weighted sum
Weighted product
Auto-weighted sum/product
Pareto front

Filters (applied to final aggregated score)

Any scoring function as a filter
Diversity filters
- Unique
- Occurence
- Memory assisted
  - ScaffoldSimilarityECFP

Benchmarking

Benchmarks are lists of objectives (configuration files) with metrics calculated upon exit. Re-implementations of existing benchmarks are available as presets.

from molscore import MolScoreBenchmark
msb = MolScoreBenchmark(
    model_name='test',
    output_dir='./',
    benchmark='GuacaMol',
    budget=10000
)
for task in msb:
    while not task.finished:
        scores = task.score(SMILES)

Current benchmarks available include: GuacaMol, GuacaMol_Scaffold, MolOpt, 5HT2A_PhysChem, 5HT2A_Selectivity, '5HT2A_Docking', LibINVENT Exp1&3, MolExp(L)

Note: inspect preset benchmarks with MolScoreBenchmark.presets.keys()

Note: see tutorial for more detail

Evaluation

The moleval subpackage can be used to calculate metrics for an arbitrary set of molecules.

from moleval.metrics.metrics import GetMetrics
MetricEngine = GetMetrics(
    test=TEST_SMILES, # Model training data subset
    train=TRAIN_SMILES, # Model training data
    target=TARGET_SMILES, # Exemplary target data
)
metrics = MetricEngine.calculate(
    GEN_SMILES, # Generated data
)

Note: see tutorial for more detail

Metrics available

Intrinsice metrics (generated molecules only)

Validity, Uniqueness, Scaffold uniqueness, Internal diversity (1 & 2), Scaffold diversity
Sphere exclusion diversity: Measure of chemical space coverage at a specific Tanimoto similarity threshold. I.e., A score 0.5 indicates 50% of the sample size sufficiently describes the chemical space, therefore the higher the metric the more diverse the sample. Also see here
Solow Polasky diversity
Functional group diversity
Ring system diversity
Filters: Passing of a set of drug-like filters (MolWt, Rotatable bonds, LogP etc.), Medicinal Chemistry substructures and PAINS substructures.
Purchasability: Molbloom prediction of presence in ZINC20

Extrinsic metrics (comparison to reference molecules)

Novelty
FCD
Analogue similarity: Proportion of generated molecules that are analogues to molecules in reference data.
Analogue coverage: Proportion of reference data that are analogues to generated data.
Functional group similarity
Ring system similarity
Single nearest neighbour similarity
Fragment similarity
Scaffold similarity
Outlier bits (Silliness): Average proportion of fingerprint bits (atomic environments) present in a generated molecule, not present anywhere in the reference data. The lower the silliness the better.
Wasserstein distance (LogP, SA Score, NP score, QED, Weight)

Additional functionality

Curriculum learning (see tutorial)
Experience replay buffers (see tutorial)
Parallelisation (see tutorial)
A GUI for monitoring generated molecules (see below)

molscore_monitor

Citation & Publications

If you use this software, please cite it as below.

@article{thomas2024molscore,
title={MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design},
author={Thomas, Morgan and O’Boyle, Noel M and Bender, Andreas and De Graaf, Chris},
journal={Journal of Cheminformatics},
volume={16},
year={2024},
publisher={BMC}
}

This software was also utilised in the following publications:

Thomas, M., Smith, R.T., O’Boyle, N.M. et al. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminform 13, 39 (2021). https://doi.org/10.1186/s13321-021-00516-0
Thomas M, O'Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 14, 68 (2022). https://doi.org/10.1186/s13321-022-00646-z
Handa K, Thomas M, Kageyama M, Iijima T, Bender A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J Cheminform 15, 112 (2023). https://doi.org/10.1186/s13321-023-00781-1
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 16, 77 (2024). https://doi.org/10.1186/s13321-024-00866-5
Bou A, Thomas M, Dittert S, Ramírez CN, Majewski M, Wang Y, Patel S, Tresadern G, Ahmad M, Moens V, Sherman W. ACEGEN: Reinforcement learning of generative chemical agents for drug discovery. J Chem Inf Model 64, 15 (2024). https://doi.org/10.1021/acs.jcim.4c00895
Thomas M, Matricon PG, Gillespie RJ, Napiórkowska M, Neale H, Mason JS, Brown J, Fieldhouse C, Swain NA, Geng T, O'Boyle NM. Identification of nanomolar adenosine A2A receptor ligands using reinforcement learning and structure-based drug design. Nat Commun 16, 5485 (2025). https://doi.org/10.1038/s41467-025-60629-0

Name		Name	Last commit message	Last commit date
Latest commit History 867 Commits
datasets/molexp		datasets/molexp
moleval		moleval
molscore		molscore
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.ruff.toml		.ruff.toml
CITATION.rff		CITATION.rff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
environment_notorch.yml		environment_notorch.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MolScore: A scoring, evaluation and benchmarking framework for de novo drug design

Overview

Installation

Scoring

Benchmarking

Evaluation

Additional functionality

Citation & Publications

About

Uh oh!

Releases 25

Packages

Contributors 3

Uh oh!

Languages

License

MorganCThomas/MolScore

Folders and files

Latest commit

History

Repository files navigation

MolScore: A scoring, evaluation and benchmarking framework for de novo drug design

Overview

Installation

Scoring

Benchmarking

Evaluation

Additional functionality

Citation & Publications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 25

Packages 0

Contributors 3

Uh oh!

Languages

Packages