| CARVIEW |
cdkbook
Groovy Cheminformatics with the Chemistry Development Kit
Edition 2.11-2
Egon L. Willighagen PhD
Long time CDK developer
© E.L. Willighagen 2011-2025
License: CC-BY-SA 4.0 International
Warning
This book is being opensourced [1]. This involves transforming the LaTeX source into Markdown, and updating all scripts to ensure all the automation works well. I have made good steps forward, but it will take some time for things to iron out. If you find issue, please report them here. If you like this book, please give the GitHub repository a star.
Most code snippets in this book are actually Groovy scripts, but this repository has some Jupyter notebook examples. If you want to know how any of those examples translates to Python, please file a request here.
Contents
- Introduction
- Writing CDK Applications
2.1. A (Very) Basic Java Application
2.2. Groovy
2.2.1. Closures
2.2.2. Grabbing dependencies
2.3. Python
2.4. Other environments
2.4.1. Bioclipse
2.4.2. Cinfony
2.4.3. R - Cheminformatics
3.1. Molecular Representations
3.2. Chemical Graphs
3.3. Quantum Chemistry
3.4. Numerical Representations
3.5. Chemometrics - Atoms, Bonds and Molecules
4.1. Atoms
4.1.1. IElement
4.1.2. IIsotope
4.1.3. IAtomType
4.1.4. Coordinates
4.2. Bonds
4.2.1. Electron counts
4.2.2. Bond stereochemistry
4.3. Molecules
4.3.1. Iterating over atoms and bonds
4.3.2. Neighboring atoms and bonds
4.4. Molecular Formula
4.5. Implicit and Explicit Hydrogens
4.6. Chemical Objects
4.7. Rings - Stereochemistry
5.1. Stereochemistry in a flat world
5.2. Tetrahedral chirality - Salts and other disconnected structures
6.1. Salts
6.2. Crystals - Paired and unpaired electrons
7.1. Lone Pairs
7.2. Unpaired electrons - Protein and DNA
8.1. Protein From File
8.2. Protein From Sequence
8.3. Strands and Monomers - Reactions
9.1. A single reaction
9.2. Reaction from File
9.2.1. MDL RXN files
9.3. CMLReact files - From IChemObject to IChemFile
10.1. IAtomContainerSet
10.2. IReactionSet and IRingSet
10.3. IChemModel
10.4. IChemSequence
10.5. IChemFile - IChemObjectBuilders
11.1. Implementations
11.1.1. The Default Builder
11.1.2. The Debug Builder
11.1.3. The Silent Builder - Input/Output
12.1. File Format Detection
12.1.1. Custom format matchers
12.2. Reading from Readers and InputStreams
12.2.1. Example: Downloading Domoic Acid from PubChem
12.3. Input Validation
12.3.1. Reading modes
12.3.2. Validation
12.4. Gzipped files
12.5. Iterating Readers
12.5.1. MDL SD files
12.5.2. PubChem Compounds XML files
12.6. Customizing the Output
12.6.1. Setting Properties
12.7. Example: creating unit tests for atom type perception
12.8. Line Notations
12.8.1. SMILES
12.9. Recipes
12.9.1. MDL molfile (V2000)
12.9.2. SDF files with properties - Atom types
13.1. The CDK atom type model
13.1.1. Hybridization Types
13.2. Atom type perception
13.2.1. Single atoms
13.2.2. Full molecules
13.2.3. Configuring the Atom
13.2.4. No atom type perceived?!
13.3. Sybyl atom types - Graph Properties
14.1. Partitioning
14.2. Spanning Tree
14.3. Ring counts
14.3.1. Smallest Rings
14.3.2. All Rings
14.4. Graph matrices
14.4.1. Adjacency matrix
14.4.2. Distance matrix
14.5. Atom Numbers
14.5.1. Morgan Atom Numbers
14.5.2. InChI Atom Numbers - Missing Information
15.1. Element and Isotope information
15.1.1. Elements
15.1.2. Isotopes
15.2. Reconnecting Atoms
15.3. Missing Bond Orders
15.4. Missing Hydrogens
15.4.1. Implicit Hydrogens
15.4.2. Explicit Hydrogens
15.5. 2D Coordinates
15.6. Unknown Molecular Formula - Depiction
16.1. Molecules
16.1.1. Scalable Vector Graphics
16.2. Background color
16.3. Coloring selections
16.4. Parameters
16.5. Reactions - Substructure Searching
17.1. Exact Search
17.2. Substructures
17.3. Matching Substructures
17.4. SMARTS matching
17.4.1. Unique matches
17.5. Fingerprints
17.5.1. MACCS Fingerprints
17.5.2. ECFP and FCFP Fingerprints - Molecular Properties
18.1. Molecular Mass
18.1.1. Implicit Hydrogens
18.2. LogP
18.3. Total Polar Surface Area
18.4. Van der Waals Volume
18.5. Aromaticity - Molecular Descriptors
19.1. Descriptors and Specifications
19.1.1. IImplementationSpecification
19.2. IDescriptor
19.3. IMolecularDescriptor
19.4. IDescriptorResult
19.5. Counting Nitrogens and Oxygens - InChI
20.1. Layers
20.1.1. Fixed Hydrogens
20.1.2. Stereoisomerism - Chemistry Toolkit Rosetta
21.1. Heavy atom counts from an SD file
21.2. Depict a compound as an image
21.3. Working with SD tag data - Migration
22.1. CDK 2.9 to 2.10
22.1.1. IChemObjectReaderErrorHandler
22.2. CDK 2.6 to 2.7
22.2.1. InChI functionality
22.3. CDK 2.0 to 2.3
22.4. CDK 1.4 to 2.0
22.4.1. Removed classes
22.4.2. Renamed classes and methods
22.4.3. Changed behavior
22.4.4. Constructors that now require a builder
22.4.5. Static methods that are no longer
22.4.6. IsotopeFactory
22.4.7. IFingerPrinter
22.4.8. SMILESGenerator
22.4.9. Aromaticity calculations
Appendix A
A.1 CDK Atom Types
A.2 Sybyl Atom Types
Appendix B
B.1 Isotope List
Appendix C
C.1 Molecular Descriptors
C.2 Atomic Descriptors
C.3 Atom-Pair Descriptors
C.4 Bond Descriptors
C.5 Protein Descriptors
Appendix D
D.1 Readers and Writers
References
- Willighagen E. Edition 1.4.1-0 of Groovy Cheminformatics with the Chemistry Development Kit. 2015. doi:10.6084/M9.FIGSHARE.2057790.V1 (Scholia)