THRUST 1
In MMLI 2.0, the overall goal of Thrust 1 is to develop foundational AI agents for discovery and synthesis of functional molecules, with a focus on three key areas:
- A function- and synthesis-aware modular chemical language model (mCLM) to enable the prediction of new molecules of emergent functions derived from their modules, and all of the AI-generated molecules will be accessible via automated assembly by non-specialists.
- Knowledge-augmented theme-specific LLMs (themeLLM) for hypothesis generation and experimental design.
- Innovative AI Agents with critical thinking to enable the development of other frontier AI tools such as generative AI within Thrusts 2-4.
THRUST 2
The researchers in Thrust 2 focuses on AI-enabled catalysts discovery and development. Led by Prof. Scott Denmark, the overall goal of this thrust is to pressure test the foundational AI agents described in Thrust 1 using catalyst discovery and development as a testbed to drive further development of foundational AI tools. In addition, we will develop new generative AI models for catalyst design, focusing on the development of chemical and enzymatic catalysts with new properties or functions.
THRUST 3
The overarching goal of Thrust 3 is to pressure test and advance the foundational AI agents developed in Thrust 1 by applying them to drug discovery and synthesis. Drug discovery represents an ideal testbed because it requires simultaneous reasoning over chemical function, synthetic feasibility, and emergent molecular properties.
In this thrust, we will deploy the modular Chemical Language Model (mCLM) to discover new kinase inhibitors (KIs) and validate that the mCLM can reason over function-infused molecular modules that are compatible with automated modular synthesis. This approach complements other CLMs that do not account for synthesis compatibility a priori1–10 and other modular discovery strategies such as DNA-encoded libraries (which do not consider function a priori) and fragment-based drug discovery (which does not consider iterative automated synthesis a priori.
Additionally, we will use AI-enabled synthesis planning tools to design and experimentally validate highly efficient chemoenzymatic routes to three FDA-approved drugs. Finally, we will develop new LLM-enabled models to generate robust retrosynthetic pathways, experimental procedures, and predictions of enzyme selectivity.
THRUST 4
The overall goal of Thrust 4 is to pressure test the proposed paradigm of critical-thinking AI guided closed-loop experimentation using materials discovery and understanding as a testbed to drive further development of foundational AI tools outlined in Thrust 1.
In Phase I, we established a Closed-Loop Transfer (CLT) paradigm that transfers from the closed-loop discovery regime into the hypothesis-driven discovery regime to yield new knowledge of how molecular structure encodes photostability in light harvesting small molecules – a crucial bottleneck impeding solar cell technologies. Key to the success we achieved in our first five years is a modular chemical synthesis platform that is friendly to both automation and AI. By putting synthesis considerations at the beginning, rather than at the end, of the AI-guided discovery process we have eliminated the longstanding synthesis bottleneck that previously precluded the robust integration of AI with C-C bond-based materials discovery. Although some recent work has demonstrated closed loop paradigms in molecular discovery, including work from some of our labs partially funded by NSF-MMLI, none, prior to our recently published work, has discovered new chemical knowledge. Central to the success of CLT in making this leap to new knowledge discovery is the fusion of physical modeling with AI.
In Phase II, CLT will take a radically new dimension by pairing up with a multimodal AI agent (Thrust 1) that can autonomously propose hypotheses, request data from the closed-loop experimentation to test hypotheses and learn and ultimately deliver new knowledge non-existent in literature (CLT 2.0). In this way, our work will advance multimodal language models from the “undergraduate” to “graduate” level by infusing critical thinking and learning through closed-loop experimentation.