Despite much interest in expanding chemical space, diverse, billion molecule libraries remain inaccessible. In principle, docking a virtual library could access some of this missing chemical space. This idea has until now been vitiated by two key problems: 1. prediction of readily synthesized molecules has been challenging, without resorting to strategies that collapse diversity; and 2. docking is notoriously inaccurate. Two recent advances have made virtual library docking screens seem less fanciful. First, our collaborators at Enamine, a widely used fine chemicals supplier, have defined a 0.7 billion molecule make-on-demand library based on >100 reactions that they have under good control; >650,000 of these have been successfully synthesized. Second, while docking retains serious errors, it has made pragmatic progress, and has found genuinely novel ligands for >100 targets.
The specific aims are:
Aim 1. A robust, searchable, and dockable database of 3 billion diverse lead-like molecules. We will A. Enumerate 3 billion vetted products from two- and three-component reactions. B. measure the diversity and novelty of this library and how they differ from the world's in-stock molecules. C. Develop a community accessible database and chemoinformatics infrastructure that can store, similarity search, and rapidly retrieve molecules from this library. D. convert these molecules into biologically relevant 3D forms, including enumerating low- energy conformers, partial atomic charges and other parameters, van der Waals parameters and solvation energies for all library molecules, enabling their use for docking screens.
Aim 2. Dock and experimentally test the library against two targets. A. Screen the library against the dopamine D4 and kappa-opioid receptors, seeking novel ligands. 250 to 500 library molecules will be tested per screen, itself a 10-fold increase. A key question will be do we find novel, potent ligands, or are we overwhelmed by false positives? B. As the library grows, do we continue to find ever more novel, in some sense ever more perfect, high affinity ligands, or does discovery saturate? C. How does hit rate vary with docking score? As we will be testing hundreds of molecules, we can afford to investigate not only those with the highest docking ranks, but also molecules with mediocre and poor ranks. This has not been previously explored, certainly not at scale. If successful, this project will increase the number of molecules available to the community by 1000-fold, and demonstrate their utility for ligand discovery. Extensive preliminary results support its feasibility.

Public Health Relevance

This proposal develops a method to discover new-to-the-planet small molecules to modulate biology. It leverages recent innovations in low-cost chemical synthesis and molecular docking methods to screen orders of magnitude more chemicals in the computer, testing these predictions against two important targets - opioid and dopamine receptors. A practical outcome is new tools useful to the community to enable them to find new compounds for their own targets.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Schools of Pharmacy
San Francisco
United States
Zip Code