High-throughput screening remains the most common method to identify the primary chemical hits in chemical biology and drug discovery projects. However, large-scale screens are often financially out of reach for many academic labs. The resulting small screens of small chemical libraries often identify few confirmed hits, most of which are unsuitable for medicinal chemistry follow-up. There is a critical need for computational tools that can guide experimental screening at a reasonable cost while exploring far more chemicals. This proposal will develop a computational platform that can be applied to any target or phenotypic system to select compounds for experimental screening from virtual chemical libraries with a billion or more compounds. New machine learning and high-throughput computing methods will navigate this large chemical space strategically and efficiently, improving the quality of the identified compounds. The computationally guided screening can ultimately improve the chemical diversity, target specificity, and stability of the identified active compounds, accelerating and enabling drug discovery across NIH Institutes. The long-term goal of this research is to develop technologies that make chemical screening accessible to more research groups and enable drug discovery in any therapeutic area. Data science techniques can take advantage of existing public bioactivity data in PubChem to help prioritize chemical screening for a new target. In this technology development proposal, the researchers will 1) determine a general initial screening compound set, 2) develop a computationally-guided iterative screening system, and 3) create and experimentally validate an end-to-end high-throughput computing workflow that directs drug discovery for any new target. At each step, rigorous comparisons to experimental results and baseline models will ensure the methods are working as expected and offer capabilities that do not exist in the current state-of-the-art computational methods. The resulting computational platform will improve the efficiency of and reduce the barriers to chemical screening, ultimately making screening for new compounds available to academic research groups in ways and in places where it has previously been inaccessible.
The proposed project is broadly relevant to public health as a critical advancement of technology to enable basic biomedical research across therapeutic areas. This effort creates a computational platform to reduce the overall cost and improve the quality of high-throughput screening in any chemical biology or drug discovery project, which will be rigorously compared to existing experimental techniques. Modern data science tools will reduce the barriers to high-throughput screening and make screening for new compounds available to academic research groups in ways and in places where it has previously been inaccessible.