Collaborative Drug Discovery (CDD) proposes to develop technology that will vastly simplify and integrate all the processes required to exploit predictive models for drug discovery. The software will make it easy for scientists without specialized training in informatics to create, train, apply, evaluate, share, and archive models with minimal effort, and also leverage a large library of pre-computed models with zero effort. The software will also enable scientists working in different organizations to collectively build models from their aggregated data and share these models, without sharing the underlying training data. Our goal is to democratize the role in drug discovery of computational models ? which have historically been restricted to computational experts ? and allow models to become routine aids to the discovery workflow in academia, foundations, government laboratories, and small companies that do not have the resources to employ them today. In Phase 2 we implemented modified Bayesian model building directly within CDD?s web- based CDD Vault platform, which securely hosts structure-activity relationship (SAR) data; any user can now easily train a Bayesian model with experimental data stored in her private Vault, then apply the model to predict activity for untested compounds. In Phase 2B we propose to generalize this capability with the following new Specific Aims, which are needed to achieve a widespread scientific and commercial impact:
Aim 1 : Integrate a suite of diverse computational techniques (such as QSAR, Neural Networks, Support Vector Machines, Random Forest, k-Nearest Neighbors, and possibly others) into a single framework, to allow direct side-by-side comparison.
Aim 2 : Develop and validate a universal metric that ranks the predictive strength of each method as applied to a particular dataset.
Aim 3 : Apply the metric to automatically generate thousands of models from high-quality, public-access structure-activity and ADME/Tox datasets and present key results to the user.
Aim 4 : Develop a novel capability to build models collaboratively, by aggregating multiple datasets, and share the models without revealing the compounds and data in the training sets.

Public Health Relevance

The proposed project will create novel computational tools that will help researchers to accelerate the discovery of new and improved drugs against a wide range of diseases. These tools will particularly benefit researchers working on diseases that leading pharmaceutical companies have largely ignored because they are not perceived as highly profitable opportunities, despite the fact that in many cases they afflict millions of people.

National Institute of Health (NIH)
National Center for Advancing Translational Sciences (NCATS)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Colvis, Christine
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Collaborative Drug Discovery, Inc.
United States
Zip Code
Lane, Thomas; Russo, Daniel P; Zorn, Kimberley M et al. (2018) Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 15:4346-4360
Ekins, Sean; Clark, Alex M; Dole, Krishna et al. (2018) Data Mining and Computational Modeling of High-Throughput Screening Datasets. Methods Mol Biol 1755:197-221
Ekins, Sean; Spektor, Anna Coulon; Clark, Alex M et al. (2017) Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today 22:555-565
Stratton, Thomas P; Perryman, Alexander L; Vilchèze, Catherine et al. (2017) Addressing the Metabolic Stability of Antituberculars through Machine Learning. ACS Med Chem Lett 8:1099-1104
Mikušová, Katarína; Ekins, Sean (2017) Learning from the past for TB drug discovery in the future. Drug Discov Today 22:534-545
Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean et al. (2016) Predicting Mouse Liver Microsomal Stability with ""Pruned"" Machine Learning Models and Public Data. Pharm Res 33:433-49
Clark, Alex M; Dole, Krishna; Ekins, Sean (2016) Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 56:275-85
Ekins, Sean; Perryman, Alexander L; Clark, Alex M et al. (2016) Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 56:1332-43
Ekins, Sean (2016) The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 33:2594-603
Clark, Alex M; Ekins, Sean (2015) Open Source Bayesian Models. 2. Mining a ""Big Dataset"" To Create and Validate Models with ChEMBL. J Chem Inf Model 55:1246-60

Showing the most recent 10 out of 17 publications