Collaborative Drug Discovery, Inc. (CDD) will create a novel web-based software platform that enables scientists to work together effectively to discover and improve new drug leads by sharing computational predictions based on open-source descriptors and models, for the first time without needing to reveal underlying chemical structures and biodata. It will create the first practical system of bio computational analysis across distributed datasets with different owners, while respecting data privacy. By lowering this key barrier to collaboration the platform will accelerate the pre-clinical drug discovery pipeline.
Research aim ed at neglected diseases and orphan indications will especially benefit, because they often rely on the loosely affiliated efforts of academic investigators, non-profit foundations, government laboratories, and small biotechnology firms (extra-pharma entities). Such efforts typically lack not only the resources but also the integrated workflows of discovery projects conducted at large pharmaceutical companies (within which data can be shared freely across departments). The project will for the first time enable researchers focused on neglected diseases and orphan indications to effectively exploit bio computational tools such as virtual screening and ADME/Tox predictions, which are now considered to be standard and indispensible components of early discovery workflows within large pharma. It will also make it easier for these extra-pharma researchers to collaborate with large pharma and benefit from large pharma's significant investment accumulating large high-quality datasets. In Phase II of this SBIR project, CDD will: 1. Create a stand-alone platform, based entirely on open source technologies, that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous QSAR data - all without needing to divulge the underlying training sets. 2. Develop approaches that enable scientists who are not computational chemists to exploit the technology. A series of user interfaces will automate and intelligently guide the user to create or exploit models and assist the user to visualize domains of applicability, interpret results, and understand their limitations. The integrated platforms will enable scientists to seamlessly create, share and execute computational models leveraging private data vaults, with or without sharing the underlying training data. 3. Validate the platform by (a) developing a suite of at least five ADME/Tox and physicochemical property models based on open-source descriptors and data obtained from commercial ADME vendors, as well as public data from PubChem, ChEMBL and other sources, (b) securely making available a series of sophisticated pre- competitive ADME/Tox models provided by large pharmaceutical companies, and (c) demonstrating that col- laboratory can utilize the platform on their own (without relying on a computational chemist) to discover and advance TB drug leads with good ADME/Tox properties.

Public Health Relevance

The proposed project will create novel computational tools that will help researchers to accelerate the discovery of new and improved drugs against a wide range of diseases. These tools will particularly benefit researchers working on diseases that leading pharmaceutical companies have largely ignored because they are not perceived as highly profitable opportunities, despite the fact that in many cases they afflict millions of peopl.

National Institute of Health (NIH)
National Center for Advancing Translational Sciences (NCATS)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-G (10))
Program Officer
Colvis, Christine
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Collaborative Drug Discovery, Inc.
United States
Zip Code
Lane, Thomas; Russo, Daniel P; Zorn, Kimberley M et al. (2018) Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 15:4346-4360
Ekins, Sean; Clark, Alex M; Dole, Krishna et al. (2018) Data Mining and Computational Modeling of High-Throughput Screening Datasets. Methods Mol Biol 1755:197-221
Ekins, Sean; Spektor, Anna Coulon; Clark, Alex M et al. (2017) Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today 22:555-565
Stratton, Thomas P; Perryman, Alexander L; Vilchèze, Catherine et al. (2017) Addressing the Metabolic Stability of Antituberculars through Machine Learning. ACS Med Chem Lett 8:1099-1104
Mikušová, Katarína; Ekins, Sean (2017) Learning from the past for TB drug discovery in the future. Drug Discov Today 22:534-545
Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean et al. (2016) Predicting Mouse Liver Microsomal Stability with ""Pruned"" Machine Learning Models and Public Data. Pharm Res 33:433-49
Clark, Alex M; Dole, Krishna; Ekins, Sean (2016) Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 56:275-85
Ekins, Sean; Perryman, Alexander L; Clark, Alex M et al. (2016) Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 56:1332-43
Ekins, Sean (2016) The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 33:2594-603
Clark, Alex M; Ekins, Sean (2015) Open Source Bayesian Models. 2. Mining a ""Big Dataset"" To Create and Validate Models with ChEMBL. J Chem Inf Model 55:1246-60

Showing the most recent 10 out of 17 publications