Collaborative Drug Discovery, Inc. (CDD) will create a novel web-based software platform that enables scientists to work together effectively to discover and improve new drug leads by sharing computational predictions based on open-source descriptors and models, for the first time without needing to reveal underlying chemical structures and biodata. It will create the first practical system of bio computational analysis across distributed datasets with different owners, while respecting data privacy. By lowering this key barrier to collaboration the platform will accelerate the pre-clinical drug discovery pipeline.
Research aim ed at neglected diseases and orphan indications will especially benefit, because they often rely on the loosely affiliated efforts of academic investigators, non-profit foundations, government laboratories, and small biotechnology firms (""""""""extra-pharma"""""""" entities). Such efforts typically lack not only the resources but also the integrated workflows of discovery projects conducted at large pharmaceutical companies (within which data can be shared freely across departments). The project will for the first time enable researchers focused on neglected diseases and orphan indications to effectively exploit bio computational tools such as virtual screening and ADME/Tox predictions, which are now considered to be standard and indispensible components of early discovery workflows within large pharma. It will also make it easier for these extra-pharma researchers to collaborate with large pharma and benefit from large pharma's significant investment accumulating large high-quality datasets. In Phase II of this SBIR project, CDD will: 1. Create a stand-alone platform, based entirely on open source technologies, that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous QSAR data - all without needing to divulge the underlying training sets. 2. Develop approaches that enable scientists who are not computational chemists to exploit the technology. A series of user interfaces will automate and intelligently guide the user to create or exploit models and assist the user to visualize domains of applicability, interpret results, and understand their limitations. The integrated platforms will enable scientists to seamlessly create, share and execute computational models leveraging private data vaults, with or without sharing the underlying training data. 3. Validate the platform by (a) developing a suite of at least five ADME/Tox and physicochemical property models based on open-source descriptors and data obtained from commercial ADME vendors, as well as public data from PubChem, ChEMBL and other sources, (b) securely making available a series of sophisticated pre- competitive ADME/Tox models provided by large pharmaceutical companies, and (c) demonstrating that col- laboratory can utilize the platform on their own (without relying on a computational chemist) to discover and advance TB drug leads with good ADME/Tox properties.

Public Health Relevance

The proposed project will create novel computational tools that will help researchers to accelerate the discovery of new and improved drugs against a wide range of diseases. These tools will particularly benefit researchers working on diseases that leading pharmaceutical companies have largely ignored because they are not perceived as highly profitable opportunities, despite the fact that in many cases they afflict millions of peopl.

National Institute of Health (NIH)
National Center for Advancing Translational Sciences (NCATS)
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-G (10))
Program Officer
Colvis, Christine
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Collaborative Drug Discovery, Inc.
United States
Zip Code
Ekins, Sean (2016) The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 33:2594-603
Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean et al. (2016) Predicting Mouse Liver Microsomal Stability with ""Pruned"" Machine Learning Models and Public Data. Pharm Res 33:433-49
Mikušová, Katarína; Ekins, Sean (2016) Learning from the past for TB drug discovery in the future. Drug Discov Today :
Ekins, Sean; Spektor, Anna Coulon; Clark, Alex M et al. (2016) Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today :
Clark, Alex M; Dole, Krishna; Ekins, Sean (2016) Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 56:275-85
Ekins, Sean; Perryman, Alexander L; Clark, Alex M et al. (2016) Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 56:1332-43
Li, Shao-Gang; Vilchèze, Catherine; Chakraborty, Sumit et al. (2015) Evolution of a thienopyrimidine antitubercular relying on medicinal chemistry and metabolomics insights. Tetrahedron Lett 56:3246-3250
Clark, Alex M; Ekins, Sean (2015) Open Source Bayesian Models. 2. Mining a ""Big Dataset"" To Create and Validate Models with ChEMBL. J Chem Inf Model 55:1246-60
Ekins, Sean; Clark, Alex M; Wright, Stephen H (2015) Making Transporter Models for Drug-Drug Interaction Prediction Mobile. Drug Metab Dispos 43:1642-5
Ekins, Sean; Clark, Alex M; Swamidass, S Joshua et al. (2014) Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 28:997-1008

Showing the most recent 10 out of 14 publications