Collaborative Drug Discovery, Inc. (CDD) proposes to create a novel web-based software platform that enables scientists to work together effectively to discover and improve new drug leads, yet with the option not to reveal chemical structures to each other. It will create the first practical system of biocomputational analysis across distributed datasets with different owners, while respecting data privacy. By lowering this key barrier to collaboration, the platform will accelerate the pre-clinical drug discovery pipeline.
Research aim ed at neglected diseases and orphan indications will especially benefit, because they often rely on the loosely affiliated efforts of academic investigators, non-profit foundations, government laboratories, and small biotechnology firms (""""""""extra-pharma"""""""" entities). Such efforts typically lack not only the resources but also the integrated workflows of discovery projects conducted at large pharmaceutical companies (within which data can be shared freely across departments). The project will for the first time enable researchers focused on neglected diseases and orphan indications to effectively exploit biocomputational tools such as virtual screening and ADME/Tox predictions, which are now considered to be standard and indispensible components of early discovery workflows within large pharma. It will also make it easier for these extra-pharma researchers to collaborate with large pharma and benefit from large pharma's significant investment accumulating large high-quality datasets. In Phase 1 of the proposed SBIR, CDD will leverage ongoing collaborations to prove the feasibility and value of the approach with prospective potency predictions in advance of experimental confirmation. Key collaborators include Prof. Carl Nathan at Weill Cornell Medical College, Dr. Clifton Barry, III, at NIAID, and Allen Casey at the Infectious Disease Research Institute (IDRI). Their groups will serve as an experimental test bed for the project. They all have ongoing screening programs to discover compounds active against tuberculosis (TB).
Specific aims for Phase 1 include: 1. Demonstrate the value to the collaborating screening centers of creating computational TB screening models derived from distributed, heterogeneous collections of data and exploiting the models prospectively to filter and prioritize the molecules scheduled to be screened. Validate the hypothesis that by selecting subsets enriched with active compounds, the centers can efficiently explore more of chemical space than would otherwise be possible with limited resources. 2. Develop initial standards for specifying models (including purpose, inputs, outputs, algorithms, descriptor types, domain of applicability and other parameters necessary for presentation, interpretation, and exchange) that will form the outline for more comprehensive software prototypes that CDD will iteratively develop, deploy, test and validate in Phase 2.

Public Health Relevance

The proposed project will create novel computational tools that will help researchers to accelerate the discovery of new and improved drugs against a wide range of diseases. These tools will particularly benefit researchers working on diseases that leading pharmaceutical companies have largely ignored because they are not perceived as highly profitable opportunities, despite the fact that in many cases they afflict millions of people.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-K (11))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Collaborative Drug Discovery, Inc.
United States
Zip Code
Litterman, Nadia K; Ekins, Sean (2015) Databases and collaboration require standards for human stem cell research. Drug Discov Today 20:247-54
Ekins, Sean; Freundlich, Joel S; Hobrath, Judith V et al. (2014) Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery. Pharm Res 31:414-35
Ekins, Sean; Clark, Alex M; Swamidass, S Joshua et al. (2014) Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 28:997-1008
Ekins, Sean; Pottorf, Richard; Reynolds, Robert C et al. (2014) Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis. J Chem Inf Model 54:1070-82
Ekins, Sean; Casey, Allen C; Roberts, David et al. (2014) Bayesian models for screening and TB Mobile for target inference with Mycobacterium tuberculosis. Tuberculosis (Edinb) 94:162-9
Ekins, Sean; Freundlich, Joel S; Reynolds, Robert C (2014) Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 54:2157-65
Ekins, Sean; Reynolds, Robert C; Kim, Hiyun et al. (2013) Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. Chem Biol 20:370-8
Ekins, Sean; Freundlich, Joel S; Reynolds, Robert C (2013) Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 53:3054-63
Ekins, Sean; Reynolds, Robert C; Franzblau, Scott G et al. (2013) Enhancing hit identification in Mycobacterium tuberculosis drug discovery using validated dual-event Bayesian models. PLoS One 8:e63240