Biocomputation across distributed private datasets to enhance drug discovery

Bunin, Barry

Abstract

Collaborative Drug Discovery (CDD) proposes to develop technology that will vastly simplify and integrate all the processes required to exploit predictive models for drug discovery. The software will make it easy for scientists without specialized training in informatics to create, train, apply, evaluate, share, and archive models with minimal effort, and also leverage a large library of pre-computed models with zero effort. The software will also enable scientists working in different organizations to collectively build models from their aggregated data and share these models, without sharing the underlying training data. Our goal is to democratize the role in drug discovery of computational models ? which have historically been restricted to computational experts ? and allow models to become routine aids to the discovery workflow in academia, foundations, government laboratories, and small companies that do not have the resources to employ them today. In Phase 2 we implemented modified Bayesian model building directly within CDD?s web- based CDD Vault platform, which securely hosts structure-activity relationship (SAR) data; any user can now easily train a Bayesian model with experimental data stored in her private Vault, then apply the model to predict activity for untested compounds. In Phase 2B we propose to generalize this capability with the following new Specific Aims, which are needed to achieve a widespread scientific and commercial impact:
Aim 1 : Integrate a suite of diverse computational techniques (such as QSAR, Neural Networks, Support Vector Machines, Random Forest, k-Nearest Neighbors, and possibly others) into a single framework, to allow direct side-by-side comparison.
Aim 2 : Develop and validate a universal metric that ranks the predictive strength of each method as applied to a particular dataset.
Aim 3 : Apply the metric to automatically generate thousands of models from high-quality, public-access structure-activity and ADME/Tox datasets and present key results to the user.
Aim 4 : Develop a novel capability to build models collaboratively, by aggregating multiple datasets, and share the models without revealing the compounds and data in the training sets.

Public Health Relevance

The proposed project will create novel computational tools that will help researchers to accelerate the discovery of new and improved drugs against a wide range of diseases. These tools will particularly benefit researchers working on diseases that leading pharmaceutical companies have largely ignored because they are not perceived as highly profitable opportunities, despite the fact that in many cases they afflict millions of people.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Advancing Translational Sciences (NCATS)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 2R44TR000942-05
Application #: 9345057
Study Section: Special Emphasis Panel (ZRG1-IMST-K (14)B)
Program Officer: Colvis, Christine

Project Start: 2013-08-16
Project End: 2019-03-31
Budget Start: 2017-04-01
Budget End: 2018-03-31
Support Year: 5
Fiscal Year: 2017
Total Cost: $750,457
Indirect Cost

Institution

Name: Collaborative Drug Discovery, Inc.
Department
Type: Domestic for-Profits
DUNS #: 149823846

City: Burlingame
State: CA
Country: United States
Zip Code: 94010

Related projects


NIH 2018 R44 TR	Biocomputation across distributed private datasets to enhance drug discovery Bunin, Barry A. / Collaborative Drug Discovery, Inc.
NIH 2017 R44 TR	Biocomputation across distributed private datasets to enhance drug discovery Bunin, Barry A. / Collaborative Drug Discovery, Inc.	$750,457
NIH 2015 R44 TR	Biocomputation across distributed private datasets to enhance drug discovery Ekins, Sean / Collaborative Drug Discovery, Inc.	$379,158
NIH 2014 R44 TR	Biocomputation across distributed private datasets to enhance drug discovery Ekins, Sean / Collaborative Drug Discovery, Inc.	$562,173
NIH 2013 R44 TR	Biocomputation across distributed private datasets to enhance drug discovery Ekins, Sean / Collaborative Drug Discovery, Inc.	$462,944

Publications

Lane, Thomas; Russo, Daniel P; Zorn, Kimberley M et al. (2018) Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 15:4346-4360

Ekins, Sean; Clark, Alex M; Dole, Krishna et al. (2018) Data Mining and Computational Modeling of High-Throughput Screening Datasets. Methods Mol Biol 1755:197-221

Stratton, Thomas P; Perryman, Alexander L; Vilchèze, Catherine et al. (2017) Addressing the Metabolic Stability of Antituberculars through Machine Learning. ACS Med Chem Lett 8:1099-1104

Mikušová, Katarína; Ekins, Sean (2017) Learning from the past for TB drug discovery in the future. Drug Discov Today 22:534-545

Ekins, Sean; Spektor, Anna Coulon; Clark, Alex M et al. (2017) Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today 22:555-565

Perryman, Alexander L; Stratton, Thomas P; Ekins, Sean et al. (2016) Predicting Mouse Liver Microsomal Stability with ""Pruned"" Machine Learning Models and Public Data. Pharm Res 33:433-49

Clark, Alex M; Dole, Krishna; Ekins, Sean (2016) Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 56:275-85

Ekins, Sean; Perryman, Alexander L; Clark, Alex M et al. (2016) Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 56:1332-43

Ekins, Sean (2016) The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 33:2594-603

Clark, Alex M; Ekins, Sean (2015) Open Source Bayesian Models. 2. Mining a ""Big Dataset"" To Create and Validate Models with ChEMBL. J Chem Inf Model 55:1246-60

Showing the most recent 10 out of 17 publications

Comments

Be the first to comment on Barry Bunin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: