The interactions of small molecules with proteins is not only omnipresent throughout cellular processes, but also of fundamental importance to drug design and disease treatment. Much like with protein-protein and protein-RNA interactions, high-throughput (HTP) experimental methods have led to the generation of enormous volumes of protein-small molecule and related data. However, in addition to sheer scale, high degrees of heterogeneity in these data, combined with proprietary ownership concerns when originating from within industry, present significant challenges to the sharing and full use of this data in basic research and therapeutic development. This proposal aims to develop new mathematical methods that can address not only interpreting the data itself, but also the collaborative and generative process through which researchers work: new cryptographic tools can enable unprecedented forms of secure sharing and collaboration between industry and the public, and deep learning of structural features can reduce the dependence of researchers on prior assumptions as to important predictors of drug-target interactions (DTI). In our previous granting period, we successfully developed methods for structure-based prediction and HTP data analysis of protein-protein and protein-RNA interactions, uncovering novel biology (e.g., for neurodegenerative diseases). In this renewal, we aim to: 1) develop scalable methods for multi-party computation and differential privacy to enable the secure sharing of large proprietary drug-target interaction databases among industry and public researchers; 2) develop novel integrative machine learning approaches for identifying drug-target interactions based on interactome, molecular structure and chemogenomic data (in collaboration with co-I Jian Peng); and 3) establish innovative collaborations with industry, academia and the scientific community to drive use and adoption of these computational tools and technologies among practicing biomedical researchers. Successful completion of these aims will provide both public and private research communities with scalable access to technologies for secure sharing of proprietary drug screening data as well as flexible, accurate tools for predicting drug-target interactions. All developed software will be made available via publicly accessible web-based portals under open source software licenses. Collaborations with research partners will validate the relevance of these tools to human health and disease, while the dissemination aim will ensure research communities convenient and ongoing access to these innovations.

Public Health Relevance

The interactions of small molecules with proteins are omnipresent throughout cellular processes and of fundamental importance to drug design and disease treatment, yet the task of predicting these interactions brings major challenges because of the heterogeneity and proprietary nature of the data. In this proposal, we develop new mathematical methods and software that can address not only interpreting the data itself, but also the collaborative and generative process through which researchers work: new cryptographic tools can enable unprecedented forms of secure sharing and collaboration between industry and the public, and deep learning can accelerate the drug discovery process.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM081871-12
Application #
9935079
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2008-04-01
Project End
2021-05-31
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
12
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
001425594
City
Cambridge
State
MA
Country
United States
Zip Code
02142
Cho, Hyunghoon; Berger, Bonnie; Peng, Jian (2018) Generalizable visualization of mega-scale single-cell data. Res Comput Mol Biol 10812:251-253
Liu, Yang; Palmedo, Perry; Ye, Qing et al. (2018) Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst 6:65-74.e3
Ordovas-Montanes, Jose; Dwyer, Daniel F; Nyquist, Sarah K et al. (2018) Allergic inflammatory memory in human respiratory epithelial progenitor cells. Nature 560:649-654
Bepler, Tristan; Morin, Andrew; Noble, Alex J et al. (2018) Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Res Comput Mol Biol 10812:245-247
Hie, Brian; Cho, Hyunghoon; Berger, Bonnie (2018) Realizing private and practical pharmacological collaboration. Science 362:347-350
Cho, Hyunghoon; Berger, Bonnie; Peng, Jian (2018) Generalizable and Scalable Visualization of Single-Cell Data Using Neural Networks. Cell Syst 7:185-191.e4
Orenstein, Yaron; Ohler, Uwe; Berger, Bonnie (2018) Finding RNA structure in the unstructured RBPome. BMC Genomics 19:154
Orenstein, Yaron; Kim, Ryan; Fordyce, Polly et al. (2017) Joker de Bruijn: Sequence Libraries to Cover All k-mers Using Joker Characters. Res Comput Mol Biol 10229:389-390
Orenstein, Yaron; Puccinelli, Robert; Kim, Ryan et al. (2017) Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping. Cell Syst 5:230-236.e5
Luo, Yunan; Zhao, Xinbin; Zhou, Jingtian et al. (2017) A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun 8:573

Showing the most recent 10 out of 50 publications