The interactions of small molecules with proteins is not only omnipresent throughout cellular processes, but also of fundamental importance to drug design and disease treatment. Much like with protein-protein and protein-RNA interactions, high-throughput (HTP) experimental methods have led to the generation of enormous volumes of protein-small molecule and related data. However, in addition to sheer scale, high degrees of heterogeneity in these data, combined with proprietary ownership concerns when originating from within industry, present significant challenges to the sharing and full use of this data in basic research and therapeutic development. This proposal aims to develop new mathematical methods that can address not only interpreting the data itself, but also the collaborative and generative process through which researchers work: new cryptographic tools can enable unprecedented forms of secure sharing and collaboration between industry and the public, and deep learning of structural features can reduce the dependence of researchers on prior assumptions as to important predictors of drug-target interactions (DTI). In our previous granting period, we successfully developed methods for structure-based prediction and HTP data analysis of protein-protein and protein-RNA interactions, uncovering novel biology (e.g., for neurodegenerative diseases). In this renewal, we aim to: 1) develop scalable methods for multi-party computation and differential privacy to enable the secure sharing of large proprietary drug-target interaction databases among industry and public researchers; 2) develop novel integrative machine learning approaches for identifying drug-target interactions based on interactome, molecular structure and chemogenomic data (in collaboration with co-I Jian Peng); and 3) establish innovative collaborations with industry, academia and the scientific community to drive use and adoption of these computational tools and technologies among practicing biomedical researchers. Successful completion of these aims will provide both public and private research communities with scalable access to technologies for secure sharing of proprietary drug screening data as well as flexible, accurate tools for predicting drug-target interactions. All developed software will be made available via publicly accessible web-based portals under open source software licenses. Collaborations with research partners will validate the relevance of these tools to human health and disease, while the dissemination aim will ensure research communities convenient and ongoing access to these innovations.
The interactions of small molecules with proteins are omnipresent throughout cellular processes and of fundamental importance to drug design and disease treatment, yet the task of predicting these interactions brings major challenges because of the heterogeneity and proprietary nature of the data. In this proposal, we develop new mathematical methods and software that can address not only interpreting the data itself, but also the collaborative and generative process through which researchers work: new cryptographic tools can enable unprecedented forms of secure sharing and collaboration between industry and the public, and deep learning can accelerate the drug discovery process.
Showing the most recent 10 out of 50 publications