The understudied protein targets that are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project need to be placed in the contexts of gene-sets/pathways, drugs/small-molecules, diseases/phenotypes, and cells/tissues. By extending our previous methods, we will impute knowledge about the understudied potential target protein kinases, GPCRs, and ion channels listed in the RFA using machine learning strategies. To establish this classification system, we will organize data from many omics- and literature- based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we will process and abstract data from over 100 resources. We will then predict target functions, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. To further validate such predictions, we will employ text mining to identify knowledge that corroborates with the data mining predictions, perform molecular docking of predicted small molecules using homology modeling, and seek associations between variants and human diseases by mining electronic medical records (EMR) together with genomic profiling of thousands of patients. In addition, we will develop innovative data visualization tools to allow users to interact with all the collected data, and develop social networking software to build communities centered around proteins/genes/targets as well as biological topics including pathways, cell types, drugs/small-molecules, and diseases. Overall, we will develop an invaluable resource that will accelerate target and drug discovery.

Public Health Relevance

The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project will facilitate translational research by integrating and mining data about understudied druggable targets from numerous repositories and other resources. The KMC for IDG team will develop novel tools to analyze these data for the purpose of finding connections between genes/proteins/targets and diseases/phenotypes, cells/tissues, pathways/gene-sets, and drugs/small-molecules in order to identify potential applications to treat diseases and for other biological contexts of clinical relevance.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Zenklusen, Jean C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code
Torre, Denis; Lachmann, Alexander; Ma'ayan, Avi (2018) BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Syst 7:556-561.e3
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M et al. (2018) Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Sci Data 5:180023
Oprea, Tudor I; Bologa, Cristian G; Brunak, Søren et al. (2018) Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov 17:317-332
Guo, Yiqing; Pace, Jesse; Li, Zhengzhe et al. (2018) Podocyte-Specific Induction of Krüppel-Like Factor 15 Restores Differentiation Markers and Attenuates Kidney Injury in Proteinuric Kidney Disease. J Am Soc Nephrol 29:2529-2545
Wang, Zichen; Lachmann, Alexander; Keenan, Alexandra B et al. (2018) L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics 34:2150-2152
Lachmann, Alexander; Torre, Denis; Keenan, Alexandra B et al. (2018) Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun 9:1366
Clarke, Daniel J B; Kuleshov, Maxim V; Schilder, Brian M et al. (2018) eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Res 46:W171-W179