The understudied protein targets that are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project need to be placed in the contexts of gene-sets/pathways, drugs/small-molecules, diseases/phenotypes, and cells/tissues. By extending our previous methods, we will impute knowledge about the understudied potential target protein kinases, GPCRs, and ion channels listed in the RFA using machine learning strategies. To establish this classification system, we will organize data from many omics- and literature- based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we will process and abstract data from over 100 resources. We will then predict target functions, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. To further validate such predictions, we will employ text mining to identify knowledge that corroborates with the data mining predictions, perform molecular docking of predicted small molecules using homology modeling, and seek associations between variants and human diseases by mining electronic medical records (EMR) together with genomic profiling of thousands of patients. In addition, we will develop innovative data visualization tools to allow users to interact with all the collected data, and develop social networking software to build communities centered around proteins/genes/targets as well as biological topics including pathways, cell types, drugs/small-molecules, and diseases. Overall, we will develop an invaluable resource that will accelerate target and drug discovery.
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project will facilitate translational research by integrating and mining data about understudied druggable targets from numerous repositories and other resources. The KMC for IDG team will develop novel tools to analyze these data for the purpose of finding connections between genes/proteins/targets and diseases/phenotypes, cells/tissues, pathways/gene-sets, and drugs/small-molecules in order to identify potential applications to treat diseases and for other biological contexts of clinical relevance.
|Lachmann, Alexander; Torre, Denis; Keenan, Alexandra B et al. (2018) Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun 9:1366|
|Clarke, Daniel J B; Kuleshov, Maxim V; Schilder, Brian M et al. (2018) eXpression2Kinases (X2K) Web: linking expression signatures to upstream cell signaling networks. Nucleic Acids Res 46:W171-W179|
|Torre, Denis; Lachmann, Alexander; Ma'ayan, Avi (2018) BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Syst 7:556-561.e3|
|Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M et al. (2018) Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Sci Data 5:180023|
|Oprea, Tudor I; Bologa, Cristian G; Brunak, Søren et al. (2018) Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov 17:317-332|
|Guo, Yiqing; Pace, Jesse; Li, Zhengzhe et al. (2018) Podocyte-Specific Induction of Krüppel-Like Factor 15 Restores Differentiation Markers and Attenuates Kidney Injury in Proteinuric Kidney Disease. J Am Soc Nephrol 29:2529-2545|
|Wang, Zichen; Lachmann, Alexander; Keenan, Alexandra B et al. (2018) L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics 34:2150-2152|