The understudied protein targets that are the focus of the implementation phase of the Illuminating the Druggable Genome (IDG) project need to be placed in the contexts of gene-sets/pathways, drugs/small-molecules, diseases/phenotypes, and cells/tissues. By extending our previous methods, we will impute knowledge about the understudied potential target protein kinases, GPCRs, and ion channels listed in the RFA using machine learning strategies. To establish this classification system, we will organize data from many omics- and literature- based resources into attribute tables where genes are the rows and their attributes are the columns. Examples of such attribute tables include gene or protein expression in cancer cell lines (CCLE) or human tissues (GTEx), changes in expression in response to drug perturbations or single-gene knockdowns (LINCS), regulation by transcription factors based on ChIP-seq data (ENCODE), and phenotypes in mice observed when single genes are knocked out (KOMP). In total, we will process and abstract data from over 100 resources. We will then predict target functions, target association with pathways, small-molecules/drugs that modulate the activity and expression of the target, and target relevance to human disease. To further validate such predictions, we will employ text mining to identify knowledge that corroborates with the data mining predictions, perform molecular docking of predicted small molecules using homology modeling, and seek associations between variants and human diseases by mining electronic medical records (EMR) together with genomic profiling of thousands of patients. In addition, we will develop innovative data visualization tools to allow users to interact with all the collected data, and develop social networking software to build communities centered around proteins/genes/targets as well as biological topics including pathways, cell types, drugs/small-molecules, and diseases. Overall, we will develop an invaluable resource that will accelerate target and drug discovery.
The Knowledge Management Center (KMC) for the Illuminating the Druggable Genome (IDG) project will facilitate translational research by integrating and mining data about understudied druggable targets from numerous repositories and other resources. The KMC for IDG team will develop novel tools to analyze these data for the purpose of finding connections between genes/proteins/targets and diseases/phenotypes, cells/tissues, pathways/gene-sets, and drugs/small-molecules in order to identify potential applications to treat diseases and for other biological contexts of clinical relevance.