The main goal of the Data Organizing Core, DOC, of the Illuminating the Druggable Genome Knowledge Management Center (IDG KMC) is to evaluate, organize and rank all prospective disease-linked proteins for four protein superfamilies: G-protein-coupled receptors (GPCRs), nuclear receptors (NRs), ion channels (IC) and kinases. As main knowledge repository, the DOC will develop the "Target Central" Resource Database (TCRD) by combining data extracted from multiple sources linking disease, pathway, protein, chemical, gene, bioactivity, drug discovery and clinical information elements from databases, literature, patents, drug labels and other documents. TCRD will serve as central source for the IDG Query Platform, which is developed by KMC's User Interface Portal (UIP) core. DOC will develop tools for algorithmic processing and prediction, which will improve disease-protein associations supported by human curation. Four External Target Panels will curate emerging associations, ranking appropriate proteins. DOC will stratify proteins into 4 classes (Tclin - clinical;Tchem - manipulated by chemicals;Tmacro - manipulated by macromolecules;and Tdark - the genomic "dark matter"), supported by tissue and cellular localization data for proteins (TTL) and diseases. Oprea at UNM will lead the DOC, supported by team leaders Brunak and Jensen (at Center for Protein Research, Denmark), Overington (European Bioinformatics Institute) and Schurer (University of Miami), respectively.
Specific Aims : 1. Develop tools for the automated extraction and processing of data, deposited into TCRD;2. Develop tools for the semi-automated data extraction for pathways, diseases and associated ontologies, which will support TTL stratification;3. Develop tools for expert curation of literature and patent data, approved drug labels and clinical trials;4. Develop analytics, modeling and visualization tools for disease-based target prioritization. Preliminary stratification (e.g., Tclin 22%, Tdark 30%) of disease-protein associations was performed for each protein superfamily, using automated tools. Within 12 months, the TCRD-based IDG Querly Platform will be operational, improving target prioritization for the research community at large and the IDG Consortium, in exploring "dark matter" for GPCRs, NRs, ICs and kinases.
The Data Organizing Core will combine unrelated informational elements from biology, chemistry and clinical sciences, and distil them into knowledge, associating diseases and proteins, to rank proteins for druggability using facts, inferences and predictions. The results, captured in the Target Central repository, will assist IDG Consortium members and other scientists to focus on the less studied, dark area of the genome..