Protein post-translational modification (PTM) plays a critical role in many diseases; however, critical gaps remain in research infrastructure for global analysis of PTMs. Key PTM information concerning enzyme- substrate relationships, regulation of PTM enzymes, PTM cross-talk, and functional consequences of PTM remains buried in the scientific literature. Meanwhile, while high-throughput panomics (genomic, transcriptomic, proteomic, PTM proteomic) data offer an unprecedented opportunity for the discovery of PTM-disease relationships, the data must be analyzed in an integrated and easily accessible knowledge framework in order for researchers and clinicians to gain a molecular understanding of disease. The goal of this application is to develop a collaborative knowledge environment for semantic annotation of scientific literature and integrative panomics analysis for PTM-disease knowledge discovery in precision medicine. We propose to connect PTM information from literature mining and curated databases in a knowledge resource on an ontological framework that supports analysis of panomics data in the context of PTM networks. To broaden impact and foster collaborative development, our resource will be FAIR (Findable, Accessible, Interoperable, Reusable) and interoperable with community standards.
The specific aims are: (i) develop a novel NLP (natural language processing) system for full-scale literature mining and PTM-disease knowledge extraction; (ii) develop a PTM knowledge resource for integrative panomics analysis and network discovery; and (iii) provide a FAIR collaborative environment for scalable semantic annotation and knowledge integration. The proposed system will build upon the NLP technologies and text mining tools already developed by our team and the bioinformatics infrastructure at the Protein Information Resource (PIR). The iPTMnet web portal will allow searching, browsing, visualization and analysis of PTM networks and PTM-related mutations in conjunction with user-supplied omics data, including panomics data from major national initiatives. Use scenarios will include identification of disease-driving genetic variants and analysis of cellular responses to kinase inhibitors. Our PTM knowledgebase will be disseminated with an RDF triple-store and a SPARQL endpoint for semantic queries, while our text mining tools and full-scale literature mining results will be disseminated in the BioC community standard for seamless integration to other text mining pipelines. To engage the community semantic annotation of scientific literature, we will host a hackathon to develop tools to expose BioC-annotated literature corpora to the semantic web, as well as an annotation jamboree to explore tagging of scientific text with precise ontological terms. This project will thus offer a unique research resource for PTM-disease network discovery as well as an integrable collaborative knowledge framework to support Big Data to Knowledge in precision medicine.
Precision medicine requires a detailed understanding of the molecular events that are disrupted in disease, including changes in protein post-translational modifications (PTM) that are hallmarks of many diseases. The proposed resource will support analysis of genomic-scale data for exploring PTM-disease networks and PTM-related mutations, as well as knowledge dissemination on the semantic web. These combined efforts will accelerate basic understanding of disease processes and discovery of diagnostic targets and more effective individualized therapies.