Protein post-translational modification (PTM) plays a critical role in many diseases; however, critical gaps remain in research infrastructure for global analysis of PTMs. Key PTM information concerning enzyme- substrate relationships, regulation of PTM enzymes, PTM cross-talk, and functional consequences of PTM remains buried in the scientific literature. Meanwhile, while high-throughput panomics (genomic, transcriptomic, proteomic, PTM proteomic) data offer an unprecedented opportunity for the discovery of PTM-disease relationships, the data must be analyzed in an integrated and easily accessible knowledge framework in order for researchers and clinicians to gain a molecular understanding of disease. The goal of this application is to develop a collaborative knowledge environment for semantic annotation of scientific literature and integrative panomics analysis for PTM-disease knowledge discovery in precision medicine. We propose to connect PTM information from literature mining and curated databases in a knowledge resource on an ontological framework that supports analysis of panomics data in the context of PTM networks. To broaden impact and foster collaborative development, our resource will be FAIR (Findable, Accessible, Interoperable, Reusable) and interoperable with community standards.
The specific aims are: (i) develop a novel NLP (natural language processing) system for full-scale literature mining and PTM-disease knowledge extraction; (ii) develop a PTM knowledge resource for integrative panomics analysis and network discovery; and (iii) provide a FAIR collaborative environment for scalable semantic annotation and knowledge integration. The proposed system will build upon the NLP technologies and text mining tools already developed by our team and the bioinformatics infrastructure at the Protein Information Resource (PIR). The iPTMnet web portal will allow searching, browsing, visualization and analysis of PTM networks and PTM-related mutations in conjunction with user-supplied omics data, including panomics data from major national initiatives. Use scenarios will include identification of disease-driving genetic variants and analysis of cellular responses to kinase inhibitors. Our PTM knowledgebase will be disseminated with an RDF triple-store and a SPARQL endpoint for semantic queries, while our text mining tools and full-scale literature mining results will be disseminated in the BioC community standard for seamless integration to other text mining pipelines. To engage the community semantic annotation of scientific literature, we will host a hackathon to develop tools to expose BioC-annotated literature corpora to the semantic web, as well as an annotation jamboree to explore tagging of scientific text with precise ontological terms. This project will thus offer a unique research resource for PTM-disease network discovery as well as an integrable collaborative knowledge framework to support Big Data to Knowledge in precision medicine.

Public Health Relevance

Precision medicine requires a detailed understanding of the molecular events that are disrupted in disease, including changes in protein post-translational modifications (PTM) that are hallmarks of many diseases. The proposed resource will support analysis of genomic-scale data for exploring PTM-disease networks and PTM-related mutations, as well as knowledge dissemination on the semantic web. These combined efforts will accelerate basic understanding of disease processes and discovery of diagnostic targets and more effective individualized therapies.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01GM120953-02
Application #
9326315
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2016-08-05
Project End
2019-07-31
Budget Start
2017-08-01
Budget End
2018-07-31
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Delaware
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
059007500
City
Newark
State
DE
Country
United States
Zip Code
19716
Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E et al. (2018) DEXTER: Disease-Expression Relation Extraction from Text. Database (Oxford) 2018:
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52
Ding, Ruoyao; Boutet, Emmanuel; Lieberherr, Damien et al. (2017) eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality. Database (Oxford) 2017:
Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515
Wang, Qinghua; Ross, Karen E; Huang, Hongzhan et al. (2017) Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 1558:213-232
Ross, Karen E; Huang, Hongzhan; Ren, Jia et al. (2017) iPTMnet: Integrative Bioinformatics for Studying PTM Networks. Methods Mol Biol 1558:333-353