Semantic Literature Annotation and Integrative Panomics Analysis for PTM-Disease Knowledge Network Discovery

Wu, Cathy; Shanker, Vijay

Abstract

Protein post-translational modification (PTM) plays a critical role in many diseases; however, critical gaps remain in research infrastructure for global analysis of PTMs. Key PTM information concerning enzyme- substrate relationships, regulation of PTM enzymes, PTM cross-talk, and functional consequences of PTM remains buried in the scientific literature. Meanwhile, while high-throughput panomics (genomic, transcriptomic, proteomic, PTM proteomic) data offer an unprecedented opportunity for the discovery of PTM-disease relationships, the data must be analyzed in an integrated and easily accessible knowledge framework in order for researchers and clinicians to gain a molecular understanding of disease. The goal of this application is to develop a collaborative knowledge environment for semantic annotation of scientific literature and integrative panomics analysis for PTM-disease knowledge discovery in precision medicine. We propose to connect PTM information from literature mining and curated databases in a knowledge resource on an ontological framework that supports analysis of panomics data in the context of PTM networks. To broaden impact and foster collaborative development, our resource will be FAIR (Findable, Accessible, Interoperable, Reusable) and interoperable with community standards.
The specific aims are: (i) develop a novel NLP (natural language processing) system for full-scale literature mining and PTM-disease knowledge extraction; (ii) develop a PTM knowledge resource for integrative panomics analysis and network discovery; and (iii) provide a FAIR collaborative environment for scalable semantic annotation and knowledge integration. The proposed system will build upon the NLP technologies and text mining tools already developed by our team and the bioinformatics infrastructure at the Protein Information Resource (PIR). The iPTMnet web portal will allow searching, browsing, visualization and analysis of PTM networks and PTM-related mutations in conjunction with user-supplied omics data, including panomics data from major national initiatives. Use scenarios will include identification of disease-driving genetic variants and analysis of cellular responses to kinase inhibitors. Our PTM knowledgebase will be disseminated with an RDF triple-store and a SPARQL endpoint for semantic queries, while our text mining tools and full-scale literature mining results will be disseminated in the BioC community standard for seamless integration to other text mining pipelines. To engage the community semantic annotation of scientific literature, we will host a hackathon to develop tools to expose BioC-annotated literature corpora to the semantic web, as well as an annotation jamboree to explore tagging of scientific text with precise ontological terms. This project will thus offer a unique research resource for PTM-disease network discovery as well as an integrable collaborative knowledge framework to support Big Data to Knowledge in precision medicine.

Public Health Relevance

Precision medicine requires a detailed understanding of the molecular events that are disrupted in disease, including changes in protein post-translational modifications (PTM) that are hallmarks of many diseases. The proposed resource will support analysis of genomic-scale data for exploring PTM-disease networks and PTM-related mutations, as well as knowledge dissemination on the semantic web. These combined efforts will accelerate basic understanding of disease processes and discovery of diagnostic targets and more effective individualized therapies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01GM120953-01
Application #: 9195864
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2016-08-05
Project End: 2019-07-31
Budget Start: 2016-08-05
Budget End: 2017-07-31
Support Year: 1
Fiscal Year: 2016
Total Cost: $374,400
Indirect Cost: $134,400

Institution

Name: University of Delaware
Department: Biostatistics & Other Math Sci
Type: Schools of Engineering
DUNS #: 059007500

City: Newark
State: DE
Country: United States
Zip Code: 19716

Related projects


NIH 2018 U01 GM	Semantic Literature Annotation and Integrative Panomics Analysis for PTM-Disease Knowledge Network Discovery Wu, Cathy H.; Shanker, Vijay K. / University of Delaware
NIH 2017 U01 GM	Semantic Literature Annotation and Integrative Panomics Analysis for PTM-Disease Knowledge Network Discovery Wu, Cathy H.; Shanker, Vijay K. / University of Delaware
NIH 2016 U01 GM	Semantic Literature Annotation and Integrative Panomics Analysis for PTM-Disease Knowledge Network Discovery Wu, Cathy H.; Shanker, Vijay K. / University of Delaware	$374,400

Publications

Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550

Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518

Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E et al. (2018) DEXTER: Disease-Expression Relation Extraction from Text. Database (Oxford) 2018:

Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52

Wang, Qinghua; Ross, Karen E; Huang, Hongzhan et al. (2017) Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 1558:213-232

Ross, Karen E; Huang, Hongzhan; Ren, Jia et al. (2017) iPTMnet: Integrative Bioinformatics for Studying PTM Networks. Methods Mol Biol 1558:333-353

Ding, Ruoyao; Boutet, Emmanuel; Lieberherr, Damien et al. (2017) eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality. Database (Oxford) 2017:

Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460

The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169

Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: