PRO: A Protein Ontology in OBO Foundry for Scalable Integration of Biomedical Knowledge

Wu, Cathy

Abstract

Biomedical ontologies are critical to the accurate representation and integration of genome-scale data in biomedical and clinical research. The Protein Ontology (PRO)-the reference ontology for protein entities in the OBO (Open Biological and Biomedical Ontologies) Foundry-represents protein families, multiple protein forms (proteoforms) arising from single genes, and protein complexes. This competitive renewal grant application will further establish PRO as a scalable, flexible, collaborative research infrastructure for protein-centric semantic integration of biomedical data of increasing volume and complexity.
Specific aims are to: (i) enable scalable and dynamic representation of protein types; (ii) provide comprehensive coverage of human proteoforms in their biological context; (iii) develop collaborative use cases and support an expanding community of users to advance protein-disease understanding; and (iv) broaden dissemination to support semantic computing, dynamic term mapping, and interoperability and reusability. We will increase PRO coverage by semi-automated import of proteoforms and complexes from curated databases and via established text mining approaches. Expert curation will focus on human variant forms, post-translational modification (PTM) forms and complexes critical to disease processes, along with homologous mouse forms. We will develop a new PRO sub-ontology of protein sites-amino acid positions of significance-to enable automatic and dynamic definition of combinatoric proteoforms. We will establish PRO OWL and RDF versions and provide a SPARQL query endpoint to support semantic computing. We will organize annual workshops and annotation jamborees to develop use cases and promote PRO co-development with community collaborators to address specific disciplinary needs. PRO has several unique features. For knowledge representation, PRO defines precise protein entities to support accurate annotation at the appropriate level of granularity and provides the ontological framework to connect PTM and variant proteoforms and complexes necessary to model human health and disease. For semantic data integration, PRO provides the ontological structure to connect-via specified relations-the vast amounts of proteomics data and biomedical knowledge to support hypothesis generation and testing. PRO therefore addresses the gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully complementing existing knowledge sources. The proposed research will allow the PRO Consortium to deepen and broaden PRO for scalable semantic integration of biomedical data, facilitating protein-disease knowledge discovery and clinical applications by an expanding community of biomedical, clinical and computational users.

Public Health Relevance

The Protein Ontology (PRO) will allow researchers to capture and accurately represent scientific knowledge of proteins thus providing a research infrastructure for modeling biological systems and for protein-centric integration of existing and emerging experimental and clinical data. As a component of the global informatics resource network, the PRO resource will be instrumental in computational analysis and knowledge discovery of genome-scale data and will aid in improving our understanding of human disease, and in the identification of potential diagnostic and therapeutic targets.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM080646-11
Application #: 9120920
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2007-05-01
Project End: 2019-08-31
Budget Start: 2016-09-01
Budget End: 2017-08-31
Support Year: 11
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Delaware
Department: Biostatistics & Other Math Sci
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 059007500

City: Newark
State: DE
Country: United States
Zip Code: 19716

Related projects

Publications

Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550

Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518

Bhattacharya, Sanchita; Dunn, Patrick; Thomas, Cristel G et al. (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015

Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52

Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460

Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135

Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78

Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2017) Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45:D339-D346

The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169

Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515

Showing the most recent 10 out of 56 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: