Biomedical ontologies are critical tools for the accurate representation and integration of genome-scale data in biomedical and translational research. The OBO (Open Biological and Biomedical Ontologies) Foundry is a community effort to develop a systematic and coordinated framework for evidence-based ontology development on the basis of an evolving set of best practice principles. The Protein Ontology (PRO) is the reference ontology for proteins within the OBO Foundry, and is, with the Gene Ontology, one of the first six ontologies recommended by the Foundry as preferred targets for community convergence. To provide a basic ontological framework to capture protein knowledge in a systems biology context, PRO encompasses three sub-ontologies to represent (1) proteins from homologous genes based on evolutionary relatedness (ProEvo);(2) protein forms produced from a given gene, including splice isoforms, mutation variants, and co- or post-translationally modified forms (ProForm);and (3) protein-containing complexes (ProComp). This competitive renewal grant application aims to further develop PRO in order to facilitate its semantic and computational use by the biomedical research community and thereby broaden its scientific impact for discovery and reasoning in the health sciences.
The specific aims are: (i) to enhance the PRO ontological framework;(ii) to broaden the coverage of protein objects;(iii) to enhance the PRO curation platform, website and visual representation;(iv) to develop driving clinical projects;and (v) to expand the scientific impact, adoption and dissemination of PRO. The ontological framework will capture new types of protein objects and relations and connect to semantic resources and reasoning tools. PRO will broaden coverage through mappings and definitions of relations to connect protein objects in existing knowledge bases, and via semi-automated import of protein forms and complexes from curated databases. A graphical network representation will seamlessly connect protein forms and complexes across tax in biological context for disease modeling. Use cases and two specific Driving Clinical Projects-one for reasoning and hypothesis generation for Alzheimer's disease, and one for flow cytometry data representation and immune system modeling-will demonstrate knowledge integration in the OBO Foundry framework as an enabling research infrastructure for reasoning and modeling in the health sciences. We will host annual PRO Scientific Dissemination Meetings addressing the protein-related needs of the bio- and clinical informatics research communities. PRO will be disseminated via multiple websites and ontological services, as well as through reciprocal links with major knowledge resources. ID management for protein objects will include mapping of PRO terms to common database identifiers with well-defined relations and expedited creation of requested PRO terms and UIDs. The PRO research has several unique features and its significance is multi-fold. For knowledge representation, PRO defines precise protein objects to support accurate annotation at the appropriate granularity and provides the ontological framework to connect all protein types necessary to model biology, in particular linking specific protein forms to particular complexes in biological context. For semantic data integration, PRO provides the ontological structure to connect-via specified relations- the vast amounts of protein knowledge contained in databases to support new hypothesis generation and testing. PRO therefore addresses the current gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully leveraging and complementing existing knowledge sources. The proposed research will allow the PRO Consortium to bring together the resources and expertise from several collaborating institutions to deepen and broaden PRO as a mature research infrastructure for biomedical knowledge discovery and translational science.

Public Health Relevance

The PRO ontology will allow researchers to capture and accurately represent scientific knowledge of proteins, providing a research infrastructure for modeling biological systems, improving the understanding of human disease, and aiding in the identification of potential diagnostic and therapeutic targets.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Swain, Amy L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2017) Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45:D339-D346
Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78
Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515
Wang, Qinghua; Ross, Karen E; Huang, Hongzhan et al. (2017) Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 1558:213-232
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169
Ross, Karen E; Natale, Darren A; Arighi, Cecilia et al. (2016) Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology. CEUR Workshop Proc 1747:
Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J et al. (2016) OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data. J Biomed Semantics 7:25
Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael et al. (2016) UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 1374:23-54
Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias et al. (2016) The Ontology for Biomedical Investigations. PLoS One 11:e0154556

Showing the most recent 10 out of 49 publications