Biomedical ontologies are critical tools for the accurate representation and integration of genome-scale data in biomedical and translational research. The OBO (Open Biological and Biomedical Ontologies) Foundry is a community effort to develop a systematic and coordinated framework for evidence-based ontology development on the basis of an evolving set of best practice principles. The Protein Ontology (PRO) is the reference ontology for proteins within the OBO Foundry, and is, with the Gene Ontology, one of the first six ontologies recommended by the Foundry as preferred targets for community convergence. To provide a basic ontological framework to capture protein knowledge in a systems biology context, PRO encompasses three sub-ontologies to represent (1) proteins from homologous genes based on evolutionary relatedness (ProEvo);(2) protein forms produced from a given gene, including splice isoforms, mutation variants, and co- or post-translationally modified forms (ProForm);and (3) protein-containing complexes (ProComp). This competitive renewal grant application aims to further develop PRO in order to facilitate its semantic and computational use by the biomedical research community and thereby broaden its scientific impact for discovery and reasoning in the health sciences.
The specific aims are: (i) to enhance the PRO ontological framework;(ii) to broaden the coverage of protein objects;(iii) to enhance the PRO curation platform, website and visual representation;(iv) to develop driving clinical projects;and (v) to expand the scientific impact, adoption and dissemination of PRO. The ontological framework will capture new types of protein objects and relations and connect to semantic resources and reasoning tools. PRO will broaden coverage through mappings and definitions of relations to connect protein objects in existing knowledge bases, and via semi-automated import of protein forms and complexes from curated databases. A graphical network representation will seamlessly connect protein forms and complexes across tax in biological context for disease modeling. Use cases and two specific Driving Clinical Projects-one for reasoning and hypothesis generation for Alzheimer's disease, and one for flow cytometry data representation and immune system modeling-will demonstrate knowledge integration in the OBO Foundry framework as an enabling research infrastructure for reasoning and modeling in the health sciences. We will host annual PRO Scientific Dissemination Meetings addressing the protein-related needs of the bio- and clinical informatics research communities. PRO will be disseminated via multiple websites and ontological services, as well as through reciprocal links with major knowledge resources. ID management for protein objects will include mapping of PRO terms to common database identifiers with well-defined relations and expedited creation of requested PRO terms and UIDs. The PRO research has several unique features and its significance is multi-fold. For knowledge representation, PRO defines precise protein objects to support accurate annotation at the appropriate granularity and provides the ontological framework to connect all protein types necessary to model biology, in particular linking specific protein forms to particular complexes in biological context. For semantic data integration, PRO provides the ontological structure to connect-via specified relations- the vast amounts of protein knowledge contained in databases to support new hypothesis generation and testing. PRO therefore addresses the current gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully leveraging and complementing existing knowledge sources. The proposed research will allow the PRO Consortium to bring together the resources and expertise from several collaborating institutions to deepen and broaden PRO as a mature research infrastructure for biomedical knowledge discovery and translational science.

Public Health Relevance

The PRO ontology will allow researchers to capture and accurately represent scientific knowledge of proteins, providing a research infrastructure for modeling biological systems, improving the understanding of human disease, and aiding in the identification of potential diagnostic and therapeutic targets.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518
Bhattacharya, Sanchita; Dunn, Patrick; Thomas, Cristel G et al. (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52
Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460
Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135
Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78
Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2017) Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45:D339-D346
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515

Showing the most recent 10 out of 56 publications