Biomedical ontologies are critical tools for the accurate representation and integration of genome-scale data in biomedical and translational research. The OBO (Open Biological and Biomedical Ontologies) Foundry is a community effort to develop a systematic and coordinated framework for evidence-based ontology development on the basis of an evolving set of best practice principles. The Protein Ontology (PRO) is the reference ontology for proteins within the OBO Foundry, and is, with the Gene Ontology, one of the first six ontologies recommended by the Foundry as preferred targets for community convergence. To provide a basic ontological framework to capture protein knowledge in a systems biology context, PRO encompasses three sub-ontologies to represent (1) proteins from homologous genes based on evolutionary relatedness (ProEvo);(2) protein forms produced from a given gene, including splice isoforms, mutation variants, and co- or post-translationally modified forms (ProForm);and (3) protein-containing complexes (ProComp). This competitive renewal grant application aims to further develop PRO in order to facilitate its semantic and computational use by the biomedical research community and thereby broaden its scientific impact for discovery and reasoning in the health sciences.
The specific aims are: (i) to enhance the PRO ontological framework;(ii) to broaden the coverage of protein objects;(iii) to enhance the PRO curation platform, website and visual representation;(iv) to develop driving clinical projects;and (v) to expand the scientific impact, adoption and dissemination of PRO. The ontological framework will capture new types of protein objects and relations and connect to semantic resources and reasoning tools. PRO will broaden coverage through mappings and definitions of relations to connect protein objects in existing knowledge bases, and via semi-automated import of protein forms and complexes from curated databases. A graphical network representation will seamlessly connect protein forms and complexes across tax in biological context for disease modeling. Use cases and two specific Driving Clinical Projects-one for reasoning and hypothesis generation for Alzheimer's disease, and one for flow cytometry data representation and immune system modeling-will demonstrate knowledge integration in the OBO Foundry framework as an enabling research infrastructure for reasoning and modeling in the health sciences. We will host annual PRO Scientific Dissemination Meetings addressing the protein-related needs of the bio- and clinical informatics research communities. PRO will be disseminated via multiple websites and ontological services, as well as through reciprocal links with major knowledge resources. ID management for protein objects will include mapping of PRO terms to common database identifiers with well-defined relations and expedited creation of requested PRO terms and UIDs. The PRO research has several unique features and its significance is multi-fold. For knowledge representation, PRO defines precise protein objects to support accurate annotation at the appropriate granularity and provides the ontological framework to connect all protein types necessary to model biology, in particular linking specific protein forms to particular complexes in biological context. For semantic data integration, PRO provides the ontological structure to connect-via specified relations- the vast amounts of protein knowledge contained in databases to support new hypothesis generation and testing. PRO therefore addresses the current gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully leveraging and complementing existing knowledge sources. The proposed research will allow the PRO Consortium to bring together the resources and expertise from several collaborating institutions to deepen and broaden PRO as a mature research infrastructure for biomedical knowledge discovery and translational science.

Public Health Relevance

The PRO ontology will allow researchers to capture and accurately represent scientific knowledge of proteins, providing a research infrastructure for modeling biological systems, improving the understanding of human disease, and aiding in the identification of potential diagnostic and therapeutic targets.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Biomed Engr/Col Engr/Engr Sta
United States
Zip Code
Poux, Sylvain; Magrane, Michele; Arighi, Cecilia N et al. (2014) Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database (Oxford) 2014:bau016
Famiglietti, Maria Livia; Estreicher, Anne; Gos, Arnaud et al. (2014) Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 35:927-35
UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:D191-8
Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2014) Protein Ontology: a controlled structured network of protein entities. Nucleic Acids Res 42:D415-21
Ross, Karen E; Arighi, Cecilia N; Ren, Jia et al. (2013) Use of the protein ontology for multi-faceted analysis of biological processes: a case study of the spindle checkpoint. Front Genet 4:62
UniProt Consortium (2013) Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41:D43-7
Ross, Karen E; Arighi, Cecilia N; Ren, Jia et al. (2013) Construction of protein phosphorylation networks by data mining, text mining and ontology integration: analysis of the spindle checkpoint. Database (Oxford) 2013:bat038
Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H et al. (2013) HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41:D584-9
UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40:D71-5
Arighi, Cecilia N (2011) A tutorial on protein ontology resources for proteomic studies. Methods Mol Biol 694:77-90

Showing the most recent 10 out of 23 publications