Biomedical ontologies are increasingly important in genomic and proteomic research where complex data in disparate resources need to be integrated. In particular, the Gene Ontology (GO) has become a standard for genome annotation and the Open Biomedical Ontologies (OBO) is an umbrella for ontologies shared across biomedical domains. There is, however, a gap in the current OBO library-an ontology of protein classes and their relationships. The goal of the proposed project is to develop a PRotein Ontology (PRO) to facilitate protein annotation and functional discovery. PRO will be developed within the OBO Foundry, adopting principles specifying best practices in ontology development.
The specific aims of this project are to: (i) develop a Protein Evolution (ProEvo) ontology for the description of proteins based on evolutionary relationships, (ii) develop an ontology for Protein Modified Forms (ProMod) for the representation of multiple protein forms of a gene, (iii) specify relationships between PRO, GO and other OBO ontologies, and (iv) disseminate PRO ontology and develop scientific case studies. ProEvo will be developed based on manually-curated families of full-length proteins in PIRSF and PANTHER and their constituent domains in SCOP and Pfam, initially for human protein-containing classes. The ProEvo classes and their relationships with GO terms will formalize the relationships between phytogeny and function, to allow more consistent and accurate inference of function based on experimental evidence. ProMod will define protein products generated by genetic variation, alternative splicing, proteolytic cleavage, and post-translational modification, initially for human and mouse proteins using fully-curated entries in UniProtKB/Swiss-Prot and the Mouse Genome Initiative, to support specific annotation of proteomes at the precise levels of variants, isoforms, and modified products. Defined based on scientific case studies of human and mouse disease proteins, the relations between PRO, GO, and other OBO ontologies, such as Disease Ontology, will capture the relationships required for disease understanding. For PRO .dissemination to the community, the ontology will be integrated into OBO, new relations will be added to the OBO Relations Ontology, and an annual protein ontology workshop will be organized. PRO will also be accessible from the PIR web site for integrative protein analysis. Through scientific meetings and collaborations, the PRO consortium will interact with the wider scientific community to ensure that PRO is useful and widely adopted. The PRO ontology will allow researchers to explore functional and evolutionary relationships of proteins to improve understanding of disease and identify potential diagnostic and therapeutic targets.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Delaware
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518
Bhattacharya, Sanchita; Dunn, Patrick; Thomas, Cristel G et al. (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515
Wang, Qinghua; Ross, Karen E; Huang, Hongzhan et al. (2017) Analysis of Protein Phosphorylation and Its Functional Impact on Protein-Protein Interactions via Text Mining of the Scientific Literature. Methods Mol Biol 1558:213-232
Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460
Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135
Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78

Showing the most recent 10 out of 56 publications