Biomedical ontologies are critical to the accurate representation and integration of genome-scale data in biomedical and clinical research. The Protein Ontology (PRO)-the reference ontology for protein entities in the OBO (Open Biological and Biomedical Ontologies) Foundry-represents protein families, multiple protein forms (proteoforms) arising from single genes, and protein complexes. This competitive renewal grant application will further establish PRO as a scalable, flexible, collaborative research infrastructure for protein-centric semantic integration of biomedical data of increasing volume and complexity.
Specific aims are to: (i) enable scalable and dynamic representation of protein types; (ii) provide comprehensive coverage of human proteoforms in their biological context; (iii) develop collaborative use cases and support an expanding community of users to advance protein-disease understanding; and (iv) broaden dissemination to support semantic computing, dynamic term mapping, and interoperability and reusability. We will increase PRO coverage by semi-automated import of proteoforms and complexes from curated databases and via established text mining approaches. Expert curation will focus on human variant forms, post-translational modification (PTM) forms and complexes critical to disease processes, along with homologous mouse forms. We will develop a new PRO sub-ontology of protein sites-amino acid positions of significance-to enable automatic and dynamic definition of combinatoric proteoforms. We will establish PRO OWL and RDF versions and provide a SPARQL query endpoint to support semantic computing. We will organize annual workshops and annotation jamborees to develop use cases and promote PRO co-development with community collaborators to address specific disciplinary needs. PRO has several unique features. For knowledge representation, PRO defines precise protein entities to support accurate annotation at the appropriate level of granularity and provides the ontological framework to connect PTM and variant proteoforms and complexes necessary to model human health and disease. For semantic data integration, PRO provides the ontological structure to connect-via specified relations-the vast amounts of proteomics data and biomedical knowledge to support hypothesis generation and testing. PRO therefore addresses the gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully complementing existing knowledge sources. The proposed research will allow the PRO Consortium to deepen and broaden PRO for scalable semantic integration of biomedical data, facilitating protein-disease knowledge discovery and clinical applications by an expanding community of biomedical, clinical and computational users.

Public Health Relevance

The Protein Ontology (PRO) will allow researchers to capture and accurately represent scientific knowledge of proteins thus providing a research infrastructure for modeling biological systems and for protein-centric integration of existing and emerging experimental and clinical data. As a component of the global informatics resource network, the PRO resource will be instrumental in computational analysis and knowledge discovery of genome-scale data and will aid in improving our understanding of human disease, and in the identification of potential diagnostic and therapeutic targets.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM080646-11
Application #
9120920
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2007-05-01
Project End
2019-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
11
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of Delaware
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
059007500
City
Newark
State
DE
Country
United States
Zip Code
19716
Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518
Bhattacharya, Sanchita; Dunn, Patrick; Thomas, Cristel G et al. (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52
Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460
Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135
Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78
Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2017) Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45:D339-D346
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515

Showing the most recent 10 out of 56 publications