Biomedical ontologies are critical to the accurate representation and integration of genome-scale data in biomedical and clinical research. The Protein Ontology (PRO)-the reference ontology for protein entities in the OBO (Open Biological and Biomedical Ontologies) Foundry-represents protein families, multiple protein forms (proteoforms) arising from single genes, and protein complexes. This competitive renewal grant application will further establish PRO as a scalable, flexible, collaborative research infrastructure for protein-centric semantic integration of biomedical data of increasing volume and complexity.
Specific aims are to: (i) enable scalable and dynamic representation of protein types; (ii) provide comprehensive coverage of human proteoforms in their biological context; (iii) develop collaborative use cases and support an expanding community of users to advance protein-disease understanding; and (iv) broaden dissemination to support semantic computing, dynamic term mapping, and interoperability and reusability. We will increase PRO coverage by semi-automated import of proteoforms and complexes from curated databases and via established text mining approaches. Expert curation will focus on human variant forms, post-translational modification (PTM) forms and complexes critical to disease processes, along with homologous mouse forms. We will develop a new PRO sub-ontology of protein sites-amino acid positions of significance-to enable automatic and dynamic definition of combinatoric proteoforms. We will establish PRO OWL and RDF versions and provide a SPARQL query endpoint to support semantic computing. We will organize annual workshops and annotation jamborees to develop use cases and promote PRO co-development with community collaborators to address specific disciplinary needs. PRO has several unique features. For knowledge representation, PRO defines precise protein entities to support accurate annotation at the appropriate level of granularity and provides the ontological framework to connect PTM and variant proteoforms and complexes necessary to model human health and disease. For semantic data integration, PRO provides the ontological structure to connect-via specified relations-the vast amounts of proteomics data and biomedical knowledge to support hypothesis generation and testing. PRO therefore addresses the gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully complementing existing knowledge sources. The proposed research will allow the PRO Consortium to deepen and broaden PRO for scalable semantic integration of biomedical data, facilitating protein-disease knowledge discovery and clinical applications by an expanding community of biomedical, clinical and computational users.
The Protein Ontology (PRO) will allow researchers to capture and accurately represent scientific knowledge of proteins thus providing a research infrastructure for modeling biological systems and for protein-centric integration of existing and emerging experimental and clinical data. As a component of the global informatics resource network, the PRO resource will be instrumental in computational analysis and knowledge discovery of genome-scale data and will aid in improving our understanding of human disease, and in the identification of potential diagnostic and therapeutic targets.
Showing the most recent 10 out of 56 publications