Biomedical ontologies are critical to the accurate representation and integration of genome-scale data in biomedical and clinical research. The Protein Ontology (PRO)-the reference ontology for protein entities in the OBO (Open Biological and Biomedical Ontologies) Foundry-represents protein families, multiple protein forms (proteoforms) arising from single genes, and protein complexes. This competitive renewal grant application will further establish PRO as a scalable, flexible, collaborative research infrastructure for protein-centric semantic integration of biomedical data of increasing volume and complexity.
Specific aims are to: (i) enable scalable and dynamic representation of protein types; (ii) provide comprehensive coverage of human proteoforms in their biological context; (iii) develop collaborative use cases and support an expanding community of users to advance protein-disease understanding; and (iv) broaden dissemination to support semantic computing, dynamic term mapping, and interoperability and reusability. We will increase PRO coverage by semi-automated import of proteoforms and complexes from curated databases and via established text mining approaches. Expert curation will focus on human variant forms, post-translational modification (PTM) forms and complexes critical to disease processes, along with homologous mouse forms. We will develop a new PRO sub-ontology of protein sites-amino acid positions of significance-to enable automatic and dynamic definition of combinatoric proteoforms. We will establish PRO OWL and RDF versions and provide a SPARQL query endpoint to support semantic computing. We will organize annual workshops and annotation jamborees to develop use cases and promote PRO co-development with community collaborators to address specific disciplinary needs. PRO has several unique features. For knowledge representation, PRO defines precise protein entities to support accurate annotation at the appropriate level of granularity and provides the ontological framework to connect PTM and variant proteoforms and complexes necessary to model human health and disease. For semantic data integration, PRO provides the ontological structure to connect-via specified relations-the vast amounts of proteomics data and biomedical knowledge to support hypothesis generation and testing. PRO therefore addresses the gaps in the bioinformatics infrastructure for protein representations in a way that makes knowledge about proteins more accessible to computational reasoning, fully complementing existing knowledge sources. The proposed research will allow the PRO Consortium to deepen and broaden PRO for scalable semantic integration of biomedical data, facilitating protein-disease knowledge discovery and clinical applications by an expanding community of biomedical, clinical and computational users.
The Protein Ontology (PRO) will allow researchers to capture and accurately represent scientific knowledge of proteins thus providing a research infrastructure for modeling biological systems and for protein-centric integration of existing and emerging experimental and clinical data. As a component of the global informatics resource network, the PRO resource will be instrumental in computational analysis and knowledge discovery of genome-scale data and will aid in improving our understanding of human disease, and in the identification of potential diagnostic and therapeutic targets.
Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E et al. (2018) iPTMnet: an integrated resource for protein post-translational modification network discovery. Nucleic Acids Res 46:D542-D550 |
Huang, Liang-Chin; Ross, Karen E; Baffi, Timothy R et al. (2018) Integrative annotation and knowledge discovery of kinase post-translational modifications and cancer-associated mutations through federated protein ontologies and resources. Sci Rep 8:6518 |
Bhattacharya, Sanchita; Dunn, Patrick; Thomas, Cristel G et al. (2018) ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5:180015 |
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52 |
Poux, Sylvain; Arighi, Cecilia N; Magrane, Michele et al. (2017) On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics 33:3454-3460 |
Gurcan, Metin N; Tomaszewski, John; Overton, James A et al. (2017) Developing the Quantitative Histopathology Image Ontology (QHIO): A case study using the hot spot detection problem. J Biomed Inform 66:129-135 |
Arighi, Cecilia N; Drabkin, Harold; Christie, Karen R et al. (2017) Tutorial on Protein Ontology Resources. Methods Mol Biol 1558:57-78 |
Natale, Darren A; Arighi, Cecilia N; Blake, Judith A et al. (2017) Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res 45:D339-D346 |
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169 |
Zaru, Rossana; Magrane, Michele; O'Donovan, Claire et al. (2017) From the research laboratory to the database: the Caenorhabditis elegans kinome in UniProtKB. Biochem J 474:493-515 |
Showing the most recent 10 out of 56 publications