Because of the staggering complexity of biological systems, biomedical research is becoming increasingly dependent on knowledge stored in a computable form. The Gene Ontology (GO) is by far the largest knowledgebase of how genes function, and has become a critical component of the computational infrastructure enabling the genomic revolution. It has become nearly indispensible in the interpretation of large- scale molecular measurements in biological research. Crucially, for human health research, GO is also one of a suite of complementary ontologies constructed in such as way to maximally promote interoperability and comparability of data sets. It represents the gene functions and biological processes that are perturbed in human disease, e.g. via the links from Human Phenotype Ontology (HPO) class abnormality of lipid metabolism, defined in relation to the GO class lipid metabolic process (GO_0006629), researchers or clinicians can find the set of genes that are known to be involved in this process. GO is a knowledge resource that can be statistically mined, either standalone or in combination with data from other knowledge resources, which enables experts to discover connections and form new hypotheses from the biological networks GO represents. All knowledge in GO is represented using semantic web technologies and so is amenable to computational integration and consistency checking. The proposed GO knowledge environment will enable a wider community of scientists to contribute to, and to utilize, a common, computable representation of biology. To ensure the knowledge environment meets the requirements of biomedical researchers, we will: a) deliver a comprehensive, detailed, computable knowledgebase of gene function, encoded in the Gene Ontology and annotations (computer-readable statements about the how specific genes function), focusing on human biology; b) provide a ?hub? for a broad community of scientists to collaboratively extend, correct and improve the knowledgebase; c) ensure the GO knowledge resource is of the highest quality with regards to depth, breadth and accuracy; d) facilitate the transfer of insights obtained from studies of non-human organisms, such as the mouse and zebrafish, to human biology; and e) enable the scientific community to use the knowledgebase in analyses of large-scale genetic and -omics data.
Our aims reflect the essential requirements for realizing the overarching objectives for a biomedical data resource: efficiently capturing and integrating biological knowledge and adhering to the highest possible standard for accuracy and detail; constructing and providing a robust, flexible, powerful, and extensible technological infrastructure available not only for internal use but just as easily by the wider community; and lastly, leveraging state-of-the-art social media, web services and other technologies to disseminate the GO resource to the entire biomedical research community.

Public Health Relevance

This project aims to provide a complete and integrated picture of what every single gene in a human being does, thus allowing us to better understand the genetic and cellular workings of human health and disease. We do this by developing the Gene Ontology, a computational resource that collects biological knowledge into a large network structure that connects genes with the roles they play. Researchers, clinicians, and sophisticated computer programs use this network to interpret the massive amounts of biomedical and genomic data being generated in experiments and in studies designed to gain key insights into human health.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
2U41HG002273-17
Application #
9209989
Study Section
Genome Research Review Committee (GNOM-G)
Program Officer
Di Francesco, Valentina
Project Start
2001-01-19
Project End
2022-02-28
Budget Start
2017-03-02
Budget End
2018-02-28
Support Year
17
Fiscal Year
2017
Total Cost
$2,772,772
Indirect Cost
$429,358
Name
University of Southern California
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90032
Christie, Karen R; Blake, Judith A (2018) Sensing the cilium, digital capture of ciliary data for comparative genomics investigations. Cilia 7:3
Müller, H-M; Van Auken, K M; Li, Y et al. (2018) Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature. BMC Bioinformatics 19:94
Pichler, Klemens; Warner, Kate; Magrane, Michele et al. (2018) SPIN: Submitting Sequences Determined at Protein Level to UniProt. Curr Protoc Bioinformatics 62:e52
Denaxas, Spiros C (2017) Integrating Bio-ontologies and Controlled Clinical Terminologies: From Base Pairs to Bedside Phenotypes. Methods Mol Biol 1446:275-287
Ruch, Patrick (2017) Text Mining to Support Gene Ontology Curation and Vice Versa. Methods Mol Biol 1446:69-84
Pesquita, Catia (2017) Semantic Similarity in the Gene Ontology. Methods Mol Biol 1446:161-173
Friedberg, Iddo; Radivojac, Predrag (2017) Community-Wide Evaluation of Computational Function Prediction. Methods Mol Biol 1446:133-146
Vesztrocy, Alex Warwick; Dessimoz, Christophe (2017) A Gene Ontology Tutorial in Python. Methods Mol Biol 1446:221-229
Poux, Sylvain; Gaudet, Pascale (2017) Best Practices in Manual Annotation with the Gene Ontology. Methods Mol Biol 1446:41-54
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158-D169

Showing the most recent 10 out of 88 publications