Genome annotations combine sequence, the results of bioinformatics analyses, and the knowledge of human curators into models of gene structure. These annotations provide a basic resource for investigations into the genetic causes of human disease. Despite their potential as a resource for such studies, genome annotations have proven difficult to use. A major reason for this has been the lack of community standards for describing them, which has resulted in the proliferation of arbitrary file formats and database schemas. In order to solve this problem, the Gene Ontology Consortium has developed the Sequence Ontology (SO). The purpose of SO is unify the description of genome annotations. Many model organism databases such as SGD, WormBase and FlyBase have now adopted SO, and release their annotations in SO-compliant formats. Many other genome databases are attempting to follow suite, but are finding it difficult to do so. One reason for their difficulties is the lack of publicly available software for managing and distributing SO- compliant genome annotations. The goal of this proposal is to further develop, improve and consolidate existing software tools that will help the broader genomics community to use the Sequence Ontology as a tool to produce, manage, and disseminate SO-compliant genome annotations. Our proposed data adapters and converters will help bring old annotation data and software forward;our SO-based quality control pipelines will ensure that the data produced by different databases is indeed interoperable;and our navigation and database search tools will help human curators to produce higher quality SO-compliant genome annotations.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004341-03
Application #
7656645
Study Section
Special Emphasis Panel (ZRG1-BST-D (51))
Program Officer
Good, Peter J
Project Start
2007-08-15
Project End
2011-06-30
Budget Start
2009-07-01
Budget End
2010-06-30
Support Year
3
Fiscal Year
2009
Total Cost
$178,010
Indirect Cost
Name
University of Utah
Department
Genetics
Type
Schools of Medicine
DUNS #
009095365
City
Salt Lake City
State
UT
Country
United States
Zip Code
84112
Desvignes, T; Batzel, P; Berezikov, E et al. (2015) miRNA Nomenclature: A View Incorporating Genetic Origins, Biosynthetic Pathways, and Sequence Variants. Trends Genet 31:613-626
Cunningham, Fiona; Moore, Barry; Ruiz-Schultz, Nicole et al. (2015) Improving the Sequence Ontology terminology for genomic variant annotation. J Biomed Semantics 6:32
Welch, Brandon M; Eilbeck, Karen; Del Fiol, Guilherme et al. (2014) Technical desiderata for the integration of genomic data with clinical decision support. J Biomed Inform 51:3-7
Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen et al. (2014) A proposed clinical decision support architecture capable of supporting whole genome sequence information. J Pers Med 4:176-99
Singleton, Marc V; Guthery, Stephen L; Voelkerding, Karl V et al. (2014) Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet 94:599-610
Mungall, Christopher J; Batchelor, Colin; Eilbeck, Karen (2011) Evolution of the Sequence Ontology terms and relationships. J Biomed Inform 44:87-93
Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res 38:D331-5
Moore, Barry; Fan, Guozhen; Eilbeck, Karen (2010) SOBA: sequence ontology bioinformatics analysis. Nucleic Acids Res 38:W161-4
Reese, Martin G; Moore, Barry; Batchelor, Colin et al. (2010) A standard variation file format for human genome sequences. Genome Biol 11:R88
Eilbeck, Karen; Moore, Barry; Holt, Carson et al. (2009) Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10:67