Integrated human genome annotation: generation of a reference gene set

Hubbard, Timothy

Abstract

The specific aim of this proposal is to annotate all the evidence-based gene features at high accuracy on the human genome reference sequence. This includes identifying all the protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence available in the public nucleotide database (NCBI/EMBL/DDBJ) and pseudogenes. To achieve this goal we will integrate computational approaches, including recent comparative methods, expert manual annotation, able to integrate literature information, and targeted experimental approaches. Based on the exhaustive experimental and computation investigation of our initial GENCODE annotation of the ENCODE regions we are confident that we can deliver a gene set with high specificity and sensitivity that will provide critical information to other biologists and other ENCODE groups. As part of this process we will label all apparent gene loci clearly, classifying them according to their likely current functional status, so users are informed where regions that appear gene like are most likely pseudogenes or where transcript evidence is most likely artefactual. There are a number of motivated groups working in the area of defining protein coding genes for the human genome. This proposal includes most such groups and coordinates with other key groups. Critically, all the groups bring extensive experience of data integration and evaluation, leading to the resolution of annotation discrepancies by multiple approaches. This gives us confidence that through this integrated project we will be able to eliminate many of the remaining uncertainties about the precise location of genes and their component exons and transcript structure in the human genome. ? Genome-wide, highly accurate transcript definition will be of enormous value to the myriad of researchers working on the human genome. It will both have large cost savings worldwide due to increased specificity of reagent design and provide a more complete view of human genes, in particular those associated with disease. From this foundation, more accurate descriptions of the genetic causes of disease can be discovered. ? ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Specialized Center--Cooperative Agreements (U54)
Project #: 1U54HG004555-01
Application #: 7391864
Study Section: Special Emphasis Panel (ZHG1-HGR-M (O1))
Program Officer: Good, Peter J

Project Start: 2007-09-27
Project End: 2011-06-30
Budget Start: 2007-09-27
Budget End: 2008-06-30
Support Year: 1
Fiscal Year: 2007
Total Cost: $2,837,894
Indirect Cost

Institution

Name: Sanger Institute
Department
Type
DUNS #: 346013253

City: Cambridge
State
Country: United Kingdom
Zip Code

Related projects


NIH 2012 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$1,153,519
NIH 2011 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$1
NIH 2010 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$2,338,425
NIH 2009 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$2,362,045
NIH 2009 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$115,981
NIH 2008 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$2,362,045
NIH 2007 U54 HG	Integrated human genome annotation: generation of a reference gene set Hubbard, Timothy John / Sanger Institute	$2,837,894

Publications

Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel et al. (2016) The Ensembl gene annotation system. Database (Oxford) 2016:

Pervouchine, Dmitri D; Djebali, Sarah; Breschi, Alessandra et al. (2015) Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat Commun 6:5903

Nguyen, Ngan; Hickey, Glenn; Zerbino, Daniel R et al. (2015) Building a pan-genome reference for a population. J Comput Biol 22:387-401

Washietl, Stefan; Kellis, Manolis; Garber, Manuel (2014) Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res 24:616-28

Harrow, Jennifer L; Steward, Charles A; Frankish, Adam et al. (2014) The Vertebrate Genome Annotation browser 10 years on. Nucleic Acids Res 42:D771-9

Flicek, Paul; Amode, M Ridwan; Barrell, Daniel et al. (2014) Ensembl 2014. Nucleic Acids Res 42:D749-55

Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu et al. (2014) Comparative analysis of the transcriptome across distant species. Nature 512:445-8

Pervouchine, Dmitri D (2014) IRBIS: a systematic search for conserved complementarity. RNA 20:1519-31

Farrell, Catherine M; O'Leary, Nuala A; Harte, Rachel A et al. (2014) Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 42:D865-72

Chan, Clara S; Jungreis, Irwin; Kellis, Manolis (2013) Heterologous stop codon readthrough of metazoan readthrough candidates in yeast. PLoS One 8:e59450

Showing the most recent 10 out of 67 publications

Comments

Be the first to comment on Timothy Hubbard's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: