GENCODE: comprehensive gene annotation for human and mouse

Hubbard, Timothy

Abstract

The aim of the GENCODE consortium is to annotate all evidence-based gene features in the human genome at a high accuracy, including protein-coding loci with alternatively splices variants, non-coding loci and pseudogenes. With this proposal we aim to extend GENCODE to the mouse genome and use the comparison of corresponding human and mouse loci to improve both sets of annotation. Despite the tremendous progress of current GENCODE production project, and the current outstanding quality of at least the protein-coding gene set, a complete annotation of all human genes is far from complete. For example, it has recently become clear that the number of non-coding RNA genes is far greater than previously supposed. It is also recognized that there are still substantial numbers of alternative transcripts still to be discovered from transcriptomics studies of additional cell types.
Our first aim i s therefore to continue to improve the coverage and accuracy of the GENCODE human gene set.
Our second aim i s to apply to the mouse genome the same annotation approaches as we have applied to human to generate the human GENCODE gene set. To achieve both goals we will integrate computational approaches, expert manual annotation and targeted experimental approaches as we have done for human. We will also use comparative approaches to use the resulting mouse annotation to inform and improve the human GENCODE gene set. A comprehensive knowledge of the location and structure of genes in the human genome is central to our understanding of human biology and the mechanisms of disease. Similarly for mouse, a comprehensive high quality gene set will aid in the design of experiments and the interpretation of the effects of gene knockouts and resulting phenotypes. Also, since mouse is used as a model of human, knowledge of its genes and their relationship to human genes will help inform human gene function. The outputs of regular releases of GENCODE gene sets will therefore be of benefit to the entire community of human and mouse researchers.

Public Health Relevance

A comprehensive knowledge of the location and structure of genes in the human genome is central to our understanding of human biology and the mechanisms of disease. Since mouse is used as a model of human, knowledge of its genes and their relationship to human genes also helps inform human gene function.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Biotechnology Resource Cooperative Agreements (U41)
Project #: 1U41HG007234-01
Application #: 8503762
Study Section: Special Emphasis Panel (ZHG1-HGR-M (J2))
Program Officer: Feingold, Elise A

Project Start: 2013-04-01
Project End: 2017-03-31
Budget Start: 2013-04-01
Budget End: 2014-03-31
Support Year: 1
Fiscal Year: 2013
Total Cost: $2,582,206
Indirect Cost: $172,682

Institution

Name: Sanger Institute
Department
Type
DUNS #: 346013253

City: Cambridge
State
Country: United Kingdom
Zip Code

Related projects

Publications

Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879

Schlaffner, Christoph N; Pirklbauer, Georg J; Bender, Andreas et al. (2018) A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp :

Garg, Shilpa; Rautiainen, Mikko; Novak, Adam M et al. (2018) A graph-based approach to diploid genome assembly. Bioinformatics 34:i105-i114

Lagarde, Julien; Johnson, Rory (2018) Capturing a Long Look at Our Genetic Library. Cell Syst 6:153-155

Lilue, Jingtao; Doran, Anthony G; Fiddes, Ian T et al. (2018) Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 50:1574-1583

Schoeler, Natasha E; Leu, Costin; Balestrini, Simona et al. (2018) Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy. Epilepsia 59:1557-1566

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M et al. (2018) Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 46:D221-D228

Newman, Victoria; Moore, Benjamin; Sparrow, Helen et al. (2018) The Ensembl Genome Browser: Strategies for Accessing Eukaryotic Genome Data. Methods Mol Biol 1757:115-139

Wang, Junling; Pejaver, Vikas Rao; Dann, Geoffrey P et al. (2018) Target site specificity and in vivo complexity of the mammalian arginylome. Sci Rep 8:16177

Zerbino, Daniel R; Achuthan, Premanand; Akanni, Wasiu et al. (2018) Ensembl 2018. Nucleic Acids Res 46:D754-D761

Showing the most recent 10 out of 88 publications

Comments

Be the first to comment on Timothy Hubbard's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: