The aim of the GENCODE consortium is to annotate all evidence-based gene features in the human genome at a high accuracy, including protein-coding loci with alternatively splices variants, non-coding loci and pseudogenes. With this proposal we aim to extend GENCODE to the mouse genome and use the comparison of corresponding human and mouse loci to improve both sets of annotation. Despite the tremendous progress of current GENCODE production project, and the current outstanding quality of at least the protein-coding gene set, a complete annotation of all human genes is far from complete. For example, it has recently become clear that the number of non-coding RNA genes is far greater than previously supposed. It is also recognized that there are still substantial numbers of alternative transcripts still to be discovered from transcriptomics studies of additional cell types.
Our first aim i s therefore to continue to improve the coverage and accuracy of the GENCODE human gene set.
Our second aim i s to apply to the mouse genome the same annotation approaches as we have applied to human to generate the human GENCODE gene set. To achieve both goals we will integrate computational approaches, expert manual annotation and targeted experimental approaches as we have done for human. We will also use comparative approaches to use the resulting mouse annotation to inform and improve the human GENCODE gene set. A comprehensive knowledge of the location and structure of genes in the human genome is central to our understanding of human biology and the mechanisms of disease. Similarly for mouse, a comprehensive high quality gene set will aid in the design of experiments and the interpretation of the effects of gene knockouts and resulting phenotypes. Also, since mouse is used as a model of human, knowledge of its genes and their relationship to human genes will help inform human gene function. The outputs of regular releases of GENCODE gene sets will therefore be of benefit to the entire community of human and mouse researchers.

Public Health Relevance

A comprehensive knowledge of the location and structure of genes in the human genome is central to our understanding of human biology and the mechanisms of disease. Since mouse is used as a model of human, knowledge of its genes and their relationship to human genes also helps inform human gene function.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
5U41HG007234-02
Application #
8642205
Study Section
Special Emphasis Panel (ZHG1-HGR-M (J2))
Program Officer
Feingold, Elise A
Project Start
2013-04-01
Project End
2017-03-31
Budget Start
2014-04-01
Budget End
2015-03-31
Support Year
2
Fiscal Year
2014
Total Cost
$2,530,561
Indirect Cost
$187,449
Name
Sanger Institute
Department
Type
DUNS #
346013253
City
Cambridge
State
Country
United Kingdom
Zip Code
CB10 -1SA
Gordon, David; Huddleston, John; Chaisson, Mark J P et al. (2016) Long-read sequence assembly of the gorilla genome. Science 352:aae0344
Jungreis, Irwin; Chan, Clara S; Waterhouse, Robert M et al. (2016) Evolutionary Dynamics of Abundant Stop Codon Readthrough. Mol Biol Evol 33:3108-3132
Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Santoyo-Lopez, Javier et al. (2016) Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq). Nat Commun 7:12339
Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R et al. (2016) The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 44:D717-25
Petryszak, Robert; Keays, Maria; Tang, Y Amy et al. (2016) Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res 44:D746-52
Breschi, Alessandra; Djebali, Sarah; Gillis, Jesse et al. (2016) Gene-specific patterns of expression variation across organs and species. Genome Biol 17:151
Pignatelli, Miguel; Vilella, Albert J; Muffato, Matthieu et al. (2016) ncRNA orthologies in the vertebrate lineage. Database (Oxford) 2016:
Ma, Jiao; Diedrich, Jolene K; Jungreis, Irwin et al. (2016) Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. Anal Chem 88:3967-75
Zerbino, Daniel R; Johnson, Nathan; Juetteman, Thomas et al. (2016) Ensembl regulation resources. Database (Oxford) 2016:
Skinner, Benjamin M; Sargent, Carole A; Churcher, Carol et al. (2016) The pig X and Y Chromosomes: structure, sequence, and evolution. Genome Res 26:130-9

Showing the most recent 10 out of 57 publications