GENCODE Resource Project

Flicek, Paul

Abstract

A comprehensive knowledge of the location, structure, and expression of genes in the human genome is central to the understanding of human biology and the mechanisms of disease. Similarly for mouse, a comprehensive high quality gene set will aid in the design of experiments, and the interpretation of the effects of gene knockouts and resulting phenotypes, and as a model for human disease will help inform human gene function. The GENCODE consortium has assembled a team of world experts in a variety of fields related to gene annotation to create and distribute this gold standard. GENCODE's wide expertise covers gene and transcript isoform identification, pseudogene evolution, sequence conservation, gene expression, proteomics and post-translational modifications, gene regulatory elements, development and maintenance of the infrastructure required to create genome annotation at scale, and demonstrated community engagement and leadership. The complete and accurate annotation of the human and mouse genomes is necessary as many of the protein-coding genes are still incomplete or misannotated. GENCODE also aims to include all non-protein-coding genes which remain poorly understood with many loci still missing. Beyond the coding and non-coding genes, GENCODE creates reference pseudogene annotation as recent studies indicate that pseudogenes can play key regulatory roles. The completion of the full first-pass manual annotation of the reference mouse genome assembly will therefore be one of the main objectives of GENCODE. Efforts are underway in the Genome Reference Consortium (GRC) to expand the definition of the reference human genome to include genomic sequence for all haplotypes and gene alleles. GRC have already committed to supporting the genomes of a collection of 16 representative mice strains, thus effectively replacing the linear genome with a ?graph-like? structure of 16 separate haplotypes. GENCODE already annotate the full reference genome for human and mouse, including all available alternate sequences. GENCODE will continue to provide annotation appropriate to these new genomic sequences. In addition to genomic mutations that impair gene product function, many phenotypes are caused or moderated by the regulation of gene products. Therefore, GENCODE's complete annotation of all transcript isoforms logically includes key regulatory regions that are fundamentally a part of each gene and GENCODE will pilot the annotation of these tissue-specific regions.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Biotechnology Resource Cooperative Agreements (U41)
Project #: 2U41HG007234-05
Application #: 9277661
Study Section: Special Emphasis Panel (ZHG1)

Project Start
Project End
Budget Start: 2017-05-15
Budget End: 2018-04-30
Support Year: 5
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: European Molecular Biology Laboratory
Department
Type
DUNS #: 321691735

City: Heidelberg
State
Country: Germany
Zip Code: 69117

Related projects

Publications

Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879

Schlaffner, Christoph N; Pirklbauer, Georg J; Bender, Andreas et al. (2018) A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp :

Garg, Shilpa; Rautiainen, Mikko; Novak, Adam M et al. (2018) A graph-based approach to diploid genome assembly. Bioinformatics 34:i105-i114

Lagarde, Julien; Johnson, Rory (2018) Capturing a Long Look at Our Genetic Library. Cell Syst 6:153-155

Lilue, Jingtao; Doran, Anthony G; Fiddes, Ian T et al. (2018) Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 50:1574-1583

Schoeler, Natasha E; Leu, Costin; Balestrini, Simona et al. (2018) Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy. Epilepsia 59:1557-1566

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M et al. (2018) Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 46:D221-D228

Newman, Victoria; Moore, Benjamin; Sparrow, Helen et al. (2018) The Ensembl Genome Browser: Strategies for Accessing Eukaryotic Genome Data. Methods Mol Biol 1757:115-139

Wang, Junling; Pejaver, Vikas Rao; Dann, Geoffrey P et al. (2018) Target site specificity and in vivo complexity of the mammalian arginylome. Sci Rep 8:16177

Zerbino, Daniel R; Achuthan, Premanand; Akanni, Wasiu et al. (2018) Ensembl 2018. Nucleic Acids Res 46:D754-D761

Showing the most recent 10 out of 88 publications

Comments

Be the first to comment on Paul Flicek's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: