GENCODE: comprehensive genome annotation for human and mouse

Flicek, Paul

Abstract

The objective of the GENCODE consortium is to create a foundational reference genome annotation, in which all gene features in the human and mouse genomes are identified and classified with high accuracy based on biological evidence, and then to release these annotations for the benefit of biomedical research and genome interpretation. GENCODE aims for a better understanding of a `normal' human genome; using genome sequences of the most commonly used mouse strains will facilitate the most effective use of these key models for large-scale knockout analysis and disease-specific research. To produce regular annotation releases of high accuracy, GENCODE will continue to follow its well-established and conservative research design, supplemented by targeted investigations into the value of new technologies, new data and new sources of evidence. GENCODE focuses on protein-coding and non-coding loci, including their alternatively spliced isoforms and pseudogenes. Over the course of this proposal GENCODE will follow major directions in genomics, including graph- based genome representations, long-read transcriptome sequencing, connecting genes and the associated regulatory regions that affect their transcription, and identifying genes that are not present on the current reference assembly. The GENCODE consortium has four fundamental components: (1) a comprehensive gene annotation pipeline leveraging manual annotation; (2) an integrated approach to pseudogene identification and classification; (3) a set of computational methods to evaluate and enhance gene annotation; and (4) complementary experimental pipelines for validation and functional annotation. More specifically, in the next four years GENCODE aims to (1) extend the human and mouse GENCODE gene sets to as near completion as possible given current experimental technology; (2) deploy population-based genome annotation to ensure that any transcript isoform expressed in an individual human will be present in the reference annotation set; (3) extend the gene annotation to include core regulatory regions and tissue-specific enhancers from selected datasets; (4) to distribute GENCODE annotations and engage with community annotation efforts. Current popular distribution channels for GENCODE data including the GENCODE web site, the Ensembl and UCSC Genome Browsers, will be maintained. Finally, new mechanisms for prioritizing genes for manual annotation with community input will be established, with the long-term aim of establishing GENCODE as the standard annotation set for research and clinical genomics efforts.

Public Health Relevance

The GENCODE project produces reference gene annotation for the human and mouse genomes. The annotation provides a reference from which to conduct clinical and genomics research in the short term; in the long term it informs all research that will contribute fundamental knowledge to benefit public health.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Biotechnology Resource Cooperative Agreements (U41)
Project #: 5U41HG007234-06
Application #: 9534716
Study Section: Special Emphasis Panel (ZHG1)
Program Officer: Gilchrist, Daniel A

Project Start: 2013-04-01
Project End: 2021-05-31
Budget Start: 2018-06-01
Budget End: 2019-05-31
Support Year: 6
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: European Molecular Biology Laboratory
Department
Type
DUNS #: 321691735

City: Heidelberg
State
Country: Germany
Zip Code: 69117

Related projects

Publications

Casper, Jonathan; Zweig, Ann S; Villarreal, Chris et al. (2018) The UCSC Genome Browser database: 2018 update. Nucleic Acids Res 46:D762-D769

Rodriguez, Jose Manuel; Rodriguez-Rivas, Juan; Di Domenico, Tomás et al. (2018) APPRIS 2017: principal isoforms for multiple gene sets. Nucleic Acids Res 46:D213-D217

Loughran, Gary; Jungreis, Irwin; Tzani, Ioanna et al. (2018) Stop codon readthrough generates a C-terminally extended variant of the human vitamin D receptor with reduced calcitriol response. J Biol Chem 293:4434-4444

Tardaguila, Manuel; de la Fuente, Lorena; Marti, Cristina et al. (2018) SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res :

Jain, Miten; Koren, Sergey; Miga, Karen H et al. (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338-345

Kolmogorov, Mikhail; Armstrong, Joel; Raney, Brian J et al. (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28:1720-1732

Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879

Schlaffner, Christoph N; Pirklbauer, Georg J; Bender, Andreas et al. (2018) A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp :

Garg, Shilpa; Rautiainen, Mikko; Novak, Adam M et al. (2018) A graph-based approach to diploid genome assembly. Bioinformatics 34:i105-i114

Lagarde, Julien; Johnson, Rory (2018) Capturing a Long Look at Our Genetic Library. Cell Syst 6:153-155

Showing the most recent 10 out of 88 publications

Comments

Be the first to comment on Paul Flicek's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: