The creation, advancement and maintenance of the GENCODE resource requires both adherence to and optimization of defined processes that ensure the genome annotation created now and in the future will always be of the same or better standard compared to what has already been created. The GENCODE resource must also be attuned to the new technologies and opportunities that arise as the field of genomics evolves. A primary objective of the GENCODE resource is to ensure quality control (QC) and data validation of annotations. Ensembl will compare the GENCODE gene set to other gene sets (e.g. UniProt) to check for missing genes or transcripts; CNIO will validate the coding genes; the CNIO/CNIC proteomics pipeline will validates the gene models; CNIO/CNIC will perform manual verification for QC of proteomics data. Project stability will be ensured through a well-maintained computational infrastructure, adequate QC processes that will ensure the highest possible quality, as well as regular releases of freely available annotation in high value formats. The annotation curation for human and mouse will be completed, in particular the existing human partial transcript models will be extended to full length, expanding the human lncRNA annotation, as well as the completion of the initial full pass of the mouse annotation. GENCODE will incorporate individual genome representation and population data represented by available human variation data at both the sequence level (e.g. 1000 Genomes) and at the transcriptomic level (e.g. GTEx), and by the 16 mouse strain genomes produced by the Mouse Genomes Project led by the WTSI. Data from individuals and populations will be annotated. A personal genome resource will be developed, which will produce an accurate representation of an individual's gene set. Two pilot projects will help to define the most effective way to support future GENCODE annotations. The first pilot project will use GENCODE's experience in developing population reference genome graphs to pilot a scalable and potentially universal approach to population based genome annotation. The second pilot project will focus on connecting regulatory regions to regulated genes. GENCODE will enhance the current annotation of genes with their regulatory elements so that the annotation is dependent on tissue and cell type. The demand for manual annotation of transcripts across strains and species may outstrip GENCODE's ability to provide such services via existing mechanisms, therefore a system to enable the submission of annotated data will be developed. The described measures will ensure that GENCODE in 2020 will be significantly more valuable for research and clinical applications in genomics than today.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
2U41HG007234-05
Application #
9277660
Study Section
Special Emphasis Panel (ZHG1)
Project Start
Project End
Budget Start
2017-05-15
Budget End
2018-04-30
Support Year
5
Fiscal Year
2017
Total Cost
Indirect Cost
Name
European Molecular Biology Laboratory
Department
Type
DUNS #
321691735
City
Heidelberg
State
Country
Germany
Zip Code
69117
Garrison, Erik; Sirén, Jouni; Novak, Adam M et al. (2018) Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 36:875-879
Schlaffner, Christoph N; Pirklbauer, Georg J; Bender, Andreas et al. (2018) A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp :
Garg, Shilpa; Rautiainen, Mikko; Novak, Adam M et al. (2018) A graph-based approach to diploid genome assembly. Bioinformatics 34:i105-i114
Lagarde, Julien; Johnson, Rory (2018) Capturing a Long Look at Our Genetic Library. Cell Syst 6:153-155
Lilue, Jingtao; Doran, Anthony G; Fiddes, Ian T et al. (2018) Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 50:1574-1583
Schoeler, Natasha E; Leu, Costin; Balestrini, Simona et al. (2018) Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy. Epilepsia 59:1557-1566
Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M et al. (2018) Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 46:D221-D228
Newman, Victoria; Moore, Benjamin; Sparrow, Helen et al. (2018) The Ensembl Genome Browser: Strategies for Accessing Eukaryotic Genome Data. Methods Mol Biol 1757:115-139
Wang, Junling; Pejaver, Vikas Rao; Dann, Geoffrey P et al. (2018) Target site specificity and in vivo complexity of the mammalian arginylome. Sci Rep 8:16177
Zerbino, Daniel R; Achuthan, Premanand; Akanni, Wasiu et al. (2018) Ensembl 2018. Nucleic Acids Res 46:D754-D761

Showing the most recent 10 out of 88 publications