The Encyclopedia of Genes and Gene Variants

Guigo, Roderic

Abstract

The goal of this proposal is to characterize the gene content of the ENCODE regions. This means the delineation of one complete mRNA sequence for at least one splice isoform of each protein coding gene in the ENCODE regions, and the inference of a number of additional alternative splice forms - either complete or partial. The proposal builds on the complementary strength of a team with unique expertise in the fields of computational gene prediction, experimental verification of DNA functional domains, and genome annotation systems, that has already proven successful in the design of efficient high throughput mammalian gene identification systems. Complementary to other undirected large-scale gene characterization projects, our proposal emphasizes a targeted approach in which computational gene predictions guide the subsequent experimental verification. In this way, genes and exonic variants likely to be underrepresented in the current catalog of human genes can be specifically targeted. These include: short and intronless genes, genes undergoing non-canonical splicing, selenoprotein genes (genes translating the TGA stop codon, into a selenocysteine residue), genes with unusual codon composition that may express at very low levels of with a very restricted pattern, human specific genes and genes evolving very rapidly, whose corresponding homologues either do not exist in other species or are difficult to identify. Our strategy includes the utilization of a variety of existing computational and experimental techniques, often through novel strategies. Among these techniques, those that take advantage of the conservation of characteristic features between the human genes and their orthologs in other vertebrate species will play an essential role. By the end of the ENCODE project, we expect our strategy to be implemented in a largely automated pipeline that can be efficiently applied to the analysis of the entire human genome sequence.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 5U01HG003150-03
Application #: 6932035
Study Section: Special Emphasis Panel (ZHG1-HGR-P (02))
Program Officer: Good, Peter J

Project Start: 2003-09-30
Project End: 2007-07-31
Budget Start: 2005-09-19
Budget End: 2007-07-31
Support Year: 3
Fiscal Year: 2005
Total Cost: $489,400
Indirect Cost

Institution

Name: Municipal Institute of Medical Research
Department
Type
DUNS #

City: Barcelona
State
Country: Spain
Zip Code

Related projects


NIH 2006 U01 HG	The Encyclopedia of Genes and Gene Variants Guigo, Roderic / Municipal Institute of Medical Research	$489,340
NIH 2005 U01 HG	The Encyclopedia of Genes and Gene Variants Guigo, Roderic / Municipal Institute of Medical Research	$489,400
NIH 2005 U01 HG	The Encyclopedia of Genes and Gene Variants Guigo, Roderic / Municipal Institute of Medical Research	$28,572
NIH 2004 U01 HG	The Encyclopedia of Genes and Gene Variants Guigo, Roderic / Municipal Institute of Medical Research	$478,877
NIH 2003 U01 HG	The Encyclopedia of Genes and Gene Variants Guigo, Roderic / Municipal Institute of Medical Research	$567,058

Publications

Djebali, Sarah; Lagarde, Julien; Kapranov, Philipp et al. (2012) Evidence for transcript networks composed of chimeric RNAs in human cells. PLoS One 7:e28213

Harrow, Jennifer; Nagy, Alinda; Reymond, Alexandre et al. (2009) Identifying protein-coding genes in genomic sequences. Genome Biol 10:201

Djebali, Sarah; Kapranov, Philipp; Foissac, Sylvain et al. (2008) Efficient targeted transcript discovery via array-based normalization of RACE libraries. Nat Methods 5:629-35

Keibler, Evan; Arumugam, Manimozhiyan; Brent, Michael R (2007) The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs. Bioinformatics 23:545-54

(2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799-816

Denoeud, France; Kapranov, Philipp; Ucla, Catherine et al. (2007) Prominent use of distal 5'transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res 17:746-59

Zheng, Deyou; Frankish, Adam; Baertsch, Robert et al. (2007) Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 17:839-51

Chatterji, Sourav; Pachter, Lior (2007) Patterns of gene duplication and intron loss in the ENCODE regions suggest a confounding factor. Genomics 90:44-8

Washietl, Stefan; Pedersen, Jakob S; Korbel, Jan O et al. (2007) Structured RNAs in the ENCODE selected regions of the human genome. Genome Res 17:852-64

Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H et al. (2006) Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. Genome Biol 7 Suppl 1:S5.1-10

Showing the most recent 10 out of 16 publications

Comments

Be the first to comment on Roderic Guigo's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: