Cross-species gene finding and annotation

Pachter, Lior

Abstract

One of the big challenges in genomics is to organize and classify the huge amount of sequence data. This motivates the development of computational methods that can infer biological information from sequence alone. A number of computer programs have been designed for computational gene annotation, and these have had varying degrees of success. Algorithms based on Hidden Markov Models (HMMs) locate translational and transcriptional features of the genome, such as coding regions, splice sites, and initiation and termination signals. These signals are then used to predict gene structures. The second class of gene finding programs build on sequence similarity and produce an alignment of a new sequence to a known protein, or align two syntenic sequences. The success of such homology based methods comes from the fact that coding regions are generally well conserved in species which diverged as far back as 450 million years. At evolutionary distances around 50- 100 million years, as in human and mouse, the conservation also extends to other functional regions important for gene expression, such as promoters, UTRs, and other regulatory domains. In this project we intend to construct an annotation tool that combines and generalizes the two approaches of HMM and sequence alignment mentioned above. The actual prediction of genes and other functionally related elements will be carried out by a generalized form of HMM called generalized pair HMM (GPHMM). The computational complexity of the problem is greatly reduced by the use of something we call an approximate alignment.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG002362-03
Application #: 6753504
Study Section: Genome Study Section (GNM)
Program Officer: Good, Peter J

Project Start: 2002-06-01
Project End: 2006-05-31
Budget Start: 2004-07-01
Budget End: 2006-05-31
Support Year: 3
Fiscal Year: 2004
Total Cost: $310,532
Indirect Cost

Institution

Name: University of California Berkeley
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2004 R01 HG	Cross-species gene finding and annotation Pachter, Lior S. / University of California Berkeley	$310,532
NIH 2003 R01 HG	Cross-species gene finding and annotation Pachter, Lior S. / University of California Berkeley	$308,937
NIH 2002 R01 HG	Cross-species gene finding and annotation Pachter, Lior S. / University of California Berkeley	$309,259

Publications

Snir, Sagi; Rao, Satish (2010) Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Trans Comput Biol Bioinform 7:704-18

Snir, Sagi; Warnow, Tandy; Rao, Satish (2008) Short quartet puzzling: a new quartet-based phylogeny reconstruction algorithm. J Comput Biol 15:91-103

Begun, David J; Holloway, Alisha K; Stevens, Kristian et al. (2007) Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol 5:e310

Schwartz, Ariel S; Pachter, Lior (2007) Multiple alignment by sequence annealing. Bioinformatics 23:e24-9

Chatterji, Sourav; Pachter, Lior (2007) Patterns of gene duplication and intron loss in the ENCODE regions suggest a confounding factor. Genomics 90:44-8

Beerenwinkel, Niko; Drton, Mathias (2007) A mutagenetic tree hidden Markov model for longitudinal clonal HIV sequence data. Biostatistics 8:53-71

Chen, K; Rajewsky, N (2006) Deep conservation of microRNA-target relationships and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb Symp Quant Biol 71:149-56

Snir, Sagi; Rao, Satish (2006) Using max cut to enhance rooted trees consistency. IEEE/ACM Trans Comput Biol Bioinform 3:323-33

Dewey, Colin N; Pachter, Lior (2006) Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum Mol Genet 15 Spec No 1:R51-6

Dewey, Colin N; Huggins, Peter M; Woods, Kevin et al. (2006) Parametric alignment of Drosophila genomes. PLoS Comput Biol 2:e73

Showing the most recent 10 out of 26 publications

Comments

Be the first to comment on Lior Pachter's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: