The technological achievements of the past 20 years have made sequencing a genome a relatively simple task. Decoding this information, however, has proved to be much more difficult, and is one of the great challenges for this century. Advancing our understanding of how genes are structured and regulated will eventually lead to novel therapeutics for combating cancer and other diseases, to cheaper and more nutritious food, to less wasteful materials and energy sources, and to a greater understanding of ourselves. One of the enduring, and most important products resulting from the genome era will be the catalogs of genes for each organism. Producing these catalogs is a difficult task even under the best of circumstances. The pace of genome sequencing continues to increase, and these new genomes represent a wealth of information if we can understand them. This proposal seeks to improve our knowledge of genomes by advancing the state of the art in computational gene finding. Our algorithms leverage untapped and new sources of information, and are expected to improve our ability to find both novel genes and genes with known homologs. Our specific plans include (a) automated training of gene prediction programs for any genome, (b) developing the first algorithm that merges a generalized hidden Markov model for gene structure with a profile hidden Markov model for protein family structure, (c) creating the first gene finder that incorporates information about DMA duplex stability under superhelical stresses, (d) building new algorithms that take advantage of high-throughput transcript profiling technologies such as whole genome expression arrays and massively parallel sequencing methods, and (e) providing web-based applications and support via the Internet.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004348-04
Application #
7870495
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Good, Peter J
Project Start
2007-09-01
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2011-06-30
Support Year
4
Fiscal Year
2010
Total Cost
$307,762
Indirect Cost
Name
University of California Davis
Department
Biochemistry
Type
Schools of Medicine
DUNS #
047120084
City
Davis
State
CA
Country
United States
Zip Code
95618
Georges, Arthur; Li, Qiye; Lian, Jinmin et al. (2015) High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps. Gigascience 4:45
Lott, Paul C; Korf, Ian (2014) StochHMM: a flexible hidden Markov model tool and C++ library. Bioinformatics 30:1625-6
Bradnam, Keith R; Fass, Joseph N; Alexandrov, Anton et al. (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2:10
Zhabinskaya, Dina; Benham, Craig J (2012) Theoretical analysis of competing conformational transitions in superhelical DNA. PLoS Comput Biol 8:e1002484
Ginno, Paul A; Lott, Paul L; Christensen, Holly C et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Mol Cell 45:814-25
Parra, G; Bradnam, K; Rose, Alan B et al. (2011) Comparative and functional analysis of intron-mediated enhancement signals reveals conserved features among plants. Nucleic Acids Res 39:5328-37
Zhabinskaya, Dina; Benham, Craig J (2011) Theoretical analysis of the stress induced B-Z transition in superhelical DNA. PLoS Comput Biol 7:e1001051
Blahnik, Kimberly R; Dou, Lei; Echipare, Lorigail et al. (2011) Characterization of the contradictory chromatin signatures at the 3' exons of zinc finger genes. PLoS One 6:e17121
Parra, Genis; Bradnam, Keith; Ning, Zemin et al. (2009) Assessing the gene space in draft genomes. Nucleic Acids Res 37:289-97