Projects to sequence vertebrate genomes are proceeding more rapidly than was imagined a few years ago. New computational tools for comparative genome analysis at the nucleotide level are needed to more effectively identify functional but non-protein-coding segments and to dissect mammalian evolution. The Penn State group has excelled at developing such tools. Our PipMaker and MultiPipMaker Web servers set the standard for alignment of user specified genomic sequences, and our Blastz program was chosen to produce high-sensitivity alignments for the Mouse Genome Analysis Consortium. We were also a significant source of biological and statistical expertise within the Consortium, particularly with respect to functional non-coding segments and evolution. We will raise comparative genome studies to a higher level by developing software that accurately identifies the full spectrum of mutational events. Current multiple alignment procedures use one sequence as the reference and hence give an asymmetric and incomplete view of sequence relationships. Our new Generalized Multiple Alignments will provide symmetric and complete views of the alignments, accurately identify kilobase-scale insertions and deletions, and permit any of the species to be used as a reference in subsequent analysis. The results obtained by our new alignment programs will be analyzed using new statistical procedures to more accurately predict the locations of elements that regulate gene transcription, and to measure variation and co-variation of mutational rates along the genome. These computational studies will guide experimental work to confirm regulatory sites and to identify the biological mechanisms that underlie rate variations in neutral evolution. An integral part of this effort will be our continued collaborations with the NISC Comparative Sequencing Program, with the group headed by David Haussler and Jim Kent at the University of California at Santa Cruz, and with the Comparative Chloroplast Genomics Project.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
2R01HG002238-15
Application #
6681646
Study Section
Genome Study Section (GNM)
Program Officer
Good, Peter J
Project Start
2000-08-15
Project End
2007-07-31
Budget Start
2003-09-30
Budget End
2004-07-31
Support Year
15
Fiscal Year
2003
Total Cost
$807,700
Indirect Cost
Name
Pennsylvania State University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Giardine, Belinda; Borg, Joseph; Viennas, Emmanouil et al. (2014) Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 42:D1063-9
Song, Giltae; Riemer, Cathy; Dickins, Benjamin et al. (2012) Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 4:586-601
Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Conversion events in gene clusters. BMC Evol Biol 11:226
Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Evaluation of methods for detecting conversion events in gene clusters. BMC Bioinformatics 12 Suppl 1:S45
Wu, Weisheng; Cheng, Yong; Keller, Cheryl A et al. (2011) Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res 21:1659-71
Locke, Devin P; Hillier, LaDeana W; Warren, Wesley C et al. (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529-33
Miller, Webb; Wright, Stephen J; Zhang, Yu et al. (2010) Optimization methods for selecting founder individuals for captive breeding or reintroduction of endangered species. Pac Symp Biocomput :43-53
Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross C et al. (2010) An effective method for detecting gene conversion events in whole genomes. J Comput Biol 17:1281-97
Chen, Kuan-Bei; Zhang, Yu (2010) A varying threshold method for ChIP peak-calling using multiple sources of information. Bioinformatics 26:i504-10
Ratan, Aakrosh; Zhang, Yu; Hayes, Vanessa M et al. (2010) Calling SNPs without a reference sequence. BMC Bioinformatics 11:130

Showing the most recent 10 out of 79 publications