Projects to sequence vertebrate genomes are proceeding more rapidly than was imagined a few years ago. New computational tools for comparative genome analysis at the nucleotide level are needed to more effectively identify functional but non-protein-coding segments and to dissect mammalian evolution. The Penn State group has excelled at developing such tools. Our PipMaker and MultiPipMaker Web servers set the standard for alignment of user specified genomic sequences, and our Blastz program was chosen to produce high-sensitivity alignments for the Mouse Genome Analysis Consortium. We were also a significant source of biological and statistical expertise within the Consortium, particularly with respect to functional non-coding segments and evolution. We will raise comparative genome studies to a higher level by developing software that accurately identifies the full spectrum of mutational events. Current multiple alignment procedures use one sequence as the reference and hence give an asymmetric and incomplete view of sequence relationships. Our new Generalized Multiple Alignments will provide symmetric and complete views of the alignments, accurately identify kilobase-scale insertions and deletions, and permit any of the species to be used as a reference in subsequent analysis. The results obtained by our new alignment programs will be analyzed using new statistical procedures to more accurately predict the locations of elements that regulate gene transcription, and to measure variation and co-variation of mutational rates along the genome. These computational studies will guide experimental work to confirm regulatory sites and to identify the biological mechanisms that underlie rate variations in neutral evolution. An integral part of this effort will be our continued collaborations with the NISC Comparative Sequencing Program, with the group headed by David Haussler and Jim Kent at the University of California at Santa Cruz, and with the Comparative Chloroplast Genomics Project.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genome Study Section (GNM)
Program Officer
Good, Peter J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Schools of Arts and Sciences
University Park
United States
Zip Code
Giardine, Belinda; Borg, Joseph; Viennas, Emmanouil et al. (2014) Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 42:D1063-9
Song, Giltae; Riemer, Cathy; Dickins, Benjamin et al. (2012) Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 4:586-601
Locke, Devin P; Hillier, LaDeana W; Warren, Wesley C et al. (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529-33
Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Conversion events in gene clusters. BMC Evol Biol 11:226
Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Evaluation of methods for detecting conversion events in gene clusters. BMC Bioinformatics 12 Suppl 1:S45
Wu, Weisheng; Cheng, Yong; Keller, Cheryl A et al. (2011) Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res 21:1659-71
Miller, Webb; Wright, Stephen J; Zhang, Yu et al. (2010) Optimization methods for selecting founder individuals for captive breeding or reintroduction of endangered species. Pac Symp Biocomput :43-53
Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross C et al. (2010) An effective method for detecting gene conversion events in whole genomes. J Comput Biol 17:1281-97
Chen, Kuan-Bei; Zhang, Yu (2010) A varying threshold method for ChIP peak-calling using multiple sources of information. Bioinformatics 26:i504-10
Ratan, Aakrosh; Zhang, Yu; Hayes, Vanessa M et al. (2010) Calling SNPs without a reference sequence. BMC Bioinformatics 11:130

Showing the most recent 10 out of 79 publications