Software for Analyzing Biosequence Data

Miller, Webb

Abstract

Projects to sequence vertebrate genomes are proceeding more rapidly than was imagined a few years ago. New computational tools for comparative genome analysis at the nucleotide level are needed to more effectively identify functional but non-protein-coding segments and to dissect mammalian evolution. The Penn State group has excelled at developing such tools. Our PipMaker and MultiPipMaker Web servers set the standard for alignment of user specified genomic sequences, and our Blastz program was chosen to produce high-sensitivity alignments for the Mouse Genome Analysis Consortium. We were also a significant source of biological and statistical expertise within the Consortium, particularly with respect to functional non-coding segments and evolution. We will raise comparative genome studies to a higher level by developing software that accurately identifies the full spectrum of mutational events. Current multiple alignment procedures use one sequence as the reference and hence give an asymmetric and incomplete view of sequence relationships. Our new Generalized Multiple Alignments will provide symmetric and complete views of the alignments, accurately identify kilobase-scale insertions and deletions, and permit any of the species to be used as a reference in subsequent analysis. The results obtained by our new alignment programs will be analyzed using new statistical procedures to more accurately predict the locations of elements that regulate gene transcription, and to measure variation and co-variation of mutational rates along the genome. These computational studies will guide experimental work to confirm regulatory sites and to identify the biological mechanisms that underlie rate variations in neutral evolution. An integral part of this effort will be our continued collaborations with the NISC Comparative Sequencing Program, with the group headed by David Haussler and Jim Kent at the University of California at Santa Cruz, and with the Comparative Chloroplast Genomics Project.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG002238-15
Application #: 6681646
Study Section: Genome Study Section (GNM)
Program Officer: Good, Peter J

Project Start: 2000-08-15
Project End: 2007-07-31
Budget Start: 2003-09-30
Budget End: 2004-07-31
Support Year: 15
Fiscal Year: 2003
Total Cost: $807,700
Indirect Cost

Institution

Name: Pennsylvania State University
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 003403953

City: University Park
State: PA
Country: United States
Zip Code: 16802

Related projects

Publications

Giardine, Belinda; Borg, Joseph; Viennas, Emmanouil et al. (2014) Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 42:D1063-9

Song, Giltae; Riemer, Cathy; Dickins, Benjamin et al. (2012) Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 4:586-601

Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Conversion events in gene clusters. BMC Evol Biol 11:226

Song, Giltae; Hsu, Chih-Hao; Riemer, Cathy et al. (2011) Evaluation of methods for detecting conversion events in gene clusters. BMC Bioinformatics 12 Suppl 1:S45

Wu, Weisheng; Cheng, Yong; Keller, Cheryl A et al. (2011) Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res 21:1659-71

Locke, Devin P; Hillier, LaDeana W; Warren, Wesley C et al. (2011) Comparative and demographic analysis of orang-utan genomes. Nature 469:529-33

Miller, Webb; Wright, Stephen J; Zhang, Yu et al. (2010) Optimization methods for selecting founder individuals for captive breeding or reintroduction of endangered species. Pac Symp Biocomput :43-53

Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross C et al. (2010) An effective method for detecting gene conversion events in whole genomes. J Comput Biol 17:1281-97

Chen, Kuan-Bei; Zhang, Yu (2010) A varying threshold method for ChIP peak-calling using multiple sources of information. Bioinformatics 26:i504-10

Ratan, Aakrosh; Zhang, Yu; Hayes, Vanessa M et al. (2010) Calling SNPs without a reference sequence. BMC Bioinformatics 11:130

Showing the most recent 10 out of 79 publications

Comments

Be the first to comment on Webb Miller's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: