Multiple-species alignment is one of the techniques for deciphering the functions of regions within the human genome. Alignment of genomic sequence data from various species indicates which regions are most highly conserved, allowing identification of regions that perform some essential function, such as regulation of gene expression. Novel computational challenges must be overcome to produce the most informative alignments of multiple-species genomic data. For example, genomic alignments can be much longer than any alignment of protein sequences, and one must deal with several common families of repetitive elements and, perhaps, with differences in mode of evolution between coding and noncoding DNA. Full utility of alignment-generating programs requires that they be integrated into a software system that includes alignment analysis tools and various data management capabilities. We propose the following extensions to our ongoing development of strategies and software tools for multiple-species alignment of genomic sequence data. Our work with the O-like globin gene cluster will be extended to additional genomic loci. Laboratory experiments will be conducted to provide guidelines for choosing additional species and for evaluating the biological correctness of genomic alignments. Our existing alignment tools will be enhanced to more fully utilize biological knowledge, and a variety of supporting software tools will be produced. Finally, we will explore other sequence alignment problems that may be synergistically related to our genomic alignment research.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM005110-10
Application #
2750698
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Bean, Carol A
Project Start
1989-08-01
Project End
2000-07-31
Budget Start
1998-08-01
Budget End
1999-07-31
Support Year
10
Fiscal Year
1998
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802
Berman, Piotr; Bertone, Paul; Dasgupta, Bhaskar et al. (2004) Fast optimal genome tiling with applications to microarray design and homology search. J Comput Biol 11:766-85
Molete, J M; Petrykowska, H; Bouhassira, E E et al. (2001) Sequences flanking hypersensitive sites of the beta-globin locus control region are required for synergistic enhancement. Mol Cell Biol 21:2969-80
Elnitski, L; Li, J; Noguchi, C T et al. (2001) A negative cis-element regulates the level of enhancement by hypersensitive site 2 of the beta-globin locus control region. J Biol Chem 276:6289-98
Hardison, R C; Chui, D H; Riemer, C et al. (2001) Databases of human hemoglobin variants and other resources at the globin gene server. Hemoglobin 25:183-93
Wilson, M D; Riemer, C; Martindale, D W et al. (2001) Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nucleic Acids Res 29:1352-65
Hardison, R C (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16:369-72
Yung Yu, C; Yang, Z; Blanchong, C A et al. (2000) The human and mouse MHC class III region: a parade of 21 genes at the centromeric segment. Immunol Today 21:320-8
McClelland, M; Florea, L; Sanderson, K et al. (2000) Comparison of the Escherichia coli K-12 genome with sampled genomes of a Klebsiella pneumoniae and three salmonella enterica serovars, Typhimurium, Typhi and Paratyphi. Nucleic Acids Res 28:4974-86
Schwartz, S; Zhang, Z; Frazer, K A et al. (2000) PipMaker--a web server for aligning two genomic DNA sequences. Genome Res 10:577-86
Doyle, J L; DeSilva, U; Miller, W et al. (2000) Divergent human and mouse orthologs of a novel gene (WBSCR15/Wbscr15) reside within the genomic interval commonly deleted in Williams syndrome. Cytogenet Cell Genet 90:285-90

Showing the most recent 10 out of 66 publications