High-throughput analysis has become an essential tool in biomedical informatics and genomic research. One aspect of this research paradigm that has not been heavily examined is how errors generated in early stages of these analyses propagate through downstream analyses. For many complex, high-throughput analyses of genome data, sequence alignment is among the very first steps. Alignment results in a hypothesized set of site homologies;however, these hypotheses are often taken as fact and used in all subsequent analyses as if correct. It is well known that alignments become inaccurate as evolutionary divergence increases;how this error affects genome analyses in biomedical informatics is poorly understood. This proposal is a follow-up to a pilot study examining the use of computational sequence simulation to study the effects and propagation of alignment error on comparative and functional genomic sequence analysis.
The specific aims of this study are (1) to more thoroughly examine the effect of alignment errors on phylogenetic accuracy;(2) to examine the accuracy of ancestral sequence reconstruction methods and see how errors in both alignment and phylogenetic hypothesis may affect these reconstructions;(3) to explore a novel approach for determining the composition of gene families which makes use of evolutionary principles, including phylogeny and ancestral sequence reconstruction;and (4) to improve a sequence simulation program to include (a) better insertion/deletion models, including those based on both theory and empirical data;(b) more realistic simulation options for RNA, coding DNA and protein sequences;(c) to allow full genome simulations, including segmental variation of mutation parameters and genome-scale processes;and (d) to extend the program for use on all major computational platforms, including Windows, Linux and Macintosh.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM009505-01A1
Application #
7372186
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-08-01
Project End
2011-07-31
Budget Start
2009-08-01
Budget End
2010-07-31
Support Year
1
Fiscal Year
2009
Total Cost
$314,446
Indirect Cost
Name
Arizona State University-Tempe Campus
Department
Genetics
Type
Organized Research Units
DUNS #
943360412
City
Tempe
State
AZ
Country
United States
Zip Code
85287