An alignment is a hypothesis about the evolutionary correspondence between DNA or protein sequences. Sequence alignments are crucial for the study of molecular evolution. In addition, they can be used to detect highly conserved functional domains and are a component of some strategies for searching sequence databases. Widely-used alignment techniques rely on evolutionary assumptions that are both unstated and unclear. The ad hoc basis of these techniques casts doubt on the alignments that they produce. The proposed research will involve development of methods for alignment inference that are based upon explicit evolutionary models. Special attention will be paid to making these models as realIstic as possible. Regional heterogeneity of nucleotide composition, heterogeneity of evolutionary rates, and relatively general distributions for lengths of insertion-deletion events will be allowed. The evolutionary models will provide an underlying framework for applications that include: the inference of protein secondary structure, the estimation of evolutionary parameters, and the determination of reliable regions in alignments. In addition, a method to simultaneously align multiple sequences and reconstruct phylogenies will be introduced. The sequence analysis techniques that arise from this research will all have a likelihood basis and can take advantage of the fact that likelihood approaches to statistical inference have been intensively studied. The evolutionary perspective and probabilistic nature of these techniques should combine to make them more accurate than those previously proposed.
Thorne, J L; Goldman, N; Jones, D T (1996) Combining protein evolution and secondary structure. Mol Biol Evol 13:666-73 |
Goldman, N; Thorne, J L; Jones, D T (1996) Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol 263:196-208 |