Protein tertiary structure changes slowly during evolution. Nucleotide substitution rates are expected to be low if they result in an amino acid replacement that disrupts protein structure. The effect of an amino acid replacement on tertiary structure is determined not only by the residues involved in the replacement but also by the residues that are spatially nearby the site that experiences the replacement. This relationship between protein structure and protein change induces an evolutionary dependence among the positions in the protein-coding genes. Unfortunately, widely used models for the evolution of protein-coding genes ignore this dependence. The research in this project will build upon a newly developed statistical technique for making evolutionary inferences from sequence pairs. This new technique incorporates dependence among codons due to pain/vise amino acid interactions that are imposed by the protein tertiary structure. The initial focus will be to extend this model-based approach to the analysis of more than two phylogenetically-related sequences. The resulting method will be a powerful tool for characterizing the impact of protein structure on protein evolution. The Pandit database of aligned protein-coding DMA sequences will be mined to assess which protein families evolve under the most and least influence of tertiary structure. Evidence of positive selection in this database will be identified and the issue of whether the strength of the relationship between protein structure and protein evolution varies among taxonomic groups will be addressed. To complement the empirical studies and to further evaluate the new methodology, simulations will be performed. In addition, the possibility of allowing protein tertiary structure to change over time will be explored. Ancestral sequence inference that accounts for covarying positions and the potential for applying ancestral sequence inference to vaccine design is another topic of interest. Although the emphasis of this project is evolutionary dependence among codons due to protein structure, the statistical approach is quite general and could be applied to diverse cases of evolutionary dependence where surrogates for sequence fitness can be measured or modeled.
Showing the most recent 10 out of 34 publications