The objective of this project, jointly supported by Molecular Biophysics in the Division of Molecular and Cellular Biosciences and the Theoretical and Computational Chemistry Program in the Chemistry Division, is to develop evolutionary profiles that best preserve the phylogenetic tree topology of homologous proteins. Using a multidimensional QR algorithm recently developed in a preliminary study, non-redundant sets of sequences and structures that best span the evolutionary space of the proteins will be constructed from combined sequence and structural alignment information. The targets of these studies will primarily be the proteins involved in the translational machinery, ribosomal proteins and aminoacyl-tRNA synthetases (AARS), that are present in all domains of life. The goal is to apply these evolutionary profiles to identify structural words important for folding and to analyze evolutionary changes in patterns of structures and sequences governing protein/RNA. With data available from over 25,000 entries in the structural databases and the complete sequences of over 200 genomes, major evolutionary events can now be correlated with changes in shape and function of the proteins. Features extracted from these profiles will also be instrumental in developing new assays for improving gene annotation. These profiles will be implemented and tested in a protein structure prediction algorithm. Using hybrid molecular dynamics simulations, the folding of the structural words and interactions of the AARS with tRNA for organism s from several domains of life will be investigated and the results compared to studies carried out in collaboration with an experimental group. The information obtained from this work will enhance the understanding of the evolutionary and physical principles of interplay between structure, function, and folding.
The concepts and methods developed in this project will be incorporated into the course, "Computational Chemical Biology" that the PI has recently developed in the UIUC Chemistry Department. This course is aimed at undergraduates and first-year graduate students interested in research at the interface of chemistry, biophysics, and computer science. The preliminary work on this evolutionary approach to bioinformatics and the QR algorithm has been presented at two NSF sponsored workshops on computational biology as well as several other recent workshops in the US and Australia. More hands-on workshops are in the planning, and copies of the tutorials are made available through the web. The QR algorithms and evolutionary profiles developed in this project will be made available to the broader scientific community as part of a sequence and structure bioinformatics plug-in for the popular visualization program VMD that is freely distributed.