Structure comparison of proteins is important for understanding the evolutionary relationships between proteins, predicting protein structures and predicting protein functions. Despite many studies in the past over the years, it remains a challenging and open problem. Proteins are flexible molecules and rigid matching of their structures, as used by most current methods, has the difficulty of recognizing relatively distant, functionally important similarities. Another well known issue in structure comparison is the lack of comprehensive statistical framework for assessing the statistical significance of similarities between individual protein structures and classes of protein structures. In this proposal, we develop methods based on elastic shape analysis (ESA) for protein structure comparison and alignment. ESA allows flexible matching using a combination of stretching and bending of two protein structures, which is quantified by a formal distance, geodesic distance. The minimum geodesic distance, corresponding to the best matching between any two structures, can be obtained by efficient dynamic programming algorithm. Mean and covariance of a population of structures can be calculated. Rigorous statistical framework can be developed for structure comparison and classification. Under this framework, similarities between two structures can be assessed;family-specific structure variations within a protein family can be characterized;and hypothesis testing for structure classification can be conducted. Based on the framework, we propose to 1) develop a unified statistical framework for classification of protein structures usin probability distributions built from families of protein structures;2) develop multiple structure alignment method based on the mean structure calculated for a group of protein structures;and 3) develop a method for aligning protein structures on the joint sequence-structure space to incorporate both backbone geometric and sequence information into structure alignment.
Despite being an old problem, structure alignment of proteins is still very challenging and open. In this proposal, we develop a comprehensive mathematical framework for protein structure alignment and address several unsettled issues, including (1) flexible structure alignment;(2) a formal distance between any two protein structures;(3) probability distributions for families of protein structures and their use in automatic classificaton of protein structures;and (4) alignment of protein structures by incorporating both backbone geometric and sequence information.