Structure comparison of proteins is important for understanding the evolutionary relationships between proteins, predicting protein structures and predicting protein functions. Despite many studies in the past over the years, it remains a challenging and open problem. Proteins are flexible molecules and rigid matching of their structures, as used by most current methods, has the difficulty of recognizing relatively distant, functionally important similarities. Another well known issue in structure comparison is the lack of comprehensive statistical framework for assessing the statistical significance of similarities between individual protein structures and classes of protein structures. In this proposal, we develop methods based on elastic shape analysis (ESA) for protein structure comparison and alignment. ESA allows flexible matching using a combination of stretching and bending of two protein structures, which is quantified by a formal distance, geodesic distance. The minimum geodesic distance, corresponding to the best matching between any two structures, can be obtained by efficient dynamic programming algorithm. Mean and covariance of a population of structures can be calculated. Rigorous statistical framework can be developed for structure comparison and classification. Under this framework, similarities between two structures can be assessed;family-specific structure variations within a protein family can be characterized;and hypothesis testing for structure classification can be conducted. Based on the framework, we propose to 1) develop a unified statistical framework for classification of protein structures usin probability distributions built from families of protein structures;2) develop multiple structure alignment method based on the mean structure calculated for a group of protein structures;and 3) develop a method for aligning protein structures on the joint sequence-structure space to incorporate both backbone geometric and sequence information into structure alignment.

Public Health Relevance

Despite being an old problem, structure alignment of proteins is still very challenging and open. In this proposal, we develop a comprehensive mathematical framework for protein structure alignment and address several unsettled issues, including (1) flexible structure alignment;(2) a formal distance between any two protein structures;(3) probability distributions for families of protein structures and their use in automatic classificaton of protein structures;and (4) alignment of protein structures by incorporating both backbone geometric and sequence information.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Florida State University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Tang, Ke; Wong, Samuel W K; Liu, Jun S et al. (2015) Conformational sampling and structure prediction of multiple interacting loops in soluble and β-barrel membrane proteins using multi-loop distance-guided chain-growth Monte Carlo method. Bioinformatics 31:2646-52
Tang, Ke; Zhang, Jinfeng; Liang, Jie (2014) Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method. PLoS Comput Biol 10:e1003539
He, Gewen; Steppi, Albert; Laborde, Jose et al. (2014) RASS: a web server for RNA alignment in the joint sequence-structure space. Nucleic Acids Res 42:W377-81
Laborde, Jose; Robinson, Daniel; Srivastava, Anuj et al. (2013) RNA global alignment in the joint sequence-structure space using elastic shape analysis. Nucleic Acids Res 41:e114
Ellingson, Leif; Zhang, Jinfeng (2012) Protein surface matching by combining local and global geometric information. PLoS One 7:e40540