Despite rapid progress in structural bioinformatics, a rigorous and unifying mathematical and statistical framework is missing in our current toolbox for analysis, classification, and organization of individual as well as groups of biomolecules. We have recently developed such a framework based on the elastic shape analysis (ESA) for the comparison of protein and RNA structures. Under this framework, the formal geodesic distance for any two protein/RNA structures can be computed rapidly. Probability distributions can also be built for families of protein/RNA structures, and can be used to classify structures in a principled way through statistical hypothesis testing. In addition, sequence information can be naturally incorporated so that comparison of structures can be conducted in the joint sequence-structure space. We have also developed novel algorithms for matching and analyzing protein surfaces. We propose to significantly further develop these methodologies for important applications in structure biology, including studying chromosome structures by combining both 30 structure and sequence level information. The proposed research will make significant contributions to the following areas: (1) This proposal will fill an important gap in structure biology - the lack of a rigorous mathematical and statistical framework for biomolecular structure comparison; (2) Our proposed unifying framework will allow natural incorporation of sequence information for structure comparison; (3) Our approach can uncover distinct clusters at the deepest level of current classification scheme (i.e. SCOP family), enabling a finer classification of biomolecular structures. Preliminary results indicate that by using carefully measured structural similarity, we will obtain representative sets of proteins of higher quality than those by current sequence similarity based methods; (4) The probabilistic models designed for protein/RNA backbone structures and surfaces will capture the flexible nature of protein structures through the use of ensemble of conformations, while maintaining high computational efficiency. These models will also enable effective characterization of family-specific variations among proteins, an important task none of the existing methods work well; (5) Protein/RNA structures will be organized using network-based data structures using probabilistic approaches. This new organization will effectively integrates sequence, backbone structure, and surface information, facilitating discovery of novel insight; and (6) these new development will be rapidly generalized for studying chromosome structures. This proposed research will allow development of tools that will also be applicable in other areas of shape analysis, including medical image analysis, computer vision, and pattern recognition. Our work will help to increase the communication between the field of protein structure analysis and the field of shape analysis, and will stimulate more cross-over development in methodology and transform research activities in both fields.

Public Health Relevance

Analysis, classification and organization of biomolecules are fundamental tasks essential for understanding the sequence-structure-function relationships of biomolecules. In this project, we aim to develop rigorous and unifying mathematical and statistical frameworks for such tasks and apply them to study proteins, RNAs and chromosomes.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM126558-02
Application #
9520279
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Wehrle, Janna P
Project Start
2017-07-01
Project End
2022-06-30
Budget Start
2018-07-01
Budget End
2019-06-30
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Florida State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
790877419
City
Tallahassee
State
FL
Country
United States
Zip Code
32306
Perez-Rathke, Alan; Fahie, Monifa A; Chisholm, Christina et al. (2018) Mechanism of OmpG pH-Dependent Gating from Loop Ensemble and Single Channel Studies. J Am Chem Soc 140:1105-1115
Girimurugan, Senthil B; Liu, Yuhang; Lung, Pei-Yau et al. (2018) iSeg: an efficient algorithm for segmentation of genomic and epigenomic data. BMC Bioinformatics 19:131
Yang, Yiqing; Guo, Ruiqiong; Gaffney, Kristen et al. (2018) Folding-Degradation Relationship of a Membrane Protein Mediated by the Universally Conserved ATP-Dependent Protease FtsH. J Am Chem Soc 140:4656-4665
Tian, Wei; Lin, Meishan; Tang, Ke et al. (2018) High-resolution structure prediction of ?-barrel membrane proteins. Proc Natl Acad Sci U S A 115:1511-1516
Turpin, Zachary M; Vera, Daniel L; Savadel, Savannah D et al. (2018) Chromatin structure profile data from DNS-seq: Differential nuclease sensitivity mapping of four reference tissues of B73 maize (Zea mays L). Data Brief 20:358-363
Bou-Dargham, Mayassa J; Liu, Yuhang; Sang, Qing-Xiang Amy et al. (2018) Subgrouping breast cancer patients based on immune evasion mechanisms unravels a high involvement of transforming growth factor-beta and decoy receptor 3. PLoS One 13:e0207799
Gürsoy, Gamze; Xu, Yun; Liang, Jie (2017) Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model. PLoS Comput Biol 13:e1005658
Liu, Yuhang; Zhang, Jinfeng; Qiu, Xing (2017) Super-delta: a new differential gene expression analysis procedure with robust data normalization. BMC Bioinformatics 18:582