This proposal's objective is to develop a new class of statistical models to advance scientific knowledge of protein tertiary structure and to extend template-based modeling to protein loop regions. As advancement in basic science, the improved modeling of protein structure will broadly impact biomedical fields. The following specific aims will be accomplished.
The first aim (Random Partition Models Indexed by Pairwise Information) is to develop probability models for partitions that are explicitly non-exchangeable, utilizing available pairwise information to influence the clustering of data. Four distributions ar proposed, each using the pairwise information by modifying identities from the Chinese Restaurant Process, a popular probability model for clustering. Hierarchical clustering uses pairwise distance, but current methods for protein structure modeling do not. The proposed method provides a means to incorporate this type of information into Bayesian nonparametric models for protein structure.
The second aim (Template-Based Modeling of Loop Conformation Space Using Partition Models) applies the proposed random partition models in loop modeling. This proposal will improve our previous estimation approach by accounting for the influences of individual amino acids as well as for influences from neighboring residues. New methods based on the random partition models will provide rigorous statistical modeling at and between residue positions allowing one to limit and precisely sample the conformational space. This will in turn allow for a clearer understanding of roles of loops in catalytic sites and protein signaling.
The final aim (New Paradigm for Protein Packing and Higher-Order Structure Using Partition Models) applies the statistical modeling to estimate the propensities of a new model of protein packing called the ball/socket. Statistical modeling of the amino acid propensities within the ball/socket motifs and between patterns of motifs will allow insights into the rules governing packing, filling a substantial gap in current understanding of protein structure. The statistical model estimating these propensities will exploit the known pairwise information by using the proposed random partition models. Such analysis is currently not available to the scientific community.

Public Health Relevance

More accurate and improved modeling of protein structure from sequence will greatly aid the biomedical community in a better understanding of disease states. Moreover, producing accurate models of protein structure directly from sequence leverages the vast amounts of genetic information produced by the many genome projects. Accurate protein structure modeling also informs drug discovery by prioritizing targets.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM104972-04
Application #
8839256
Study Section
Special Emphasis Panel (ZGM1-CBCB-5 (BM))
Program Officer
Wehrle, Janna P
Project Start
2012-07-01
Project End
2016-04-30
Budget Start
2015-05-01
Budget End
2016-04-30
Support Year
4
Fiscal Year
2015
Total Cost
$350,928
Indirect Cost
$31,548
Name
Brigham Young University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
009094012
City
Provo
State
UT
Country
United States
Zip Code
84602
Dahl, David B; Day, Ryan; Tsai, Jerry W (2017) Random Partition Distribution Indexed by Pairwise Information. J Am Stat Assoc 112:721-732
Li, Qiwei; Dahl, David B; Vannucci, Marina et al. (2016) KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 32:3774-3781
Fraga, Keith J; Joo, Hyun; Tsai, Jerry (2016) An amino acid code to define a protein's tertiary packing surface. Proteins 84:201-16
Joo, Hyun; Chavan, Archana G; Fraga, Keith J et al. (2015) An amino acid code for irregular and mixed protein packing. Proteins 83:2147-61
Joo, Hyun; Tsai, Jerry (2014) An amino acid code for ?-sheet packing structure. Proteins 82:2128-40
Li, Qiwei; Dahl, David B; Vannucci, Marina et al. (2014) Bayesian model of protein primary sequence for secondary structure prediction. PLoS One 9:e109832
Day, Ryan; Joo, Hyun; Chavan, Archana C et al. (2013) Understanding the general packing rearrangements required for successful template based modeling of protein structure from a CASP experiment. Comput Biol Chem 42:40-8