New tools based on graph theory have revolutionized genome analyses, providing better ways to identify and classify rearrangements of genomic fragments. The same tools recently also provided a major breakthrough in multiple sequence alignment. Here we propose to apply these tools to protein structure analysis and use the resulting insights into protein structure evolution to increase model quality in comparative modeling. Structure comparison between distant homologs show clearly that the dominant paradigm in structure comparison, that a protein structure could be divided into an invariant core and flexible loops breaks down below 40%-50% sequence identity threshold. Instead, significant rearrangements can happen anywhere in the structure, with secondary structure elements undergoing significant shifts and movements. As a result, standard protocol in comparative modeling, based on sequence mounting on a rigid core structure, must fail for such homologs. Structural differences between homologs are driven, as is the entire folding process, by free energy of the system, but because of serious deficiencies in current force fields and computational approaches, energy-based predictions of such changes are not successful. In this grant we propose to improve the quality of comparative modeling by first discovering and then applying empirical rules of protein structure changes. Rapid growth of the number of known protein structures, fueled in part by technical advances in high throughput structure determination spearheaded by the Protein Structure Initiative, resulted in increasingly dense coverage of the structural space of many folds. This provides a rich learning base to discover such empirical rule, provided a right formalism to describe protein structure changes can be developed. In preliminary analyses we have shown that in a next approximation after the invariant core/flexible loops, protein structure can be described as built from rigid subdomains, and simple rearrangements of these subdomains account for almost half of the structural differences between distant homologs. Moreover, proteins can only adopt structures lying in a specific low dimensionality subspace of the entire conformational space. To improve the quality of models from comparative modeling, we plan to identify conserved subdomains for all known folds and to describe the allowed subspaces by analyzing already known structures from these folds. In the next step we will use this information to generate possible variants of the template structure and use model evaluation tools to identify the one most similar to the [sic].

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Exploratory Grants (P20)
Project #
3P20GM076221-03S1
Application #
7780895
Study Section
Special Emphasis Panel (ZGM1-CBB-3 (HM))
Program Officer
Smith, Ward
Project Start
2006-04-01
Project End
2010-10-31
Budget Start
2009-04-01
Budget End
2010-10-31
Support Year
3
Fiscal Year
2009
Total Cost
$651,131
Indirect Cost
Name
Sanford-Burnham Medical Research Institute
Department
Type
DUNS #
020520466
City
La Jolla
State
CA
Country
United States
Zip Code
92037
Cai, Xiao-Hui; Jaroszewski, Lukasz; Wooley, John et al. (2011) Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering. Proteins 79:2389-402
Zhang, Qing; Zmasek, Christian M; Cai, Xiaohui et al. (2011) TIR domain-containing adaptor SARM is a late addition to the ongoing microbe-host dialog. Dev Comp Immunol 35:461-8
Zakharov, Mikhail N; Pillai, Biju K; Bhasin, Shalender et al. (2011) Dynamics of coregulator-induced conformational perturbations in androgen receptor ligand binding domain. Mol Cell Endocrinol 341:1-8
Zmasek, Christian M; Godzik, Adam (2011) Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol 12:R4
Weekes, Dana; Krishna, S Sri; Bakolitsa, Constantina et al. (2010) TOPSAN: a collaborative annotation environment for structural genomics. BMC Bioinformatics 11:426
Zhang, Qing; Zmasek, Christian M; Godzik, Adam (2010) Domain architecture evolution of pattern-recognition receptors. Immunogenetics 62:263-72
Ellrott, Kyle; Jaroszewski, Lukasz; Li, Weizhong et al. (2010) Expansion of the protein repertoire in newly explored environments: human gut microbiome specific protein families. PLoS Comput Biol 6:e1000798
Burra, Prasad V; Zhang, Ying; Godzik, Adam et al. (2009) Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure. Proc Natl Acad Sci U S A 106:10505-10
Jaroszewski, Lukasz; Li, Zhanwen; Krishna, S Sri et al. (2009) Exploration of uncharted regions of the protein universe. PLoS Biol 7:e1000205
Zhang, Ying; Thiele, Ines; Weekes, Dana et al. (2009) Three-dimensional structural view of the central metabolic network of Thermotoga maritima. Science 325:1544-9

Showing the most recent 10 out of 21 publications