The """"""""twillight zone"""""""" is a term coined by Burkhard Rost to refer to remote protein homologs whose sequence similarity to proteins of known structure is sufficiently low that computational detection of the homology becomes quite challenging. We propose to construct new threading methods that extend protein structure prediction and sequence/structure alignments further into the """"""""twilight zone"""""""". We attack the problem on two fronts: first, we intend to extend the """"""""wrapping"""""""" methods we designed to successfully attack two hard special cases of this problem to other SCOP superfamilies. This involves casting the pairwise dependencies in the wrapping portion of the energy function into the general framework of Markov Random Fields, while still allowing them to wrap multiple aligned sequences to narrow the search space; then using a more sophisticated energy function based on a backbone-dependent rotamer library for sidechain packing. Second, our programs for the beta-helix and trefoil folds used human intervention to construct the core structural templates on which we are wrapping the sequences to predict whether they could fold into these structures or not. In order to construct a general threading program with reasonable fold library coverage for the PDB, we need to solve the challenge of automating the construction of a structural template from a set of proteins that, for example, all belong to the same SCOP superfamily. Most current threading programs train too closely to a backbone of one particular structure to be able to capture remote homologs. We propose a novel multiple structure alignment that adds geometric flexibility to capture similarities between more distant homologs, from which more general core templates can be abstracted. The applications of better computational protein structure prediction to speed up medical discovery are well-known. BetaWrap, our first beta-helix prediction program, already uncovered a previously-unknown relationship between the beta-helix fold and the virulence of microbial pathogens. A striking prediction of the BetaWrap program is that the beta-helix fold is predicted for many surface adhesins, toxins, and other recognition/penetration proteins of human pathogens. Our prediction that a major pollen allergen forms the beta-helix shape has just recently been confirmed experimentally.

Public Health Relevance

Advances in computational protein structure prediction can help guide prediction of protein function, and thus speed medical discovery. This proposal is especially targeted at improving prediction of beta-structural motifs, which include many protein families that are important for bacterial pathogenesis, with representatives from whooping cough toxin (beta-helices) to the botulism toxin (beta-trefoils). BetaWrap, our first beta-helix prediction program, already uncovered a previously-unknown relationship between the beta-helix fold and the virulence of microbial pathogens. A striking prediction of the BetaWrap program is that the beta-helix fold is predicted for many surface adhesins, toxins, and other recognition/penetration proteins of human pathogens. Our prediction that a major pollen allergen forms the beta-helix shape has just recently been confirmed experimentally. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM080330-01A1
Application #
7460514
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Remington, Karin A
Project Start
2008-06-01
Project End
2012-05-31
Budget Start
2008-06-01
Budget End
2009-05-31
Support Year
1
Fiscal Year
2008
Total Cost
$263,008
Indirect Cost
Name
Tufts University
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
073134835
City
Medford
State
MA
Country
United States
Zip Code
02155
Daniels, Noah M; Gallant, Andrew; Ramsey, Norman et al. (2015) MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov Random Fields and Stochastic Search. IEEE/ACM Trans Comput Biol Bioinform 12:4-16
Cao, Mengfei; Zhang, Hao; Park, Jisoo et al. (2013) Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS One 8:e76339
Gallant, Andrew; Leiserson, Mark D M; Kachalov, Maxim et al. (2013) Genecentric: a package to uncover graph-theoretic structure in high-throughput epistasis data. BMC Bioinformatics 14:23
Daniels, Noah M; Gallant, Andrew; Peng, Jian et al. (2013) Compressive genomics for protein databases. Bioinformatics 29:i283-90
Daniels, Noah M; Hosur, Raghavendra; Berger, Bonnie et al. (2012) SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone. Bioinformatics 28:1216-22
Daniels, Noah M; Nadimpalli, Shilpa; Cowen, Lenore J (2012) Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment. BMC Bioinformatics 13:259
Daniels, Noah M; Kumar, Anoop; Cowen, Lenore J et al. (2012) Touring protein space with Matt. IEEE/ACM Trans Comput Biol Bioinform 9:286-93
Leiserson, Mark D M; Tatar, Diana; Cowen, Lenore J et al. (2011) Inferring mechanisms of compensation from E-MAP and SGA data using local search algorithms for max cut. J Comput Biol 18:1399-409
Kumar, Anoop; Cowen, Lenore (2010) Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution. Bioinformatics 26:i287-93
Menke, Matt; Berger, Bonnie; Cowen, Lenore (2010) Markov random fields reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system. Proc Natl Acad Sci U S A 107:4069-74

Showing the most recent 10 out of 14 publications