Determining protein structure and function from genomic sequences and protein classification remains one of the most significant challenges in modern computational biology. Significant enhancement to the capacity of algorithms to predict protein shapes from sequences is proposed, focusing on major bottlenecks; e.g., the folding energy and the ability of making approximate matches. Algorithms to determine protein shapes from sequences have two major components: The first component (sampling) generates a set of plausible protein shapes; at least one of the sampled shapes is expected to be similar to the correct fold. The second component scores the different structures and decides on the best model. The radius of convergence of the energy function must be sufficiently large so that approximate matches will be detected as well (in threading approximate matches may include deletions and insertion). It is therefore clear that poor scoring functions (or energies), which are unable to identify the correct fold, are likely to diminish the capacity of the folding algorithm. At present, it is easy to generate a set of decoy (wrong) structures that will confuse existing energy functions. Mathematical programming and machine learning techniques (Support Vector Machines) will design enhanced folding and threading potentials. The training by these methods is automated and will lead to monotonic improvement in recognition as a function of the data size. To more effectively cover protein space, the goal is to learn 100 million data points in a single consistent potential with (at most) 10,000 parameters. The automated large scale learning is crucial at times in which the information on sequences and structures grows rapidly. A threading prediction server, based on the old and the new potentials, is and will be available, to the community at http://ser-loopp.tc.cornell.edu/Ioopp.html ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
7R01GM067823-05
Application #
7486680
Study Section
Molecular and Cellular Biophysics Study Section (BBCA)
Program Officer
Wehrle, Janna P
Project Start
2004-02-01
Project End
2008-05-31
Budget Start
2007-07-01
Budget End
2008-05-31
Support Year
5
Fiscal Year
2007
Total Cost
$132,856
Indirect Cost
Name
University of Texas Austin
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
170230239
City
Austin
State
TX
Country
United States
Zip Code
78712
Viswanath, Shruthi; Dominguez, Laura; Foster, Leigh S et al. (2015) Extension of a protein docking algorithm to membranes and applications to amyloid precursor protein dimerization. Proteins 83:2170-85
Elber, Ron (2015) From an SNP to a Disease: A Comprehensive Statistical Analysis. Structure 23:1155
Burke, Sean; Elber, Ron (2012) Super folds, networks, and barriers. Proteins 80:463-70
Ravikant, D V S; Elber, Ron (2011) Energy design for protein-protein interactions. J Chem Phys 135:065102
Adamczak, Rafal; Pillardy, Jaroslaw; Vallat, Brinda K et al. (2011) Fast geometric consensus approach for protein model quality assessment. J Comput Biol 18:1807-18
Phatak, Mukta; Adamczak, Rafa?; Cao, Baoqiang et al. (2011) Solvent and lipid accessibility prediction as a basis for model quality assessment in soluble and membrane proteins. Curr Protein Pept Sci 12:563-73
Swaminathan, Karthikeyan; Adamczak, Rafal; Porollo, Aleksey et al. (2010) Enhanced prediction of conformational flexibility and phosphorylation in proteins. Adv Exp Med Biol 680:307-19
Ravikant, D V S; Elber, Ron (2010) PIE-efficient filters and coarse grained potentials for unbound protein-protein docking. Proteins 78:400-19
Cao, Baoqiang; Elber, Ron (2010) Computational exploration of the network of sequence flow between protein structures. Proteins 78:985-1003
Lam, Ying Wai; Yuan, Yong; Isaac, Jared et al. (2010) Comprehensive identification and modified-site mapping of S-nitrosylated targets in prostate epithelial cells. PLoS One 5:e9075

Showing the most recent 10 out of 24 publications