The goal of this R21/R33 proposal is to develop novel models, scoring schemes, and techniques based on the mini-threading approach for protein structure prediction. During the R21 phase, we will focus on the proof-of-principle development for our new methods. First, we will develop new statistical models and computational methods to identify useful fragments in PDB for a query protein. In particular, we will identify protein fragments of variable lengths in PDB according to statistically significant matches instead of limiting the fragments to 9-mers as practiced by existing methods. Second, besides angular restraints used in the current threading methods, we will formulate spatial restraints derived from the alignments between a query sequence and its fragment hits of known structures in Cartesian coordinates. Third, we will investigate new optimization problem formulations to build coarse-grain structural models. Specifically, we will tailor advanced optimization techniques, such as semidefinite programming and evolutionary algorithms, to find the efficient methods of assembling local structures. Fourth, we will evaluate confidence of predicted protein structures through clustering sampled conformations, correlated mutation, and neural networks. Fifth, we will build all-atom structural models for selected coarse-grain models, and further evaluate the models using properties of atomic structures under perturbation (e.g., high temperature or force). During the R33 phase, we will focus on the evaluation, refinement, extension and application of the methods developed during the R21 phase. First, we will perform large-scale evaluations of the methods, and we will refine the methods based on the evaluations and tests. Second, we will implement the methods as a stand-alone software package for public distribution and a Web server available for the public. Third, we will expand our methods to structure prediction of membrane proteins. Finally, we will apply the methods to selected proteins that have significant impact to human health, such as CFTR channels, proteins coded in the SARS genome, strabismus (stbm)/van Gogh (Vang) protein, ARC superfamily, etc. The new techniques may significantly increase the accuracy of the protein structure prediction whiling saving computing time. They will extend to membrane proteins, whose structures have understudied by major drug targets for many diseases. Our studies will shed some light on the structures and functions of a set of key human proteins, which may help researchers characterize disease genes and develop new treatment with substantial savings of resources. ? ? ?
Showing the most recent 10 out of 22 publications