The objective of this CAREER project is to develop rapid methods of extracting structural information from unassigned NMR data in a computationally efficient manner. More specifically, Probability Density Profile Analysis (PDPA) will be developed. PDPA will utilize Residual Dipolar Coupling (RDC) data that can be acquired very rapidly and accurately by Nuclear Magnetic Resonance spectroscopy. This project will address the main obstacles impeding the realization of the full potential for computational means for protein structure prediction. At this time, at least two major problems are preventing the transition from a purely-experimental to a purely-computational mode of protein structure determination. First, a slow discovery of novel protein folds (20-50 years to complete a vital library of folds), and second current computational tools provide little to no benefit to the community of structural biologists. PDPA exhibits the potential to address these two disparate problems. Such a method can increase the efficiency of selecting novel protein targets by structural genomic centers and therefore reducing the time required for completion of an exhaustive library of protein folds. In addition, such a method will provide the means of experimentally validating computed protein structures and therefore provide a means of protein structure determination other than NMR and X-ray crystallography. This application of PDPA should be of great interest to the universal community of structural biologists of the twenty-first century.

Current lack of familiarity of students with areas of computational biology and bioinformatics poses a great impediment in attracting students for research in this field and, consequently, in the advancement of the field. This project will integrate graduate and undergraduate researchers from various disciplines to alleviate this problem. Moreover, combination of the proposed courses, annual competitions, summer short courses, and student exchange programs is expected to increase students' level of awareness and participation in research related to the field. Furthermore, the above mentioned programs are anticipated to facilitate intercollegiate collaboration and recruitment of under represented students and faculty into this research work.

Project Report

Computational modeling of protein structures has experienced significant advances in recent years. During the recent CASP competitions, high resolution structure determination of proteins was demonstrated as well as De Novo structure determination. Despite their many advances, computational modeling tools such as I-TASSER and ROSETTA are still not utilized by the community of structural biologists or pharmaceutical endeavors as anticipated. This lack of their utility can be attributed to the fact that confidence in modeled structures still lies well short of confidence in experimental structures. Methods that rely on a minimum set of experimental data to confirm or reject computationally hypothesized structures, can increase the practical utility of computational tools and potentially reduce the cost (time and money) of protein structure determination. nD-PDPA (n Dimensional Probability Density Profile Analysis) combined with nD-RDC analysis approach have provided one demonstrated path to structure validation and refinement from unassigned RDC data. nD-RDC Analysis - primarily focuses on the observation that some information regarding the anisotropic properties of the molecular alignment can be obtained from studying the statistical properties of unassigned RDC data. Figure 1 illustrates the distribution of a large number of vectors from two alignment media. The highlighted convex-hull that encapsulates this distribution (also named l-map) parameterizes seven of the ten parameters that are needed to fully describe two order tensors including orientational components of the anisotropy [1]. The same principle has been demonstrated for higher dimension with additional features that is intuitively expected with additional data [2]. Accurate estimation of seven out of ten parameters that are needed to fully describe two alignment tensors can be invaluable in a number of endeavors such as structure refinement from RDC data or nD-PDPA as described in the following section. nD-PDPA – has been introduced as a method for rapid classification of an unknown protein to a fold family using unassigned RDC data. Although the nD-PDPA is very flexible in utilizing unassigned NMR data such as a variety of RDCs or pseudo contact shifts (PCS), in practice only the backbone N-H RDC data are frequently used. PDPA extends the basic observation of nD-RDC analysis by stating that the distribution of RDC data in two dimensions (two alignment media) is entirely a function of the protein's tertiary structure (example shown in Figure 2). Therefore, any two structures that exhibit similar distribution of RDC data must exhibit some structural similarity. Applications of nD-PDPA have successfully demonstrated the above observation under controlled simulated data [3] and experimental data [4]. In the interest of brevity we only discuss one instance of nD-PDPA's application. During this exercise, unassigned, backbone N-H RDC data for a 79 residue structural genomics target protein was acquired from two alignment media. A total of 51 ( only 64% of the total possible data) pairs of RDC data was obtained from the entire protein. Parallel to data acquisition, the ROSETTA modeling program was used to obtain ten plausible models for this protein while the structure of the protein underwent determination by X-Ray crystallography. Utility of 2D-PDPA identified models 8, 5, 4 and 1 as the top structural homologues to the actual structure. Upon the availability of the actual structure, these models exhibited 3.1, 3.29, 3.88 and 3.43 Å of difference measured over the backbone atoms correspondingly. The remaining six structures exhibited bb-rmsd scores of between 4.3 and 7.5 Å. Figure 3 illustrates X-ray structure of the target protein (in red) superimposed on the best modeled structure selected by 2D-PDPA (in green). Our latest results related to the analysis of the T12 protein can be found in our most recent publication [5]. [1] Mukhopadhyay R, Miao X, Shealy P & Valafar H. Efficient and accurate estimation of relative order tensors from lambda-maps. J Magn Reson (2009) 198: pp. 236-247. [2] Miao X, Mukhopadhyay R & Valafar H. Estimation of Relative Order Tensors, and Reconstruction of Vectors in Space using Unassigned RDC Data and its Application. Journal of Magnetic Resonance (2008) : . [3] Valafar H & Prestegard JH. Rapid classification of a protein fold family using a statistical analysis of dipolar couplings. Bioinformatics (2003) 19: pp. 1549-1555. [4] Bansal S, Miao X, Adams MWW, Prestegard JH & Valafar H. Rapid classification of protein structure models using unassigned backbone RDCs and probability density profile analysis (PDPA). J Magn Reson (2008) 192: pp. 60-68. [5] Fahim, A., Mukhopadhyay, R., Yandle, R., Prestegard, J. H., & Valafar, H. (2013). Protein Structure Validation and Identification from Unassigned Residual Dipolar Coupling Data Using 2D-PDPA. Molecules (Basel, Switzerland), 18(9), 10162–88. doi:10.3390/molecules180910162

Agency
National Science Foundation (NSF)
Institute
Division of Molecular and Cellular Biosciences (MCB)
Application #
0644195
Program Officer
Kamal Shukla
Project Start
Project End
Budget Start
2007-01-01
Budget End
2012-12-31
Support Year
Fiscal Year
2006
Total Cost
$606,942
Indirect Cost
Name
University South Carolina Research Foundation
Department
Type
DUNS #
City
Columbia
State
SC
Country
United States
Zip Code
29208