Protein structure mediates protein function and, ultimately, organismal behavior. A complement of computational and experimental approaches is necessary to determine structures for the large numbers of protein sequences available from whole genome sequencing projects. We propose a novel approach to integrate easily-obtained data from Nuclear Magnetic Resonance (NMR) experiments on proteins with our prediction methodologies to accurately model structures in a rapid manner. Specifically, our aims are to: 1. Automate secondary structure assignment using chemical shift, J-coupling, unassigned NOE data and sequence based algorithms. We will use neural networks to efficiently and accurately combine the different datasets. 2. Sample protein conformational space by translating secondary structure, chemical shift, J-coupling and database tendencies into backbone angle probability distributions. These distributions, generated using neural networks, will be used to bias the sample space explored by our de novo methods for a given protein sequence such that a large proportion of native-like conformations consistent with the input data are encountered. 3. Select the most native-like conformations by combining NMR data with existing statistical and physical functions. NMR scoring functions will be based on the similarity of backbone angles and simulated NOE spectra with the calculated probability distributions and the input NOE data. 4. Refine the quality of the conformational ensemble automatically assigning the NOE data to obtain non-local constraints. The simulated spectra from the best scoring conformations will be used to obtain an initial subset of constraints which will be incorporated into the generation of new conformations, thus iteratively assigning the NOE data and improving the quality of the conformations until a final set of structures fitting the input data is obtained. 5.Test the methods developed in a robust and unbiased manner. We will set up internal testing mechanisms that avoid bias to particular classes of proteins; evaluate components of predictions separately from whole predictions to identify those that work well and those that need further improvement; and perform continuous benchmarking of our methods 6. Enable NMR experimentalists to submit sequences for which we will make prediction using the methods described above. We will publish the software produced, and the information obtained, using database driven interfaces on the world wide web.
Showing the most recent 10 out of 26 publications