While automation is revolutionizing many aspects of biology, the determination of three-dimensional protein structure remains a long, hard, and expensive task. High-throughput structural genomics is required in order to apply modem techniques such as structure-based drug design on a much larger scale. Traditional (semi-) automated approaches to protein structure determination through nuclear magnetic resonance (NMR) spectroscopy require dozens of experiments and months of spectrometer time, making them unsuitable for high-throughput automation. One of the main bottlenecks in the determination of three-dimensional protein structures by NMR is the assignment of chemical shifts to atoms in a biopolymer. Therefore, high-throughput structure determination using NMR requires a systematic attack on the assignment problem. Novel algorithmic techniques are proposed for automated assignment and protein structure determination from sparse, unassigned NMR data, based on an approach called Jigsaw. The proposed research aims to minimize the number and types of NMR experiments that must be performed and the amount of human effort required to interpret the experimental results, while still producing an accurate analysis of the protein structure. To enable high-throughput data collection, the proposed methods utilize only a few fast, cheap NMIR experiments. The research will build on Jigsaw to develop a minimalist approach, demonstrating the large amount of information available in a few key spectra, and how it can be extracted using a combination of combinatorial and geometric algorithms. New algorithms and computer systems will be developed for determining protein structure from only four NMR spectra. The system will use algorithms similar to and adapted from physical geometric algorithms, pattern recognition and machine vision, signal processing, and robotics, in order to analyze spectra, assign spectral peaks to atom interactions, compute secondary structure, and estimate the global fold. Jigsaw will be extended to work on larger proteins, and tested on experimental NMR data. A novel probabilistic framework will be implemented to handle the increased spectral complexity and sparser information content obtained both for larger proteins, and in high-throughput NMR protocols.
Showing the most recent 10 out of 45 publications