One of the ultimate goals of efforts in structural biology is to have a comprehensive molecular understanding of the biological processes in an organism. A fundamental step towards this goal is to elucidate the structure of, and relationships between, the large numbers of proteins involved in these processes. Such large-scale studies are necessary to uncover, for example, the genetic and molecular basis for diseases, as well as for designing effective treatment of these diseases. Thus, a holistic understanding of such biological systems necessarily requires high-throughput computational methods that can process and interpret experimental data recorded on proteins, for example from X-ray crystallography, or Nuclear Magnetic Resonance (NMR), and synthesize it with other empirically available data (e.g., a repository of protein structures such as the Protein Data Bank). Currently, practitioners in structural biology make use of manual or ad hoc approaches for interpreting and synthesizing data that lack guarantees on solution quality or running time (or both). To enable large-scale studies in structural biology, we propose to study several basic problems from a computational point of view, and provide formulations and algorithms that are compatible with the goals of biologists and biochemists. For example, practical structure determination methods that incorporate experimental data are typically based on techniques such as simulated annealing or molecular dynamics, which typically have no guarantees on solution quality or running time. Due to their heuristic nature, even automated methods require manual intervention, and can be quite time consuming (e.g., months to years, depending on the desired accuracy and the difficulty of processing and interpreting experimental data. Understanding biological systems at a molecular level requires the study of tens to hundreds of proteins, so it is clear that traditional approaches, both manual and automated, are a bottleneck to proteomics research, and we must develop efficient computational approaches to enable large-scale proteomics efforts. Using a combination of experimental and modeling data, the research spans three problem areas: protein structure, protein-protein interactions, and allosteric regulation. A key question when designing formulations for biological problems is, how much input (and how much experimental uncertainty or noise) admits efficient and accurate algorithms? The high-level objectives in each of the proposed problem areas is to: 1. Consider available experimental and modeling data, and define realistic optimization criteria that are compatible with the goals of biologists and biochemists. 2. Design algorithms that efficiently compute optimal or near-optimal solutions, and provide worst-case analysis on solution quality and running time. If such algorithms are not possible, provide lower bound proofs that show the extent to which input data is not adequate. The work will be evaluated through collaboration with biologists. The research actively engages graduate and undergraduate students in the interdisciplinary research as well as more theoretical computer science. A summer project has been developed for a high school student that will provide the basis of a set of locl high school outreach activities.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0643768
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2007-02-01
Budget End
2012-10-31
Support Year
Fiscal Year
2006
Total Cost
$518,888
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003