The research involves the design and analysis of efficient algorithms for fundamental problems that arise in studies of the three-dimensional structures of proteins. Graph-theoretic problems underlie these studies, since protein structures are naturally (and sufficiently) represented by graphs that have vertices for the individual amino acid residues and edges between close pairs. However, graph-theoretic formalisms lead to computationally hard optimization problems, further complicated by extensive amounts of noise in experimental data. Motivated by specific challenges in nuclear magnetic resonance spectroscopy and other protein structure studies, the project addresses two significant algorithmic problems: identifying correspondences between a pair of graphs where one is a significantly corrupted version of the other, and determining three-dimensional coordinates for the vertices of a graph, given approximate, noisy distance measurements for its edges.

The first algorithmic problem is a form of graph matching, and the project focuses on developing efficient search algorithms to uncover correspondences, with random graph models to rigorously analyze the algorithms and study threshold phenomena characterizing robustness to noise. In an application to analysis of NMR data, one of the graphs represents the protein and the other the data, a noisy, ambiguous set of atomic interactions; the goal is to match the NMR-identified interactions with specific atomic interactions in the protein. The second algorithmic problem is Euclidean embedding for sparse geometric graphs, and the research involves development of algorithms to render such graphs amenable to low rank distance matrix reconstruction methods, generalizing the reconstruction methods to exploit the underlying geometric structure and account for the confounding noise structure. In the NMR setting, the graph represents NMR-probed through-space atomic interactions, and the goal is to compute structures consistent with the experimental data and biophysical constraints. Both problems are fundamental to numerous other significant applications in protein structure studies.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dartmouth College
United States
Zip Code