Proteins are the molecular engines that perform biological functions essential to life. A major milestone in the understanding of protein behavior emerged with the advent of the "new view" of protein folding. This perspective conceives of protein structure, stability, and dynamics as governed by a molecular-level landscape not unlike a relief map describing the surface of the Earth. Each point on the landscape - analogous to latitude and longitude - corresponds to a particular spatial arrangement of protein atoms. The altitude of each point on the map defines protein stability - unstable conformations lie on mountaintops and stable conformations within valley floors. Determining these landscapes is a key goal of protein biology since they are useful in the understanding of natural proteins and design of synthetic proteins as drugs, enzymes, and molecular machines. It is relatively straightforward to calculate these landscapes for small proteins using computer simulations, but it has not been possible to do so from experimental measurements. It is the primary objective of this research to combine mathematical tools from the modeling of dynamical systems with machine learning approaches to analyze high-dimensional datasets to determine approximate protein folding landscapes directly from experimental data. The approach will first be validated in computer simulations of small proteins where the folding landscape is known. Theoretical analyses will place bounds on how close the approximate landscapes are to the true landscapes, and place conditions on the experimental data required for their determination. Ultimately the approach will be applied to experimental measurements of a tuberculosis protein. The computational analysis tool will be released as user-friendly software for free public download. Positive research experiences have great benefits for undergraduate success and retention, and this award will support summer and academic year research opportunities. New educational outreach materials will be developed for the University of Illinois "Engineering Open House" to promote awareness of materials science and engineering among middle- and high-school students.

The aim of this work is to integrate nonlinear manifold learning with dynamical systems theory to reconstruct protein folding landscapes from experimental time series measuring a single system observable. The "new view" of protein folding revolutionized understanding of folding as a conformational search over rugged and funneled free energy landscapes parameterized by a small number of emergent collective variables, with transformative implications for the understanding and design of proteins as drugs, enzymes, and molecular machines. It is now relatively routine to determine multidimensional folding landscapes from computer simulations in which all atomic coordinates are known, but it has not been possible to do so from experimental measurements of protein dynamics that are restricted to small numbers of coarse-grained observables. This research project integrates Takens' delay embeddings with nonlinear manifold learning using diffusion maps to first project univariate time series in an experimentally measurable observable into a high-dimensional space in which the dynamics are C1-equivalent to those in real space, and then extract from this space a topologically and geometrically equivalent reconstruction of the folding funnel to that which would have been determined from knowledge of all atomic coordinates. The reconstructed landscape preserves the topology of the true funnel - the metastable configurations and folding pathways - but the topography may be perturbed, i.e., the heights and depths of the free energy peaks and valleys. The three primary objectives of this work are to (i) validate the approach in molecular dynamics simulations of small proteins for which the true landscape is known, (ii) place conditions on the sampling resolution and signal-to-noise ratio in experimental measurements for robust landscape recovery, and theoretical bounds on the induced topographical perturbations, and (iii) apply the approach to experimental single-molecule Forster resonance energy transfer (smFRET) measurements on the lid-opening and closing dynamics of Mycobacterium tuberculosis protein tyrosine phosphatase (Mtb-PtpB).

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1841810
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2018-07-01
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$162,000
Indirect Cost
Name
University of Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637