Advances in biomolecular modeling and simulation have yielded massive amounts of biomolecular data, exposing the myriad of forms/structures assumed and leveraged by a biomolecule to modulate its biological activities in the cell. These structures are embedded in the energy landscape, which organizes structures by their energetics and underscores the inherent nature of biomolecules as dynamic systems interconverting between structures with varying energies. In principle, the landscape contains all the information needed to expose and characterize biomolecular dynamics and link it to (dys)function, molecular mechanisms, and our biology. The objective of this project is to advance research on statistical inference of geometric features of protein energy landscapes as an essential means of understanding and predicting the phenotypic/functional impact of protein sequence variations on dynamics and function.

To cope with the flexible shape of geometric features, nonparametric approaches are adopted in the proposed inferential framework, where challenges are tackled by synergistically integrating statistics with differential geometry and Morse theory. Novel methodology will be investigated to test the statistical significance of stable structural states (anti-modes) on molecular landscapes, and study asymptotic behaviors in the estimation of optimal paths (integral curves) between stable states to support stochastic optimization research on constructing molecular landscapes. A computationally-feasible goodness-of-fit test will be developed for basins (level sets) on landscapes. The project will also establish asymptotic distributional results of surface integrals on the boundary of basins, which are quantitative descriptors of the topological and geometric information of the landscapes that support in-silico discoveries on the functional impact of protein sequence variations. The proposed activities will make a direct contribution to modern statistics, molecular biology and molecular modeling by linking the statistical inference of energy landscapes to (altered) molecular dynamics and (dys)function. The activities will additionally support fields where geometric features in spatial data are of interest, such as geology, cosmology, neuroscience, remote sensing, and atmospheric science.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Application #
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
George Mason University
United States
Zip Code