A major feature of the biological science in the 21st century is its transition from qualitative and descriptive to quantitative and analytical. Experimental exploration of self-organizing biomolecular systems, such as viruses, molecular motors and proteins in Alzheimer's disease, has been a dominating driving force in scientific discovery and innovation in the past few decades. Unfortunately, quantitative understanding of biomolecular structure, function, and dynamics severely lags behind the pace of the experimental progress. Fundamental challenges that hinder the current quantitative understanding of biomolecular systems are their tremendous complexity and excessively large number of degrees of freedom. Most biological processes occur in water, which constitutes 65-90 percent human cell mass. An average human protein has about 5500 atoms, which, together with its surrounding water molecules, involve about 100,000 degrees of freedom. The dimensionality increases dramatically for subcellular organelles and multiprotein complexes. The real-time structure optimization, dynamic simulation, and function prediction of molecular motors and/or viruses in human cells are intractable with full-atom models at present. A crucial question is how to reduce the number of degrees of freedom, while retaining the fundamental physics in complex biological systems. This project addresses grand challenges in the structure, function, and dynamics of self-organizing biomolecular systems due to exceptionally massive data sets. These challenges are tackled through the introduction of a new mathematical models, together with advanced computational methods to deal with excessively large biomolecular data sets. This proposal offers innovative approaches to an important area in massive data analysis, dimensionality reduction, computational mathematics and mathematical modeling.
The project addresses the aforementioned challenges by a number of geometric and topological approaches. First, a multiscale framework is proposed to reduce the dimensionality and number of degrees of freedom by a macroscopic continuum description of the aquatic environment, and a microscopic discrete description of biomolecules. Additionally, adaptive coarse-grained approach based on persistently stable manifolds is introduced to further reduce the dimensionality of excessively large biomolecular systems. A total free energy functional is introduced to bring the macroscopic surface tension and microscopic potential interactions on an equal footing. The differential geometry theory of surfaces is utilized to describe the interface between macroscopic and microscopic domains. Potential driven geometric flows are constructed to minimize the total free energy functional. Furthermore, evolutionary topology and total curvature are introduced to analyze the topology-function relationship of biomolecules. Frenet frames are utilized to characterize the local geometry and associated stable manifolds in dynamical data of biomolecular systems. Machine learning algorithms are proposed to extract stable manifolds. Finally, perturbation strategy is introduced to explore the persistence of stable manifolds, which provides the assurance for the reliability of the coarse grained model. In addition to promising and extensive preliminary results illustrating the power of this approach, extensive validation and application have been proposed to ensure that the proposed methodology yields robust and powerful tools for biomolecular structure optimization, function prediction and dynamical simulation.
This project is funded by the Division of Mathematical Sciences with cofounding from the Division of Molecular and Cellular Biosciences.