Understanding the rules of life is the major mission of biological sciences in the 21st Century. The availability of massive biological data and the recent advances in computational algorithms have paved the way for biological sciences to transition from a qualitative, phenomenological and descriptive to a quantitative, analytical and predictive approach. However, this transition is hindered by tremendous structural complexity and excessively large datasets. For example, even all the world's computers put together do not have enough power to design drugs automatically because of the structural complexity of protein-drug interactions, excessively large datasets associated with drug configurations, and the high dimensionality of involved molecular simulation and/or machine learning. These challenges will be addressed by innovative mathematical strategies in the present project. A team from mathematics and computer science at Michigan State University will turn sophisticated mathematics into computational algorithms that create simplified representations of complex biomolecules or their interactions. As a result, deep learning and other types of machine learning can be efficiently carried out to extract the structure-function relationship from massive and diverse biomolecular datasets. This information will be extremely valuable for revealing the rules of life and for design new biomolecules, including biomedicine, which ultimately tests our understanding of the biomolecular world and brings a direct benefit to human health. Additionally, this project will support the development of undergraduate and graduate-level courses on computational biophysics and machine learning at Michigan State University. Finally, this research will facilitate the cross-disciplinary training of the next generation researchers who are experts on advanced mathematics, computer algorithms, and molecular-level biology.

The objective of the present project is to develop novel de Rham-Hodge theory-based approaches to revolutionize the current practice in biomolecular data analysis and modeling. The de Rham-Hodge theory is a hallmark of the 20th Century?s mathematics that has had a great impact in modern mathematics, quantum physics, and computer science. The investigators will introduce for the first time the de Rham-Hodge theory to reduce the structural complexity of biomolecules. Additionally, the research team will propose the persistent de Rham-Hodge theory and element-specific de Rham-Hodge theory for the first time to properly encode chemical and biological information in biomolecular data representation. These methods will be carefully integrated with advanced machine learning or deep learning algorithms to reveal biomolecular structure-function relationships. Moreover, the investigators will extensively validate the proposed methods on a variety of datasets, such as protein binding to the proteins, ligands, DNA and RNA, protein folding stability changes upon mutation, drug toxicity, solvation, solubility, and partition coefficient. Finally, user-friendly software packages and online servers will be developed using parallel and GPU architectures for researchers who are not formally trained in advanced mathematics or sophisticated machine learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Application #
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Michigan State University
East Lansing
United States
Zip Code