CDS&E: Fast Search of Growing High-Dimensional Big Data to Enable Accurate Semiclassical Molecular Dynamics Studies of Large Molecular Systems

Zhuang, Yu

Abstract

Quantum effects are inherent factors for material properties and chemical processes. By capturing quantum effects with good quantitative accuracy, ab initio semiclassical molecular dynamics simulation is a generally applicable investigation tool for a broad range of chemical and material science studies, including studies on pollutant effects on lung health, enzyme catalysis, ozone depletion, space craft surface coating, solar cells, and a lot more studies that promise to advance national health and pharmaceutical sciences, material design investigations for national defense, energy and environmental protection researches, etc. But the computation cost of semiclassical dynamics simulations is enormously high, making semiclassical dynamics highly challenging, and even infeasible in many cases, for large molecular systems. This project proposed methods for reducing computation cost while maintaining simulation accuracy, which will expand the reach of semiclassical dynamics study to a broader range of studies of national and scientific importance.

Ab initio semiclassical molecular dynamics simulation has enormous computation cost in calculating ab initio Hessians from quantum mechanical electronic structure theories. Hessian modeling using training data in the closest time distances from a set of saved ab initio data has been successful in reducing the cost of Hessian calculations while maintaining simulation accuracy. It was observed that opportunities exist for further reduction of computation cost by using training data in the closest spatial distances, which offers more chances for Hessian modeling to replace ab initio Hessian. Due to the frequent incoming of new ab initio data, the ab initio data set is constantly growing. To search frequently updated growing datasets, a challenge is that the algorithms not only need to achieve high search efficiency but also have to be efficient for re-organizing the dataset with frequent insertions of new data. Existing searching algorithms are good in search efficiency but not so good in data-organizing efficiency since they were designed for static or infrequently updated datasets. This project develops search algorithms that will be the first to leverage the growing process of datasets to deliver high efficiency in both searching and data organizing. Hessian modeling using training data of closest spatial distance returned by the new search algorithms has the potential for further reduction of computation cost, promising to speed up dynamics simulations and enable simulations of larger molecular systems and/or the use of higher-accuracy electronic structure theories to capture better details of the molecular systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Advanced CyberInfrastructure (ACI)
Type: Standard Grant (Standard)
Application #: 2103563
Program Officer: Tevfik Kosar

Project Start
Project End
Budget Start: 2021-06-01
Budget End: 2024-05-31
Support Year
Fiscal Year: 2021
Total Cost: $278,348
Indirect Cost

CDS&E: Fast Search of Growing High-Dimensional Big Data to Enable Accurate Semiclassical Molecular Dynamics Studies of Large Molecular Systems
Zhuang, Yu
Texas Tech University, Lubbock, TX, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments