Adrian Roitberg of the University of Florida and Olexandr Isayev of the University of North Carolina at Chapel Hill are supported by an award from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry. Empirical potentials---also known as force fields---play an essential role in simulating atomic-scale interactions between molecules. They are used in the computational design of materials and pharmaceuticals. However, current potentials have been designed to be fast or accurate, but rarely both. This presents a critical bottleneck for the next phase of predictive chemical computer models. In this project, Professors Roitberg and Isayev are leveraging state-of-the-art artificial intelligence (AI) to "learn" potentials from ultra-large datasets of molecular energies and chemical reactions. The project is creating a new force field, ANI, that is accurate and fast, while also applicable to a broad range of systems in chemistry. This research has the potential to benefit materials design, renewable energy research, and drug design. The project is first step toward the use of artificial intelligence techniques to create new materials and molecules beyond what the human imagination can do alone. The research team is engaged in outreach through workshops on molecular simulations, "Talk science to me" science for the general public, and the involvement of high school students from the North Carolina School of Science and Math (NCSSM) in the research.
The objective of this project is to develop a chemically-accurate, extensible, and universal neural network potential, ANI, for use in "in silico" organic chemistry experimentation. The range of possible applications for ANI is very broad, from conformational searches to chemical reactions and ligand binding. Through intelligent sampling of new regions of chemical space, the researchers are expanding use cases for ANI to include arbitrary systems containing H, C, N O, F, S, P, Cl and Br atoms. The new design strategy is based on the ANAKIN-ME method, used in implementing the earlier ANI-1 potential. To train ANI-1, a database of wB97x/6-31G* DFT energies for 22 million structural conformations from 60,000 distinct organic molecules was computed through exhaustive, stochastic sampling of conformational and chemical space. Through rigorous benchmarks for organic molecules, biomolecules, and peptides, ANI-1 predicted total and relative energies with RMS errors under 1 kcal/mol, when compared to DFT reference values. The enhancements being made to ANAKIN-ME are aimed at improving computational efficiency, expanding the range of systems that can be simulated, and achieving < 1 kcal/mol RMS error in comparison to high quality CCSD(T)/CBS quantum chemical energies. These enhancements include reducing the required dataset size for a given set of atom types, to enable inclusion of additional elements and chemistries; expanding training datasets to include data on atomic charges and forces, in addition to energies, and data for charged molecules; and implementing "query-by-committee" active learning approaches to facilitate learning of addition, substitution, and elimination chemical reactions by ANI. The ANI potential is being disseminated through user-friendly open access mechanisms. The implementation of the ANI potential takes advantage of graphics processing unit (GPU) acceleration to run on GPU-enabled workstations and parallel supercomputers. The ANI software library has a simple Python API and is being integrated with popular molecular modeling and simulation packages such as AMBER, OpenMM and Avogadro.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.