The Division of Chemistry supports Albert DePrince of the Georgia Institute of Technology as an American Competitiveness in Chemistry Fellow. Dr. DePrince will work on developing new ab initio computational chemistry methods and software to work on newer computer architecture, taking advantage of the computing power of graphical processing units (GPU). The resulting software will be incorporated in open access software. The PI will collaborate with scientists at Oak Ridge National Laboratory to implement his work on the next-generation super computers (the Keeneland and Titan systems) that will be housed there. The ultimate goal of this research is to develop improved methods for computational chemistry. For his plan for broadening participation, Dr. DePrince will develop IntercalationDiscovery@home to allow the public to engage in computational chemistry experiments to help screen potential anti-cancer drugs. In addition, DePrince will develop computational chemistry lab experiments for use by high-school chemistry students to illustrate chemistry concepts (e.g. VSEPR theory).

Research like that of Dr. DePrince is aimed at developing new methods for the computation of chemical and physical properties of matter. Work like this enables scientists to do experiments "in silico" to help guide scientific research. The project that Dr. DePrince is working on will be disseminated through a widely-used open-access software package, as well as being incorporated into the software of next-generation supercomputers that will be housed at Oak Ridge National Laboratory - a Department of Energy laboratory. The efforts at broadening participation being pursued by Dr. DePrince are aimed at giving a broad population of students and the public exposure to an emerging area of the chemical sciences.

Project Report

The coupled-cluster through perturbative triples [CCSD(T)] method is generally considered to be the "gold standard" in quantum chemistry because, in many cases, it can describe reaction energies, intermolecular interactions, and molecular properties with predictive accuracy. However, the steep computational scaling of the method precludes its application to large molecular systems. Our goal is to develop an efficient implementation of the CCSD(T) method for use in computing environments consisting of a many-core CPU and at least one graphics processing unit (GPU). The majority of the computational cost of coupled-cluster methods takes the form of tensor contractions which can be implemented on a computer as matrix multiplications using efficient linear algebra libraries. The Cublas (GPU) implementation of matrix multiplication is quite efficient, so developing a coupled-cluster code for GPUs may seem straight forward. The most naive approach would involve copying data to the device (GPU), performing a tensor contraction, and copying the result back to the host (CPU). This strategy, however, is limited by the high cost of data transfers between the host and the device and the limited global memory available on the device. Hence, our strategy in developing a GPU-accelerated CCSD(T) algorithm is to reduce the amount of data required for a typical CCSD(T) computation. To this end, we have developed an efficient implementation of the CCSD(T) method that exploits density fitting (DF) / Cholesky decomposition (CD) and frozen natural orbital (FNO) approximations [ J. Chem. Theory Comput. 9, 293 (2013), J. Chem. Theory Comput. 9, 2687 (2013) ]. In general, we found that DF/CD errors in coupled-cluster theory are negligible, and the approximations can be coupled to FNO approximations without sacrificing the quality of the method. The algorithm we developed is competitive with commercial implementations of CCSD(T), and in some cases, the DF/CD and FNO approximations lead significant computational savings. For example, this algorithm facilitated a definitive study of the contribution of three-body dispersion effects to the lattice energy of crystalline benzene [J. Chem. Phys. 140, 121104 (2014)]. We have made these codes available to the public in the free and open-source Psi4 electronic structure package. FNO and DF/CD approximations can drastically reduce the number of coupled-cluster amplitudes and electron-repulsion integrals required for CCSD(T) computations. This reduction mitigates the cost of data transfers between the host and device in a GPU-accelerated CCSD(T) algorithm. We developed an efficient GPU-accelerated FNO-DF/CD-CCSD algorithm as a plugin to the Psi4 electronic structure package. The algorithm maximizes efficiency by simultaneously utilizing both CPU and GPU resources. The distribution of work between the CPU and GPU is straightforward; the most computationally demanding diagram (the double-particle ladder diagram) is evaluated on the GPU while all other diagrams are evaluated on the CPU. If the GPU finishes its work before the CPU, the GPU can steal tasks from the list of diagrams designated as CPU work. We have generalized our algorithm to utilize multiple GPUs. We assessed the performance of this algorithm for systems with up to 822 basis functions (a uracil dimer, described by the aug-cc-pVTZ basis set, while using a conservative 10-5 FNO threshold). When using either one NVIDIA Kepler K20c GPU or two NVIDIA Fermi C2070 GPUs, we observe a 2.5x acceleration over our optimized CPU implementation using 6 cores of an Intel Core i7-3930k CPU [Mol. Phys. 112, 844 (2014)]. This implementation of GPU-accelerated FNO-DF/CD-CCSD(T) is available for download for free from github (https://github.com/edeprince3/gpu_dfcc).

Agency
National Science Foundation (NSF)
Institute
Division of Chemistry (CHE)
Type
Standard Grant (Standard)
Application #
1137288
Program Officer
Katharine Covert
Project Start
Project End
Budget Start
2011-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2011
Total Cost
$200,000
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332