This EAGER award supports research and education involving a new collaboration kindled at the MATDAT18 Datathon event focused on developing artificial intelligence methods to discover new materials or identify specific materials with desired properties for an application. Methods involving computation, materials data, and the tools of data science offer the potential to find or design a material with desired properties much faster and at lower cost than traditional methods used by materials scientists and engineers.

In this project, the research team will develop novel machine learning techniques to mine knowledge from various publicly available databases on materials and their properties. The knowledge thus gained can be utilized for material selection and design. The team will focus first on using the methods of data science and materials data from a large community repository obtained from using computers and theory to calculate the energy needed to form a material from its constitutive elements.

All the techniques developed in this project will be coded as software for different computers. Software will be released as open source codes to the materials science and data science communities via a number of mechanisms including the GitHub. This project will also provide educational opportunities to graduate and undergraduate students and a first-hand research experience in data analysis for materials science. Results of this project will be incorporated in appropriate undergraduate and graduate courses. Strong efforts will be made to include minorities and women. Results of this project will be disseminated widely via publications in journals and international conferences.

Technical Abstract

This EAGER award supports research and education involving a new collaboration kindled at the MATDAT18 Datathon event focused on developing deep learning predictors for formation energies and other materials properties. Large databases of computed material properties, such as The Materials Project and AFLOWLIB developed under Materials Genome Initiative, host properties of tens of thousands of materials. They are primarily employed to screen materials for various target applications such as photocatalysis and battery materials. Such databases can also be utilized to develop deep learning-based predictors of materials properties. These predictions are expected to be more accurate than predictions made using traditional machine learning (ML) techniques.

Even cutting edge conventional ML methods such as Gradient Boosting or Random Forest of Trees have limited capacity, or the ability to learn, when compared to multi-layer deep artificial neural networks employed in deep learning to mine vast data. In this project the research team aims to develop a deep learning predictor for formation energy of crystals. The investigators also propose to develop other relevant combinatorial algorithms for solving this problem. Formation energy, which is the energy difference between the crystal and the constituent elements in their atomic form, is one of the most reliable properties available from these databases. The focus of this project is on fast and highly accurate prediction of formation energies and stability of materials by utilizing the superior capacity of deep learning systems and other algorithms to learn from big data.

The project will deliver a publicly accessible cyber infrastructure implementing a deep learning system capable of predicting formation energies for inorganic materials with an accuracy that is vastly superior to that of the predictors built with traditional ML models, and new forms of chemical representations of materials that can be reused to predict other properties of materials. One of the challenges in the employment of deep learning techniques is in the large training times taken by these algorithms. The research team plans to address this challenge with a variety of algorithmic innovations including novel parallel training algorithms. The investigators plan to employ a number of parallel architectures including CPU clusters and GPUs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Materials Research (DMR)
Standard Grant (Standard)
Application #
Program Officer
Daryl W. Hess
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Connecticut
United States
Zip Code