Successes in accelerated materials design, made possible in part through the Materials Genome Initiative, have shifted the bottleneck in materials development towards the synthesis of novel compounds. Existing databases do not contain information about the synthesis recipes necessary to make compounds that are found to have promising properties, designed through computational methods. As a result, much of the momentum and efficiency gained in the design process becomes gated by trial-and-error synthesis techniques. This delay in going from promising materials concept to validation, optimization, and scale-up is a significant burden to the commercialization of novel materials. This Designing Materials to Revolutionize and Engineer our Future (DMREF) research will build predictive tools for synthesis so that the development time for chemical compounds with interesting properties can be synthesized in a matter of days, rather than months or years. The research activities include automatically extracting information from the published literature and patents on how solid inorganic materials have been made in the past by using natural language processing techniques. After this text extraction the project will generate a "cookbook" of materials synthesis recipes. This cookbook can be mined through machine learning approaches for suggestions on how to make new materials by looking for patterns and similarities among previously made materials. The project outcome will be a data set of materials synthesis methods, to be made available to the community. Another key project outcome is to use machine learning to predict novel or optimized recipes for materials. These predictions will be accompanied by experimental confirmation for a class of materials used in catalysis called zeolites. The major objective of the outreach component of this research is to enable the use of the database by non-experts. This will be accomplished through both online tutorials and in person workshops. The online tutorials will teach the basic knowledge required to utilize the online tools and functionalities while the workshops will be addressed to students and researchers who want to make use of the database itself.

The approach to automatic extraction of information in the literature will be semi-supervised from a machine learning perspective. Unsupervised methods, including word embeddings that capture the context of words within scientific corpus, will be used. Then downstream supervised methods will be used to classify words by their type and their relationship to other words. This forms the basis of the recipe database. The extracted information will then be mined using machine learning tools from the materials informatics community. Because the recipe classification (described subsequently) leverages expertise from the NLP perspective and the target material classification leverages expertise from the materials perspective, there is significant leverage to be had from this interdisciplinary approach, a partnership not previously pursued to further materials design. This approach builds on established synthesis knowledge, and combines it with modern data extraction, materials informatics, text mining and machine learning techniques, and high-throughput ab-initio thermochemical data availability. The integration of these different fields will provide a direct route towards more rational design of synthesis methods and thereby significantly accelerate the deployment and testing of new materials concepts.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Materials Research (DMR)
Type
Standard Grant (Standard)
Application #
1922372
Program Officer
Peter Anderson
Project Start
Project End
Budget Start
2019-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2019
Total Cost
$560,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710