This proposal supports novel computer science research to provide foundations for new technologies. The research objective of this proposal is to design new robust data mining and machine learning algorithms for solving the computational challenges in complex materials genome data mining. The Materials Genome Initiative research has been launched by U.S. government to discover, manufacture, and deploy advanced materials fast and low-cost, which holds great opportunities to address the challenges in clean energy, national security, and human welfare. However, the major computational challenges are the bottlenecks for comprehensive materials genome data analysis due to unprecedented scale and complexity. There is a critical need for new data mining and machine learning strategies to bridge the gap and facilitate the new materials discovery. To solve the key and challenging problems in mining such comprehensive heterogeneous materials genome data, the PIs propose to develop a novel robust data mining and explore ways to integrate features from multiple data sources. The PIs will make the developed computational methods and tools online, available to the public. These methods and tools are expected to impact other material genome and biochemistry research and enable investigators working on new material design to effectively test performance prediction hypothesis. The proposed algorithms and tools are expected to help knowledge extraction for applications in broader scientific domains with massive high-dimensional and heterogonous data sets. This project will facilitate the development of novel educational tools to enhance several current courses.

The PIs propose to develop a novel robust data mining framework targeting to explore the following research tasks. First, the PIs will develop new computational tools to automate the material genome data processing, including missing values imputation by a new robust rank-k matrix completion method, robust tensor factorization based feature extraction approach, and informative nanoparticles selection using robust active learning model. Second, the PIs will investigate the new sparse multi-task multi-view learning model to integrate heterogeneous material characterizations for predicting the catalytic capabilities and associations to theoretical modeling measurements. Third, to predict the catalytic capabilities of the new synthesized nanoparticles, the PIs will design novel robust semi-supervised learning models by investigating elastic embedding, adaptive loss, L1-norm graph, and directed graph models. The proposed sparse multi-view feature learning and robust semi-supervised learning models meet the critical needs of large-scale data analysis and integration. Such unique capabilities will enable new computational applications in a large number of research areas. It advances and thus extends the relationship between engineering innovation and computational analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1423056
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2014-08-15
Budget End
2018-07-31
Support Year
Fiscal Year
2014
Total Cost
$250,000
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019