Many high-impact data mining applications exhibit the co-existence of multiple types of heterogeneity, such as different classification tasks, different data sources, and different labeling oracles. It has broad applicability such as fraud detection, manufacturing, transportation, healthcare, etc. This project aims to answer two fundamental questions: (Q1) how to jointly model multiple types of heterogeneity? (Q2) how to theoretically characterize the model generalization performance? It is expected to advance the algorithmic and theoretical foundations of state-of-the-art data mining techniques limited to a single type of heterogeneity and the sparse literature on modeling dual heterogeneity. The resulting algorithms and theories will be assimilated into new curriculum development and multiple K-12 outreach activities. It could benefit various real applications where multiple types of heterogeneity co-exist. A close collaboration with industrial partners promises timely and measurable impacts on two application domains, including security and manufacturing.

In particular, this project strives to develop a unified, overarching data mining framework, with three complementary research thrusts. The first thrust creates a suite of effective and efficient algorithms for modeling the co-existence of multiple types of heterogeneity. The key idea is to introduce a joint regularizer on the parameter space to model the interplay among different types of heterogeneity. The second thrust theoretically characterizes the model generalization performance, especially how it is affected by (1) the co-existence of multiple types of heterogeneity and (2) the violations to the underlying assumptions behind each type of heterogeneity. The third thrust systematically evaluates the algorithms and theories from the first two thrusts on real applications. More details can be found at: http://faculty.engineering.asu.edu/jingruihe/lab-2/projects/heterogeneous-learning/.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1947203
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-08-15
Budget End
2022-01-31
Support Year
Fiscal Year
2019
Total Cost
$415,836
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820