Many high-impact data mining applications exhibit the co-existence of multiple types of heterogeneity, such as different classification tasks, different data sources, and different labeling oracles. It has broad applicability such as fraud detection, manufacturing, transportation, healthcare, etc. This project aims to answer two fundamental questions: (Q1) how to jointly model multiple types of heterogeneity? (Q2) how to theoretically characterize the model generalization performance? It is expected to advance the algorithmic and theoretical foundations of state-of-the-art data mining techniques limited to a single type of heterogeneity and the sparse literature on modeling dual heterogeneity. The resulting algorithms and theories will be assimilated into new curriculum development and multiple K-12 outreach activities. It could benefit various real applications where multiple types of heterogeneity co-exist. A close collaboration with industrial partners promises timely and measurable impacts on two application domains, including security and manufacturing.
In particular, this project strives to develop a unified, overarching data mining framework, with three complementary research thrusts. The first thrust creates a suite of effective and efficient algorithms for modeling the co-existence of multiple types of heterogeneity. The key idea is to introduce a joint regularizer on the parameter space to model the interplay among different types of heterogeneity. The second thrust theoretically characterizes the model generalization performance, especially how it is affected by (1) the co-existence of multiple types of heterogeneity and (2) the violations to the underlying assumptions behind each type of heterogeneity. The third thrust systematically evaluates the algorithms and theories from the first two thrusts on real applications. More details can be found at: http://faculty.engineering.asu.edu/jingruihe/lab-2/projects/heterogeneous-learning/.