This project is to study knowledge transfer oriented data mining (or KTDM). Given two data sets, the idea of KTDM is to discover models that are common to both data sets, as well as models that are unique in one data set. These common and unique models with respect to the two data sets will provide a tool to leverage the already-understood properties of one data set for the purpose of understanding the other, probably less understood, data set. This EAGER project is to concentrate on models in the form of a diversified set of classification trees. The KTDM approach is useful for real-world applications in part due to its ability to allow users to narrow down to particular models, guided by known knowledge from another data set. It will help towards realizing transfer of knowledge and learning in various domains. The project will support a graduate student and will seek collaboration with experts in the medical domain. These will increase the impact of the project. For more information, please see www.cs.wright.edu/~gdong/projects.html.

Project Report

This project studied knowledge transfer oriented data mining (KTDM); KTDM refers to data mining that is aimed at helping scientific and medical researchers to gain better understanding of challenge domains by utilizing mined knowledge concerning similarities between the challenging domains and well understood domains. KTDM can help researchers to effectively utilize similarities shared by multiple domains to transfer their understanding of known domains to challenging domains, to effectively take the research by analogy approach using those similarities in general, and to form high potential research hypothesis on the challenging domain in particular. This project focused on mining decision trees shared by two domains. The project produced algorithms for mining single shared decision trees and for mining small diversified set of shared decision trees (since a single shared knowledge structure may only present a limited view of similar behaviors across multiple domains) for two given domains, and produced methods for evaluating the quality of shared decision trees for the application of supporting research by analogy. The project conducted experiments on a set of six microarray gene expression datasets for cancers and cancer treatment outcome to evaluate the developed algorithms and shared decision tree quality measures. Some of the mined shared decision trees between cancers are available at the project website. Two scientific articles have been published: One was published at the ACM SIGKDD Exploration concerning the importance and various research issues for cross domain similarity mining with applications to support research by analogy, and the other was published at the Journal of Bioinformatics and Computational Biology concerning building accurate decision tree committees for microarray data by using attribute behavior diversity. The second paper confirmed that the approach to use attribute usage diversity can lead to highly diversified set of (shared) decision trees from a classifier-based perspective. The researchers of this project will continue to finish and publish several papers on shared decision tree mining that have been in progress. The experience gained from this project also increased the confidence of the participants of the project on the usefulness of KTDM.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1044634
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2010-08-15
Budget End
2012-07-31
Support Year
Fiscal Year
2010
Total Cost
$100,000
Indirect Cost
Name
Wright State University
Department
Type
DUNS #
City
Dayton
State
OH
Country
United States
Zip Code
45435