Data mining (aka Knowledge Discovery in Databases, KDD) is a procedure to extract previously unknown and potentially useful information or pattern from huge data sets. KDD is usually a multiphase process involving numerous steps such as data preparation, data preprocessing, feature selection, rule induction, knowledge evaluation and deployment etc. Many novel data mining and learning algorithms have been developed, though vigorously, under rather add hoc and vague concepts. These algorithms, in most cases, are individual creations of different researchers, without much common methodological and fundamental framework. In other words, great majority of work in data mining is focused on algorithm development while neglecting the studies of fundamental theoretical issues concerning data, inter-data relationships, and quality of the implicit information hidden in the data or data redundancies. Thus, it is not easy to fully understand and evaluate how individual phase influences each other and the impact of each phase on the whole knowledge discovery process. For further development and breakthroughs in data mining and learning algorithms, a deep examination of its foundation is necessary. The central goal of the proposed research is to develop a unified rough set based data mining framework to explore various fundamental issues of data mining and learning algorithms. It aims to present the analytical capabilities of the methodology of rough sets in the context of data mining methodologies, techniques and applications. It will provide a unified framework to help better understand the whole KDD process.

Intellectual merit: Rough set theory is particularly suited to reasoning about imprecise or incomplete data and discovering relationships in the data. The simplicity and mathematical clarity of rough set theory makes it attractive for both theoreticians and application-oriented researchers. The main advantage of rough set theory is that it does not require any preliminary or additional information about the data, such as probability in statistics, basic probability assignment in Dempster-Shafer theory or the value of membership in fuzzy set theory. Rough set theory constitutes a sound basis for KDD and can be used in different phases of the KDD process. In particular, the formal techniques of rough set theory lead to many novel and promising breakthrough methods and algorithms for attribute functional, or partial functional dependencies, their discovery, analysis, and characterization, feature election, feature extraction, data reduction, decision rule generation, and pattern extraction (templates, association rules) etc., which are the fundamental issues of the KDD process. Rough set theory represents a new innovative approach and can lead to the development of new learning algorithms to create novel uses and breakthroughs of data mining techniques.

Broader impacts: The proposed collaborative project is interdisciplinary in nature. It will synthesize often-disparate work in data mining, rough set theory and high performance computing. The PIs' strong multidisciplinary research collaboration experience will lead to widespread awareness and impact of the proposed research to rough set, data mining and high performance computing community. It will design and develop a wide-range of novel data mining algorithms and methods including data reduction, rule induction and classification ensemble in one unified framework to better understand the whole KDD process. These algorithms and methods will significantly extend the application scope of data mining techniques and rough set theory and will result in the improved understanding of issues involved in designing efficient and innovative data mining and learning algorithms and methods. The proposed research will integrate tightly with teaching activities, the research results will be developed into undergraduate and graduate courses and research projects. Part of this approach includes the development of new cross-disciplinary courses that bring together computer science and mathematics for the understanding of principle and methods of theoretical foundations of data mining and rough set theory. The integration will help with training students in the issues involved in the rough set theory, design and implementation of novel data mining methods and algorithms, high performance computing. The active participation of students will allow for significant exposure to the latest research in data mining.

Project Start
Project End
Budget Start
2005-07-15
Budget End
2008-12-31
Support Year
Fiscal Year
2005
Total Cost
$102,230
Indirect Cost
Name
Drexel University
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104