Current mining technology typically applies to centrally stored data (i.e., in one single repository, with central administration, etc.). However, real-life datasets are often decentralized (i.e., consisting of several tables, perhaps obtained via normalization or partitioning / allocation, stored in several repositories). The goal of this research project is to develop mining techniques for decentralized data. The key idea is that in contrast to traditional techniques (where the data is joined first to form a single table), the decentralized approach concurrently generates partial results on the separate tables, and thereafter, the foreign key relationships are utilized to merge these results. A similar approach is examined for classification in decentralized datasets, and the techniques chosen are those most amenable to decentralization, or as indicated by the applications. Efficiency analyses is used to assess the techniques, and empirically validated on available synthetic and real datasets. The effect of different database design choices on the decentralized mining algorithms is also considered. Systematic techniques are developed to use, together with the catalog statistics, the details of database design (e.g., normalization, and partitioning/allocation information) to optimize for efficient execution. Furthermore, the techniques can be applied to mining distributed relational metadata for public information repositories; e.g., it can help simple web searching as well as allow programming advanced applications, such as information mining, for the datasets referenced - initially for those by students, and thereafter, users of any Web-based datasets. This research also benefits educational activities and provides research experience for graduate students involved in project.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9978510
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1999-10-01
Budget End
2002-09-30
Support Year
Fiscal Year
1999
Total Cost
$211,000
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109