The main objective of this project is to improve the scalability and effectiveness of knowledge discovery and data mining systems in order to handle large, structural databases (i.e., databases composed of parts and relations among parts). First, a state-of-the-art structural discovery system called Subdue will be integrated with one or more existing non-structural discovery systems. Parallel and distributed versions of both Subdue and the integrated discovery system will then be developed. Distributing both the data and the processing across several machines will afford the most scalability for the discovery systems, and distribution is essential for handling large databases. The integrated discovery system will then be applied to several large scientific databases. The results will be evaluated by domain experts and disseminated along with source code releases of all software to the scientific community. This research addresses a critical need to improve the scalability of existing knowledge discovery and data mining methods, especially in domains with richer data representations, will provide scalable discovery systems to the scientific community, and will increase the state of knowledge in the design of parallel and distributed intelligent systems.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9615272
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1997-03-01
Budget End
2000-08-31
Support Year
Fiscal Year
1996
Total Cost
$305,389
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019