The goal of this project is to develop a large-scale parallel data mining system (PDMS), which can manipulate very large scientific databases. The research pursues an application-oriented approach with input from three scientific domains: bioinformatics (protein structure prediction), astronomy (rare object identification) and materials informatics ("virtual" material design). There are two conflicting objectives that must be satisfied: genericity and specificity. The PDMS toolkit must be generic in that it can support a range of common data mining tasks such as associations, sequences, classification and clustering, yet to be usable it must support specificity or domain-specific customization. The PDMS system is based on a novel three-tiered architecture consisting of a front-end interface and query tool, a middle layer of common high-level mining algorithms, and a back-end system consisting of a core set of data mining "primitive operations", tightly integrated with a database system, and delivering peak parallel or distributed performance. The application-oriented approach produces excellent opportunities to advance inter-disciplinary educational efforts, and encourages the cross-fertilization of ideas and algorithms across these areas. New courses will be offered on the design of large scale data mining systems as well as applications of data mining in scientific domains. The results of this project will aid research in developing more generic data mining tools that are able to leverage high performance parallel and distributed techniques in all the phases of the knowledge discovery process, and in developing customized tools for important scientific applications like bioinformatics, astronomy and materials science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0092978
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2001-09-15
Budget End
2008-08-31
Support Year
Fiscal Year
2000
Total Cost
$300,000
Indirect Cost
Name
Rensselaer Polytechnic Institute
Department
Type
DUNS #
City
Troy
State
NY
Country
United States
Zip Code
12180