This project develops methods to scale data mining tasks to large, inherently distributed databases. Meta-learning computes base classifiers by applying learning programs in parallel to database partitions, and then integrates them by another learning process outputting a `meta-classifier`. Meta-learning the correlations of the base classifiers boosts overall predictive accuracy. Efficiency is achieved by parallel processing and reduced serial time per learning process. Meta-learning is scalable by data reduction and parallel processing in hierarchical organizations. We explore, analyze and validate a number of approaches to meta-learning, and deliver demonstrable systems allowing other researchers to directly utilize our results. We develop field tests of the resultant technology by launching `learning agents` over the internet and combine their collective knowledge by `meta-learning agents`. Many defense and commercial applications will benefit from learning new knowledge by integrating and analyzing very large amounts of widely distributed data. One organization working collaboratively on this project is the Financial Services Technology Consortium. The FSTC has defined a realistic application amenable to meta-learning techniques and are providing real datasets and learning tasks. The project participants will demonstrate the efficacy of the proposed approach for fraud detection of electronic transactions in electronic commerce applications of the internet.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9632225
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1996-08-01
Budget End
1999-07-31
Support Year
Fiscal Year
1996
Total Cost
$153,650
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027