Over the last two decades there has been an explosive growth in online data storage of various forms. These large datasets have motivated the rapid development of data mining methods. However, until now, there has been a lack of an online repository of large data sets for researchers to evaluate and compare their methods. In this project, an online repository of large and difficult data sets are being gathered that are representative of the diverse character of many important scientific and business domains. This repository includes high-dimensional data sets as well as data sets of different data types (time series, spatial data, transaction data, and so forth). The primary role of the repository is that of a benchmark testbed to enable researchers in data mining (including computer scientists, statisticians, engineers, and mathematicians) to scale existing and future data analysis algorithms to very large data sets. Each data set in the respository contains online documentation, metadata, and links to relevant background domain information such as prior published work. Availability of a standard set of large benchmark data sets will directly stimulate and foster systematic progress in data mining related research, similar to the affect that the UCI Machine Learning Data Repository has had on machine learning research. This repository will play a substantial role in brokering the gap between research-oriented algorithm development in the laboratory and the real-world practicalities and challenges of very large data sets. www.ics.uci.edu/~mlearn/MLRepository.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
9813584
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1998-08-15
Budget End
2001-01-31
Support Year
Fiscal Year
1998
Total Cost
$99,737
Indirect Cost
Name
University of California Irvine
Department
Type
DUNS #
City
Irvine
State
CA
Country
United States
Zip Code
92697