SGER: An Online Repository of Large Data Sets for Data Mining Research and Experimentation

Smyth, Padhraic; Kibler, Dennis; Pazzani, Michael

Abstract

Over the last two decades there has been an explosive growth in online data storage of various forms. These large datasets have motivated the rapid development of data mining methods. However, until now, there has been a lack of an online repository of large data sets for researchers to evaluate and compare their methods. In this project, an online repository of large and difficult data sets are being gathered that are representative of the diverse character of many important scientific and business domains. This repository includes high-dimensional data sets as well as data sets of different data types (time series, spatial data, transaction data, and so forth). The primary role of the repository is that of a benchmark testbed to enable researchers in data mining (including computer scientists, statisticians, engineers, and mathematicians) to scale existing and future data analysis algorithms to very large data sets. Each data set in the respository contains online documentation, metadata, and links to relevant background domain information such as prior published work. Availability of a standard set of large benchmark data sets will directly stimulate and foster systematic progress in data mining related research, similar to the affect that the UCI Machine Learning Data Repository has had on machine learning research. This repository will play a substantial role in brokering the gap between research-oriented algorithm development in the laboratory and the real-world practicalities and challenges of very large data sets. www.ics.uci.edu/~mlearn/MLRepository.html

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 9813584
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 1998-08-15
Budget End: 2001-01-31
Support Year
Fiscal Year: 1998
Total Cost: $99,737
Indirect Cost

SGER: An Online Repository of Large Data Sets for Data Mining Research and Experimentation
Smyth, Padhraic Kibler, Dennis Pazzani, Michael
University of California Irvine, Irvine, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments