This collaborative project, developing a broad suite of data mining benchmarks, defines benchmark data sets and efficient algorithms for important data mining kernels establishing a comprehensive benchmark suite for data mining applications. Overall, applications using data mining algorithms now form a large enough percentage to warrant research into the development of a data mining benchmark that can be used to evaluate new processor architecture and serve for comparison in testing new data mining algorithms. Taking an initial, and significant step towards developing benchmarks, test suites and datasets for applications which can be used to drive the design, implementation, and growth of systems from processor to application levels, the project specifically pursues the following goals:
-Develop a benchmarking suite that will be used to understand the bottlenecks in high performance data mining and guide in the development of next-generation processors, and -Devise data mining kernels that can be efficiently executed on existing and future processors.
Benchmarks play a major role in advancing architectures, software scalability, networks, and other IT disciplines. They not only play a role in measuring the relative performance of different systems, but also aid in the research and development of architectures to applications in terms of quality, scalability, cost, execution time, and other measures. Establishing a benchmark and accompanying tools for data access and usage, performing a detailed analysis of applications in the suite, and developing a testbed to perform these analyses, the work contributes a community resource that can help in design evaluation, comparison, and improvement for processor architecture, algorithms, and scalable systems.
Broader Impact: While providing a standardize way of evaluating and comparing algorithms, applications, designs, and products, the results from this project have the potential to directly impact the advancement of various fields including data mining algorithms and applications, newer architectures, and system design for data intensive computing. The project opens the way to the development of a new industry segment addressing data intensive computing, similar to what resulted from media, networking, and signal processing applications. Moreover, the resource contributes to education by providing the community with software, tools, and data that can be used in the classroom.