THOR: A New Programming Model for Data Analysis and Mining

Vuduc, Richard; Gray, Alexander

Abstract

Virtually every human endeavor in a broad variety of disciplines in science, engineering, and finance, is encountering the need to make new discoveries through the analysis of massive datasets. Yet, the task of creating the necessary scalable analysis tools for modern high-performance parallel hardware is a daunting task, due to the complexity of developing efficient algorithms and subsequently having to parallelize and tune these algorithms for real systems. This research project aims to address this problem by developing a new domain-specific programming model, called the Tree-based High-Order Reduce (THOR) model, that enables rapid automatic implementation of customized parallel data analysis and mining tasks with minimal coding effort.

The key enabling insight behind THOR is the generalized n-body problem (GNP) theory, a mathematical formalism that unifies the expression of seemingly disparate statistical data analysis tasks, including n-point correlation, hierarchical clustering, k-nearest neighbors classification, and kernel density estimation, among numerous others. As its name suggests, a GNP elegantly generalizes the classical n-body problem from physics to a much broader class of problems. Most importantly, the GNP form permits the development of asymptotically fast solutions, e.g., generalized versions of the fast multipole method. The THOR model enables the data analyst to specify a GNP, from which the THOR program generator can automatically produce a highly tuned parallel implementation. In short, this project aims to show how a programming model, which is bound to an appropriately high-level mathematical formalism while having the simplicity of a model like MapReduce, can lead to both scalable data analysis algorithms and their efficient implementation.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Type: Standard Grant (Standard)
Application #: 0833136
Program Officer: Almadena Y. Chtchelkanova

Project Start
Project End
Budget Start: 2008-09-01
Budget End: 2012-08-31
Support Year
Fiscal Year: 2008
Total Cost: $686,647
Indirect Cost

THOR: A New Programming Model for Data Analysis and Mining
Vuduc, Richard Gray, Alexander
Georgia Tech Research Corporation, Atlanta, GA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments