Virtually every human endeavor in a broad variety of disciplines in science, engineering, and finance, is encountering the need to make new discoveries through the analysis of massive datasets. Yet, the task of creating the necessary scalable analysis tools for modern high-performance parallel hardware is a daunting task, due to the complexity of developing efficient algorithms and subsequently having to parallelize and tune these algorithms for real systems. This research project aims to address this problem by developing a new domain-specific programming model, called the Tree-based High-Order Reduce (THOR) model, that enables rapid automatic implementation of customized parallel data analysis and mining tasks with minimal coding effort.

The key enabling insight behind THOR is the generalized n-body problem (GNP) theory, a mathematical formalism that unifies the expression of seemingly disparate statistical data analysis tasks, including n-point correlation, hierarchical clustering, k-nearest neighbors classification, and kernel density estimation, among numerous others. As its name suggests, a GNP elegantly generalizes the classical n-body problem from physics to a much broader class of problems. Most importantly, the GNP form permits the development of asymptotically fast solutions, e.g., generalized versions of the fast multipole method. The THOR model enables the data analyst to specify a GNP, from which the THOR program generator can automatically produce a highly tuned parallel implementation. In short, this project aims to show how a programming model, which is bound to an appropriately high-level mathematical formalism while having the simplicity of a model like MapReduce, can lead to both scalable data analysis algorithms and their efficient implementation.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0833136
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$686,647
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332