Modern multicore architectures, that provide high raw gigaflops and teraflops, have deep memory hierarchies and low overhead threading capabilities. Lack of support for directly exploiting these capabilities leads to severe under-utilization especially for data intensive applications. This project expects to develop methods that efficiently use the available computational power to provide cost improvement for large scale data processing systems.

This project will develop a highly efficient computation framework called GLADE that will support a large class of data intensive applications, and will be based on a novel computational model called generalized linear aggregates. The commutative and associative properties of Generalized Linear Aggregates facilitate highly efficient parallel and distributed computation as well as exploitation of deep memory hierarchies, especially when multiple queries are simultaneously executed as is typical in many data-processing tasks. The resulting one to two orders of magnitude improvement in computational efficiency can be expected to yield corresponding reduction in cost and energy requirements of data processing tasks which in turn will make it feasible to analyze much larger data sets than currently possible.

The proposed work will make the synergistic combination of high performance computing and large scale data analysis widely available to researchers, and other interested groups in government, industry, and education. The enabling of a large number of data intensive application using inexpensive computers that cost in low tens of thousands of dollars will broaden the use of data analysis, exploration and mining for a wide variety of existing and emerging applications. Examples of such applications include network intrusion detection, social network analysis, climate data, ecosystem analysis, and customer relationship management. Additional information about the project can be found at: http://sites.google.com/site/sanjayranka/glade.

Project Report

The main goal of this project was to add advanced data processing and mining capabilities to DataPath in the form of an add-on set of libraries called GLADE. Secondary goals included the use of GLADE for general database research to advance the state of the art of exact and approximate query processing. Intelectual Merit GLADE, significantly enhanced the capabilities of DataPath in terms of data processing. It is now possible to combine database processing, linear algebra, data mining using a sophnisticated set of abstractions such as Generalized Linear Aggregates, Generalized Transformers, Generalized Iterative State Transformation. In terms of impact on database research, GLADE allowed us to pursue significant work on large Marcov Chain Monte Carlo (MCMC) sytems and sampling based approximate query processing. Broader Impact GLADE together with DataPath, the framework in which GLADE is implemented, for the basis of GrokIt, a data processing framework developed by Tera Insights, LLC, a company founded by the PI(Dobra). GrokIt is already being used at University of Florida to allow students to process large amounts of stock market data (detalied history of the last 10 years of stock transactions containing 56.8 billion tuples) and at Infinite Enery for energy usage prediction. GLADE, through GrokIt commercial incarnation, already has a significant impact on educatin and is used in several classes at University of Florida (Database System Implementation, Advanced Data Science, Independent Study).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1144985
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$100,000
Indirect Cost
Name
University of Florida
Department
Type
DUNS #
City
Gainesville
State
FL
Country
United States
Zip Code
32611