Many scientific domains have entered a data-driven era, in which scientific discovery depends heavily on effective and efficient analysis of large-scale data generated by wet-bench experiments or computer simulations. Current database management systems (DBMSs), while being very popular in the business world, fall short in high-throughput data processing required by scientific applications. The goal of this project is to design and implement a novel data management software architecture that enables high-throughput data management services for general scientific communities. The project achieves this goal via (1) a novel one-scan-fits-all data processing framework based on repetitive scans of large data sources; (2) a query engine that leverages the massive computing power of modern Graphics Processing Units (GPU) hardware; and (3) design and implementation of algorithms for popular analytics in three scientific domains on top of the query engine to demonstrate the effectiveness and efficiency of the proposed architecture. The project also aims at building a software prototype and evaluating this prototype with real-world scientific datasets and query workloads.

The project is expected to provide a highly efficient solution to satisfy the data management needs of a wide range of scientific fields. To deliver comparable performance, the proposed architecture requires only a fraction of the hardware and energy costs needed by existing systems. As a result, it has the potential to make scientific studies that are regarded as difficult or infeasible a reality. Integration of proposed research into educational endeavors that contribute to broadening the influence of computer science, nurturing the next generation of multidisciplinary scientists, and boosting the success of minority and women students in the computer science and engineering field are other broader impact activities planned.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1253980
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-06-01
Budget End
2019-05-31
Support Year
Fiscal Year
2012
Total Cost
$499,882
Indirect Cost
Name
University of South Florida
Department
Type
DUNS #
City
Tampa
State
FL
Country
United States
Zip Code
33617