Transformative research is conducted via computational analyses of large data sets in the terabyte and petabyte range. These analyses are often enabled by scientific workflows, which provide automation and efficient and reliable execution on campus and national cyberinfrastructure resources. Workflows face many issues related to data management such as locating input data, finding necessary storage co-located with computing capabilities, and efficiently staging data so that the computation progresses but storage resources do not fill up. Such data placement decisions need to be made within the context of individual workflows and across multiple concurrent workflows. Scientific collaborations also need to perform data placement operations to disseminate and replicate key data sets. Additional challenges arise when multiple scientific collaborations share cyberinfrastructure and compete for limited storage and compute resources. This project will explore the interplay between data management and computation management for these scenarios. The project will include the design of algorithms and methodologies that support large-scale data management for efficient workflow-based computations composed of individual analyses and workflow ensembles while preserving policies governing data storage and access. The algorithms will be evaluated regarding their impact on performance of synthetic and real-world workflows running in simulated and physical cyberinfrastructures. New approaches to data and computation management can potentially transform how scientific analyses are conducted at the petascale. Besides advancing computer science, this work will have direct impact on data and computation management for a range of scientific disciplines that manage large data sets and use them in complex analyses running on cyberinfrastructure.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0905032
Program Officer
Mohamed G. Gouda
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$810,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089