Many geophysical simulations and nearly all climate models archive their results in hundreds to thousands of files spanning gigabytes to terabytes of storage. Before the geophysicist can interpret these data, it must be reduced to a manageable size (e.g., by averaging it) and/or transported to a specified location (e.g., the desktop computer). Currently researchers often copy raw data over the Internet from multiple locations to commence their data analysis. This project is developing the Scientific Data Operators (SDO): efficient software that lets researchers perform typical data reduction and analysis in parallel, remotely, without wasting time and network bandwidth.

The SDO software combines distributed and shared memory programming, client-server architecture, and Open Source development techniques. As proof-of-concept, a distributed analysis of multiple NCAR CCSM IPCC climate assessment simulations (each is about a terabyte) within and across national boundaries will be performed. This project also provides support for a graduate student to carry out dissertation research in distributed climate analysis. Outside climate modeling and analysis, SDO will have three main impacts: (1) increase the value of large geophysical datasets by decreasing the time to analyze, discover, and publish new results; (2) reveal any critical bandwidth, I/O, and client/server bottlenecks in processing distributed geophysical data; and (3) improve analysis of growing bioinformatics data sets, especially gene expression data, in ways similar to the geophysics domain. SDO is free software based on the internationally successful netCDF Operator (NCO) software. The project results including this software will be accessible via the project web site.

Project Start
Project End
Budget Start
2004-09-01
Budget End
2008-08-31
Support Year
Fiscal Year
2004
Total Cost
$594,386
Indirect Cost
Name
University of California Irvine
Department
Type
DUNS #
City
Irvine
State
CA
Country
United States
Zip Code
92697