This collaborative project brings together expertise of five research teams at Brown University (IIS-1111423), University of Washington (IIS-1110370), Massachusetts Institute of Technology (IIS-1111371), Portland State University (IIS-1110917) and University of Wisconsin-Madison (IIS-1111423). Scientific data management has traditionally been performed using the file system, at best using files structured according to a low-level data format. Higher-level data management infrastructure has been task-specific and not reusable in different domains, resulting in millions of dollars of duplicated implementation effort by scientists to manage their data. The goal of this project is the development of a scientific database (SciDB), a system designed and optimized for scientific applications. The aim of SciDB is to do for science what relational databases did for the business world, namely to provide a high performance, commercial-quality and scalable data management system appropriate for many science domains.

In contrast to existing database systems, SciDB is based on a multidimensional array data model and includes multiple features specific to science and critical for science: provenance, uncertainty, versions, time travel, science-specific operations, and in situ data processing. No existing system offers all these features in a single, highly scalable engine. SciDB thus significantly advances the state-of-the-art in data management in addition to supporting domain scientists in data-driven knowledge discovery. The intellectual merit of SciDB is in exploring novel, high performance solutions to nested array storage, parallel array query optimization and execution, array language design, and time travel.

The primary broader impact of SciDB is on the community of scientists who benefit from the tool. By keeping scientists "in the loop" in the design of the system from the outset, the project delivers software that is broadly usable to the community. The proposal also funds participation in a series of workshops that seek to engage even more of the science community. SciDB is an open-source effort, with an initial prototype (www.scidb.org/) already downloaded by hundreds of users. Finally, the PIs have a strong track record of delivering robust data management software that is widely used and involving students in the process, including students from under-represented groups. Further information can be found on the project web page (http://database.cs.brown.edu/projects/scidb).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1110370
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2011-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2011
Total Cost
$370,781
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195