With tremendous amounts of data existing in scientific applications, database management becomes a critical issue, but database technology is not keeping pace. This problem is especially acute in the long tail of science: the large number of relatively small labs and individual researchers who collectively produce the majority of scientific results. These researchers lack the IT staff and specialized skills to deploy technology at scale, but have begun to routinely access hundreds of files and potentially terabytes of data to answer a scientific question. This project develops the architecture for a database-as-a-service platform for science. It explores techniques to automate the remaining barriers to use: ingesting data from native sources and automatically bootstrapping an initial set of queries and visualizations, in part by aggressively mining a shared corpus of data, queries, and user activity. It investigates methods to extract global knowledge and patterns while offering scientists access control over their data, and some formal privacy guarantees. The Intellectual Merit of this proposal consists of automating non-trivial cognitive tasks associated with data work: information extraction from unstructured data sources, data cleaning, logical schema design, privacy control, visualization, and application-building. As Broader Impacts, the project helps scientists reduce the proportion of time spent "handling data" rather than "doing science." All software resulting from this project are open source, and all findings are disseminated broadly through publications and workshops. Sustainable support for science users of the software is coordinated through the University of Washington eScience Institute. The research is incorporated in both undergraduate and graduate computer science courses, and the software is also incorporated into domain science courses as well. The project's outreach activities include advising students through special programs geared toward under-represented groups such as the CRA-W DREU. More information about this project is found at http://escience.washington.edu/dbaas.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1064606
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-08-01
Budget End
2016-07-31
Support Year
Fiscal Year
2010
Total Cost
$231,977
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109