This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
The scientific data management landscape is changing. Improvements in instrumentation and simulation software are giving scientists access to data at an unprecedented scale. This data is increasingly being stored in data centers running thousands of commodity servers. This new environment creates significant data management challenges. In addition to efficient query processing, the magnitude of data and queries call for new query management techniques such as runtime query control, intra-query fault tolerance, query composition support, and seamless query sharing.
This project is developing a series of techniques to support the above query management tasks. To achieve this goal, the project includes the design and implementation of a prototype massively parallel database management system that serves as the platform for the development of various query management schemes. The new algorithms are evaluated on both synthetic and real data from the scientific domain.
The expected results of this project include a variety of runtime query management algorithms including parallel query progress indicators, distributed intra-query fault-tolerance, and the ability to suspend and resume queries as needed. The expected results also include tools for searching previously executed queries, annotating them, and sharing them with others. Together, these tools hold the promise to significantly improve data analysis at massive scale, making it an interactive and collaborative process.
Through the above contributions, this project will have significant impact on the scientific community, currently limited by their ability to analyze data. The software and technical papers resulting from this project will be disseminated through the project website (http://nuage.cs.washington.edu/).