The increasing complexity, scale, and dynamics of networked computing systems make it hard for users and system administrators to understand and control these systems. A significant fraction of time and money is spent tackling unexpected system performance problems or tuning large systems with many components, the performance of which depends on thousands of dependencies and parameters. This problem is tackled by the Ques project using innovative data management techniques. Ques treats a computing system as a rich source of data about system configuration and activity, available typically as continuous, rapid, and time-varying data streams. System administrators are given the ability to pose a broad range of system management queries over this data. Ques addresses challenges in developing simple and intuitive ways to express these queries, processing the queries automatically and efficiently using query execution plans, and controlling systems based on statistical and performance models learned from system data. A fully functional prototype of Ques is developed and deployed in a real world setting. The ideas from Ques are incorporated into two new courses for graduate and undergraduate students at Duke. Automated plan generation algorithms for complex system management queries will have a major impact towards making systems more manageable by human administrators. The source code of Ques will be released publicly and the technology will be migrated potentially to industrial strength system management products. Results from Ques will be disseminated via the project Web site (www.cs.duke.edu/~shivnath/ques.html).

Project Report

Business-critical systems often have hundreds of components---e.g., applications, databases, storage area networks---whose performance depend on thousands of intricate and time-varying dependencies and parameters. The Ques project used innovative data management techniques to address the dangerous spiral towards unwieldy systems, high system administration costs, and frustrated users. Ques treats a computing system as a rich source of data about system configuration and activity, available typically as continuous, rapid, and time-varying data streams. Ques has enabled users and system administrators to pose a broad range of system-management queries over the data. Important query types in Ques include forecasting (e.g., how long will this application take to complete?), diagnosis (e.g., why is my application two times slower today compared to last week?), and recommendation (e.g., what memory configuration settings should I use for my database server in order to get the lowest response time for requests?) Analytical as well as empirical studies were done to identify the challenges in answering such system-management queries automatically and efficiently. Various types of systems were considered in these studies. Examples include relational database servers like MySQL and PostgreSQL, Web servers such as the Apache Web server, application servers such as Apache Tomcat, storage area networks such as IBM DS6000, and systems for distributed data processing such as the Hadoop MapReduce system. A number of algorithms were developed in Ques in order to answer system-management queries automatically. Two important types of innovations are present in these algorithms. First, these algorithms use a hybrid combination of techniques from analytical system modeling and machine-learning analysis, often in conjunction with domain knowledge that system administrators possess. Second, many of these algorithms have the ability to identify data that is currently missing. Collecting such missing data will enable the algorithms to provide more accurate query results. Empirical studies were done to show the benefits of these algorithms by implementing them in a prototype of Ques. Features of the Ques prototype have been demonstrated at a number of venues. The demonstrated features include: (a) query specification interface and automated processing algorithms for diagnosis and anomaly-detection queries in the context of database systems and storage area networks, (b) recommendation queries arising from resource provisioning for virtual machines running data-intensive applications on clusters, and (c) recommendation queries arising from query execution plan and configuration parameter tuning for relational database servers, the Hadoop MapReduce system, and continuous query processing systems. Demonstration of the Ques prototype won the best system demonstration award at the ACM SIGMOD Conference in 2010. An educational version of this software was also developed for use in classes that teach the design and implementation of data-intensive computing systems. This software makes it easy for students to deploy and run distributed systems such as Hadoop on cloud platforms such as Amazon Web Services. In addition, the software has deep introspection and visualization capabilities that help students better understand distributed query processing techniques. Many research publications and Ph.D. dissertations, M.S. projects, undergraduate research projects, as well as high-school senior projects have emerged from the Ques project. Students working on the Ques project have won awards such as the Doctoral Dissertation Award and Doctoral Candidacy Award from the Department of Computer Science at Duke University. The project has also benefited from research contributions by students from underrepresented minorities through the Mellon Mays Undergraduate Fellowship.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0644106
Program Officer
Frank Olken
Project Start
Project End
Budget Start
2007-02-01
Budget End
2013-01-31
Support Year
Fiscal Year
2006
Total Cost
$535,995
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705