Database management systems (DBMSes) are now an essential component of a vibrant information technology industry. Despite research and development efforts over several decades, DBMSes are not well understood. There is surprisingly little known about quite basic questions such as, how often does the optimizer pick the wrong plan for a query? does adding a physical operator for an algebraic operation always improve the effectiveness of query optimization, or is there a limit to the number of operators that can be practically accommodated? how do throughput and disk utilization depend on multiprogramming level? or when does thrashing occur?
This project extends an existing laboratory information management system to develop and thoroughly test predictive models of centralized DBMSes. These models concern the role of schema complexity, effective operator set, and cardinality estimation errors on the plan chosen by the optimizer, the structure of the optimizer search space, and the interaction of multiprogramming level on throughput, disk utilization, and response time in predicting thrashing. These models predict important characteristics of DBMSes that share a common architecture, quantify the relative contributions of identified causal factors, and determine fundamental limits of that architecture.
These models can be used to further improve DBMSes through engineering efforts that benefit from the fundamental understanding that this perspective can provide. Additionally, this novel research infrastructure, being made available to the community and to students via a web portal, encourages a culture of empirical generalization and the sharing of experimental results: www.cs.arizona.edu/projects/soc/sodb/