From key-value stores to distributed file systems to distributed databases, networked storage underpins modern Internet services. Networked storage allows programmers to separate logic and data, enables high throughput scale out, and takes advantage of increasingly fast datacenter networks. However, in the big data era, networked storage faces a new challenge: The amount of data accessed per user request is growing rapidly; outpacing processor speeds and DRAM capacity. Increasingly, user-perceived response times are dominated by the slowest storage accesses, i.e., the 99th percentile tails. Networked storage is notorious for fat tailed response times.
We are developing networked storage systems that are 1) always fast and 2) cost efficient. A key approach is to understand and selectively use replication for predictability (or cloning). In this approach, clients issue redundant storage accesses against independent hardware resources. The first to respond provides the result. Replication for predictability reduces client-perceived variability, leading to always fast response times. Our implementations are especially cost effective at scale. To lower costs, we study the root causes of slow response times, saving resources by focusing on common causes. We also trade quality---e.g., slightly degraded search engine results--- for lower hardware costs when appropriate. For broader impact, the PI will work to transfer the technology to national and local companies.