Businesses are moving data and applications into the cloud, meaning that many applications and data are consolidated efficiently in one place on fewer servers. Cloud storage services must keep the data of thousands of customers separated while also allowing customers to operate on it efficiently. Safely intermixing customer-provided operations over data is problematic. Historically, processor hardware isolates programs, but increasing data access rates make that costly. This project develops a new approach to storage that allows safe operation on data without hardware protection using recent advances in programming languages.
The approach combats data movement between disaggregated storage and compute nodes by having untrusted tenant extensions pushed to Sandstorm, a new cloud storage system. Sandstorm's insight is that storage extensions can use language-level isolation to eliminate hardware isolation overheads that cannot be avoided today: not with virtual machines, containers, nor serverless Lambdas. Sandstorm also eliminates copying data for safety, so extensions benefit from low-level hardware functionality like zero-copy network transmission. The project will develop multitenant benchmarks, low-cost performance-isolated concurrency mechanisms for multicores, techniques to minimize data movement within servers, storage extensions that demonstrate the benefits, and distributed extensions over clusters.
As power limits data center scale, minimizing data movement out of storage becomes crucial. Sandstorm enables any cloud developer to accelerate data-intensive applications like real-time social network and natural graph analysis and fine-grained coordination of hundreds of thousands of autonomous vehicles. All artifacts will be developed openly under a permissive MIT license for academic and industrial use. The project includes development of a new education platform for teaching students about distributed systems and cloud computing at the graduate, undergraduate, and high school levels with a set of serverless computing labs targeted toward University of Utah students and summer camp attendees.
All data, code, experiments, and benchmarks will be open and made publicly available through http://github.com/utah-scs/ and at http://utah.systems/ and retained for a minimum of three years beyond the project award period. All data, code, benchmarks, and experiments associated with all published results will also be hosted at http://dataverse.harvard.edu/dataverse/utah-scs as part of the Harvard Dataverse for long-term retention.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.