The increasing popularity of cloud storage and cloud computing is leading organizations to consider moving data and computation out of their own data centers and into the cloud. However, success for cloud providers can present a significant risk to customers; namely, it becomes very difficult and expensive to switch providers. This research agenda will explore methods that allow cloud customers to diversify the set of cloud storage and cloud computing providers they use. Further, it will model, measure, and optimize the resulting more diverse systems.

To diversify cloud storage, the research agenda includes investigating how to apply RAID-like techniques used by disks and file systems, but at the cloud storage level. By striping user data across multiple providers, customers can avoid vendor lock-in, reduce the cost of switching providers, and better tolerate provider outages or failures. A redundant array of cloud storage providers (RACS) acts as a proxy that transparently spreads the storage load over many providers. To diversity cloud computations, the research agenda investigates a data model and structure that allows computation to be sent to where the data is stored and performed directly on local data. This new diversified storage cloud and diversified compute cloud has the potential to return control back to the user for assurance on the integrity of data and computation, while still benefiting from the whole cloud paradigm.

Project Report

Cloud computing is often compared to utility models such as electricity. Similar to power suppliers, cloud providers offer massive amounts of computing resources, and you pay for what you use. Unlike the power grid, cloud resources are tightly coupled to a provider’s infrastructure. For example, the infrastructure and interfaces that the "Amazon cloud" provides are different from the "Microsoft cloud", and different still from the "Google cloud". In particular, a cloud user develops an application for a particular provider. This tight coupling entails significant risk to the cloud user. Cloud users do not have physical control over the security and integrity of their computation or data. That is, users must trust cloud providers. Furthermore, once a cloud provider is selected, the user cannot easily switch to a different cloud provider, something called vendor lock-in. If a cloud provider were to go out of business, cloud users may lose access to critical data (See the recent closing of Nirvanix). A key goal and outcome of the research associated with this award was to "unshackle the cloud" and make cloud computing a commodity instead of locking users into a single provider. A key research question needed to be answered: How does one build and secure a cloud without owning the underlying infrastructure? Progress in this research goal would be significant. It would allow a cloud user to control the location and migration of their computation, networking, and storage without owning the underlying infrastructure. Research results from Prof. Hakim Weatherspoon’s group supported by this grant makes significant progress towards this goal: Total cloud provider independence and resilience. In particular, Weatherspoon’s group created clouds that are not bound to any provider or physical resources, called Superclouds. Similar to the cloud today, Supercloud users see a collection of computing resources. Unlike clouds today, these resources can exist anywhere, can be cloned, mirrored, combined, and constantly moved, without either the user or provider aware of each other’s identity. A Supercloud can be considered as a complete infrastructure abstraction layer that sits between the cloud provider and user. A Supercloud gives its users the illusion of their own homogenized private cloud. Under the hood, the Supercloud includes different hypervisors, hardware architectures, storage subsystems, network fabrics, etc. Similar to the power grid, they can be separately managed to provide resiliency under varying degrees of trust and resource availability models. What is particularly important about a Supercloud is that it does not require modification or cooperation from the underlying providers. Indeed, Weatherspoon’s research results include migrating live computational instances between Cornell, IBM, and Amazons Elastic Compute Cloud (EC2), Google’s Compute Engine, and HP’s cloud. This work appeared in the ACM European Conference on Computer Systems (Eurosys), USENIX Hot Topics in Cloud Computing (HotCloud), and IEEE Internet Computing. Even if a Supercloud is decoupled from the virtualization infrastructure as described above, it remains tightly coupled to the network and storage infrastructures of the underlying cloud provider datacenters. Weatherspoon’s research results demonstrated further severing of ties between users and providers making cloud storage and networking independent of the underlying cloud infrastructure provider. The storage approach used a redundant array of cloud storage providers (RACS). This work appeared in the ACM Symposium on Cloud Computing (SoCC). The networking approach created an overlay that was controlled by the user where the entire network could be migrated, split, or merged between the underlying cloud infrastructure providers while maintaining the same network topology. This work appeared in an IBM technical journal.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1151268
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$200,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850