A large class of distributed data-rich applications, including distributed data mining, distributed workflows, and Web 2.0 Mashups, are increasingly relying on cloud services to meet their data storage and computing demands. However, today, these applications are responsible for combining data and results from different specialized cloud services individually, which can lead to significant performance and reliability bottlenecks, due to the lack of appropriate resources connecting the applications to multiple clouds, resulting in a significant impediment to their successful deployment. This project proposes a cloud proxy network that allows optimized and reliable data-centric operations to be performed at strategic network locations. In this model, proxies may take on several data-centric roles: interacting with cloud services, routing data to each other, caching data for later use, and invoking compute-intensive data operators for intermediate processing. The proposed solution will enable an efficient coupling of cloud services to yield improved end-to-end performance and reliability for newly emerging data-intensive applications. This project will explore the potential of the proxy network architecture by evaluating its merits using a volunteer approach, focusing on four main research challenges: Proxy Performance, Proxy Reliability, Proxy Information Sparsity, and Proxy Selection.

The broader impact of this project is to amplify the effectiveness and productivity of diverse scientific, social, and engineering communities for enabling data-driven scientific inquiry in a performance-efficient and reliable manner. Proxy middleware will be released to the wider community towards this end. Educating a new generation of students in data-centric computing through major curriculum innovation is also proposed.

Project Report

This project (proxy.cs.umn.edu) explore the use of edge nodes (proxies) to boost the performance and reliability of distributed applications, e.g. mobile and multi-cloud applications. A proxy can provide increased performance for communication or computation based on its resource capacity and network position. . We designed and deployed a network dashboard tool (netstat.cs.umn.edu) to support proxy selection that can provide accurate prediction of network parameters including latency, bandwidth, jitter for both UDP and TCP connections. Below we depict the output of the dashboard illustrating the identifications of proxies that can improve communication. Such proxies can then be recruited by applications to boost performance. As a proof-of-concept, we deployed the HPC application, Montage, which builds image mosaics of the sky to show how proxies can improve performance. The top figure indicates where proxies can improve performance in fetching images from the NASA website (left figure), in moving data through the distributed workflow stages (second figure), and in delivering the output products to the end-user. We also use proxies to deliver custom output to different classes of end-users. On a mobile Android device, the proxy compressed and distilled the image (see screenshot). We also used the proxy concept to develop a wide-area distributed cloud system called Nebula. We deployed a wide-area version of MapReduce that runs soley on edge nodes in Nebula. We showed that when the underlying data is widely distributed, wide-area MapReduce can outperform standard Hadoop.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0916425
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2009-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$482,000
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455