BigData: Small: DCM: Open Flow Enabled Hadoop over Local and Wide Area Clusters Robert L. Grossman University of Chicago
In the recent years, data intensive programming using Hadoop and MapReduce has become more and more important. As normally deployed, Hadoop's implementation of MapReduce in a multi-rack cluster is dependent upon the top of the rack switches and of the aggregator switches connecting multiple racks. To use Hadoop effectively at scale across many racks requires expensive network switches and routers that are complex to configure and to maintain.
Software defined networks using OpenFlow have proven in many cases to offer good performance at lower cost and to be simpler to manage. The first goal of this proposal is to contribute to the development of a new version of Hadoop called Hadoop-OFP. The basic idea of Hadoop-OFP is to integrate OpenFlow enabled switches with Hadoop: to i) improve performance; ii) lower the cost of the hardware required; and iii) simplify the management of the cluster.
Large data flows are also a critical component of data intensive computing. Unfortunately, setting up networks to manage large data flows can be challenging. A second goal of this proposal is to develop a tool that can configure OpenFlow enabled networks to handle more efficiently the large data flows that arise with data intensive computing.