MapReduce has been widely leveraged as a new programming model to tap the power of parallel data processing for Big Data. More and more systems are being deployed to serve data-intensive analytics applications written in MapReduce. Many jobs can show up at the same time on a system with conflicting resource requirements. This project investigates cross-layer cooperation techniques to achieve system efficiency, cross-phase techniques to enhance job fairness and system throughput, and cross-job task co-scheduling techniques to exploit the temporal relationship among jobs for better throughput and services to analytic queries composed of multiple MapReduce jobs.
This project has profound impacts in several aspects. These include (1) strengthening computer science courses at Auburn University, and enhancing instruction effectiveness with student research projects on MapReduce; (2) recruiting and cultivating students of diverse backgrounds, particularly under-represented minority and female student groups for careers in computing; (3) disseminating research results as publications, presentations, conference tutorials and demonstrations, releasing open-source software codes, and eventually pushing them for integration into the official Hadoop code base; and (4) collaborating with industry, strengthening research partnerships, and cultivating opportunities for technology transfer to industry.