Within the last decade, with the growth of video on-demand applications and the explosion of data collected via sensors and other devices, computation over massive data sets is becoming the ubiquitous norm. This development calls for a deeper understanding of several issues specific to such contexts. The first issue is one of data placement; when data is located more closely to the demand, the need for network resources is reduced. The second issue pertains to energy minimization: how can one develop algorithmic methods to make data processing more efficient? Both of these issues lead to a host of interesting questions in the vein of scheduling and facility location type problems. In an effort to address these issues, this project focuses on the development of algorithms that manage data storage for processing, with energy efficiency as the primary consideration.
Much of the prior scheduling literature assumes a job-centric perspective -- algorithms are developed to optimize tardiness, completion time, makespan, etc. In contrast, this work is motivated by a system-centric view in which utilizing resources in an "efficient'' way is of the utmost priority, subject to individual jobs being completed in a "satisfactory'' manner. Such efficiencies are primarily manifested in the form of the energy cost incurred by the system. These problems are particularly eminent in the context of large scale storage devices and data centers. The main focus is on data of all types, ranging from multimedia data stored on a collection of disks to data collected and stored in a distributed storage system. The amount of data to be stored and efficiently accessed is increasing at an unsustainable rate. The costs for managing this data are expected in turn to grow significantly. The main question is how can one develop scheduling algorithms to manage this data effectively and efficiently.
Data centers are fast becoming integral to society and have transformed everything from social networking to human communication to scientific collaboration, computation, and data exchange. This research will lead to increased efficiencies in this critical infrastructure. The project will train graduate students in conducting research both at universities and through internships at industrial research labs during the summer. Extensive mentoring and involvement of undergraduate students and women is expected. Over the last few years, the PI has developed a new course on "Science behind Computing'' and is working on a book for this course, the primary purpose of which is to educate the general public about important scientific concepts related to computing in the 21st century.