Desktop computing remains indispensable in scientific exploration, largely because it provides people with devices for human interaction and environments for interactive job execution. In fact, the proliferation of supercomputers and clusters has been driving the need for more efficient desktop processing, to assist high-end computing in tool validation, data analysis, and visualization. However, with the rapidly growing data volume and task complexity, it is increasingly hard for individual workstations to meet the demands of interactive scientific data processing. The increasing cost of such interactive processing is hindering the productivity of end-to-end scientific computing workflows. The project will develop a novel desktop parallel computing framework to speed up scientific data processing tasks routinely executed on desktop machines. It will allow users to preserve the interactiveness and convenience of desktop processing, without explicit resource request or waiting for batch execution, while their computation is seamlessly accelerated by aggregating idle computing and storage resources in local-area networks. This framework comprises several closely coupled innovative techniques to be developed in this research: data processing semantics specification based on relational algebra, on top of which we design application interfaces for automatic and flexible program parallelization; integrated computing resource aggregation and storage resource aggregation, on top of which we develop parallel I/O for desktop parallel computing; asymmetric task scheduling that exploits the central role of the client workstation, for guaranteed interactive execution, better fault tolerance, and diverse self-configuration opportunities; and quantitative and explicit performance impact control based on impact benchmarking and real-time workload monitoring that protects the performance of resource donors native workloads, in the presence of aggressive and persistent resource stealing. The proposed framework possesses three key properties that distinguish itself significantly from existing parallel execution platforms: interactiveness (immediate execution regardless of the availability of external resources), transparency (hiding the availability of and fluctuation in external resources from both application developers and users), and customized performance impact control (throttling a parallel jobs resource consumption according to the resource usage measured from the native workload of individual resource owners). Our research takes the initial steps towards a new parallel computing paradigm suitable for heterogeneous and opportunistic environments. With this new paradigm, users do not have to specify the amount of resources needed, nor do they have to wait for such resources to become available.
for NSF Award 0546301 In this project, we explored ways of novel data processing technologies to bring the benefit of parallel computing to users relying on desktop environments to analyze or visualize their scientific data. Some of these users may have access to traditional parallel platforms (such as supercomputers and clusters). However, the lack of interactive devices (such as display and keyboard/mouse), plus the lack of support for interactive runs (vs. submitting a job to a queue) prohibits such traditional platforms to be useful in interactive data processing. In this work, we have the following major findings through original research and development: 1. We have found that there is significant amount of task parallelism within common data processing programs written in script languages (such asMatLab and R). This means that there are independent data processing tasks that can be automatically detected and parallelized using exisingcompiler techniques. With todays multicore processors and web-based parallel servers, an automated script parallelization tool will greatly improve data processing performance while retaining the ease and interactiveness of desktop computing. 2. We have found that "active volunteer computing", which aggressively utilize the "residual resources" (leftover resources available on non-idle servers running foreground workloads), is able to deliver significant energy saving. This is due to that the "incremental energy cost" of piggy backing a background workload onto the existing foreground ones is significantly smaller than that of firing up new servers. This result supports the aggregation of residual resources on interconnected machines for performing parallel data processing. 3. We have found that utilizing idle nodes on campus cloud systems for parallel data processing suffers significant reliability issue due to correlated node availability problems. For example, a large group of nodes may become unavailable due to class activities. In this work, we proposed and demonstrated the bene?t of intelligent data and task replication design to greatly improve the QoS of parallel data processing. In particular, our proposed system, MOON, takes advantage of a hybrid resource architecture, where a large group of volatile, volunteer resources is supplemented by a small set of dedicated nodes. 4. We have found that there is great potential to explore the capacity/performance tradeoff of using the small yet fast shared memory onGPUs, another powerful resource for parallel computing on today's average servers. Also, we have found that with region-based memory management, we can simultaneously improve the ease of programming and the program's performance, by providing virtual memory management facilities between the CPU host memory and the GPU device memory. The above explorations have resulted in the following outcomes of this multi-year research: 1. Two PhD theses. Mainly supported by this research, two PhD student (Jiangtian Li and Feng Ji) have completed their PhD thesis research. Li joined Microsoft in 2009 while Ji is joining VMWare in summer 2013, working on projects closely related to their PhD training. 2. This project has produced 13 conference publications, 4 journal papers, 2 posters, all in well established venues, plus one book chapter. Among the publications, one received Supercomputing (SC 06) Best Paper Nomination, while another, published in 2010, received the HPDC 20 Year Best Papers Award (among 22 papers from all published in the conference's 20 year history, and the only selection published after 2006). All publications are openly accessible. We have got contacted by companies like Google and other academic researchers, who found our findings and approaches useful. 3. This research has resulted in several software tools, such as pR (for automatic R script parallelization), MOON, and gpuMR (for running MapReduce jobs on GPUs). 4. Partially supported by this grant, the PI acted, between 2006 and 2010, the faculty mentor for Women in Computer Science (WICS) at NC State University, which actively performs outreach and diversity promotion activities to attract and retain female participation in computer science, and engineering. 5. This research helped students involved, both at PhD and MS level, to participate in summer research at national laboratories and industry, to better prepare them for R&D jobs post graduation.