This project is an investigation into algorithms and mechanisms that will allow sophisticated staging of input and output data in desktop grids. Data staging allows the system to store data semi-permanently in the underlying peer-to-peer structure, and to run multi-node jobs (applications) whether they be tightly-coupled parallel applications, arbitrary work-flows, or anything between. We are building support for these application types by extending our current desktop grid infrastructure in three distinct areas. We are developing cluster identification techniques that can define arbitrarily-sized virtual clusters through both passive and active network measurement. We are incorporating virtual cluster descriptions into the underlying peer-to-peer infrastructure to allow the scheduling algorithms to map multi-node jobs to the clusters. Finally, we are incorporating data placement into the underlying infrastructure; data is placed according to use and process binding. This work will impact several research areas, including that of distributed and decentralized scheduling, application description, network characterization, and storage networks. In all of these areas our work will explore the tension between local autonomy and global, aggregate objectives. The algorithms and techniques will have broad applicability across a wide range of emerging distributed and collaborative applications. However, the work described here will also explicitly and immediately impact the quality of research conducted by our collaborators in astronomy and elsewhere. The ability to run parallel applications, and those with more complicated inter-relationships, will enable whole new classes of scientific applications to be run on top of ad-hoc grid-like systems.
This project explored decentralized approaches to scheduling of sequential and parallel jobs across multiple clusters, both those defined explicitly and ad-hoc clusters identified automatically. We have demonstrated a number of contributions. First, we have described a decentralized approach to matching parallel jobs to appropriate resources. By publishing parallel resources as the number of near nodes within a latency radius of each system node, we can route to parallel resources without the need for explicit clustering. Second, we have shown that we can use soft-state techniques to coordinate running parallel jobs in a distributed, decentralized system. Third, we demonstrate the utility of these algorithms with experiments that use actual job traces rather than synthetic inputs. Fourth, we have demonstrated a completely decentralized scheduler that performs comparably to a globally load balanced scheduler. The experiments that show this use actual job traces rather than synthetic inputs. Fifth, we have also established criteria for successful and efficient application of this scheduler: clusters must be as large as possible, and we must limit certain pathological overlap in clusters. Finally, our collaborator in Astronomy has used the system to perform numerical experiments testing the behavior of cohesionless gravitational aggregates experiencing a gradual increase of angular momentum. The test bodies used in the numerical simulations are gravitational aggregates of different construction, distinguished by the size distribution of the particles constituting them, parameterized in terms of the angle of friction. Shape change and mass loss are found to depend strongly on the friction angle, with results ranging from oblate spheroids forming binary systems to near-fluid behavior characterized by mass shedding bursts and no binary formation.