The Lark project provides network-aware scheduling for distributed computing resources, allowing workload management schedulers to make network-aware provisioning decisions and providing mechanisms for individual batch jobs to describe and request the networking topology required. Specifically, Lark will integrate the capabilities of the perfSONAR, a network performance monitoring tool, and DYNES, a cyber-instrument for allocation of bandwidth guarantees, into distributed computing grid middleware such as Condor and glideinWMS. By using a multi-site IPv6 test-bed and collaborating with a Nebraska project to provide small starter clusters to local colleges, the project aims to produce production-quality technology that could be deployed into the Open Science Grid (OSG).
This work will enable better trade-offs between selection of a site to execute jobs, data location decisions, performance, and security in large distributed computing environments, hence providing more effective use of limited resources. This project addresses a clear and immediate need in the Large Hadron Collider (LHC) computing environment, as network-aware scheduling will increase the amount of analysis the LHC experiment collaborations can perform. By integrating this work into the widely-used Condor high throughput computing software and into the OSG stack, the benefits will become available not only to LHC researchers, but to a number of science and industry projects that rely upon distributed high throughput computing.