The stated goal of distributed Grid Computing is to create a network of interconnected computers that can act as one. This proposal for the "Open Science Grid" (OSG) is a component of a U.S. effort to create a truly seamless system where scientists and students distributed nationwide and worldwide can effectively collaborate.

Physicists' demand for computing power is being spurred by the flood of data that will pour out of the Large Hadron Collider (LHC), the next-generation particle collider at CERN, the European particle physics laboratory near Geneva, as well as LIGO, the Laser Interferometer Gravitational-Wave Observatory in the U.S. These projects will produce dozens of petabytes (millions of billions of bytes) of data a year, or the equivalent of millions of DVDs, which physicists will store and sift through for at least a couple of decades in search of new phenomena. To put this in perspective, current estimates of the annual production of information on the planet are on the order of a few thousand petabytes, so these projects will be producing nearly 1% of that total. Some 100,000's of today's fastest personal computers "with accompanying tape and disk storage and high-speed networking equipment "will be needed to work together to analyse all of this data.

A goal of the OSG is to enable dozens of other projects in other sciences to reap the benefits of Grid Computing

Project Report

Project Goals The Open Science Grid (OSG) project’s goal was to stimulate new discoveries by providing scientists with effective and dependable access to an unprecedented national distributed computational facility. We proposed to achieve this through the work of the OSG Consortium: a unique hands-on multi-disciplinary collaboration of scientists, software developers and providers of computing resources. Project Outcomes & Findings The OSG has evolved into an internationally recognized key element of the U.S. national cyberinfrastructure.The OSG has been expanding the reach of high throughput computing (HTC) to a growing number of science communities. The largest OSG science stakeholder has been the Large Hadron Collider (LHC) program at the European Organization for Nuclear Research (CERN), comprising the U.S. ATLAS, U.S. CMS and ALICE-USA contributions to the World Wide LHC Computing Grid (WLCG). The global shared computing infrastructure enabled by the OSG services has facilitated a transformation in the delivery of results from the LHC and other advanced experimental facilities – enabling the public presentation of results days or weeks after the data is acquired, rather than the months or years it took before. The U.S. LHC scientific program embraces OSG as a major strategic partner in developing, deploying and operating their novel and cost effective DHTC infrastructure. The OSG provides an intellectual hub for the entire DHTC community and drives the development of novel frameworks, educates and trains, forms new collaborative efforts, supports the development of new tools and serves as a testing and evaluation laboratory. We contribute to the NSF eXtreme Digital (XD) program as a Service Provider (SP) to the Extreme Science and Engineering Discovery Environment (XSEDE) project and to the DOE Scientific Discovery through Advanced Computing (SciDAC) program as a promoter and adopter of advanced computational technologies and methods. Figure 1: OSG's Fabric of Services & Community Focused Architecture Today, the OSG fabric of services is composed of three groups—software services, support services like education, training, consulting in the best practices of DHTC, and an infrastructure of DHTC services (referred to as production services) for those who would like to join the OSG DHTC environment (Figure 1). Services in the first two groups serve the broader community that builds and operates their own DHTC environments (e.g. LIGO), as well as supporting the DHTC environment of the OSG. High Throughput Computing technology created and incorporated by the OSG and its contributing partners has now advanced to the point that scientific user communities (VOs) are simultaneously utilizing more geographically distributed HTC resources than ever before. Typical VOs now utilize ~20 resources with some routinely using as many as ~40 simultaneous resources. The overall usage of OSG has grown steadily over the life of this project and now reaches ~70M hours per month as shown in Figure 2. Figure 2: OSG Usage (hours/month) from July 2007 to August 2014 Key results from this project include: an effective DHTC infrastructure providing single sign-on for use of its services; round-the-clock, dependable services that facilitate effective and secure sharing of the resources; a high quality DHTC software stack; and a "home" to the DHTC community which pioneered the concept of federated national grids and distributed resource management overlays. Other accomplishments include joint activities with science groups, engagement with end-users and training of students through the residential summer school . In 2012, 474 scientific papers were published that depended on use of OSG services and software, many of which are LHC results and 20% of which are non-physics. The number of users of the OSG has risen substantially over the past five years, with more than 2000 end-users accessing the OSG computing resources. More than 160 students and 80 system administrators have attended technical training and education, and the number of university resources accessible through the OSG has risen from 40 to over 100. The U.S. ATLAS and U.S. CMS communities have invested heavily in this fabric of services through collaboration and direct contribution. In a recent document the U.S. LHC management stated "It is vital to the LHC program that the present level of service continue uninterrupted for the foreseeable future, and that all of the services and support structures upon which the LHC program relies today have a clear transition or continuation strategy. " LIGO benefited from OSG software infrastructure to operate the LIGO Data Grid and from OSG services to share LIGO Data Grid computing resources with other communities. The Tevatron Run II experiments have increasingly adapted their legacy infrastructure to integrate with and depend on the services provided by the OSG. They are joined by several (intensity and cosmic frontier) particle physics experiments that have adopted OSG services. A number of additional science communities from physics, biology, chemistry, mathematics, medicine, computer science, and engineering have benefitted from the services and software provided by the OSG. The structural biology community (SBGrid) at Harvard Medical School actively leverages OSG.

Agency
National Science Foundation (NSF)
Institute
Division of Physics (PHY)
Type
Cooperative Agreement (Coop)
Application #
0621704
Program Officer
Saul Gonzalez
Project Start
Project End
Budget Start
2006-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2006
Total Cost
$15,692,445
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715