The Extensible Terascale Facility (ETF) is the next stage in the evolution of NSF large-scale cyberinfrastructure for enabling high-end computational research. The ETF enables researchers to address the most challenging computational problems by utilizing the integrated resources, data collections, instruments and visualization capabilities of nine resource partners. On October 1, 2004, the ETF concluded a three-year construction effort to create this distributed environment called the TeraGrid (TG) and we are now entering the production operations phase.

The TeraGrid resource partners include: the University of Chicago/Argonne National Laboratory, the San Diego Supercomputer Center at UCSD, the Texas Advanced Computing Center at UT-Austin, the National Center for Supercomputing Applications at UIUC, Indiana University, Purdue University, Oak Ridge National Laboratory, and the Pittsburgh Supercomputing Center.

A separate proposal was submitted to NSF on October 19, 2004 for the TeraGrid Grid Infrastructure Group (GIG). Under the direction of Charlie Catlett at UC/ANL, in general, the GIG will be responsible for coordination of development activities for the TeraGrid with subcontracts to the partner sites. The resource partners (RP) will each have independent cooperative agreements with NSF, but will work closely with the GIG to implement the vision of the TeraGrid.

This proposal outlines the Grid Infrastructure Group (GIG) plans to participate as a System Management and Integration Group within the TeraGrid team to provide the Resource Partners and the scientific community with ongoing access to this computational science facility. This proposal covers the period November 1, 2004 through October 31, 2009.

TeraGrid, a world-class networking, computing, and storage infrastructure has been built and deployed. This initiative now faces the challenge of further engaging the science and engineering community to guide the tailoring of this generic infrastructure to better support their needs, catalyzing new discoveries and broadening the base of computational science. TeraGrid integrates some of the nation's most powerful resources to provide high-capability production services to the scientific community. In addition, NSF supports common software through the NSF Middleware Initiative (NMI) and community-specific infrastructure through its Information Technology Research (ITR) projects.

The TeraGrid Grid Infrastructure Group (GIG) will build on these foundations to broaden the community benefiting from cyberinfrastructure and to harden and deepen TeraGrid's unique capabilities. Collaborating with 16 science partners, the GIG has developed infrastructure priorities to simplify research modalities that remain difficult (or infeasible), even with today's cyberinfrastructure. For example, the TeraGrid aims to make routine the following frequently requested, but currently difficult tasks:

1. Drive complex workflows with multiple computational and data access steps across TeraGrid and smaller scale resources in other Grids in an integrated and automatic manner.

2. Harness TeraGrid resources in an on-demand mode, to provide computational decision-support for time-critical events ranging from weather to medical treatment.

3. Optimize turnaround, costs, and utilization by creating resource brokers that present a single point of access to schedule computational and data management tasks across all TeraGrid resources based on resource availability information. A five-year roadmap has been presented. But recognizing that user needs continue to evolve in response to scientific opportunities, it is planned to reevaluate this roadmap annually based on a widening set of science partner discussions.

TeraGrid will encourage the scientific community to leverage this resource to tackle the most important computational problems in virtually every scientific discipline. The infrastructure and community-driven grid service bridges and portals, which are called science gateways, will bring increased productivity to a large numbers of scientists who have not heretofore used NSF's high-performance computing resources. The problems targeted by current and planned TeraGrid users are among the most computationally intensive areas for modern science and represent a class of problems that cannot be addressed effectively by either smaller-scale grid environments or stand-alone supercomputer centers. Leveraging software and infrastructure partners, the TeraGrid will develop policy for software, security, and resource sharing necessary to underpin international cyberinfrastructure.

TeraGrid, NMI, ITR projects, and discipline-specific infrastructure projects will be integrated, thus, forming a coherent cyberinfrastructure. This cyberinfrastructure will provide common software components and use the TeraGrid network as a national grid resource backplane, reaching thousands of scientists through science gateways and collaboration with other grids. Working with the software partners, the Grid Infrastructure Group intend to develop a set of policies and software that will be widely used by other grid projects, with an eye toward sustaining infrastructure beyond the end of this decade. The GIG will coordinate education outreach and training (EOT) initiatives across the nine TeraGrid resource provider sites to support a broad EOT program for cyberinfrastructure. We have set quantitative objectives for growing the TeraGrid user community by an order of magnitude: to 5,000 users by FY09.

To empower all TeraGrid users, the GIG has addressed heterogeneity and the policy requirements for unique national resources and high-availability production services, developing a coordinated software environment across these heterogeneous resources and a powerful verification and validation system. In a complementary approach to TeraGrid, community-oriented ITR projects such as the Grid Physics Network (GriPhyN ) and Linked Environments for Atmospheric Discovery (LEAD are addressing scaling and software packaging capabilities necessary for aggregation of many departmental-scale resources. Similarly, computational science projects at DOE, via the SciDAC initiative, and at NIH, via the NIH Roadmap, are also important components of the cyberinfrastructure landscape. Moreover, our collaborators in Europe, Asia-Pacific, and elsewhere are building scientific grid infrastructure in projects such as, the UK eScience Programme ], Enabling Grids for E-Science in Europe (EGEE), and Japan's National Research Grid Initiative (NAREGI). The TeraGrid will partner with these and other grid projects, NSF's core centers program, and software providers such as the NMI GRIDS Center to catalyze an integrated NSF cyberinfrastructure program with cross-agency and international impact.

Project Report

Submitted 11/03/2011 The TeraGrid was an open cyberinfrastructure that enabled and supported leading-edge scientific discovery and promoted science and technology education. The TeraGrid comprised of supercomputing and massive storage systems, visualization resources, data collections, and science gateways, connected by high-bandwidth networks integrated by coordinated policies and operations, and supported by computational science and technology experts. The TeragrGrid's objectives were accomplished via a three-pronged strategy: to support the most advanced computational science in multiple domains (deep impact), to empower new communities of users (wide impact), and to provide resources and services that can be extended to a broader cyberinfrastructure (open infrastructure). This "deep, wide, and open" strategy guided the development, deployment, operations, and support activities there by ensuring maximum impact on research and education across scientific communities. At the projects conclusion, the TeraGrid achieved an integrated, national-scale computational science infrastructure operated in a partnership comprised of the Grid Infrastructure Group (GIG), eleven Resource Provider (RP) institutions, and six Software Integration partners, with funding from the National Science foundation's (NSF) Office of Cyberinfrastructure (OCI). Initially created as the Distributed Terascale Facility (with four partners) through a Major Research Equipment (MRE) award in 2001, the TeraGrid began providing production computing, storage, and visualization services to the national community in October 2004. In August 2005, the NSF funded a five-year program to operate, enhance, and expand the capacity and capabilities of the TeraGrid to meet the growing needs of the science and engineering community through 2010. At the conclusion of the initial five years, facilitating as an extended planning phase in preparation of the anticipated TeraGrid Phase III eXtreme Digital (XD), an additional extension of one year was added extending the TeraGrid into 2011. Accomplishing this vision was crucial for the advancement of many areas of scientific discovery, ensuring US scientific leadership, increasingly, for addressing important societal issues. The TeraGrid achieved its purpose and fulfilled its mission through its three-pronged strategy: Deep: ensure profound impact for the most experienced users, through provision of the most powerful computational resources and advanced computational expertise and enable transformational scientific discovery through leadership in high-performance computing (HPC) for high-end computational research. Wide:enable scientific discovery by broader and more diverse communities of researchers and educators who can leverage the TeraGrid's high-end resources, portals and science gateways and increase the overall impact of the TeraGrid's advanced computational resources to larger and more diverse research and education communities through user interfaces and portals, domain specific gateways, and enhanced support that facilitate scientific discovery by people without requiring them to become high performance computing experts. Open: facilitate simple integration with the broader cyberinfrastructure through the use of open interfaces, partnerships with other grids, and collaborations with other science research groups delivering and supporting open cyberinfrastructure facilities. The TeraGrid's integrated resource portfolio evolved over the life of the project from an initial single integrated but distributed cluster to more than 20 HPC systems, several massive storage systems, and remote visualization resources, all supported by a dedicated interconnection network. This infrastructure was integrated at several levels: policy and planning, operational and user support, and software and services. The national and global user community that relied on the TeraGrid grew tremendously to more than 10,000 total lifetime users. To support the great diversity of research activities and their wide range in resources needs, user support and operations teams leveraged the expertise across all of the TeraGrid Resource Providers. In addition, users benefited greatly from our coordinated education, outreach, and training activities. The TeraGrid's diverse set of HPC resources provided a rich computational science environment. These resources were available via a central allocations and accounting process for the national academic community. The project saw many varied resources come and go throughout its duration and as it drew to a close made way for the transition to the follow-on XD program.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Cooperative Agreement (Coop)
Application #
0503697
Program Officer
Barry I. Schneider
Project Start
Project End
Budget Start
2005-08-01
Budget End
2011-07-31
Support Year
Fiscal Year
2005
Total Cost
$60,387,459
Indirect Cost
Name
University of Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637