The proposal requests funding to establish the participation of the Texas Advanced Computing Center (TACC) as a Resource Provider (RP) in the Extensible Terascale Facility, also referred to as the ETF or the TeraGrid. ETF is an integrated heterogeneous computing-communication-information system designed to provide the national science and engineering community with unparalleled access to secure state-of-the-art cyberinfrastructure resources and services. ETF seamlessly integrates the highest-end computing resources available to the open science community, including powerful and innovative systems at TACC, sophisticated scientific instruments, and diverse data collections, using software tools and services that enable their effective use.
Among large-scale comprehensive cyberinfrastructure projects, ETF pioneers the integration of state-of-the-art software services with the policies and procedures of autonomous open national computing centers and universities. As ETF moves into its five-year operations phase, its partner institutions will seek to: deliver the promise of convenient, reliable, secure, persistent computing, data storage, data collection, and real-time instrument capabilities;support user priorities that include new software services such as co-scheduling, meta-scheduling, parameter sweep tools, and advanced data management and handling; and implement and support science gateway services that engage a much larger number of the nation's scientists and engineers in computing-enabled research and education.
Working alongside TACC are eight other ETF partner organizations: Argonne National Laboratory, Indiana University, the National Center for Supercomputing Applications (NCSA), Oak Ridge National Laboratory, Pittsburgh Supercomputing Center (PSC), Purdue University, and the San Diego Supercomputer Center (SDSC).
Over a more than 8 year period, this award funded two distinct projects. The original award, consuming most of the funding, was for the Texas Advanced Computing Center’s (TACC) support of the TeraGrid project. The award funded several large-scale production resources to support scientific research including part of an HPC system, a remote visualization platform and high-speed networking hardware and connections. In addition to the hardware resources, the project also provided funding for support staff to operate these high-end resources and develop new novel interfaces to make it easier for researchers to use the NSF funded large-scale systems including the first TeraGrid User Portal. In the second phase on the project, additional funding was provided and the nature of the work shifted to a small team of investigators conducting research in computational analysis and visual analytics on large archival data sets from the National Archives and Records Administration. This report covers both phases of the project. The TeraGrid portion of the award supported a large High Performance Computing (HPC) system, Lonestar 3, which was one of the first x86-based 64-bit platforms, using the then new low-latency, high-bandwidth InfiniBand interconnect, that are now common for HPC clustered system. The system ran for over four years supporting thousands of researchers in hundreds of projects and delivered tens of millions of CPU hours to the NSF community. The first large-scale system designed primarily for visualization for NSF researchers, Maverick, was also supported on this grant. This visualization system provided a 512GB shared-memory platform with 16 high performance GPUs to perform large scale rendering and remote visualization. This system resulted in the development of remote visualization tools and techniques that have been refined and are still in use on recent visualization resources. This project also included 10Gigabit Ethernet networking to connect all of the TeraGrid-funded sites together with high-bandwidth pipes to facilitate large data transfers between the systems. An existing Force10 switch was supported from the project funds and acted as the core switch connecting all of the TACC large-scale resources and archive library to the other of the TeraGrid sites. This connectivity ran across the National Lambda Rail network and funds from this project were used to cover a portion of the costs to connect to NLR. Several software products also resulted from this provided funding, most importantly, the TeraGrid User Portal. This was the first portal which allowed researchers with NSF allocations to manage resource allocations across all of the TeraGrid sites. This portal has been greatly enhanced over the years with additional functionality and now operates as the XSEDE User Portal. Beginning in 2008 until December of 2013, the project added a new focus and new funding, and a multidisciplinary team at TACC conducted research in computational analysis and visual analytics for big archives processing with support from the National Archives and Records Administration. This research is specifically relevant today, in the era of big data, if we consider that the possibilities of making unprecedented discoveries through data-intensive science on all walks of knowledge are based on the existence of organized, readily available, and documented data and records collections. This research is also relevant to government accountability and the possibility for the public to find the documents and data produced by the federal and state governments. These are the topics occupy archivists and records managers and our work addressed solutions into this pressing problem: imagining and exploring next generation of methods and tools to tackle big digital archives are. To address this problem we built a visual analytics framework through which we tested different archives and data analysis functions including collection’s content description, functional analysis, preservation assessment, authenticity and integrity, organizational structure and context evaluation. This last phase of the project produced 12 publications in highly rated journals and conference proceedings, and was presented in more than 30 venues both nationally and internationally. Currently, research teams use the tool developed to make sense of and to organize their data collections, and the display system that we designed with TACC’s visualization group is being replicated in other labs in the country. Most importantly this research is shaping new professionals in the field of archives and libraries, who are visiting TACC, whom we are teaching and seeking our advice to find new ways of addressing big archives.