Large-scale science applications are expected to generate exabytes of data over the next 5 to 10 years. With scientific data collected at unprecedented volumes and rates, the success of large scientific collaborations will require that they provide distributed data access with improved data access latencies and increased reliability to a large user community. To meet these requirements, scientific collaborations are increasingly replicating large datasets over high-speed networks to multiple sites. The main objective of this work is to develop and deploy a general-purpose data access framework for scientific collaborations that provides lightweight performance monitoring and estimation, fine-grained and adaptive data transfer management, and enforcement of site and Virtual Organization policies for resource sharing. Lightweight mechanisms will collect monitoring information from data movement tools without putting extra loads on the shared resources. Data transfer management mechanisms will select transfer properties based on each transfer's performance estimation and will adapt those properties when observed performance changes due to the dynamic load on storage, network and other resources. Finally, policy-driven resource management using Virtual Organization policies regarding replication and resource allocation will balance user requirements for data freshness with the load on resources.

Intellectual merit: The team will produce a software framework that will improve the ability of distributed scientific collaborations to provide efficient access to replicated datasets by a large community of users; this framework will combine fine-grained transfer management, transfer advice from policy-driven resource management, and light-weight monitoring. Broader impact: The proposed development will facilitate scientific advances in many domains that increasingly depend on replication and sharing of ever-growing datasets.

Project Report

The overall goal of the Adaptive Data Access and Policy-driven Transfer (ADAPT) project is to improve the performance of large data transfers between two sites connected by the Internet. Such large data transfers are often performed by distributed scientific collaborations. For example, a scientist may want to analyze a data set that is stored at one site, but the analysis itself needs to take place at a second site that has the required computational resources, such as a supercomputer or a large computing cluster. In such cases, the data set must first be copied to the computational site before the analysis can begin. These large data transfers are often inefficient using current data transfer technologies. Often the data set is stored in many (hundreds or thousands of) files that must all be sent to the target site. If the resources are not carefully managed at the source and destination sites and on the network between them, it is easy to overwhelm those resources. For example, the source site may try to send the data faster than the network or the destination site can transfer or store the data. This results in poor transfer performance, in terms of latency (how long it takes for data to arrive) or throughput (the rate of transferring data). The ADAPT project studied two techniques to improve the performance of these large data transfers. First, the project studied how to manage the resources available for transferring data more effectively. This resource management is based on the knowledge of the distributed scientific collaboration regarding how it wants to manage its data transfers (for example, that transfers between certain sites should get priority over other transfers) as well as any limitations known by individual sites (for example, that the site’s storage systems can only transmit or receive data at a certain rate). Such preferences and limitations can be considered the policies of the underlying systems. The ADAPT project implemented a Policy Service that is responsible for allocating system resources based on these preferences. Second, the ADAPT project improved the performance of individual data transfers over time by adapting their transfer characteristics based on how well the transfers are currently performing. Characteristics of a transfer might include how many parallel streams are used to transfer one file. The ADAPT team enhanced existing data transfer software with additional capabilities to check recent performance and adapt transfer characteristics accordingly. If recent performance was good, then the software will increase its use of streams to achieve even greater bandwidth for the transfer. If recent performance was poor, then the software will decrease its use of streams to avoid overwhelming the available resources. The project developed open source software that is freely available, including (1) the Policy Service that is responsible for allocating resources based on preferences of the sites and the distributed collaboration, and (2) the enhanced data transfer software that is responsible for adapting transfer characteristics based on recent performance. Figure 1 shows how these software components interact during data transfers, with each transfer client contacting the Policy Service to get an allocation of resources and then adapting the characteristics of the transfers that it performs within that allocation. The ADAPT team ran experiments with these two techniques and demonstrated that these techniques are very effective in improving the overall performance for large data transfer operations between two sites. The techniques work particularly well in environments where the available resources, either at the source and destination sites or on the networks between them, are limited. Such limitations frequently occur on real systems, for example, when site resources are shared by multiple users who may all be performing competing data transfers and computations. Likewise, Internet bandwidth may be limited between two sites because there are multiple transfers competing for the available network resources. The ADAPT techniques are successful in efficiently allocating resources and using transfer adaptation to avoid overwhelming the available resources. Figure 2 shows the results of experiment. In this scenario, a scientist is transferring a large data set from a site in Korea over the Internet to a site in Oakland, CA, where an analysis will be performed on the data set. The red line shows the throughput (data transfer per second) over time using the adaptive techniques compared to conventional data transfers, shown on the black line. The ADAPT techniques show a 20% improvement in the overall time to complete the large transfer between the sites. The results of the ADAPT project will have broad impact on many scientific domains that need to transfer large amounts of data efficiently between sites over the Internet using limited, shared resources. In addition, these techniques will be relevant to other areas, including Software Defined Networking, where the resource allocation and adaptive transfer techniques could help improve the routing of network traffic.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1127101
Program Officer
Kevin L. Thompson
Project Start
Project End
Budget Start
2011-08-01
Budget End
2015-01-31
Support Year
Fiscal Year
2011
Total Cost
$600,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089