Humankind's knowledge of the world and its ability to manipulate it for the betterment of quality of life and understanding through science, technology, engineering, and mathematics (STEM) is increasingly dependent on the ability to store, access, and manage extremely large persistent data sets representing scientific and process measurements, results from science and engineering simulations, and long-term knowledge. Supercomputers conventionally operate in dual or separate modes: one to do the computations in their temporary (ephemeral)-main memory-and the other to supervise the use of large persistent data storage. As supercomputers get larger, perhaps to the scale of an Exaflops by the end of this decade, the comparable scale and ease of use of mass storage is severely challenged. This research will address the problems of efficiency and scalability of data migration through the vertical memory hierarchy and will unify the way both main memory data objects and persistent storage data are named creating a single, easy to use programming. This will revolutionize data intensive supercomputing and establish a new path towards future Exascale system design and programming. This research is in collaboration with Clemson University to provide a proof-of-concept system to evaluate the new concepts.

The semantic and performance barriers between computing in main memory and manipulation of mass storage for persistent data have imposed significant limitations to performance and programmability. Because of uncertainties of access latency times combined with overheads and the need to exploit data access parallelism for high throughput, a new relationship between ephemeral storage and persistent objects is needed to unify their association and manage the asynchrony of operation while achieving high efficiency. This research is deriving an innovative execution model and developing a proof-of-concept experimental system to test and evaluate its underlying concepts for a new generation of persistent mass storage at extreme scale. It will address the challenges and provide the means for the unification of the semantics of ephemeral and mass storage through a single abstraction of data manipulation and the integration of meta-data and synchronization to manage asynchrony and uncertainty of response time as well as logical conflicting accesses while automatically hiding latency. The new model will support dynamic data path management for the asynchronous vertical storage hierarchy, exploiting adaptive runtime event-driven techniques for enhanced efficiency and scalability including management of vertical transport of data, which demands an innovative strategy of dynamic control of the entire data path.

Project Report

The outcomes of the project fall into three primary categories: development of the foundational concepts governing the definition and manipulation of persistent objects, their implementation in the HPX (High-Performance ParalleX) runtime system layer, and performance evaluation using both synthetic and actual scientific applications. Below we briefly discuss each of them. 1. Principal Concepts IU has developed innovative concepts to guide the unification of main memory and mass storage through the introduction of persistent storage objects and the software technologies to implement them. PXFS is a proof-of-concept system that unifies the name spaces and semantics of information manipulation in both main memory and disk mass storage to provide a single information storage framework. An object of persistent storage is conceived to incorporate three critical properties: 1) structured naming is consistent with that of regular variables for similar access, 2) different elements of persistent objects may be distributed vertically across the storage layers, and 3) metadata integrating control constructs for event-driven management of computation to mitigate asynchronous behavior. Migration of persistent object elements across vertical layers is declarative in terms of defined behavior properties rather than imperative under programmer control, eliminating programmer burden and exploiting runtime information for efficiency of lateral and vertical distribution. This framework adds significant enhancements to the ParalleX execution model and their implementation within an HPX-3 runtime system and the OrangeFS mass storage management system. Unification of name space is one of the two major contributions of this research. Access to objects in main memory and those in secondary storage is achieved through the same semantic constructs. The reason for this is productivity and portability. Productivity is achieved by the user not having to micromanage the intimate details of data movement. Portability is achieved by the meaning of effect being asserted across system scales, system types, and system generations. The use of mass storage imposes three or more orders of magnitude of latency on to access of data. Worse, it creates uncertainty about what that latency is going to be because previously used data may be cached in main memory buffers or in an arbitrary layer of storage hierarchy below it. This results in a level of asynchrony that is comparable to that of a large distributed system (cloud). The concept of persistent storage being developed addresses this asynchrony transparently. Access requests presume the automatic use of the futures construct. A future, as opposed to the variable value for which it is a surrogate, upon incidence of an access immediately returns the equivalent of an IOU to the requesting agent. One of three things happens: 1) the value of the persistent variable is already in main memory and so is delivered, 2) the reference link is available to be used such as in the creation of metadata for complex data structures where the value itself is not to be immediately operated upon, or 3) the value is needed and the calling sequence is suspended. In the last case, the future behaves as a Local Control Object whose state can be altered by external incident events and under certain circumstances result in the creation, or renewal of a thread. In the case of a persistent object, the future restarts the requesting thread when the value of the accessed variable has become available from the mass storage subsystem. 2. Implementation The architecture of the implemented I/O subsystem is shown in Figure 1. This approach is fully integrated with the HPX runtime system and enables usage of multiple file storage backends, in particular, OrangeFS. The implementation permits issue of typically blocking I/O calls by user-level threads, however, the blocking is avoided by offloading the processing of the requests to dedicated OS-level I/O threads. This allows the scheduler to activate other runnable threads while executing the I/O operation thus effectively overlapping computation with I/O. Another important implementation aspect was support for metadata management based on TupleSpace analogous to that introduced by Linda language (Figure 2). TupleSpace provides three operations: out, in, and rd, which insert, retrieve (with removal), and read (non-destructively) the tuple contents, respectively. A special C++ construct, HPX::any, has been added to the runtime definitions to support building of tuples storing objects of arbitrary types. TupleSpace takes advantage of the Active Global Address Space to achieve the transparent access to metadata objects (and indirectly to objects they refer to) at system scale. 3. Performance The performance of the implementation has been successfully tested using sequential access benchmarks that demonstrate significant advantages of the asynchronous HPX implementation over direct invocation of POSIX file calls (Figures 3 and 4) and NFS file system access (Figure 5). Rewriting the checkpoint routines of existing applications to adopt the new model, such as GTC, also provides substantial performance benefits (Figure 6).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1252358
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2012-06-01
Budget End
2014-08-31
Support Year
Fiscal Year
2012
Total Cost
$248,130
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401