Clemson University and the University of Alabama at Birmingham (UAB) jointly propose transformative computer science in the area of file systems in High-End Computing (HEC). The project is unique to the mission of NSF's EAGER program because, in a novel way, the outcomes of this work will replace current parallel file systems with a substantially more scalable Peer-to-Peer (P2P)-based storage system that will enable forthcoming exascale computing systems.
The combination of P2P-based file sharing concepts and HEC requirements and concerns is a novel approach to large-scale computing with performance requirements. The study of exascale storage from first principles is an important class of research to be undertaken, as opposed to refactoring existing approaches to file systems that are deployed on Terascale and Petascale systems.
Because the investigators seek to validate alternative designs of high scalability, availability, integrity and robustness than those offered by the logical evolution of existing HEC file systems and their instantiations on Petascale architectures, this EAGER project will seek to produce a preliminary/research prototype for a radically different file system. This project will jointly study, design, and create a preliminary/research prototype for a distributed software infrastructure and related techniques that support scalable and reliable file storage and retrieval for HEC relying on a structured P2P network.
If this project is successful, the design/architecture/strategy for exascale-based storage systems will change greatly over the logical evolutionary extensions of existing file systems and key opportunities and barriers will be more clearly understood in regards to the creation of practical exascale storage systems. Exascale co-design approaches will be considered and compared to the outcomes of this work, thereby informing other researchers of the relative merits of co-design approaches for exascale when file systems are studied based on a first-principles approach.
EAGER: Collaborative Research: A Peer-to-Peer based Storage System for High-End Computing This project created a preliminary prototype for Peer-to-Peer (P2P)-based file storage systems able to scale to upcoming exascale computing systems. Existing file system deployments suffer from poor scalability and reliability issues, including recovery times from corruption issues and rebuild times. As these extreme-scale deployments grow larger, these issues will only worsen. Our developed prototype is based on structured P2P network, which is well-known for its self-organizing, scalability, reliability and dynamism-resilience. It can support scalable and reliable file storage and retrieval for High-End Computing (HEC). This prototype provides the file storing, retrieving and removing functions and node join and departure functions. We also studied network and file system security to discover possible performance improvements. We have surveyed security approaches in existing high performance file systems. We have analyzed security properties and requirements of the P2P file system architecture. Also, we tested security and performance of existing authentication/authorization protocols. Graduate students and undergraduate students were trained to conduct research in distributed file storage systems and information security. Through conducting this project, we obtain a good understanding of the factors affecting the performance of each project component with extensive experimental studies and analysis. We implemented, evaluated and deployed prototypes in Planetlab. We measured the performance of the proposed mechanisms in boosting scalability and reliability in file storage systems in HEC.