This project aims to design, implement and evaluate a scalable parallel storage system for compute clusters called Platypus, that provides bandwidth guarantee for individual applications when multiple of them run on the same cluster simultaneously, and supports a decoupled file fetching mechanism that can effectively overlap CPU processing and disk access and decrease the degree of burstiness in disk access streams. Platypus's disk QoS guarantee algorithm offers long-term/short-term performance isolation among concurrently running parallel applications, and maximizes the overall disk utilization efficiency by exploiting the slack in the QoS enforcement process. Platypus's file prefetching mechanism applies the concept of decoupled architecture, which was originally proposed to bridge the gap between CPU and memory, to achieve close to perfect disk prefetching, and can effectively mask both disk I/O and networking delay associated with file accesses in for parallel applications. To evaluate the effectiveness and efficiency of Platypus, the PIs propose to build a parallel file access trace player which allows researchers to evaluate parallel file systems based on pre-collected traces in a scalable way.