For data-intensive applications, I/O and storage layers are extremely critical, and are often overlooked, but they become a bottleneck in not only obtaining scalable performance but also in utilization and productivity of systems and application scientists. Along with computational capabilities, scalable software for I/O, and storage for the required capacity and performance must be developed in order to address the data intensive nature of applications and reap benefits in performance and productivity of High End Systems. This proposal entails research and development to address several parallel I/O problems in the HECURA initiative. In particular, the main goals of this proposal are to design and implement novel I/O middleware techniques and optimizations, parallel file system techniques that scale to ultra-scale systems, design and development of techniques that efficiently enable newer APIs including suggested extensions to POSIX for parallelism, and flexible I/O benchmarks that mimic real and dynamic I/O behavior of science and engineering applications. The PIs propose innovative techniques to optimize data accesses that utilize the understanding of high-level access patterns, and use that information through middleware and file systems to enable optimizations. Specifically, the objectives are to (1) design and develop middleware I/O optimizations and cache system that are able to capture small, unaligned, irregular I/O accesses from large number of processors and uses access pattern information to optimize for I/O; (2) incorporate these optimizations in MPICH2's MPI-IO implementation to make them available to a large number of users; (3) design scalable parallel file system techniques and optimizations including a versioning parallel file system, programmable and adaptable consistency semantics, layout optimizations, and self-tuning capabilities; (4) design and evaluate enhanced APIs for file system scalability, particularly for recently proposed enhancements to the POSIX interface (API) to enable highperformance parallel I/O; and (5) develop flexible, execution oriented and scalable I/O benchmarks that mimic the I/O behavior of real science, engineering and bioinformatics applications.