This project, developing an experimental high performance computing/storage platform based on GPGPU technologies (CUDA, CTM), integrates storage more closely with computation, while achieving higher performance and availability than current commercial high-end controllers. Emphasizing design, performance-analysis, integration, optimization, and construction, the work seeks to enable the construction of massive storage with high reliability, while remaining cost effective and high performing. A computation/storage hierarchy will be designed, studied with diverse components at small scale, optimized further, and finally constructed at full scale (reaching 250+ terabytes). GPU performance proves to be a ?disruptive technology? enabling the architecture to deliver I/O at full performance, something not possible with multicore x86 systems alone. High performance, high reliability parallel I/O will coexist with computation and be supported through pNSF, rather than being presented only as add-on SAN architecture or as limited direct attached storage. Introducing simpler controllers (JBOD plus failover) and using GPUs for offloading RAID computation and reconstruction, systemic abstraction barriers of separate controller and CPU subsystems should weaken. Since failures occur in practice at non-trivial rates in large installations, the system will be configurable for high performance with RAID configurations (e.g., three-disk-or-higher failure resiliency). The constructed large-scale system will be used to test reliability and performance. The ability to move computation closer to storage drives the experimental architecture that will support scalable scientific algorithms as well as modern Internet application back-ends. If successful, the architecture of the clusters and grid nodes might shift from x86-64 servers plus local RAID or SAN storage to this heterogeneous architecture while conserving programming paradigms (e.g., MPI-2 + Pthreads). Muticore x86-64 CPUs alone might be insufficient for achieving high availability, high performance storage and computation based on current and near-term COTS components. Furthermore, silicon optimizations that may converge GPUs with CPUs might not significantly diminish the long-term value of the proposed cluster architecture and associated systems integration work.