Future HPC systems must match the needs of users, which are not adequately defined by single measures such as floating point operations. This is particularly true in the life and medical sciences where the data sets are immense and computations are frequently not floating point intensive. We will deploy a different computer architecture and different surrounding infrastructure (hardware, software, personnel) to explicitly address the emerging needs of the non-PDE solving community, and leverage recent advances made on a prototype system, Shadowfax, at Virginia Tech. We intend to create an extensible FPGA-based cluster as an expansion to existing FPGA modules that is balanced with a data storage system to address the needs that are typical of life/medical science applications. We will optimize and tune a diverse sampling of critical life science implementations on three commercially available processor types, microprocessors, FPGAs and GPGPUs, to confirm the performance and general applicability of this approach and will then make the system and all components available to the community on which to do computations and via an outreach program.
The specific aims are: 1) To optimize and evaluate the effectiveness of a Hybrid-core based cluster relative to standard microprocessors and GPGPUs specifically for life and medical science applications that are non-floating point, data and memory intensive and to validate the utility and scalability for a set of the most in-demand compute-intensive applications that samples diverse algorithm/application space. 2) To create a secure web-based portal through which life/medical science users can analyze their data. 3) To propagate our knowledge and experience via a training and internship program that will enable users, developers and systems personnel to use the prototype system or locally replicate and support a similar system.
The novel and generalize-able HPC approaches to be investigated in this project will potentially demonstrate types of computations and data manipulations are the best matches for certain architectures, thus will address the very different needs of the growing life/medical science data-driven research community. If successful, computers and techniques established here could join the established network of HPC floating point-centric computers to respond to this growing unmet need. By engaging commercial partners and focusing efforts on specific demonstration applications, we will immediately demonstrate community value and dramatically increase the speed with which these novel systems can be deployed and scaled.
This system will immediately be available to users, especially a plurality of biomedical research groups who are generating a variety of '-omics' data for which access to the tools running on appropriate systems is limiting the accuracy and completeness of analysis and thus value. By propagating our quantitative findings via a number of channels, we will contribute to how biomedical data is analyzed, and establish a basis upon which future computational production facilities are designed to meet this specific community need.