High-performance computing (HPC) providers and applications need next-generation solutions to process big data from scientific simulations. Conventional HPC systems found in national laboratories and universities are constructed based on the compute-centric paradigm while enterprise big data analytics applications prefer a data-centric paradigm such as MapReduce. Distinct architectural differences between these two paradigms demand unconventional approaches. This project takes a radically different approach to investigate key architectural components in compute-centric and data-centric paradigms, designs a transformative dual-purpose framework called Tadoop that addresses their bipolarity issues in storage and communication management, and unifies them for both HPC and enterprise analytics applications.
This high-risk Tadoop framework can enable a transformative data infrastructure for both HPC and data analytics applications and lead to broader impact in several aspects, such as demonstrating the transformation of existing HPC infrastructures into dual-purpose systems for computing and analytics, improving computer science curricula and instruction effectiveness, strengthening multidisciplinary data analytics research, releasing open-source software code, and transferring technologies for commercial service.