This project is motivated by successful deployment of eScience applications on clouds: how to deploy HPC analytics applications on the cloud? Both eScience applications and HPC analytics applications manipulate with tera-scale or peta-scale data and require access to expensive computing resources. However, HPC analytics applications bear several distinct characteristics such as complex data access patterns and interest locality, which pose new challenges to its adoption in clouds.
The goal of this project is to develop a data semantics aware framework to enable HPC analytics at clouds. Such a framework is composed of three components; 1) a MapReduce API with data semantics awareness used to develop high-performance analysis applications, 2) a translation layer equipped with data-semantics aware HPC interfaces, and 3) a data-affinity-aware data placement scheme. It is anticipated that high productivity on the economic impact is significantly improved through the cost-effective scientific data processing. Delivering an open source software to the community speeds up the 21st century scientific discovery process in any HPC analytics areas such as cosmology, astrophysics, chromodynamics, bioinformatics, etc. Numerous educational benefits are expected to be generated from collaborative effort with several UCF educational projects and external collaboration and community ties through the integration into the FutureGrid, and scientific computing cloud at Department of Energy.