We will explore in this EAGER proposal new methodologies for matching algorithms and code to architecture, allowing the efficient execution of large complex workflows using an ensemble of available computing architectures ranging from distributed memory multi-core supercomputers to graphic processing units and high performance data intensive computing devices. In this exploratory work we will focus on the matching of computer architectures to computational needs and a tuning of the methodology of Uncertainty Quantification to the architectures at hand. We will first instrument our suite of tools to record computational, memory, data and energy usage. This data will then be analyzed and modeled to create guidelines for prediction of effective and efficient usage. The facilities available to us at the University at Buffalo, Center for Computation Research (CCR) and on the xD framework of resource providers will allow us to conduct this investigation.
Intellectual Merit The project focused on developing new methods for optimizing computational and data analytics workflows for analysis of the outcomes of large-scale simulations. Such complex inference from simulation ensembles used in Uncertainty Quantification leads to twin computational challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. The problem of dealing with large data gets compounded when data warehousing and data mining are intertwined with computationally expensive tasks. We developed here an approach to solving this problem by using a mix of hardware suitable for each task in a carefully orchestrated workow. In particular, we focused on the problem of estimating risk from volcanic mass flow hazards using large ensembles of simulations (O(2000) simulations) each of which generates greater than 2GB of data. The sample computing environment is essentially an integration of Netezza database and high performance cluster. It is based on the simple idea of segregating the data intensive and compute intensive tasks and assigning the right architecture for them. The computing model and the new computational scheme were adopted to generate probabilistic hazard maps in time that was order of magnitude faster than earlier practices – from a week to a day. We also worked with cheaper more readily available hardware platforms wherein the savings were still significant. Broader Impacts While the volcanic hazard analysis tools directly developed in this effort s will help emergency managers the generated methods for uncertainty quantification can benefit many other types of hazards from floods to hurricanes and even failure analysis calculations. During the project we supported 3 graduates students including one female student in the course of their doctoral studies. Work was disseminated not only at major national meetings but also through customized training to students and post-doctoral fellows. During the project we also hosted students from France, Columbia, Japan and China.