Future computing systems will allow computations to be performed simultaneously with massive parallelization. However, current practices in scientific simulations cannot utilize the maximum potential of these machines. This is fundamentally due to the need for data synchronization across computing cores, which may cause up to 80% idling of the machines, and there is a need for new methods to overcome this inefficiency.

This research is to develop a new framework for high performance computing where synchronization across the processing elements is relaxed. This will eliminate the overhead associated with extreme parallelism and potentially lay the foundation for simulations at scale. The framework is based on asynchronous model of computation for high performance computing to better utilize future systems. The price to pay for asynchrony is poor predictability of the code, resulting in uncertainty in the calculations. The central theme of this project is to accurately quantify this induced uncertainty and develop techniques to mitigate it. Specific research thrusts include: i) study of numerical stability, consistency and accuracy of widely used numerical algorithms, in the context of fluid-flow, under asynchronous conditions; ii) development of new schemes which can maintain its accuracy under asynchronous conditions; iii) determination of efficient implementations of the resulting algorithms on current and future systems. These research goals are addressed in a dynamical systems framework. The behavior of asynchronous numerical algorithms is modeled as Markov jump systems. Issues related to consistency, numerical stability and error control is addressed by using tools for analysis and design of Markov jump systems.

Project Report

The major goal of this project was analysis of numerical algorithms under asynchronous communication between processing elements. Specifically our goal was to: i) study numerical stability, consistency and accuracy of widely used numerical algorithms under asynchronous conditions? ii) explore new schemes which can maintain its accuracy under asynchronous conditions? iii) validate these ideas on current and future large scale computing systems. These research goals were to be addressed in a dynamical systems framework. Issues related to consistency, numerical stability and error control were addressed by using tools for analysis of Markov jump systems. The project had the following main thrusts: 1) Performance and robustness analysis of stochastic jump linear systems. This work was focused on general stochastic jump systems that are applicable to asynchronous algorithms. In this work, we focused on the performance and robustness analysis of stochastic jump linear systems. In the presence of stochastic jumps, state variables evolve as random process, with associated time varying probability density functions. Consequently, system analysis is performed at the density level and a proper metric is necessary to quantify the system performance. In this paper, Wasserstein metric that measures a distance between probability density functions is employed to develop new results for the performance analysis of stochastic jump linear systems. We present a novel ``SplitandMerge" algorithm for propagation of state uncertainty in such systems, providing a unifying framework for the performance and robustness analysis of general stochastic jump linear systems. 2) Optimal Switching Synthesis of Jump Linear Systems. In this work a new method to design an optimal switching sequence for jump linear systems with given Gaussian initial state uncertainty was developed. In the context of this project, we can think of optimal switching sequence synthesis to be randomized scheduling that has beneficial effects. In this study, we considered two algorithms with different computational complexity, and determined a scheduling policy, or probabilities of execution for each algorithm. It was shown that it is possible to nearly achieve the performance of the high fidelity algorithm, by executing both the algorithms aperiodically. The synthesis was performed in a receding horizon control framework. The stability of the numerical algorithm with randomized scheduling was modeled as a stochastic jump linear system and the performance metric was defined over the probability density function of the system state variables. 3) A Switched System Framework for Analysis of Asynchronous High Performance Computing Algorithm. In this work we study massively parallel solution of 1D heat equation with relaxed communication between processing elements. This results in asynchronous execution. We treat the computation in each processing element as a dynamical system and model the asynchrony as stochastic jumps. The overall system is then analyzed as a switched dynamical system. However, modeling of massively parallel numerical algorithms as jump dynamical systems, results in large number of modes, which makes current analysis tools computationally intractable. In this work, we also present new techniques that circumvent this scalability issue. The proposed analysis framework is verified by solving the partial differential equation in a NVIDIA Tesla GPU machine, with asynchronous communication between cores. Key accomplishments in this are: i. stability analysis or quantification of steady state error with respect to the synchronous solution, ii. convergence rate analysis of the expected value of this error iii. probabilistic bound on this error. 4) Numerical properties of asynchronous PDEs solvers: In this work we have analyzed the overall stability, consistency and accuracy of commonly used finite difference schemes. We found that, while schemes are stable and consistent under asynchrony, their accuracy is significantly affected. To take into account the stochastic nature of the method, we have developed a statistical framework to analyze errors both theoretically as well from numerical experiments. We see that the overall accuracy of the solution is reduced to first order under asynchrony, irrespective of the accuracy of the original schemes under synchronous conditions. We were able to understand the source of the error due to time delays in communication. Using this information we have designed a novel algorithm to obtain asynchrony tolerant finite difference schemes. 5) Implementation of asynchronous method for massively parallel PDE applications: In order to fully exploit the asynchrony tolerant schemes we have developed a computing algorithm that relaxes synchronization processes at several levels. We utilize the latest remote memory access (RMA) programming schemes for communications, which have been shown to provide significant speedup on modern supercomputers. As the processing elements do not synchronize messages, it is important to ensure atomicity of the values in the communication windows and determine the status of the messages. To address these issues, we have designed a "ringbuffer" memory scheme to access the communication windows. An implementation of this algorithm for a simple problem was analyzed. Results show that asynchronous computations provide a significant speedup that scales almost linearly on parallel machines at realistic scales.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1349017
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2013-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2013
Total Cost
$10,000
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332