Debugging MPI and Hybrid-Heterogenous Applications at Scale
Date: Sunday, November 11, 2012, 08:30 AM - 05:00 PM
Description: MPI programming is error prone due to the complexity of MPI semantics and the difficulties of parallel programming. Difficulties are exacerbated by increasing heterogeneity (e.g., MPI plus OpenMP/CUDA) the scale of parallelism, non-determinism, and platform dependent bugs. This tutorial covers the detection/correction of errors in MPI programs as well as heterogeneous/hybrid programs. We will first introduce our main tools: MUST, that detects MPI usage errors at runtime with a high degree of automation; ISP/DAMPI, that detects interleaving-dependent MPI deadlocks through application replay; and DDT, a parallel debugger that can debug at large scale. We will illustrate advanced MPI debugging using an example modeling heat conduction. Attendees will be encouraged to explore our tools early during the tutorial to better appreciate their strengths/limitations. We will also present best practices and a cohesive workflow for thorough application debugging with all our tools. Leadership scale systems increasingly require hybrid/heterogeneous programming models -- e.g., Titan (ORNL) and Sequoia (LLNL). To address this, we will present debugging approaches for MPI, OpenMP, and CUDA in a dedicated part of afternoon session. DDT’s capabilities for CUDA/OpenMP debugging will be presented, in addition to touching on the highlights of GKLEE -- a new symbolic verifier for CUDA applications.
Links: Official link from SC12