Static Deadlock Detection in MPI Synchronization Communication
Liao Ming-Xue, He Xiao-Xin, Fan Zhi-Hua
Institute of Software, the Chinese Academy of Sciences, 100080, China
liaomingxue@sohu.com
Abstract
I
t is very common to use dynamic methods to detect
deadlocks in MPI programs for the reason that static
methods have some restrictions. To guarantee high reliability
of some important MPI-based application software, a model
of MPI synchronization communication is abstracted and a
type of static method is devised to examine deadlocks in such
modes. The model has three forms with different complexity:
sequential model, single-loop model and nested-loop model.
Sequential model is a base for all models. Single-loop model
must be treated with a special type of equation group and
nested-loop model extends the methods for the other two
models. A standard Java-based software framework
originated from these methods is constructed for determining
whether MPI programs are free from synchronization
communication deadlocks. Our practice shows the software
framework is better than those tools using dynamic methods
because it can dig out all synchronization communication
deadlocks before an MPI-based program goes into running.
Keywords: Message Passing Interface (MPI);
deadlock; static method; model; synchronization
communication
1. Introduction
Deadlock is a very common problem in software
designing and it may cause a running software
program to break down. Deadlock in big application
may even result into great loss, for example, the
deadlock happened in the control software on NASA’s
Pathfinder landed on Mars
0. In 1971 Coffman
addressed three strategies to process deadlocks:
deadlock prevention, deadlock avoidance and deadlock
detection and recovery
0. Deadlock prevention and
avoidance have many deficiencies so that they are
often used in systems which require high reliability
000. MPI 0 is a library specification for message-
passing, proposed as a standard by a broadly based
committee of vendors and users. It was designed for
high performance on both massively parallel machines
and on workstation clusters, however, it is very
difficult to debug software programs based on it
00.
Currently there are a few tools based on dynamic
methods to debug errors in MPI programs, especially
to detect deadlocks in them
0000. Both 0 and 0 need to
insert some hand-shake codes into user’s source
programs to gather status of nodes to determine a
deadlock. W. Haque utilizes MPI Profiling interface to
intercept all MPI routine calls in order to check
deadlocks
0. 0 is to find kinds of errors in MPI
programs and uses MPI Profiling interface too,
however, its interest covers not only deadlocks but also
other kinds of errors.
However, these dynamic deadlock detection
methods have a deadly deficiency that we can do
nothing but suffering oncoming disaster when the
deadlock happens. Systems requiring high reliability
can not suffer this deficiency. For example, if an on-
satellite cluster for monitoring rural flood or crops
breaks down from a deadlock the life and economic
loss will be innumerous. Therefore static methods are
necessary to be developed to serve in such
environments.
This paper introduces a static method to detect
deadlocks in MPI synchronization communication.
This static method is totally different from a static
method in
0 which is based on techniques of searching
finite state machine.
The second section defines a model of MPI
synchronization communication. The model takes
three different forms which are sequential model (S-
Model), single loop model (L0) and nested loop model
(L2). S-Model is the most basic models. We need to
transform L0 and L2 into S-Model at appropriate time
in order to detect deadlocks in them. To detect
deadlocks in L0 involves a special type of equation
group called ratio equation group. Algorithm for L2
deadlock detection is a combination of methods for L0
and S-Model.
Section 3 demonstrates how to examine deadlocks
in a sequential model and section 4 and 5 follows to
discuss L0 and L2. Section 6 gives an overview of our
software framework for determining whether MPI
programs are free from synchronization
communication deadlocks and concludes this paper.