MUltifrontal Massively Parallel Solver
(MUMPS 5.1.1)
Users’ guide
∗
March 20, 2017
Abstract
This document describes the Fortran 95 and C user interfaces to MUMPS5.1.1. We describe in detail
the data structures, parameters, calling sequences, and error diagnostics. Basic example programs using
MUMPS are also provided.
First improvements regarding low-rank compression (Block-Low Rank, or BLR) format are now
available in public releases of MUMPS.
∗
Information on how to obtain updated copies of MUMPS can be obtained from the Web page http://mumps-solver.org/
1
Contents
1 Introduction 5
2 Notes for users of previous versions of MUMPS 6
2.1 ChangeLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Binary compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Upgrading between minor releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Upgrading from MUMPS 5.0.2 to MUMPS 5.1.1 . . . . . . . . . . . . . . . . . . . 8
2.4.1 Changes on installation issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Selective 64-bit integer feature . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Upgrading from MUMPS 4.10.0 to MUMPS 5.0.2 . . . . . . . . . . . . . . . . . . 9
2.5.1 Interface with the Metis and ParMetis orderings . . . . . . . . . . . . . . . . . 9
2.5.2 Interface with the SCOTCH and PT-SCOTCH orderings . . . . . . . . . . . . . 9
2.5.3 ICNTL(10): iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.4 ICNTL(11): error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.5 ICNTL(20): sparse right-hand sides . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.6 ICNTL(4): Control of the level of printing . . . . . . . . . . . . . . . . . . . . 10
2.5.7 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Main functionalities of MUMPS 5.1.1 10
3.1 Input matrix format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Post-processing facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Error analysis and statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Null pivot detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Computation of a solution of a deficient matrix and of a null space basis . . . . . . . . . 13
3.6 Solving the transposed system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 Forward elimination during factorization . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.8 Arithmetic versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.9 Numerical pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.10 The working host processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.11 MPI-free version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.12 Combining MPI and multithreaded parallelism . . . . . . . . . . . . . . . . . . . . . . 15
3.13 Out-of-core facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.14 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.15 Computing selected entries of A
−1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.16 Reduce/condense a problem on an interface (Schur complement and reduced/condensed
right-hand side) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.17 Block Low Rank (BLR) multifrontal factorization . . . . . . . . . . . . . . . . . . . . 17
4 User interface and available routines 18
5 Application Program Interface 21
5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Initialization, Analysis, Factorization, Solve, Termination (JOB) . . . . . . . . . 21
5.1.2 Version number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.3 Control of parallelism (COMM, PAR) . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Input Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.1 Matrix type (SYM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2 Matrix format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2.1 Centralized assembled matrix (ICNTL(5)=0 and ICNTL(18)=0). . 24
5.2.2.2 Distributed assembled matrix (ICNTL(5)=0 and ICNTL(18)=1,2,3). 25
5.2.2.3 Elemental matrix (ICNTL(5)=1 and ICNTL(18)=0). . . . . . . . . 26
5.2.3 Writing the input matrix to a file . . . . . . . . . . . . . . . . . . . . . . . . . 27
2
5.3 Preprocessing: permutation to zero-free diagonal and scaling . . . . . . . . . . . . . . . 27
5.3.1 Permutation to a zero-free diagonal (ICNTL(6)) . . . . . . . . . . . . . . . . 29
5.3.2 Scaling (ICNTL(6) or ICNTL(8)) . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Preprocessing: symmetric permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4.1 Symmetric permutation vector (ICNTL(7) and ICNTL(29)) . . . . . . . . . 31
5.4.2 Given ordering (ICNTL(7)=1 and ICNTL(28)=1) . . . . . . . . . . . . . . . 32
5.5 Post-processing: iterative refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.6 Post-processing: error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.7 Out-of-core (ICNTL(22)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.8 Workspace parameters (ICNTL(14) and ICNTL(23)) and user workspace . . . . . . 35
5.9 Null pivot row detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.10 Discard matrix factors (ICNTL(31)) . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.11 Computation of the determinant (ICNTL(33)) . . . . . . . . . . . . . . . . . . . . . 39
5.12 Forward elimination during factorization (ICNTL(32)) . . . . . . . . . . . . . . . . . 40
5.13 Right-hand side and solution vectors/matrices . . . . . . . . . . . . . . . . . . . . . . . 41
5.13.1 Dense right-hand side (ICNTL(20)=0) . . . . . . . . . . . . . . . . . . . . . 42
5.13.2 Sparse right-hand side (ICNTL(20)=1,2,3) . . . . . . . . . . . . . . . . . . . 42
5.13.3 A particular case of sparse right-hand side: computing entries of A
−1
(ICNTL(30)=1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.13.4 Centralized solution (ICNTL(21)=0) . . . . . . . . . . . . . . . . . . . . . . 44
5.13.5 Distributed solution (ICNTL(21)=1) . . . . . . . . . . . . . . . . . . . . . . 45
5.14 Schur complement with reduced or condensed right-hand side (ICNTL(19) and
ICNTL(26)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.14.1 Centralized Schur complement stored by rows (ICNTL(19)=1) . . . . . . . . 46
5.14.2 Distributed Schur complement (ICNTL(19)=2 or 3) . . . . . . . . . . . . . . 46
5.14.3 Centralized Schur complement stored by columns (ICNTL(19)=2 or 3) . . . . 48
5.14.4 Using partial factorization during solution phase (ICNTL(26)= 0, 1 or 2) . . . 49
5.15 Block Low Rank (BLR) factorization (ICNTL(35) and CNTL(7)) . . . . . . . . . 50
5.15.1 Enabling the BLR functionality at installation . . . . . . . . . . . . . . . . . . 50
5.15.2 Application Program Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.15.3 BLR output: statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Control parameters 53
6.1 Integer control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Real/complex control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Compatibility between options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Information parameters 72
7.1 Information local to each processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Information available on all processors . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8 Error diagnostics 77
9 Calling MUMPS from C 80
9.1 Array indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2 Issues related to the C and Fortran communicators . . . . . . . . . . . . . . . . . . . . 82
9.3 Fortran I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.4 Runtime libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.5 Integer, real and complex datatypes in C and Fortran . . . . . . . . . . . . . . . . . . . 83
9.6 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10 Scilab and MATLAB/Octave interfaces 83
11 Examples of use of MUMPS 85
11.1 An assembled problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.2 An elemental problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11.3 An example of calling MUMPS from C . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3
1 Introduction
MUMPS (“MUltifrontal Massively Parallel Solver”) is a package for solving systems of linear equations of
the form Ax = b, where A is a square sparse matrix that can be either unsymmetric, symmetric positive
definite, or general symmetric, on distributed memory computers. MUMPS implements a direct method
based on a multifrontal approach which performs a Gaussian factorization
A = LU (1)
where L is a lower triangular matrix and U an upper triangular matrix. If the matrix is symmetric then
the factorization
A = LDL
T
(2)
where D is block diagonal matrix with blocks of order 1 or 2 on the diagonal is performed. We refer the
reader to the papers [9, 10, 13, 28, 29, 33, 40, 31, 32, 18, 1, 45, 15, 41, 3, 35, 47, 19, 36, 44] for full details
of the techniques used, algorithms and related research.
The system Ax = b is solved in three main steps:
1. Analysis.
During analysis, preprocessing (see Subsection 3.2), including an ordering based on the
symmetrized pattern A + A
T
, and a symbolic factorization are performed. During the symbolic
factorization, a mapping of the multifrontal computational graph, the so called elimination tree
[38], is computed and used to estimate the number of operations and memory necessary for
factorization and solution. Both parallel and sequential implementations of the analysis phase are
available. Let A
pre
denote the preprocessed matrix (further defined in Subsection 3.2).
2. Factorization.
During factorization A
pre
= LU or A
pre
= LDL
T
, depending on the symmetry of the
preprocessed matrix, is computed. The original matrix is first distributed (or redistributed)
onto the processors depending on the mapping computed during the analysis. The numerical
factorization is then a sequence of dense factorization on so called frontal matrices. In addition to
standard threshold pivoting and two-by-two pivoting (not so standard in distributed memory codes)
there is an option to perform static pivoting. The elimination tree also expresses independency
between tasks and enables multiple fronts to be processed simultaneously. This approach is
called multifrontal approach. After factorization, the factor matrices are kept distributed (in-core
memory or on disk); they will be used at the solution phase.
3. Solution.
The solution x
pre
of LUx
pre
= b
pre
or LDL
T
x
pre
= b
pre
where x
pre
and b
pre
are
respectively the transformed solution x and right-hand side b associated to the preprocessed matrix
A
pre
, is obtained through a forward elimination step
Ly = b
pre
or LDy = b
pre
, (3)
followed by a backward elimination step
Ux
pre
= y or L
T
x
pre
= y . (4)
The solution x
pre
is finally postprocessed to obtain the solution x of the original system Ax = b,
where x is either assembled on an identified processor (the host) or kept distributed on the working
processors. Iterative refinement and backward error analysis are also postprocessing options of the
solution phase.
Each of these 3 phases can be called separately (see Subsection 5.1.1). A special case is the one
where the forward elimination step is performed during factorization (see Subsection 3.7), instead of
during the solve phase. This allows accessing the L factors right after they have been computed, with a
better locality, and can avoid writing the L factors to disk in an out-of-core context. In this case (forward
elimination during factorization), only the backward elimination is performed during the solution phase .
The software is mainly written in Fortran 95 although a C interface is available (see Section 9). Scilab
and MATLAB/Octave interfaces are also available in the case of sequential executions. The parallel
5
评论2