52 J. Dongarra et al. / Recursive approach in sparse matrix LU factorization
tion of the factorization is the main determinant of the
overall performance.
When both of the matrices P and Q of Eq. (2) are
non-trivial, i.e. neither of them is an identity matrix,
then the factorization is said to be using complete piv-
oting. In practice, however, Q is an identity matrix and
this strategy is called partial pivoting which tends to be
sufficient to retain numerical stability of the factoriza-
tion, unless the matrix A is singular or nearly so. Mod-
erate values of the condition number κ = A
−1
·A
guarantee a success for a direct method as opposed to
matrix structure and spectrum considerations required
for iterative methods.
When the matrix A is sparse, i.e. enough of its entries
are zeros, it is important for the factorization process
to operate solely on the non-zero entries of the matrix.
However, new nonzero entries are introduced in the
L and U factors which are not present in the original
matrix A of Eq. (2). The new entries are referred to
as fill-in and cause the number of non-zero entries in
the factors (we use the notation η(A) for the number
of nonzeros in a matrix) to be (almost always) greater
than that of the original matrix A: η(L + U ) η(A).
The amount of fill-in can be controlled with the ma-
trix ordering performed prior to the factorization and
consequently, for the sparse case, both of the matrices
P and Q of Eq. (2) are non-trivial. Matrix Q induces
the column reordering that minimizes fill-in and P per-
mutes rows so that pivots selected during the Gaussian
elimination guarantee numerical stability.
Recursion started playing an important role in ap-
plied numerical linear algebra with the introduction
of Strassen’s algorithm [6,31,36] which reduced the
complexity of the matrix-matrix multiply operation
from O(n
3
) to O(n
log
2
7
). Later on it was recognized
that factorization codes may also be formulated recur-
sively [3,4,21,25,27] and codes formulated this way
perform better [38] than leading linear algebra pack-
ages [2] which apply only a blocking technique to in-
crease performance. Unfortunately, the recursive ap-
proach cannot be applied directly for sparse matrices
because the sparsity pattern of a matrix has to be taken
into account in order to reduce both the storage require-
ments and the floating point operation count, which are
the determining factors of the performance of a sparse
code.
2. Dense recursive LU factorization
Figure 1 shows the classical LU factorization code
which uses Gaussian elimination. Rearrangement of
Fig. 1. Iterative LU factorization function of a dense matrix A.Itis
equivalent to LAPACK’s xGETRF() function and is performed using
Gaussian elimination (without a pivoting clause).
the loops and introduction of blocking techniques can
significantly increase the performance of this code [2,
9]. However, the recursive formulation of the Gaussian
elimination shown in Fig. 2 exhibits superior perfor-
mance [25]. It does not contain any looping statements
and most of the floating point operations are performed
by the Level 3 BLAS [14] routines: xTRSM() and
xGEMM(). These routines achieve near-peak MFLOP/s
rates on modern computers with a deep memory hierar-
chy. They are incorporated in many vendor-optimized
libraries, and they are used in the Atlas project [16]
which automatically generates implementations tuned
to specific platforms.
Yet another implementation of the recursive algo-
rithm is shown in Fig. 3, this time without pivoting
code. Experiments show that this code performs equal-
ly well as the code from Fig. 2. The experiments also
provide indications that further performance improve-
ments are possible, if the matrix is stored recursive-
ly [26]. Such a storage scheme is illustrated in Fig. 4.
This scheme causes the dense submatricesto be aligned
recursively in memory. The recursive algorithm from
Fig. 3 then traverses the recursive matrix structure all
the way down to the level of a single dense submatrix.
At this point an appropriate computational routine is
called (either BLAS or xGETRF()). Depending on the
size of the submatrices (referred to as a block size [2]),
it is possible to achieve higher execution rates than for
the case when the matrix is stored in the column-major
or row-major order. This observation made us adopt
the code from Fig. 3 as the base for the sparse recursive
algorithm presented below.
3. Sparse matrix factorization
Matrices originating from the Finite Element Me-
thod [35], or most other discretizations of Partial Dif-
ferential Equations, have most of their entries equal to
评论0