并行计算稀疏线性系统求解资源-CSDN文库

共3个文件

pdf：3个

需积分: 5 35 浏览量 2022-03-07 01:26:22 上传评论收藏 1.49MB 7Z 举报

并行计算在现代科学计算和工程应用中扮演着至关重要的角色，特别是在解决大规模的稀疏线性系统问题时。稀疏线性系统是指线性方程组中的大部分元素为零，这种特性使得处理这类问题时可以大幅减少计算量，提高效率。在本主题中，我们将深入探讨并行计算在稀疏线性系统求解中的应用。一、稀疏矩阵的表示与存储 1. 稀疏矩阵的存储通常采用压缩存储方式，如三元组(Triple)格式、压缩行存储(CRS：Compressed Row Storage)或压缩列存储(CCS：Compressed Column Storage)。这些方法能够有效地节省存储空间，降低内存占用，尤其是在处理大型稀疏矩阵时。 2. 对于并行计算，CRS（压缩行存储）更为常用，因为它在处理行操作时具有更好的性能，适合于分布式内存系统中的并行计算。二、并行计算策略 1. 分布式内存并行：在这种策略中，稀疏矩阵被分割成多个块，每个处理节点负责处理一部分子问题。例如，使用MPI (Message Passing Interface) 进行进程间通信，协同求解整个系统。 2. 共享内存并行：多线程可以在同一台机器上共享内存资源，通过OpenMP等库实现线程间的同步和通信，对矩阵的不同部分进行并行计算。 3. GPU加速：图形处理器（GPU）拥有大量的计算核心，适合执行并行计算。CUDA或OpenACC等编程模型可以利用GPU的并行能力，实现高效求解稀疏线性系统。三、并行算法 1. 直接法：如LU分解、Cholesky分解或QR分解。这些方法将线性系统转换为更容易求解的形式。并行版本通常涉及分块处理，以实现矩阵分解的并行化。 2. 迭代法：包括CG（Conjugate Gradient）、GMRES（Generalized Minimum Residual）、BiCGStab（Biconjugate Gradient Stabilized）等。这些方法迭代求解，适合大规模问题，因为它们不需要完整的矩阵乘法。并行策略可以并行化迭代过程、预处理步骤或者部分矩阵运算。四、并行效率优化 1. 负载均衡：确保所有计算节点的工作负载尽可能均匀，避免某些节点过早完成而其他节点还在忙碌，影响整体效率。 2. 局部性优化：利用数据局部性原理，尽量减少不必要的数据传输，提高缓存命中率。 3. 并行度调整：根据硬件资源和问题规模，合理选择并行度，避免过度并行导致的通信开销过大。五、应用领域 1. 工程模拟：如流体力学、电磁学、结构力学等领域的大规模数值模拟。 2. 数据科学：机器学习、深度学习中的优化问题，往往涉及到稀疏矩阵的求解。 3. 互联网技术：推荐系统、搜索引擎排名等场景，也常常遇到稀疏线性系统的求解问题。 "并行计算稀疏线性系统求解"是一个涉及多个层次的复杂主题，涵盖了稀疏矩阵的存储、并行计算策略、并行算法的选择以及性能优化等多个方面。理解和掌握这些知识，对于解决实际问题和提升计算效率具有重要意义。

资源详情

资源评论

资源推荐

收起资源包目录

sparseLinearSystemSolverAndLU.7z （3个子文件）

sparseLinearSystemSolverAndLU

PARALLEL DIRECT METHODS FOR SPARSE LINEAR SYSTEMS by Michael T Heath.pdf 224KB

Recursive approach in sparse matrix LU 569670.pdf 1.5MB

Recursive Approach in Sparse Matrix LU Factorization 2001-recursive.pdf 197KB

Recursive approach in sparse matrix LU

factorization

Jack Dongarra, Victor Eijkhout and

Piotr Łuszczek

∗

University of Tennessee, Department of Computer

Science, Knoxville, TN 37996-3450, USA

Tel.: +865 974 8295; Fax: +865 974 8296

This paper describes a recursive method for the LU factoriza-

tion of sparse matrices. The recursive formulation of com-

mon linear algebra codes has been proven very successful in

dense matrix computations. An extension of the recursive

technique for sparse matrices is presented. Performance re-

sults given here show that the recursive approach may per-

form comparable to leading software packages for sparse ma-

trix factorization in terms of execution time, memory usage,

and error estimates of the solution.

1. Introduction

Typically, a system of linear equations has the form:

Ax = b, (1)

where A is n by n real matrix (A ∈ R

n×n

), and x

and b are n-dimensional real vectors (b, x ∈ R

). The

values of A and b are known and the task is to ﬁnd

x satisfying Eq. (1). In this paper, it is assumed that

the matrix A is large (of order commonly exceeding

ten thousand) and sparse (there are enough zero entries

in A that it is beneﬁcial to use special computational

methods to factor the matrix rather than to use a dense

code). There are two common approaches that are used

to deal with such a case, namely, iterative [33] and

direct methods [17].

Iterative methods, in particular Krylov subspace

techniques such as the Conjugate Gradient algorithm,

are the methods of choice for the discretizations of el-

liptic or parabolic partial differential equations where

∗

Corresponding author: Piotr Luszczek, Department of Computer

Science, 1122 Volunteer Blvd., Suite 203, Knoxville, TN 37996-

3450, USA. Tel.: +865 974 8295; Fax: +865 974 8296; E-mail:

luszczek@cs.utk.edu.

the resulting matrix is often guaranteed to be positive

deﬁnite or close to it. However, when the linear sys-

tem matrix is strongly unsymmetric or indeﬁnite, as

is the case with matrices originating from systems of

ordinary differential equations or the indeﬁnite matri-

ces arising from shift-invert techniques in eigenvalue

methods, one has to revert to direct methods which are

the focus of this paper.

In direct methods, Gaussian elimination with partial

pivoting is performedto ﬁnd a solution of Eq. (1). Most

commonly, the factored form of A is given by means

of matrices L, U, P and Q such that:

LU = PAQ, (2)

where:

– L is a lower triangular matrix with unitary diago-

nal,

– U is an upper triangular matrix with arbitrary di-

agonal,

– P and Q are row and column permutation matri-

ces, respectively (each row and column of these

matrices contains single a non-zero entry which is

1, and the following holds: PP

= QQ

= I,

with I being the identity matrix).

A simple transformation of Eq. (1) yields:

(PAQ)Q

−1

x = Pb, (3)

which in turn, after applying Eq. (2), gives:

LU(Q

−1

x)=Pb, (4)

SolutiontoEq.(1)maynowbeobtainedintwosteps:

Ly = Pb (5)

U(Q

−1

x)=y (6)

and these steps are performed through forward/back-

ward substitution since the matrices involved are trian-

gular. The most computationally intensive part of solv-

ing Eq. (1) is the LU factorization deﬁned by Eq. (2).

This operation has computational complexity of or-

der O(n

) when A is a dense matrix, as compared

to O(n

) for the solution phase. Therefore, optimiza-

Scientiﬁc Programming 9 (2001) 51–60

ISSN 1058-9244 / $8.00

52 J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

tion of the factorization is the main determinant of the

overall performance.

When both of the matrices P and Q of Eq. (2) are

non-trivial, i.e. neither of them is an identity matrix,

then the factorization is said to be using complete piv-

oting. In practice, however, Q is an identity matrix and

this strategy is called partial pivoting which tends to be

sufﬁcient to retain numerical stability of the factoriza-

tion, unless the matrix A is singular or nearly so. Mod-

erate values of the condition number κ = A

−1

·A

guarantee a success for a direct method as opposed to

matrix structure and spectrum considerations required

for iterative methods.

When the matrix A is sparse, i.e. enough of its entries

are zeros, it is important for the factorization process

to operate solely on the non-zero entries of the matrix.

However, new nonzero entries are introduced in the

L and U factors which are not present in the original

matrix A of Eq. (2). The new entries are referred to

as ﬁll-in and cause the number of non-zero entries in

the factors (we use the notation η(A) for the number

of nonzeros in a matrix) to be (almost always) greater

than that of the original matrix A: η(L + U )  η(A).

The amount of ﬁll-in can be controlled with the ma-

trix ordering performed prior to the factorization and

consequently, for the sparse case, both of the matrices

P and Q of Eq. (2) are non-trivial. Matrix Q induces

the column reordering that minimizes ﬁll-in and P per-

mutes rows so that pivots selected during the Gaussian

elimination guarantee numerical stability.

Recursion started playing an important role in ap-

plied numerical linear algebra with the introduction

of Strassen’s algorithm [6,31,36] which reduced the

complexity of the matrix-matrix multiply operation

from O(n

) to O(n

log

). Later on it was recognized

that factorization codes may also be formulated recur-

sively [3,4,21,25,27] and codes formulated this way

perform better [38] than leading linear algebra pack-

ages [2] which apply only a blocking technique to in-

crease performance. Unfortunately, the recursive ap-

proach cannot be applied directly for sparse matrices

because the sparsity pattern of a matrix has to be taken

into account in order to reduce both the storage require-

ments and the ﬂoating point operation count, which are

the determining factors of the performance of a sparse

code.

2. Dense recursive LU factorization

Figure 1 shows the classical LU factorization code

which uses Gaussian elimination. Rearrangement of

Fig. 1. Iterative LU factorization function of a dense matrix A.Itis

equivalent to LAPACK’s xGETRF() function and is performed using

Gaussian elimination (without a pivoting clause).

the loops and introduction of blocking techniques can

signiﬁcantly increase the performance of this code [2,

9]. However, the recursive formulation of the Gaussian

elimination shown in Fig. 2 exhibits superior perfor-

mance [25]. It does not contain any looping statements

and most of the ﬂoating point operations are performed

by the Level 3 BLAS [14] routines: xTRSM() and

xGEMM(). These routines achieve near-peak MFLOP/s

rates on modern computers with a deep memory hierar-

chy. They are incorporated in many vendor-optimized

libraries, and they are used in the Atlas project [16]

which automatically generates implementations tuned

to speciﬁc platforms.

Yet another implementation of the recursive algo-

rithm is shown in Fig. 3, this time without pivoting

code. Experiments show that this code performs equal-

ly well as the code from Fig. 2. The experiments also

provide indications that further performance improve-

ments are possible, if the matrix is stored recursive-

ly [26]. Such a storage scheme is illustrated in Fig. 4.

This scheme causes the dense submatricesto be aligned

recursively in memory. The recursive algorithm from

Fig. 3 then traverses the recursive matrix structure all

the way down to the level of a single dense submatrix.

At this point an appropriate computational routine is

called (either BLAS or xGETRF()). Depending on the

size of the submatrices (referred to as a block size [2]),

it is possible to achieve higher execution rates than for

the case when the matrix is stored in the column-major

or row-major order. This observation made us adopt

the code from Fig. 3 as the base for the sparse recursive

algorithm presented below.

3. Sparse matrix factorization

Matrices originating from the Finite Element Me-

thod [35], or most other discretizations of Partial Dif-

ferential Equations, have most of their entries equal to

J. Dongarra et al. / Recursive approach in sparse matrix LU factorization 53

Fig. 2. Recursive LU factorization function of a dense matrix A equivalent to the LAPACK’s xGETRF() function with a partial pivoting code.

zero. During factorization of such matrices it pays off

to take advantageof the sparsity pattern for a signiﬁcant

reductionin the number of ﬂoating point operationsand

executiontime. The major issue of the sparse factoriza-

tion is the aforementioned ﬁll-in phenomenon. It turns

out that the proper ordering of the matrix, represent-

ed by the matrices P and Q, may reduce the amount

of ﬁll-in. However, the search for the optimal order-

ing is an NP-complete problem [39]. Therefore, many

heuristics have been devised to ﬁnd an ordering which

approximates the optimal one. These heuristics range

from the divide and conquer approaches such as Nest-

ed Dissection [22,29] to the greedy schemes such as

Minimum Degree [1,37]. For certain types of matrices,

bandwidth and proﬁle reducing orderings such as Re-

verseCuthill-McKee[8,23]and the Sloan ordering[34]

may perform well.

Once the amount of ﬁll-in is minimized through

the appropriate ordering, it is still desirable to use the

optimized BLAS to perform the ﬂoating point opera-

tions. This poses a problem since the sparse matrix

coefﬁcients are usually stored in a form that is not

suitable for BLAS. There exist two major approach-

es that efﬁciently cope with this, namely the multi-

frontal [20] and supernodal [5] methods. The Super-

LU package [28] is an example of a supernodal code,

whereas UMFPACK [11,12] is a multifrontal one.

Factorization algorithms for sparse matrices typical-

ly include the following phases, which sometimes are

intertwined:

54 J. Dongarra et al. / Recursive approach in sparse matrix LU factorization

Fig. 3. Recursive LU factorization function used for sparse matrices (no pivoting is performed).

– matrix ordering to reduce ﬁll-in,

– symbolic factorization,

– search for dense submatrices,

– numerical factorization.

The ﬁrst phase is aimed at reducing the aforemen-

tioned amount of ﬁll-in. Also, it may be used to im-

prove the numerical stability of the factorization (it is

then referred to as a static pivoting [18]). In our code,

this phase servesboth of these purposes, whereas in Su-

perLU and UMFPACK the pivoting is performed only

during the factorization. The actual pivoting strategy

being used in theses packages is called a threshold piv-

oting: the pivot is not necessarily the largest in abso-

lute value in the current column (which is the case in

the dense codes) but instead, it is just large enough to

preserve numerical stability. This makes the pivoting

much more efﬁcient, especially with the complex data

structures involved in sparse factorization.

The next phase ﬁnds the ﬁll-in and allocates the re-

quired storage space. This process can be performed

solely based on the matrix sparsity pattern information

without considering matrix values. Substantial per-

formance improvements are obtained in this phase if

graph-theoretic concepts such as elimination trees and

elimination dags [24] are efﬁciently utilized.

The last two phases are usually performed jointly.

Theyaim at executingthe required ﬂoating point opera-

tions at the highest rate possible. This may be achieved

in a portable fashion through the use of BLAS. Super-

LU uses supernodes, i.e. sets of columns of a similar

sparsity structure, to call the Level 2 BLAS. Memory

bandwidth is the limiting factor of the Level 2 BLAS,

so, to reuse the data in cache and consequently improve

the performance, the BLAS calls are reorganizedyield-

ing the so-called Level 2.5 BLAS technique [13,28].

UMFPACK uses frontal matrices that are formed dur-

评论收藏

内容反馈

Eloudy

粉丝: 1439
资源: 40

并行计算稀疏线性系统求解

评论0

最新资源

并行计算稀疏线性系统求解

评论0

稀疏估计/压缩感知线性系统求解器：为欠定系统 Ax=y 找到一个稀疏解 x，最多有 2 个非零元素-matlab开发

用MKL求解稀疏线性系统.doc

具有多网格方法的稀疏线性系统的并行求解器。-C/C++开发

基于分块存储格式的稀疏线性系统求解优化

C 代码 进行稀疏的直接求解 线性系统.rar

用代数多重网格法求解大型稀疏线性系统的c++库- ddemidov/amgcl

几种稀疏线性系统求解问题的算法探究1

C 代码 进行稀疏的直接求解 线性系统，蒂莫西戴维斯.rar

线性方程组求解器MUMPS 5.4.1并行计算接口与功能详解

基于CUDA的大规模线性稀疏方程组求解器的设计1

线性方程组求解器

大规模稀疏线性方程组的GMRES-GPU快速求解算法 (1).pdf

C 代码 实现共轭梯度 （CG） 方法求解 一个正定稀疏线性系统 A x x=b.rar

C 代码 说明如何使用 superlu（）， 采用快速直接求解法求解 稀疏线性系统.rar

大型稀疏线形方程组的快速求解

并行计算++结构·算法·编程_陈国良

C 代码 实现共轭梯度 （CG） 方法求解 一个正定稀疏线性系统 Axx=b.rar

CSparse.NET:使用直接方法求解稀疏线性系统的简洁库

线性系统求解中迭代算法的GPU加速方法.pdf

C 代码 应用重新启动的广义最小残差 （GMRES） 算法 求解稀疏线性系统.rar

hypre：具有多网格方法的稀疏线性系统的并行求解器

基于GPU的稀疏线性系统的预条件共轭梯度法.pdf

共轭梯度法求解偏微分方程MPI并行的c++实现

雅可比迭代式并行计算稀疏方程_并行计算_雅可比迭代式并行计算稀疏方程_

基于GPU的车辆-轨道-地基土耦合系统3D随机振动并行计算方法.pdf

解稀疏矩阵

Intel Fortran使用MKL函数库中的PARDISO求解稀疏矩阵.rar

Matlab的图形处理器并行计算及其在拓扑优化中的应用.pdf

[并行计算——结构·算法·编程].陈国良.文字版

最新资源

C 代码进行稀疏的直接求解线性系统.rar

C 代码进行稀疏的直接求解线性系统，蒂莫西戴维斯.rar

C 代码实现共轭梯度（CG）方法求解一个正定稀疏线性系统 A x x=b.rar

C 代码说明如何使用 superlu（），采用快速直接求解法求解稀疏线性系统.rar

C 代码实现共轭梯度（CG）方法求解一个正定稀疏线性系统 Axx=b.rar

C 代码应用重新启动的广义最小残差（GMRES）算法求解稀疏线性系统.rar