基于多GPU的大型线性和非线性方程组的求解(2011年)

工程技术

论文

需积分: 10 11 浏览量 2021-05-22 08:00:55 上传评论收藏 536KB PDF 举报

资源推荐

资源详情

资源评论

Sept.2011 Transactions of Nanjing University of Aeronautics

Astronautics

No. 3

SOLVERS

FOR

SYSTEMS

LARGE

SPARSE

LINEAR

AND

NONLINEAR

EQUATIONS BASED ON

MULTI-GPUS

Liu

Sha

Zhong

Chengwen

•

, Chen

Xiaopeng

National Key Laboratory of

ience and Technology on Aerodynamic

sign and Research ,

Northwestern Polytechnical University, Xi' an , 710072 , P. R. China;

Center for High Performance Computing, Northwestern Polytechnical University , Xi' an , 710072 , P. R. China;

School of Mechanics, Civ

Engineering and Architecture , Northwestern

Polytechnical University

, Xi' an , 710072 , P. R. China)

Abstract: Numerical treatment of engineering application problems often eventually results in a solution of sys-

tems of linear or nonlinear equations. The solution process using digital computational devices usually takes

tremendous time due to the extremely large size encountered

most real-world engineering applications.

practical solvers for systems of linear and nonlinear equations based on multi graphic process units

(GPUs)

are

proposed in order to accelerate the solving process. In the linear and nonlinear solvers

, the preconditioned bi-con-

jugate gradient stable (PBi-CGstab) method and the Inexact Newton method are used to achieve the fast and sta-

ble convergence behavior. Multi-GPUs are utilized to obtain more data storage that large size problems need.

Key

words: general purpose graphic process unit

(GPGPU);

compute unified device architecture

(CUDA);

sys-

tem of linear equations; system of nonlinear equations; Inexact Newton method; bi-conjugate gradi-

ent stable (Bi-CGstab) method

CLC number: TP391 Document code:A Article

ID:

1005-1120(2011)03-0300-09

INTRODUCTION

Mathematica1

modeling

engineering

prob-

1ems

often

1eads

systems

1inear

non1inear

equations.

The

solution

such

resulting

equa-

tions

utilizing

numerica1

too1s

via

digita1

computa-

tiona1

devices

usually

very

time-consuming

because

most

rea1-world

engineering

app1ications

are

often

extremely

1arge

size

for

computation.

this

paper

with

the

Inexact

Newton

method

and

the

preconditioned

bi-conjugate

gradient

sta-

b1e

(PBi-CGstab)

method

linear

and

non1inear

solvers

based

muti

graphic

process

units

(GPUs)

are

proposed

for

1arge

sca1e

prob1ems.

Genera1

purpose

GPU

(GPGPU)

technique

denotes

the

imp1ementation

genera1

purpose

computing

using

programmab1e

GPUs[

I].

has

been

wide1y app1ied

many

computationa1

areas

owing

its

powerfu1

floating

calcu1a-

tion

abilities

and

wider

bandwidth

compared

with

the

traditiona1

CPU[2-3].

Furthermore

the

inher-

ent

sing1e-instruction-multip1e-data

(SIMD)

mechanism

for

GPGPU

operation

renders

this

technique

suitab1e

for

massive1y

10aded

ca1cu1a-

ns.

A seria1

GPU-based

linear

a1gebra

opera-

tions

were

proposed

Krüger

2003[4].

The

first

GPU-based

conjugate

gradie

solver

for

unstructured

matrices

was

proposed

aF5].

Buatois

proposed

genera1

sparse

lin-

ear

sol

ver

using

method

2 0 0

∞.

Cevahir

deve10ped

fast

based

GPUs

with

some

nove1

optimization

techniques[7].

These

arti-

cles

show

great

speedup

ratio

GPU

CPU.

early

work

Zhong

and

Liu

proposed

fast

solver

which

has

great

speedup

ratio

about

sing1e

GPU[8].

this

paper

multi-GPUs

are

used

obtain

data

storage

space

that

1arge

size

prob1ems

need.

the

case

1inear

solver

the

Bi-CGstab

method

can

afford

solve

the

sys-

Received date: 2010-10-13; revision received date: 2011-03-10

E-mail:virgilius@mai

nwpu.edu.cn

No.3

u Sha , et al.

Sol

vers

for

Systems

Large Sparse Linear

and

...

301

tem

linear

equations

with

non-symmetric

ma-

trix

which

cannot

solved

the

method.

better

convergence

the

method

is achieved

using

the

precondition

strateg

机

For

the

nonlinear

solver

Inexact

Newton

method

is utilized.

The

grid

generation

project

computational

fluid

dy-

namics is used

test

the

practicability

linear

and

nonlinear

solvers

which

systems

linear

and

nonlinear

equations

are

solved

order

ob-

tain

the

coordinates

grid

nodes.

COMPUTE

UNIFIED

DEVICE

ARCHITECTURE

The

compute

unified device

architecture

(CUDA)

∞

GPU

architecture

manufactured

NVIDIA.

CUDA

GPU

contains

number

SIMD

multiprocessors.

Each

multiuprocessor

contains

its

own

shared

memory

read-only

con-

stant

and

texture

caches

that

are

accessible

all

processors

the

mUltiprocesso

GPU

has

a de

vice

memory

which

is accessible

all

multipro-

cessors.

CUDA

GPU

devices

run

high

number

threads

paralle

Threads

are

grouped

together

thread

blocks.

Each

block

threads

execut-

the

same

multiprocessor

and

can

communi-

cate

through

fast

shared

memory.

Threads

different

blocks

can

communicate

only

through

device

memory.

Access

the

device

memory

very

slow

compared

with

the

shared

memory.

Device

memory

accesses

should

refrained

possible

and

these

accesses

should

coalesced

attain

high

performance.

Coalesc-

ing

possible

the

threads

access

consecutive

memory

addresses

4 , 8

bytes

and

the

base

address

for

such

a coalesced access

should

multiple

(half

warp[9])

times

size

the

aforementioned

memory

types

accessed

each

thread.

SYNCHRONIZATION

threads

CUDA

codes "

cutStartThread

and

"cutWaitForThreads"

based

Win64

API

can

also

used

for

simplification.

The

multi-GPUs

solver

works

this

pat-

tern:

Firstly

CPU

distributes

data

and

tasks

Then

set

barriers

managing

threads.

GPU

managing

thread

run

front

its

barrier

means

that

its

GPU

has

finished

computational

work

renewed

its

own

data

de-

vice

memory

along

with

its

host

memory

counter

part

and

now

, i t is wai

ting

for

the

needed

renewed

other

GPUs.

When

all

threads

have

run

front

their

barriers

the

barriers

are

released

then

, each

GPU

obtains

renewed

data

that

they

want

and

continues

work.

The

whole

process

consisting

setting

and

releasing

barriers

one

time

synchronization.

this

paper

the

semaphores

are

used

manage

the

synchronization

among

threads.

The

synchronization

process

shown

Fig.

De-

fine

array

semaphores

(Sem[GPU_NUM])

for

GPU

managing

threads

with

initial

value 0

and

activation

value

GPU

_NUM:

HANDLE

Sem[GPU_NUM];

Sem

[device_

num]

CreateSemaphore

(NULL

, 0 ,

GPU_NUM

NULL)

When

thread

reaches

barrier

its

semaphore

activated

adding

GPU_NUM

initial

value

ReleaseSemaphore

(Sem

[device_

num]

GPU

_NUM

NUL

Then

the

thread

waits

for

the

activation

semaphores

corresponding

other

threads:

WaitForMultipleObjiect

(GPU

_NUM

Sem

true

INFINITE)

The

first

parameter

this

function

the

number

semaphores.

The

second

parameter

Sem"

the

first

word

address

semaphores.

The

third

parameter

set

"true"

means

the

function

cannot

return

until

all

semaphores

are

Using

Win64

API

, a

thread

created

for

activated.

The

fourth

parameter

the

maximum

each

GPU

board

the

program.

Each

thread

waiting

time

which

set

"INFINITE"

in-

manages

the

data

input

and

output

GPU

and

sure

the

logic

correctness

multi-GPUs

solvers.

calls

the

GPU

kernel

functions

and

synchronizes

When

the

demand

this

function

is fulfilled ,

the

with

other

threads.

When

creating

and

ending

synchronization

process

is finished.

剩余8页未读，继续阅读

评论收藏

内容反馈

weixin_38656103

粉丝: 0
资源: 956

基于多GPU的大型线性和非线性方程组的求解 (2011年)

最新资源

基于多GPU的大型线性和非线性方程组的求解 (2011年)

用CUDA（显卡）解线性方程组

基于多GPU的大型线性和非线性方程组的求解(英文).pdf

基于GPU的蒙特卡罗方法求解线性代数方程

GPU并行加速线性方程组求解

GPU优化的大规模线性方程组并行求解的研究与比较.pdf

线性方程组求解器

高斯迭代法求解非线性方程组

大规模稀疏线性方程组的GMRES-GPU快速求解算法 (1).pdf

大规模稀疏线性方程组的GMRES-GPU快速求解算法

基于GPU的布尔代数方程组求解算法.pdf

基于CUDA的大规模线性稀疏方程组求解器的设计1

用GPU 加速求解线性方程组的高斯消元法

大规模稀疏线性方程组的GMRES-GPU快速求解算法.pdf

GPU加速完全异步通讯算法解大型线性方程组-完全异步通讯强化分布式投影方法求解大型线性方程组

Cuda求解线性方程组文档及代码

基于GPU加速的异步通信分布式投影方法求解大规模线性方程组

基于GPU高性能计算平台的Ricatti方程求解.pdf

求解Hermite方程组的GPU并行算法.pdf

基于GPU的FDTD麦克斯韦方程快速求解.pdf

基于GPU的微分方程数值求解算法.pdf

基于GPU求解椭圆型偏微分方程的并行算法.pdf

用GPU加速求解线性方程组的高斯消元法.pdf

拟三对角线性方程组的GPU混合并行求解算法

方程组的迭代法求解在GPU上的实现.pdf

基于CPU-GPU协同并行内点算法求解结构化非线性规划.pdf

基于matlab实现的经典算法gmres算法，用来求解大型矩阵方程问题 .rar

基于GPU的稀疏线性系统的预条件共轭梯度法.pdf

基于CPU GPU的非线性编辑技术.pdf

一个解非线性方程组的高阶迭代法及其局部收敛定理参考.pdf

最新资源