ParallelShortestPathAlgorithms：最短路径并行算法.pdf资源-CSDN文库

版权申诉

文档资料

51 浏览量 2022-07-11 00:01:59 上传评论收藏 861KB PDF 举报

资源详情

资源评论

Parallel Shortest Path Algorithms

for Solving Large-Scale Instances

Kamesh Madduri

∗

Georgia Institute of Technology

kamesh@cc.gatech.edu

David A. Bader

Georgia Institute of Technology

bader@cc.gatech.edu

Jonathan W. Berry

Sandia Nationa l Laboratories

jberry@sandia.gov

Joseph R. Crobak

Rutgers Universi ty

crobakj@cs.rutgers.edu

November 8, 2006

Abstract

We present an experimental study of the single source shortest path problem with

non-negative edge weights (NSSP) on large-scale graphs using the ∆-stepping parallel

algorithm. We report performance results on the Cray MTA-2, a multithreaded parallel

computer. The MTA-2 is a high-end shared memory system oﬀering two unique fea-

tures that aid the eﬃcient parallel implementation of irregular algorithms: the ability

to exploit ﬁne-grained parallelism, and low-overhead synchronization primitives. Our

implementation exhibits remarkable parallel speedup when compared with competitive

sequential algorithms, for low-diameter sparse graphs. For instance, ∆-stepping on a

directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten

seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To

our knowledge, these are the ﬁrst performance r esults of a s hortest path problem on

realistic graph instances in the ord er of billions of ver tices and edges.

1 Introduction

We present an experimental study of the ∆-stepping parallel algorithm [51] for solving t he

single source shortest path problem on large-scale graph instances. In addition to applica-

tions in combinatorial optimization problems, shortest path algorithms are ﬁnding increasing

relevance in the domain of complex network analysis. Popular graph theoretic ana lysis met-

rics such as betweenness centrality [28, 10 , 42, 44, 35] are based on shortest path algorithms.

∗

Contact author

Our parallel implementation targets graph families that are representative of real-world,

large-scale networks [8, 25, 13, 53, 52]. Real-world g r aphs are typically characterized by a

low diameter, heavy-tailed degree distributions modeled by power laws, and self-similarity.

They are often very large, with the number of vertices and edges ranging from several hun-

dreds of thousands to billions. On current workstations, it is not possible to do exact in-core

computations on these g r aphs due to the limited physical memory. In such cases, parallel

computing techniques can be applied to obtain exact solutions for memory and compute-

intensive graph problems quickly. For instance, recent experimental studies on Breadth-First

Search fo r large-scale graphs show that a parallel in-core implementation is two orders of

magnitude faster than an optimized external memory implementation [5, 2]. In this paper,

we present an eﬃcient parallel implementa tion for the single source shortest paths problem

that can handle scale-free instances in the order of billions of edges. In addition, we con-

duct an experimental study of performance on several other graph families, and this work

is our submission to the 9th DIMACS Implementation Challenge [19] on Shortest Paths.

Sequential algorithms for the single source shortest path problem with non-negative edge

weights (NSSP) are studied extensively, both theoretically [23, 21, 26, 27, 56, 58, 36, 33, 4 8]

and experimentally [22, 31, 30, 1 6, 61, 32]. Let m and n denote the number of edges and

vertices in the graph respectively. Nearly all NSSP algorithms a r e based on the classical

Dijkstra’s [23 ] algorithm. Using Fibonacci heaps [26], Dijkst r a’s algo rithm can be imple-

mented in O(m + n log n) time. Thorup [58] presents a n O(m + n) RAM algorithm for

undirected graphs that diﬀers signiﬁcantly diﬀerent from Dijkstra’s approach. Instead of

visiting vertices in the order of increasing distance, it traverses a component tree. Meyer

[49] and Goldberg [32] propose simple algorithms with linear average time for uniformly

distributed edge weights.

Parallel algorithms for solving NSSP are reviewed in detail by Meyer and Sanders [48, 51].

There are no known PRAM alg orithms that run in sub-linear time and O(m + n log n)

work. Parallel priority queues [24, 12] for implementing Dijkstra’s algorithm have been

developed, but these linear work algorithms have a worst -case time bound of Ω(n), as they

only perform edge relaxations in parallel. Several matrix-multiplication based algorithms

[37, 29], proposed for the parallel All-Pairs Shortest Paths (APSP), involve running time

and eﬃciency trade-oﬀs. Parallel approximate NSSP algorithms [43, 17, 57] based on the

randomized Breadth-First search algorithm of Ullman and Yannakakis [60 ] run in sub-linear

time. However, it is not known how to use the Ullman-Yannakakis randomized approach for

exact NSSP computations in sub-linear time.

Meyer and Sanders give the ∆-stepping [51] NSSP algorithm that divides Dijkstra’s algo -

rithm into a number of phases, each of which can be executed in parallel. For random graphs

with uniformly distributed edge weights, this algorithm runs in sub-linear time with linear

average case work. Several theoretical improvements [50, 46, 47] are given for ∆-stepping

(for instance, ﬁnding shortcut edges, adaptive bucket-splitting), but it is unlikely that they

would be faster than the simple ∆-stepping algorithm in practice, as the improvements in-

volve sophisticated data structures that a r e hard to implement eﬃciently. On a ra ndom

d-regular graph instance (2

vertices a nd d = 3), Meyer and Sanders report a speedup of

9.2 on 16 processors of an Intel Parago n machine, for a distributed memory implementation

of the simple ∆-stepping algorithm. For the same graph family, we a r e able to solve prob-

lems three orders of magnitude larger with near-linear speedup on the Cray MTA-2. For

instance, we achieve a speedup of 14.82 on 16 processors and 29.75 on 40 processors for a

random d-regular graph o f size 2

vertices and d set to 3.

The literature contains f ew experimental studies on parallel NSSP algorithms [38, 54, 40,

59]. Prior implementation results on distributed memory machines resorted to graph parti-

tioning [15, 1, 34], and running a sequential NSSP algorithm on the sub-graph. Heuristics

are used for load balancing and termination detection [39, 41]. The implementations perform

well for certain graph families and problem sizes, but in the worst case, there is no speedup.

Implementatio ns of PRAM graph algorithms for arbitrary sparse graphs are typically

memory intensive, and the memory accesses are ﬁne-grained and highly irregular. This

often leads t o poor performance o n cache- ba sed systems. On distributed memory clusters,

few parallel graph algorithms outperform the best sequential implementations due to lo ng

memory latencies and high synchronization costs [4, 3]. Parallel shared memory systems are

a more supportive platform. They oﬀer higher memory bandwidth and lower latency than

clusters, and the global shared memory greatly improves developer productivity. However,

parallelism is dependent on the cache performance of the algorithm [55] and scalability is

limited in most cases.

We present our shortest path implementation results on the Cray MTA-2, a massively

multithreaded parallel machine. The MTA-2 is a high-end shared memory system oﬀering

two unique features that aid considerably in the design of irregular algorithms: ﬁne-grained

parallelism and low-overhead word-level synchronization. The MTA-2 has no data cache;

rather than using a memory hierarchy to reduce latency, the MTA-2 processors use hardware

multithreading to tolerate the latency. The word-level synchronization support complements

multithreading and makes performance primarily a f unction of parallelism. Since graph

algorithms have an abundance of parallelism, yet often are not amenable to partitioning, the

MTA-2 architectural features lead to superior performance and scalability. Our recent results

highlight the exceptional performance of the MTA-2 for implementations of key combinatorial

optimization and graph theoretic problems such as list ranking [3], connected component s

[3, 9], subgraph isomorphism [9], Breadth-First Search and st-connectivity [5].

The main contributions of this paper are as follows:

• An experimental study of solving the single-source shortest paths problem in parallel

using the ∆-stepping algorithm. Prior studies have predominantly focused on running

sequential NSSP algorithms on graph families that can be easily partitioned, whereas

we also consider several arbitrary, sparse graph instances. We also analyze performance

using machine-independent algorithmic operation counts.

• Demonstration of the power of massive multithreading for graph algorithms on highly

unstructured instances. We achieve impressive performance on low-diameter random

and scale-free graphs.

• Solving NSSP for large-scale realistic graph instances in the order of billions of edges.

∆-stepping on a synthetic directed scale-free graph of 100 million vertices and 1 billion

edges takes 9.73 seconds on 40 processors of the MTA-2, with a r elative speedup of

approximately 31. These are the ﬁrst results that we are aware of, for solving instances

of this scale and also achieving near-linear speedup. Also, the sequential performance

of our implementation is comparable to competitive NSSP implementations.

This paper is organized as follows. Section 2 provides a brief overview of ∆-stepping.

Our parallel implementation of ∆-stepping is discussed in Section 3. Section 4 and 5 describe

our experimental setup, performance results and analysis. We conclude with a discussion

on implementation improvements and future plans in Section 6. Applendix A describes the

MTA-2 architecture.

2 Review of the ∆-stepping Algorithm

2.1 Preliminaries

Let G = (V, E) be a graph with n vertices and m edges. Let s ∈ V denote the source vert ex.

Each edge e ∈ E is a ssigned a non-negative real weight by the length function l : E → R.

Deﬁne the weight of a path as the sum of t he weights of its edges. The single source shortest

paths problem with non-negative edge weights (NSSP) computes δ(v), the weight of the

shortest (minimum-weighted) path from s to v. δ(v) = ∞ if v is unreachable from s. We set

δ(s) = 0.

Most shortest path algo r ithms maintain a tentative distance value for each vertex, which

are updated by edge relaxations. Let d(v) denote the tentative distance of a vertex v.

d(v) is initially set to ∞, and is an upper bound on δ(v). Relaxing an edge hv, wi ∈ E

sets d(w) to the minimum of d(w) and d(v) + l(v, w). Based on the manner in which the

tentative distance values are updated, most shortest path algorithms can be classiﬁed into

two types: label-setting or label-correcting. La bel-setting algorithms (for instance, Dijkstra’s

algorithm) perform r elaxations only fro m settled (d(v) = δ(v)) vertices, and compute the

shortest path from s to all vertices in exactly m edge relaxations. Based on the values of

d(v) and δ(v), at each iteratio n of a shortest path algorithm, vertices can be classiﬁed into

unreached (d(v) = ∞), queued (d(v) is ﬁnite, but v is not settled) or settled. Label-correcting

algorithms (e.g., Bellman-Ford) relax edges from unsettled vertices also, and may perform

more than m relaxations. Also, all vert ices remain in a queued state until the ﬁnal step of

the algorithm. ∆-stepping belongs to the label-correcting type of shortest path algor ithms.

剩余39页未读，继续阅读

评论收藏

内容反馈

版权申诉

Parallel Shortest Path Algorithms：最短路径并行算法.pdf

评论0

最新资源

Parallel Shortest Path Algorithms：最短路径并行算法.pdf

评论0

最新资源

相关推荐

Parallel Algorithms并行算法

cuda-parallel-shortest-path:使用CUDA平台的NVIDIA GPU上的并行最短路径算法

MPI-parallel-algorithms:一些MPI并行算法，易于学习

C++实现的单源最短路径算法（SSSP）.zip

All-pairs Shortest Dynamic Path Length Algorithm：计算动态网络中所有节点对之间最短动态路径长度的算法-matlab开发

top K最短路径问题（K Shortest Path Routing）K最短路径算法与应用分析.pdf

Spark_Graphs-Shortest_Path:使用火花图的图算法“最短路径”的两种实现

OSPF(Open Shortest Path First开放式最短路径优先)

OSPF(Open Shortest Path First开放式最短路径优先）

计算机组成与结构：lecture 17 Parallel Processing.pdf

论文研究-PRAM模型下二叉树的中序遍历的并行算法 .pdf

Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4

Algorithms Parallel and Sequential_2019.pdf

Shortest-Path-Finding-Visualizer:最短路径查找可视化器

Shortest-Path-Finder-OSM:使用Dijkstra算法的OpenStreetMaps最短路径查找器

Shortest_Path_迪杰特斯拉_最短路径Dijkstra_

论文研究-四正则图的纵横嵌入优化并行算法.pdf

Skiena-The_Algorithm_Design_Manual.pdf

Algorithms: Design and Analysis

论文研究-能力与权力:大规模复杂网络重要节点识别的二重异质指标及算法.pdf

A Survey of Parallel Algorithms for Shared-Memory Machines.pdf

论文研究-基于Hadoop的FP-Growth关联规则并行改进算法.pdf

CppAMPLanguageAndProgrammingModel.pdf

并行算法Parallel Algorithms

API搜索引擎（简单易上手适用于各种课程设计，附带几个测试用例） 展示：http://43.143.77.107:8090

Parallel.Iterative.Algorithms.-.From.Sequential.to.Grid.Computing

全国计算机等级考试二级Python真题及解析.docx

1000份ppt模版，PPT模板优秀PPT

导入证书可以解决”无法建立到信任根颁发机构的证书链"问题。

API搜索引擎（简单易上手适用于各种课程设计，附带几个测试用例）展示：http://43.143.77.107:8090