Fastexactshortest-pathdistancequeriesonlargenetworks资源-CSDN文库

需积分: 35 76 浏览量 2015-02-14 09:49:48 上传评论收藏 1.1MB PDF 举报

在处理大量网络数据时，寻找两点之间的最短路径是一个重要而具有挑战性的任务。网络中每个节点之间的连接关系复杂，直接计算不仅效率低下，而且在大规模网络中是不可行的。为了解决这个问题，研究人员提出了通过宽度优先搜索为每个节点建立距离标签索引的方法。这种方法的关键在于，在宽度优先搜索过程中进行剪枝操作，从而减少搜索空间的大小，并且在保持查询效率的同时减少标签索引的大小。这篇文章由来自东京大学的Takuya Akiba、Yoichi Iwata和来自信息学研究所的Yuichi Yoshida合著。他们提出了一种新的精确方法来查询大规模网络上的最短路径距离。该方法首先预先计算每个顶点的距离标签，通过从每个顶点开始执行宽度优先搜索完成。初看起来，这种方法可能显得过于直接且效率低下，但研究人员引入了剪枝的关键技术。即使通过距离标签回答任意两点之间的距离，这种剪枝技术出人意料地减少了搜索空间以及标签的大小。此外，作者们展示了利用位运算同时执行32或64个宽度优先搜索的能力。实验表明，这两项技术的结合在各种大规模实际网络上是高效和稳健的。尤其是该方法能够处理具有数亿条边的社会网络和网页图，它们的规模比以前的精确方法大两个数量级，同时与之前方法的查询时间相当。在网络图中进行距离查询，是回答两个顶点间距离的问题。在社交网络中，用户之间的距离被认为表示亲密度，并用于社交敏感搜索，以帮助用户找到更多相关用户或内容。在网络图中进行距离查询是图数据查询中一个非常基础的操作，并且具有广泛的应用。本研究提出的剪枝技术实际上是在宽度优先搜索中进行的一种优化。宽度优先搜索（Breadth-First Search，BFS）是一种从起点开始，逐层向外探索图中所有节点的方法。在每一步，都会将当前节点的所有未访问过的邻居节点加入到搜索队列中。而剪枝是一种减少不必要的搜索操作的技术，通过避免探索那些对于寻找最短路径来说不是最优选择的路径分支，从而提高搜索效率。剪枝技术通常与启发式算法相结合，启发式算法可以在搜索过程中提供有关最短路径可能位置的线索。通过启发式评估，如果当前路径不可能是最佳选择，则进行剪枝，从而可以进一步优化搜索过程。除了剪枝技术，该研究还提到了利用位运算同时执行多个宽度优先搜索。位运算是一种高效的运算方法，它可以用来处理和操作数据中的位，这通常比传统的算术运算要快得多。位运算在处理大数据时尤其有用，因为它们可以减少内存使用并加快数据处理速度。通过并行处理多个搜索任务，可以极大地提高整体的查询处理能力。文章还强调了该技术对于处理具有百万数量级边的大规模社交网络和网络图的适用性，这种规模是之前精确方法所无法达到的。该技术之所以受到关注，是因为它在查询时间上能够与其他已知方法保持竞争力，同时能处理比之前方法大两个数量级的数据集。文章指出了研究的分类和主题描述，以及通用术语和关键词，这都强调了该研究的图数据处理、最短路径、查询处理的本质。通过这篇研究，我们可以看到，即使是最传统的算法技术，在正确地应用剪枝和并行处理等现代优化技术后，也能够爆发出强大的新生命，解决之前无法应对的问题。

资源推荐

资源详情

资源评论

Fast Exact Shortest-Path Distance Queries on Large

Networks by Pruned Landmark Labeling

Takuya Akiba

The University of Tokyo

Tokyo, 113-0033, Japan

t.akiba@is.s.u-tokyo.ac.jp

Yoichi Iwata

The University of Tokyo

Tokyo, 113-0033, Japan

y.iwata@is.s.u-tokyo.ac.jp

Yuichi Yoshida

National Institute of Informatics,

Preferred Infrastructure, Inc.

Tokyo, 101-8430, Japan

yyoshida@nii.ac.jp

ABSTRACT

We propose a new exact method for shortest-path distance

queries on large-scale networks. Our method precomputes

distance labels for vertices by performing a breadth-ﬁrst

search from every vertex. Seemingly to o obvious and too

ineﬃcient at ﬁrst glance, the key ingredient introduced here

is pruning during breadth-ﬁrst searches. While we can still

answer the correct distance for any pair of vertices from

the labels, it surprisingly reduces the search space and sizes

of labels. Moreover, we show that we can perform 32 or

64 breadth-ﬁrst searches simultaneously exploiting bitwise

operations. We experimentally demonstrate that the com-

bination of these two techniques is eﬃcient and robust on

various kinds of large-scale real-world networks. In particu-

lar, our method can handle social networks and web graphs

with hundreds of millions of edges, which are two orders of

magnitude larger than the limits of previous exact methods,

with comparable query time to those of previous methods.

Categories and Subject Descriptors

E.1 [Data]: Data Structures—Graphs and networks

General Terms

Algorithms, Experimentation, Performance

Keywords

Graphs, shortest paths, query processing

1. INTRODUCTION

A distance query asks the distance between two vertices

in a graph. Without doubt, answering distance queries is

one of the most fundamental operations on graphs, and it

has wide range of applications. For example, on social net-

works, distance between two users is considered to indicate

the closeness, and used in socially-sensitive search to help

users to ﬁnd more related users or contents [40, 42], or to

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

analyze inﬂuential people and communities [19, 6]. On web

graphs, distance between web pages is one of indicators of

relevance, and used in context-aware search to give higher

ranks to web pages more related to the currently visiting

web page [39, 29]. Other applications of distance queries in-

clude top-k keyword queries on linked data [16,37], discovery

of optimal pathways between compounds in metabolic net-

works [31, 32], and management of resources in computer

networks [28, 7].

Of course, we can compute the distance for each query by

using a breadth ﬁrst search (BFS) or Dijkstra’s algorithm.

However, they take more than a second for large graphs,

which is too slow to use as a building blo ck of these appli-

cations. In particular, applications such as socially-sensitive

search or context-aware search should have low latency since

they involve real-time interactions between users, while they

need distances between a number of pairs of vertices to rank

items for each search query. Therefore, distance queries

should be answered much more quickly, say, microseconds.

The other extreme approach is to compute distances be-

tween all pairs of vertices beforehand and store them in an

index. Though we can answer distance queries instantly,

this approach is also unacceptable since preprocessing time

and index size are quadratic and unrealistically large. Due

to the emergence of huge graph data, design of more mod-

erate and practical methods between these two extreme ap-

proaches has been attracting strong interest in the database

community [12, 29,41,38, 4, 30, 17].

Generally, there are two major graph classes of real-world

networks: one is road networks, and the other is complex

networks such as social networks, web graphs, biological net-

works and computer networks. For road networks, since it is

easier to grasp and exploit structures of them, research has

been already very successful. Now distance queries on road

networks can be processed in less than one microsecond for

the complete road network of the USA [1].

In contrast, answering distance queries on complex net-

works is still a highly challenging problem. The methods

for road networks do not perform well on these networks

since structures of them are totally diﬀerent. Several meth-

ods have been proposed for these networks, but they suﬀer

from drawback of scalability. They take at least thousands

of seconds or tens of thousands of seconds to index networks

with millions of edges [41, 4, 2,17].

To handle larger complex networks, apart from these exact

methods, approximate methods are also studied. That is, we

do not always have to answer correct distances. They are

successful in terms of much better scalability and very small

SIGMOD’13, June 22–27, 2013, New York, New York, USA.

349

average relative error for random queries. However, some of

these methods take milliseconds to answer queries [15, 38,

30], which is about three orders of magnitude slower than

other methods. Some other methods answer queries in mi-

croseconds [29, 40], but it is reported that precision of these

methods for close pairs of vertices is not high [30, 4]. This

drawback might be critical for applications such as socially-

sensitive search or context-aware search since, in these ap-

plications, distance queries are employed to distinguish close

items.

1.1 Our Contributions

To address these issues, in this paper, we present a new

method for answering distance queries in complex networks.

The proposed method is an exact method. That is, it always

answers exactly correct distance to queries. It has much bet-

ter scalability than previous exact methods and can handle

graphs with hundreds of millions of edges. Nevertheless,

the query time is very small and around ten microseconds.

Though our method can handle directed and/or weighted

graphs as we mention later, in the following, we assume

undirected, unweighted graphs for simplicity of exposition.

Our method is based on the notion of distance labeling

or distance-aware 2-hop cover. The idea of 2-hop cover is

as follows. For each vertex u, we pick up a set C(u) of

candidate vertices so that every pair of vertices (u, v) has

at least one vertex w ∈ C(u) ∩ C(v) on a shortest path

between the pair. For each vertex u and a vertex w ∈ C(u),

we precompute the distance d

(u, w) between them. We

say that the set L(u) = {(w, d

(u, w))}

w∈C(u)

is the label

of u. Using labels, it is clear that the distance d

(u, v)

between two vertices u and v can be computed as min{δ+δ

(w, δ) ∈ L(u), (w, δ

) ∈ L(v)}. The family of labels {L(u)}

is called a 2-hop cover. Distance labeling is also commonly

used in previous exact methods [13,12,2,17], but we propose

a totally new and diﬀerent approach to compute the labels,

referred to as the pruned landmark labeling.

The idea of our method is simple and rather radical: from

every vertex, we conduct a breadth-ﬁrst search and add the

distance information to labels of visited vertices. Of course,

if we naively implement this idea, we need O(nm) prepro-

cessing time and O(n

) space to store the index, which is

unacceptable. Here, n is the number of vertices and m is the

number of edges. Our key idea to make this method practi-

cal is pruning during the breadth-ﬁrst searches. Let S be a

set of vertices and suppose that we already have labels that

can answer correct distance between two vertices if a shortest

path between them passes through a vertex in S. Suppose

we are conducting a BFS from v and visiting u. If there is

a vertex w ∈ S such that d

(v, u) = d

(v, w) + d

(w, u),

then we prune u. That is, we do not traverse any edges from

u. As we prove in Section 4.3, after this pruned BFS from

v, the labels can answer the distance between two vertices

if a shortest path between them passes through a vertex in

S ∪ {v}.

Interestingly, our method combines the advantages of three

diﬀerent previous successful approaches: landmark-based

approximate methods [29, 38, 30], tree-decomposition-based

exact methods [41,4], and labeling-based exact methods [13,

12, 2]. Landmark-based approximate methods achieve re-

markable precision by leveraging the existence of highly cen-

tral vertices in complex networks [29]. This fact is also

the main reason of the power of our pruning: by conduct-

Table 1: Summary of experimental results of previous meth-

ods and the proposed method for exact distance queries.

Method Network |V | |E| Indexing Query

TEDI Computer 22 K 46 K 17 s 4.2 µs

[41] Social 0.6 M 0.6 M 2,226 s 55.0 µs

HCL Social 7.1 K 0.1 M 1,003 s 28.2 µs

[17] Citation 0.7 M 0.3 M 253,104 s 0.2 µs

TD Social 0.3 M 0.4 M 9 s 0.5 µs

[4] Social 2.4 M 4.7 M 2,473 s 0.8 µs

HHL Computer 0.2 M 1.2 M 7,399 s 3.1 µs

[2] Social 0.3 M 1.9 M 19,488 s 6.9 µs

Web 0.3 M 1.5 M 4 s 0.5 µs

PLL Social 2.4 M 4.7 M 61 s 0.6 µs

(this work) Social 1.1 M 114 M 15,164 s 15.6 µs

Web 7.4 M 194 M 6,068 s 4.1 µs

ing breadth-ﬁrst searches from these central vertices ﬁrst,

later we can drastically prune breadth-ﬁrst searches. Tree-

decomposition-based metho ds exploit the core–fringe struc-

ture of networks [10, 27] by decomposing tree-like fringes

of low tree-width. Though our method do es not explicitly

use tree decompositions, we prove that our method can eﬃ-

ciently process graphs of small tree-width. This process indi-

cates that our method also exploits the core–fringe structure.

As with other labeling-based methods, the data structure of

our index is simple and query processing is very quick be-

cause of the locality of memory access.

Though this pruned landmark labeling scheme is already

powerful by itself, we propose another labeling scheme with

a diﬀerent kind of strength and combine them to further

improve the performance. That is, we show that labeling

by breadth-ﬁrst search can be implemented in a bit-parallel

way, which exploits the property that the number of bits

b in a register word is typically 32 or 64 and we can per-

form bit manipulations on these b bits simultaneously. By

this technique, we can perform BFSs from b + 1 vertices

simultaneously in O(m) time. In the beginning, this bit-

parallel labeling (without pruning) works better than the

pruned landmark labeling since pruning does not happen

much. Note that we are not talking about thread-level par-

allelism, and our bit-parallelism actually decreases the com-

putational complexity by the factor of b + 1. We can also

use thread-level parallelism in addition to these two labeling

schemes.

As we conﬁrm in our experimental results, our metho d

outperforms other state-of-the-art methods for exact dis-

tance queries. In particular, it has signiﬁcantly better scal-

ability than previous methods. It took only tens of seconds

for indexing networks with millions of edges. This indexing

time is two orders of magnitude faster than previous meth-

ods, which took at least thousands of seconds or even more

than one day. Moreover, our method successfully handled

networks with hundreds of millions of edges, which is again

two orders of magnitude larger than networks that have been

previously used in experiments of exact methods. The query

time is also better than previous methods for networks with

the same size, and we conﬁrmed that the query time does

not increase rapidly against sizes of networks. We also con-

ﬁrm the size of an index of our method is comparable to

other methods.

In Table 1, we summarize our experimental results and

those of previous exact methods presented in these papers.

We listed the results for the largest two real-world complex

350

剩余11页未读，继续阅读

评论收藏

内容反馈

atdian

粉丝: 0
资源: 2

Fast exact shortest-path distance queries on large networks

最新资源

Fast exact shortest-path distance queries on large networks

pruned-highway-labeling:道路网中最短路径的快速查询

shortest-path-in-exact-hop

SecGDB: Graph Encryption for Exact Shortest Distance Queries with Efficient Updates

An Exact Label-Correcting Algorithm

Fast and Memory-Efficient Exact Attention with IO-Awareness.pdf

Fast Exact Search in Hamming Space with Multi-Index Hashing

exact-php-client, PHP客户端库，用于精确联机.zip

彻底理解字符串查找算法的好书《Handbooks fo Exact String-Matching Algorithms》

node-exact-online:Exact Online API 的 Node.js 包装器

The-Exact-Sciences-in-Antiquity

字符串匹配算法手册 ( Handbook of Exact String-Matching Algorithms)

Symmetry Reduced and Exact Non-traveling Wave Solutions of the (2+1)-D GSWW Equation

Exactor-开源

eXact User Guide-CS_RevD

Fast Exact Multiplication by the Hessian - 1993 (nc-hessian)-计算机科学

mdps-exact-methods_mdp_

ext4-exactor.zip

Fast, Exact, Linear Booleans

Common-path OCT using optical delay stair

exact-segment-intersect:精确构造两条线段的交点

Artech House - SMS and MMS Interworking in Mobile Networks

Python库 | exact_cover-0.5.0a0.tar.gz

Exact-reduction-of-ODE-systems

Computing and Combinatorics

××汽车有限公司Exact-ERP系统操作手册.doc

Exact-Online-iOS-SDK:用于 Exact Online 的 Objective-C iOS SDK。 使用 Exact Online 的 API 创建您的 iOS 应用程序

最新资源

Exact-Online-iOS-SDK:用于 Exact Online 的 Objective-C iOS SDK。使用 Exact Online 的 API 创建您的 iOS 应用程序