没有合适的资源?快使用搜索试试~ 我知道了~
2-丁天琛-Lamps_Location-Aware_Moving_Top-k_PubSub1
需积分: 0 0 下载量 95 浏览量
2022-08-04
13:38:15
上传
评论
收藏 2.39MB PDF 举报
温馨提示
试读
14页
2-丁天琛-Lamps_Location-Aware_Moving_Top-k_PubSub1
资源详情
资源评论
资源推荐
1041-4347 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2020.2979176, IEEE
Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 14, NO. 8, AUGUST 2015 1
Lamps: Location-Aware Moving Top-k Pub/Sub
Shunya Nishio, Daichi Amagata, Member, IEEE, Takahiro Hara, Senior Member, IEEE
Abstract—Huge amounts of spatio-textual objects, such as geo-tagged tweets, are being generated at an unprecedented scale,
leading to a variety of applications such as location-based recommendation and sponsored search. Many of these applications need to
support moving top-k spatio-textual subscriptions. For example, while walking, a tourist issues a moving subscription and looks for
top-k advertisements published by nearby shops. Unfortunately, existing methods that monitor the results of spatio-textual
subscriptions support only static top-k subscriptions or moving boolean subscriptions.
In this paper, we propose a novel system, called Lamps (Location-Aware Moving Top-k Pub/Sub), which continuously monitors the
top-k most relevant spatio-textual objects for a large number of moving top-k spatio-textual subscriptions simultaneously. To the best of
our knowledge, this is the first study of a location-aware moving top-k pub/sub system. As with existing works on continuous moving
top-k subscription processing, Lamps employs the concept of a safe region to monitor top-k results. However, unlike with existing works
that assume static objects, top-k result updates may be triggered by newly generated objects. To continuously monitor the top-k results
for massive moving subscriptions efficiently, we propose SQ-tree, a novel index based on safe regions, to filter subscriptions whose
top-k results do not change. Moreover, to reduce the expensive cost of safe region re-evaluation, we develop a novel approximation
technique for safe region construction. Our experimental results on real datasets show that Lamps achieves higher performance than
baseline approaches.
Index Terms—Pub/Sub, Top-k queries, Moving queries, Spatio-textual objects
F
1 INTRODUCTION
B
ECAUSE of the prevalence of social networks and GPS-
enabled mobile devices, huge amounts of objects with
both spatial and textual information, referred to as spatio-
textual objects, have been generated in a streaming fashion.
This has led to the popularity of location-aware pub/sub
systems in many applications, such as location-based recom-
mendation [1] and sponsored search [2]. In such a system, a
large number of users register their interests (e.g., keywords
and locations) as spatio-textual subscriptions. Then, the
system delivers a stream of spatio-textual objects (e.g., geo-
tagged tweets, e-coupons, and advertisements) generated
by publishers (e.g., restaurants) to the relevant subscrip-
tions. Various location-aware pub/sub systems have been
proposed [3], [4], [5], [6], [7]. Most of them focus on boolean
matching, thereby users may receive few matching objects
or may be overwhelmed by a huge volume of matching
objects. In such a situation, a top-k subscription, which
can control the size of the objects to be received, is useful.
Unfortunately, existing works focus on static subscriptions
and cannot support moving subscriptions efficiently. Note
that many real-world applications need to support moving
subscriptions [5], [8], [9], [10], [11]. For example, while walk-
ing, a tourist looks for top-k advertisements published by
nearby shops. Because he/she is walking, his/her location
(subscription location) changes continuously.
Motivated by the fact that none of the current systems
can support moving top-k subscriptions against streaming
objects, in this paper, we propose a novel system, called
Lamps (Location-Aware Moving Top-k Pub/Sub), which
• The authors are with the Department of Multimedia Engineering,
Graduate School of Information Science and Technology, Osaka Uni-
versity, Suita 565-0871, Japan. E-mail: {nishio.syunya, amagata.daichi,
hara}@ist.osaka-u.ac.jp.
Manuscript received April 19, 2005; revised August 26, 2015.
: e-coupon
: user (subscription)
: subscription keywords
sushi, tea
coffee
Japanese
ramen
Keywords
sushi, Japanese
salmon
coffee
Japanese,
udon
, soba
seafood, beef
udon, tofu
sushi, tea
Newly published.
Fig. 1: A location-aware moving top-k pub/sub system. At
ts = 3, o
6
and o
7
are newly published and each user moves
to the tip of the arrow.
continuously monitors the top-k most relevant spatio-
textual objects for a large number of moving top-k spatio-
textual subscriptions simultaneously. Specifically, for each
subscription, we score objects based on their spatial and
textual similarities, and the top-k results are continuously
maintained against streaming objects (i.e., object generations
and expirations) and the movements of users.
Example 1. Fig. 1 shows an example of a location-aware
moving top-k pub/sub system that delivers e-coupons
to potential consumers. In this example, there are four
users (subscriptions) and four restaurants (publishers)
that continuously generate e-coupons. Each restaurant
can generate multiple e-coupons and delete its generated
e-coupons. Each user registers his/her interest as a sub-
scription and monitors the top-1 most relevant e-coupon.
At timestamp ts = 2, we assume that five e-coupons
(o
1
- o
5
) have been published and the top-1 result for
user u
3
is e-coupon o
1
, because o
1
has the highest spatial
and textual similarities to the subscription of u
3
. In other
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 22,2020 at 12:24:04 UTC from IEEE Xplore. Restrictions apply.
1041-4347 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2020.2979176, IEEE
Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 14, NO. 8, AUGUST 2015 2
words, o
1
is the closest to the location of u
3
and has the
common keyword “sushi” as u
3
. Also, the top-1 result
for u
2
is o
4
. At ts = 3, new e-coupons o
6
and o
7
are
published and each user moves, and then the result for
each user is updated. The top-1 result for u
3
becomes
o
7
, because o
7
is closer to the current location of u
3
and
its keywords are more similar to the keywords of u
3
.
Furthermore, u
2
moves west, thereby the top-1 result for
u
2
becomes o
1
because o
1
is closer to u
2
than o
4
.
Challenges. There are two key challenges for efficiently
monitoring the top-k result of each moving subscription.
When a spatio-textual object is generated or expires, the
top-k result of each subscription may change. We need to
monitor the up-to-date top-k results for a massive number of
subscriptions over a stream of spatio-textual objects. When a
user u moves, the score of each object for u varies, so the top-
k result of u may change. We need to monitor the up-to-date
top-k result for each subscription against the movements of
users.
To monitor the top-k results for all subscriptions over
a stream of objects, a straightforward approach works as
follows: When a new spatio-textual object o is generated, for
each subscription s, we calculate the score of o. If the score of
o is better than the score of the current k-th result, o becomes
the result of s. Moreover, when an object o
′
expires, for each
subscription whose top-k result contains o
′
, we need to re-
evaluate its result. For example, in Fig. 1, the expiration of
o
1
invalidates the current top-1 result of u
2
, thereby we
have to re-evaluate the result for u
2
. The straightforward
approach accesses all objects and calculates the scores, but
this approach is computationally expensive if the number
of subscriptions is large and the spatio-textual objects are
frequently generated and frequently expire.
To monitor the top-k result for each subscription against
the movements of users, a straightforward approach re-
calculates the scores of all objects and updates the top-
k result of u whenever u moves. However, this approach
incurs an expensive computational cost. To avoid unneces-
sary computation, several works have proposed safe region
techniques for a moving top-k subscription [8], [10], [12].
A safe region is a region where the top-k result does not
change. In other words, the top-k result needs to be re-
evaluated only when the user exits the safe region. These
safe regions however fail to function when streaming objects
are considered. This is because safe regions may change
when objects are newly generated. When users exit their
safe regions or their top-k results change, moreover, we need
to re-evaluate their safe regions. Even a state-of-the-art safe
region construction algorithm [10] incurs a large evaluation
cost if the system stores a lot of objects.
The works in [4], [5] have proposed location-aware
moving pub/sub systems. The underlying idea of them
is to group similar subscriptions and filter simultaneously
a set of subscripitions for a newly generated object. It is
challenging to achieve a similar optimization for moving
top-k pub/sub. These works focus on range-based moving
boolean subscriptions.
(Rev-1) Boolean subscriptions do not
have the concept of score, so thresholds for filtering them
never change. Moreover, their results are not affected by
expired objects. Top-k results, on the other hand, are affected
by object generation, expiration, and user movements. This
means that the threshold for each subscription always changes.
Therefore, it is more challenging to monitor the exact top-k
result for each subscription efficiently.
Overview of our proposed system. Lamps employs a novel
index, namely SQ-tree, which integrates a Quad-tree with
safe regions to effectively maintain moving top-k spatio-
textual subscriptions. SQ-tree can filter subscriptions whose
top-k results and safe regions do not change when a new
object is generated. Besides, we propose an efficient algo-
rithm to retrieve objects that become top-k objects when
an object expires or users exit their safe regions. To reduce
the expensive safe region evaluation cost, furthermore, we
develop a novel approximation technique for safe region
construction.
Contributions. We summarize our contributions below.
• We address, for the first time, the problem of mon-
itoring top-k spatio-textual objects for a large num-
ber of moving top-k spatio-textual subscriptions. To
tackle this problem, we propose a novel system,
called Lamps.
• We develop a novel approximation technique for
safe region construction to reduce the safe region
evaluation cost.
• We propose a novel index based on a Quad-tree
and safe regions, SQ-tree, to efficiently filter moving
subscriptions whose top-k results and safe regions
do not change when a new object is generated.
• We conduct experiments using real datasets, and
the results show that Lamps achieves higher perfor-
mance than baseline approaches.
Organization. We provide preliminary information in Sec-
tion 2. We present Lamps from Section 3 to Section 6 and
introduce our experimental results in Section 7. We review
some related works in Section 8 and conclude this paper in
Section 9.
2 P
RELIMINARY
First, in Section 2.1 we formally present some concepts that
are used throughout this paper. Section 2.2 introduces the
safe region technique for a moving top-k subscription.
2.1 Definition
We define spatio-textual object and Moving top-k Spatio-
Textual (MkST) subscription issued by a user.
Definition 1 (Spatio-textual object). A spatio-textual object,
which is generated by a publisher, is defined as o = (p, t, ts),
where o.p is the location of o, o.t is a set of keywords of o, and
ts is the timestamp when o is generated.
Definition 2 (MkST subscription). A MkST subscription is
defined as s = (p, t, k, α), where s.p is the current location
of the user, s.t is a set of keywords such that 1 ≤ |s.t| ≤
t
max
, s.k is the number of objects requested, and s.α is the
preference parameter (0 < s.α < 1) used in the scoring
function to evaluate the relevance between s and an object.
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 22,2020 at 12:24:04 UTC from IEEE Xplore. Restrictions apply.
1041-4347 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2020.2979176, IEEE
Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 14, NO. 8, AUGUST 2015 3
Given an object o and s, the score of o for s, scor e(s, o), is
calculated as follows [10], [13]:
score(s, o) = s.α · dist(s.p, o.p) + (1 − s.α ) · text(s.t, o.t)
(1)
where dist(s.p, o.p) is the spatial proximity and
text(s.t, o.t) is the textual similarity between s and o.
Given an object set O, the answer of s is a subset of O,
A, such that (1) |A| = k , and (2) ∀o
∗
∈ A, ∀o
′
∈ O\A,
score(s, o
∗
) ≤ score(s, o
′
)
1
.
In the following of this paper, we abbreviate s.k and s.α as
k and α, respectively, if there is no ambiguity.
To compute spatial proximity, we utilize the Euclidean
distance dist(s.p, o.p) =
Edist(s.p,o.p)
Maxdist
as with [3], [4], [5],
[10], [14], [15], [16]
2
. Edist(s.p, o.p) is the Euclidean distance
between s.p and o.p, and M axdist is the maximum distance
in the spatial area R
2
where subscriptions and objects exist.
We see that dist(·, ·) ∈ [0, 1]. For textual similarity, there
are three widely used set-based similarity functions, namely
Jaccard, Dice, and Cosine similarities [18]. Lamps can em-
ploy any similarity function that uniquely determines the
textual similarity by a given subscription and an object.
(Rev-2) In this paper, we use Jaccard similarity as the default
function, because it is usually used in set similarity search
[19], i.e., text(s.t, o.t) = 1 −
|s.t∩o.t|
|s.t∪o.t|
. Note that in Section 7.3
we verify the performance of Lamps when Dice and Cosine
similarities are employed. Then, score(s, o) is within [0, 1].
Problem statement. We consider a set of MkST subscrip-
tions S and a dynamic set O of spatio-textual objects. Sub-
scribers can move randomly at any time (i.e., movements
of users) [4]. Publishers can insert their newly generated
objects into O (i.e., object generation) and remove objects
that they have generated from O (i.e., object expiration).
We aim to continuously monitor the top-k results for all
subscriptions against object generation, object expiration,
and movements of users.
Example 2. Fig. 2 shows a running example used throughout
this paper. In this example, there are ten registered
subscriptions {s
1
, · · · , s
10
} and eight objects have been
generated {o
1
, · · · , o
8
}. Moreover, two objects o
9
and
o
10
are newly generated, and we assume s
1
.k = 2.
Specifically, o
2
and o
5
have high spatial and textual
similarities to s
1
. Thus, the top-k results of s
1
are o
2
and
o
5
.
2.2 Safe region technique
Lamps employs the concept of the safe region [12] to mon-
itor top-k results. Therefore, we first define the safe region
and a relevant concept, the dominant region.
1. As with [6], [13], to guarantee the top-k results are textual-relevant,
an object must contain at least one common keyword with a subscrip-
tion to become its top-k result.
2. The works in [11], [17] utilized the road network distance to
compute spatial proximity. Because the computaion cost of measureing
road network distance is higher than that of measureing Eucllidean
distance, we utilize the Euclidean distance to monitor the top-k results
of more subscriptions. However, Lamps can employ the road network
distance. In Section 6, we discuss the modifications needed when the
road network distance is used.
Keywords
Keywords
: subscription : object
Newly generated.
Fig. 2: Running example. Eight objects {o
1
, · · · , o
8
} have
been generated and two objects o
9
and o
10
are newly gener-
ated.
Definition 3 (Safe region). Given O, s = (p, t, k, α), and A of
s, the safe region of s, R, is:
R = {p
′
|∀o
∗
∈ A, ∀o
′
∈ O\A, score(s
′
, o
∗
) ≤ score(s
′
, o
′
)}
where s
′
is s after moving to a new location p
′
, i.e., s
′
=
(p
′
, t, k, α ).
Note that the safe region of s is a region where the top-k
result of s does not change.
Definition 4 (Dominant region). Given s = (p, t, k, α) and
two objects o
∗
and o
′
, the dominant region of o
∗
to o
′
, D
o
∗
,o
′
,
is:
D
o
∗
,o
′
= {p
′
|score(s
′
, o
∗
) ≤ score(s
′
, o
′
)}
where s
′
is s after moving to a new location p
′
, i.e., s
′
=
(p
′
, t, k, α ).
We see that the dominant region of o
∗
to o
′
is a region where
score(s, o
∗
) ≤ score(s, o
′
). The following lemma is showed
and proved in literature [10].
Lemma 1. Given O, s, and A, the safe region of s, R, is R =
∩
o
∗
∈A
(∩
o
′
∈O\A
D
o
∗
,o
′
).
Local safe region. To efficiently compute a safe region, lit-
erature [10] proposed a local safe region (LSR), which is a
subset of the safe region. Notice that we can monitor the
exact top-k results by re-evaluating them only when users
exit their LSR. [10] models each object o as a circle C
o
with
a center o.p and radius of r
o
=
1−α
α
· text(s.t, o.t). Then,
score(s, o) = α(dist(s.p, o.p) + r
o
). The LSR is computed in
the following three steps.
Step 1. For each object o
∗
∈ A, we compute an ellipse E
o
∗
and the intersection of these ellipses E = ∩
o
∗
∈A
E
o
∗
. Note
that
E
o
∗
= {p
′
|dist(s
′
.p
′
, o
∗
.p) + dist(s.p, s
′
.p
′
) ≤ γ − r
o
∗
}, (2)
where γ = max{dist(s.p, o
∗
.p) + r
o
∗
|o
∗
∈ A} + ∆ and ∆
is a parameter of an approximate ratio. Obviously E
o
∗
is an
ellipse with s.p and o
∗
.p as two foci.
Example 3. In Fig. 3, because s
1
.k = 2 and the top-k results
of s
1
are o
2
and o
5
, E is the intersection of E
o
2
and E
o
5
.
Step 2. Let C
s
be a circle centered at s.p with radius γ.
When s
′
.p
′
∈ E, each object o does not become the top-k
result of s
′
if C
o
is not inside C
s
. If C
o
is not inside C
s
,
γ ≤ dist(s.p, o.p) + r
o
. Then, score(s
′
, o
∗
) < score(s
′
, o) for
Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 22,2020 at 12:24:04 UTC from IEEE Xplore. Restrictions apply.
剩余13页未读,继续阅读
KerstinTongxi
- 粉丝: 21
- 资源: 277
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0