Towards Personalized Maps: Mining User Preferences
from Geo-textual Data
Kaiqi Zhao
1
Yiding Liu
2
Quan Yuan
3
Lisi Chen
4
Zhida Chen
5
Gao Cong
6
Nanyang Technological University
{
1
kzhao002@e.,
2
ydliu@,
3
qyuan1@e.,
5
chen0936@e.,
6
gaocong@}ntu.edu.sg
Hong Kong Baptist University
4
chenlisi@comp.hkbu.edu.hk
ABSTRACT
Rich geo-textual data is available online and the data keeps in-
creasing at a high speed. We propose two user behavior models
to learn several types of user preferences from geo-textual data,
and a prototype system on top of the user preference models for
mining and search geo-textual data (called PreMiner) to support
personalized maps. Different from existing recommender systems
and data analysis systems, PreMiner highly personalizes user ex-
perience on maps and supports several applications, including user
mobility & interests mining, opinion mining in regions, user rec-
ommendation, point-of-interest recommendation, and querying and
subscribing on geo-textual data.
1. INTRODUCTION
People post a variety of content to the internet everyday through
GPS-equiped mobile devices. Such posts are associated with ge-
ographical coordinates (latitude and longitude), and some of them
are associated with semantic places, i.e., points-of-interest (POIs).
They also contain words that imply semantic topics (see Figure
1(a)), or words that imply user’s opinions on different aspects of
a POI (see Figure 1(b)). With multiple types of information avail-
able from geo-textual posts, we face a great opportunity to mine
different kinds of user preferences, including preferences on topic,
region, POI aspect, and category. For example, a user who prefers
topic “sports” may often mention words like “shoot” and “goal” in
their posts. As another example, a user may frequently visit shops
in a shopping area she likes (i.e., preferences on region).
However, building a unified model that captures different types
of user preferences poses three main challenges. First, the interac-
tions among different types of latent variables (e.g., aspect, senti-
ment, region, topic) and observable variables (e.g., text, time, cat-
egory, POI) are unclear. Second, the data could be in different for-
mats (continuous and discrete) from different data sources (e.g.,
Yelp and Foursquare). The variety of data makes the modeling and
parameter learning complicated. Third, the latent variables in dif-
ferent scopes further complicate the model learning. For example,
each sentence in a review is often related to one aspect and the
This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For
any use beyond those covered by this license, obtain permission by emailing
info@vldb.org.
Proceedings of the VLDB Endowment, Vol. 9, No. 13
Copyright 2016 VLDB Endowment 2150-8097/16/09.
(a) Words about topic “eat & drink” in a check-in post
(b) Words about different aspects (environment, food and service)
and their corresponding sentiments in a review
Figure 1: Screenshot of short text (e.g., Foursquare check-ins)
and long text (e.g., Yelp reviews) data
whole review should be posted in some latent region. This implies
that aspect and sentiment are often modeled in the scope of sen-
tence, while region is modeled in the scope of document.
To tackle these challenges, we design two probabilistic mod-
els, namely Who, Where, When, What (W4) model [7, 8] and
Sentiment, Aspect, Region (SAR) model [9] for short text and long
text data, respectively. W4 mines user preferences on topics and re-
gions from short geo-textual documents with temporal information
(e.g., check-ins), while SAR mines user preferences on aspects,
categories and regions from geo-textual documents in which tem-
poral information is not available but the text is long enough for
sentiment analysis (e.g., geo-tagged reviews). Both user behav-
ior models support mining several types of user preferences and
hence cater the needs for various applications. The proposed mod-
els achieve better performances than other models in many appli-
cations. For example, SAR achieves at least 60% higher accuracy
than other models in POI recommendation, and W4 performs at
least 80% more accurate than other models in location prediction.
In this demonstration, we propose a prototype system
1
, namely
PreMiner, which is built on top of the two user behavior models.
Our system supports querying and mining geo-textual data for per-
sonalized map services, based on the two models and techniques
proposed in our previous work [1, 3]. It supports, but not limited to
the following applications:
1
The system is available at http://spatialkeyword.sce.
ntu.edu.sg/PreMiner.
1545