Towards Personalized Maps: Mining User Preferences
from Geo-textual Data
Kaiqi Zhao
Yiding Liu
Quan Yuan
Lisi Chen
Zhida Chen
Gao Cong
Nanyang Technological University
Hong Kong Baptist University
Rich geo-textual data is available online and the data keeps in-
creasing at a high speed. We propose two user behavior models
to learn several types of user preferences from geo-textual data,
and a prototype system on top of the user preference models for
mining and search geo-textual data (called PreMiner) to support
personalized maps. Different from existing recommender systems
and data analysis systems, PreMiner highly personalizes user ex-
perience on maps and supports several applications, including user
mobility & interests mining, opinion mining in regions, user rec-
ommendation, point-of-interest recommendation, and querying and
subscribing on geo-textual data.
People post a variety of content to the internet everyday through
GPS-equiped mobile devices. Such posts are associated with ge-
ographical coordinates (latitude and longitude), and some of them
are associated with semantic places, i.e., points-of-interest (POIs).
They also contain words that imply semantic topics (see Figure
1(a)), or words that imply user’s opinions on different aspects of
a POI (see Figure 1(b)). With multiple types of information avail-
able from geo-textual posts, we face a great opportunity to mine
different kinds of user preferences, including preferences on topic,
region, POI aspect, and category. For example, a user who prefers
topic “sports” may often mention words like “shoot” and “goal” in
their posts. As another example, a user may frequently visit shops
in a shopping area she likes (i.e., preferences on region).
However, building a unified model that captures different types
of user preferences poses three main challenges. First, the interac-
tions among different types of latent variables (e.g., aspect, senti-
ment, region, topic) and observable variables (e.g., text, time, cat-
egory, POI) are unclear. Second, the data could be in different for-
mats (continuous and discrete) from different data sources (e.g.,
Yelp and Foursquare). The variety of data makes the modeling and
parameter learning complicated. Third, the latent variables in dif-
ferent scopes further complicate the model learning. For example,
each sentence in a review is often related to one aspect and the
This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License. To view a copy
of this license, visit For
any use beyond those covered by this license, obtain permission by emailing
Proceedings of the VLDB Endowment, Vol. 9, No. 13
Copyright 2016 VLDB Endowment 2150-8097/16/09.
(a) Words about topic “eat & drink” in a check-in post
(b) Words about different aspects (environment, food and service)
and their corresponding sentiments in a review
Figure 1: Screenshot of short text (e.g., Foursquare check-ins)
and long text (e.g., Yelp reviews) data
whole review should be posted in some latent region. This implies
that aspect and sentiment are often modeled in the scope of sen-
tence, while region is modeled in the scope of document.
To tackle these challenges, we design two probabilistic mod-
els, namely Who, Where, When, What (W4) model [7, 8] and
Sentiment, Aspect, Region (SAR) model [9] for short text and long
text data, respectively. W4 mines user preferences on topics and re-
gions from short geo-textual documents with temporal information
(e.g., check-ins), while SAR mines user preferences on aspects,
categories and regions from geo-textual documents in which tem-
poral information is not available but the text is long enough for
sentiment analysis (e.g., geo-tagged reviews). Both user behav-
ior models support mining several types of user preferences and
hence cater the needs for various applications. The proposed mod-
els achieve better performances than other models in many appli-
cations. For example, SAR achieves at least 60% higher accuracy
than other models in POI recommendation, and W4 performs at
least 80% more accurate than other models in location prediction.
In this demonstration, we propose a prototype system
, namely
PreMiner, which is built on top of the two user behavior models.
Our system supports querying and mining geo-textual data for per-
sonalized map services, based on the two models and techniques
proposed in our previous work [1, 3]. It supports, but not limited to
the following applications:
The system is available at http://spatialkeyword.sce.