细粒度IP定位参文27：Identifyingusergeolocation资源-CSDN文库

IP定位

168 浏览量 2024-03-09 20:11:01 上传评论收藏 2.32MB PDF 举报

资源推荐

资源详情

资源评论

Information Fusion 81 (2022) 1–13

Available online 12 November 2021

Contents lists available at ScienceDirect

Information Fusion

journal homepage: www.elsevier.com/locate/inffus

Identifying user geolocation with Hierarchical Graph Neural Networks and

explainable fusion

Fan Zhou

, Tianliang Wang

, Ting Zhong

∗

, Goce Trajcevski

University of Electronic Science and Technology of China, China

Iowa State University, USA

A R T I C L E I N F O

Keywords:

User geolocation

Social information fusion

Graph Neural Networks

Interpretable fusion

Influence function

A B S T R A C T

Determining user geolocation from social media data is essential in various location-based applications —

from improved transportation/supply management, through providing personalized services and targeted

marketing, to better overall user experiences. Previous methods rely on the similarity of user posting content

and neighboring nodes for user geolocation, which suffer the problems of: (1) position-agnostic of network

representation learning, which impedes the performance of their prediction accuracy; and (2) noisy and

unstable user relation fusion due to the flat graph embedding methods employed. This work presents

Hierarchical Graph Neural Networks (HGNN) – a novel methodology for location-aware collaborative user-

aspect data fusion and location prediction. It incorporates geographical location information of users and

clustering effect of regions and can capture topological relations while preserving their relative positions. By

encoding the structure and features of regions with hierarchical graph learning, HGNN can primarily alleviate

the problem of noisy and unstable signal fusion. We further design a relation mechanism to bridge connections

between individual users and clusters, which not only leverages the information of isolated nodes that are

useless in previous methods but also captures the relations between unlabeled nodes and labeled subgraphs.

Furthermore, we introduce a robust statistics method to interpret the behavior of our model by identifying the

importance of data samples when predicting the locations of the users. It provides meaningful explanations on

the model behaviors and outputs, overcoming the drawbacks of previous approaches that treat user geolocation

as ‘‘black-box’’ modeling and lacking interpretability. Comprehensive evaluations on real-world Twitter datasets

verify the proposed model’s superior performance and its ability to interpret the user geolocation results.

1. Introduction

The plethora of Online Social Networks (OSN) have enabled novel

interactions in daily activities – e.g., sharing notifications about events

related to product descriptions and traffic jams; sharing personal expe-

riences on Instagram and Facebook; reading news and popular topics

on Twitter; building academic connections on ResearchGate, etc. These

have not only changed our way of communication, reading, and social

activities but also enabled a generation of an unprecedented volume

of heterogeneous data, which, in turn, fosters business innovations

and emerging industrial opportunities [1]. Among various applications,

identifying the geographic locations of users receives lasting interest

from both academia and industry and has become an essential Internet

service for many industrial services, such as location-based targeted

advertising, emergency location identification, political elections, sub-

stance use surveillance, local event/place recommendation and natural

disaster response [2–4].

∗

Corresponding author.

E-mail addresses: [email protected] (F. Zhou), [email protected] (T. Wang), [email protected] (T. Zhong), [email protected]

(G. Trajcevski).

Fine-grained localization, such as various sensor-based tracking of

assets and processes, have already been exploited in multiple industrial

applications. However, in more extensive geographical settings, there

is the issue of inaccuracy due to, e.g., cellular access restrictions,

high measurement overhead, and unreliable client response times [5].

Complementary to this, the increased popularity of social media ser-

vices (e.g., Twitter, Facebook, and Instagram) provide rich and timely

metadata, e.g., published message contents, mention tags, and fol-

low/followee relations. This information could be efficiently leveraged

to promptly geolocate OSN users — which has recently spurred re-

search interest in the, so calls, User Geolocation (UG) problem in

OSN [6–9]. For example, the CDC (centers for disease control and

prevention) has been utilizing social media to help the epidemiological

investigation in responding to the virus that causes COVID-19 [10].

https://doi.org/10.1016/j.inffus.2021.11.004

Received 9 December 2020; Received in revised form 6 July 2021; Accepted 6 November 2021

Information Fusion 81 (2022) 1–13

F. Zhou et al.

Online user geolocation is a passive crowd-sensing problem that

requires hybrid information fusion and insights from many user activ-

ities and sensing data to distill the knowledge and refine the predicted

results. Early efforts [6,11] mainly focused on mining indicative infor-

mation from users’ posting content relying on indicative words that can

link users to their home locations, based on various natural language

processing techniques (e.g., topic models and statistic models). For

example, Term Frequency–Inverse Document Frequency (TF-IDF [12])

is a commonly used method to measure the distribution of location

words [6]. More recent efforts fuse users interactions for collaborative

sensing and boosting the geolocation accuracy – e.g., node2vec [13]

is used to learn representation of users [7], combined with text repre-

sentation via doc2vec [14] to predict user locations in an end-to-end

manner. Recurrent Neural Networks (RNNs) with attention mechanism

to model user tweet content are also used in [8], further combining

the metadata such as timezone and self-declared profiles to predict

user locations. A more recent work [9] employs GCNs [15] for learning

network structures with graph convolution and pooling operations.

Broadly speaking, the existing state-of-the-art methods employ deep

learning techniques for learning user interaction and content represen-

tation — without fully exploiting the specific constraints in the user

geolocation task. When learning user interactions, graph representation

methods (e.g., GCN [15], GAT [16], node2vec [13], GraphSAGE [17])

are commonly used — however, the approaches are general, unweighted

and location-agnostic graph learning methods, without considering the

geographical position/location of nodes (users). Since the graph em-

bedding methods are not specifically tailored for user geolocation

task, existing approaches ignore the strong geolocation dependencies

among nodes and thus cannot capture the relative distance between

any pair of nodes. In addition, existing graph-based UG methods are

inherently flat graph learning models, which cannot capture the region-

level features and thus are very sensitive to local network structure.

For example, the homphily assumption, i.e., online interactions imply

a higher probability of geographical proximity, is not held in many

cases [2,18].

Our main motivation is based on the observation that the method-

ologies in the existing literature do not exploit the benefits of joint

consideration of identifying the topological structure of users along

with the influence of crowds from different regions. While the former is

usually noisy and unstable, the latter may provide a more robust signal

for geolocating. In addition, existing models, especially those based on

deep neural networks, often lack transparency and cannot interpret

model behavior and localization results. Thus, their applicability in

safety-critical areas is restricted. For example, when locating area with

specific emergencies (for example, the spread of COVID-19), it would

be more significant to explain why and how such a prediction was made

instead of just presenting the predicted results [19–21].

To address the aforementioned limitations of previous works, we

propose a novel multi-view user geolocation framework, called Hi-

erarchical Graph Neural Networks (HGNN), to fuse user-generated

content and network information for collaborative user geolocation.

It enhances user geolocation performance from the following aspects.

First, it incorporates the relative distances of each node to other nodes

(clusters) in the network, which enables the model to discriminate the

nodes having similar topological structures but residing in different

regions. Second, the hierarchical feature fusion method that we propose

provides both coarse- and fine-grained graph representation by learning

and distinguishing the crowd effects from different geographic regions.

Third, our model naturally exploits unlabeled and isolated nodes for

context information aggregation, which are absent in previous UG

models. Fourth, the interpretability of information fusion allows us

to understand the trained geolocation model’s behavior and how it

is affected by the information aggregated from the training samples

(i.e., all in-network users and their associated features). The main

contributions of this work in terms of the novelty of the proposed

approach are four-fold. Specifically, we present:

• A new location-aware node relation learning model that takes the

geographical location and relative distance into account when per-

forming non-linear transformation and feature aggregation, which not

only preserves network topology but also encodes node position with

respect to the other nodes and/or clusters.

• A new hierarchical GNN framework that learns both region- and

node-level features for robust feature aggregation and propagation,

which can be combined with any graph learning approaches in an

end-to-end manner. Compared to flat node-level embedding in exist-

ing UG approaches, we are able to alleviate the influence of noisy

interactions and the impact of outlier nodes.

• A new general framework to explain the behavior of user geolocation

models and the prediction results. We take the initiatives to use

influence function [22] to quantify the impact of in-network users

and corresponding features on the predicted outcomes.

• Extensive evaluations on three benchmark Twitter datasets. The

results demonstrate that our method significantly outperforms the

state-of-the-art baselines while providing explanations on both model

behavior and detection results.

In the rest of this paper, Section 2 reviews the related work,

followed by Section 3 that formalizes the problem and presents the

necessary backgrounds. In Section 4, we give the details of the method-

ology, as well as the approach for explaining the user-aspect data

fusion and location prediction. Experimental evaluations quantifying

the benefits of our approach are performed in Section 5. We conclude

this work and outline directions for future work in Section 6.

2. Related work

In the body of previous works on geolocating online social networks,

the models can be broadly categorized into three groups according to

the type of data used to make the prediction. We now review relevant

works and position our paper in the context of the existing literature.

2.1. Content-based approaches

User-generated content (UGC) such as textual posts and photos

may be casually attached with real-time locations facilitated by the

increasing popularity of GPS-equipped devices. However, these geo-

tagged tweets are extremely sparse, e.g., no more than 1% of published

tweets are labeled with geographical locations [23]. A plethora of

works [6,11,24–26] have studied the possibility of leveraging UGC

for locating users. These methods address the geolocation problem by

inferring locations from the location-relevant words with various clas-

sification models. Therefore, identifying meaningful indicative words

is an important step towards accurate user geolocation, where TF-

IDF [12] is a widely adopted textual content representation method in

the literature [6,9,27–29]. For example, inverse location/city frequency

has been used to measure the location words in the content [6,27]. In

contrast, probabilistic models are usually used to characterize the users’

location distributions w.r.t. their published UGC, which, however, re-

quires extensive manually labeled location-related words to achieve

satisfactory results.

Inspired by recent advances in applying deep learning in natural

language processing, a few studies turn to model users’ textual con-

tents with various neural networks based models in order to learn

the tweet representation in an end-to-end manner [7,8,30,31]. Among

these methods, doc2vec [14] and recurrent neural networks (RNNs)

are simple yet effective choices for learning vector representation of

textural contents. For example, in [7], combining TF-IDF and doc2vec

representations of textual information is proposed to enhance the pre-

diction performance. GRU [32] with attention mechanism [33] was

used in [8] to model user tweet content and obtain a timeline rep-

resentations. Though doc2vec and RNN-based methods can learn the

Information Fusion 81 (2022) 1–13

F. Zhou et al.

language characteristics efficiently without manual location feature en-

gineering, a recent study [34] finds that TF-IDF is consistently superior

to doc2vec due to the location-indicative words captured in TF-IDF.

Our present work enables better location-awareness than the exist-

ing literature and, in particular, HGNN distinguishes the crowd effects

from different geographic regions.

2.2. Network-based methods

Online social relationships are also important indicators for user

geolocation under the homophily assumption [35–38], i.e., people

prefer to interact with others in nearby areas. Backstrom et al. [35]

examine the relationship between users’ geographical proximity and

online friendships on Facebook, and find that the likelihood of relations

between any user pair drops monotonically as a function of distance.

Rather than solely relying on friendships, more and more works uti-

lize various types of connections, such as the co-mention tags and

mentions between non-friends, to construct closer social interactions

beyond friendships [2,31]. In this way, similar interests among users

can be retrieved from such implicit networks to improve geolocation

accuracy [30,39,40]. Moreover, researchers also identify some noisy

interaction factors that may degrade the prediction performance. For

example, social influence of celebrities is a distracting factor that

may confuse the prediction and thus is removed from the built user

network [30,41].

Although the existing approaches have tackled the aspect of ex-

plicitly modeling location dependency between social connected users,

some challenges have not been properly addressed — namely, the

sparsity of geo-tagged users and the inaccurate label propagation. More

importantly, friends’ locations are usually contradicting each other,

which hinders the practical applicability of these works. In contrast, our

HGNN learns both region-level and node-level features and aggregates

them in a manner that provides better intepretability.

2.3. Multi-information fusion based models

Recent efforts have leveraged deep graph learning methods to

model user interaction networks by fusing user-generated contents

and various meta-data, such as user profiles, tweeting time, and user

timezone. For example, MENET [7] exploits node2vec [13] to learn

user representations, combined with text representation learned by

doc2vec, for predicting users’ locations. Another work [9] employs

GCNs [15] for learning network structures with the graph convolution

and pooling operations, which has achieved state-of-the-art geolocation

performance. A recent work [34] investigate several graph embedding

methods and found that NetMF [42] performs better than node2vec and

GraphSAGE [17] on user geolocation task, but does not show superior

performance than GCN-based models [9,34].

It is worth noting that some works make use of various meta-data

(e.g., self-declared location in profile and timezone information) for

improving the prediction performance. For example, user timezone,

as well as UTC offset and country noun, have been used for user

geolocation [7,8,28,31,43]. While such auxiliary information is a strong

indicator for regularizing the locations the model predicted, a majority

of users are not willing to open this privacy information, which is

sometimes camouflaged or posted casually. We further note that there

is another line of efforts [36,44–47] studying the Twitter message

geolocation problem which tries to identify the tweeting locations

rather than the Twitter user location discussed in this work.

Despite the promising results on improving geolocation perfor-

mance, existing state-of-the-art methods fail to identify the importance

of individual users that we addressed in this work. Arguably, while

various graph embedding techniques can be utilized for network rep-

resentation in user geolocation, understanding the influence of user

connections is more important for interpreting the behavior of the

geolocation models and therefore benefits downstream decision mak-

ings. In this spirit, we initiate the attempt to analyze theoretically

and experimentally how the properties of graph structures influence

the geolocation performance. This not only demystifies and interprets

the predictions made by the model but outlines the underlying con-

straints of existing approaches, which, in turn, should be taken into

consideration in modeling and predicting user geolocation.

2.4. Graph neural networks

Graph neural networks are effective methods models for analyzing

and learning from data on graphs, and have been successfully applied

to a variety of domains including image processing [48], social net-

works [49], transportation systems [50], etc. Existing GNN models vary

from each others on message passing mechanisms, while most of them

rely on flat information aggregations [15]. There are several hierar-

chical GNN frameworks that gradually coarsen the original graph with

pooling operation for graph classification [51,52] and image recogni-

tion [48,53]. The main difference with our work is how HGNN model

defines the graph hierarchy for clusters and exploits the geographic

information. Directly applying GraphPool [51] or HGP-SL [52] for UG

task is problematic since both of them fail to consider the relative

location of nodes w.r.t. other nodes/clusters and cannot cluster the

unlabeled nodes. Another related work is PGNN [54], recently proposed

to learn the relative position of nodes. However, it does not leverage

nodes’ geographic information that is critical for UG. More importantly,

all these methods are suitable for fully connected graph learning, while

our HGNN model is capable of incorporating unlabeled and isolated

nodes and thus is more suitable for UG task.

Despite the promising performance gains on many graph tasks,

most GNNs are still black-box models without human-understandable

model behaviors and explanations. Although GAT [16] can learn the

importance of edges and thus, to some extent, explain the node

aggregation behaviors via attention mechanism, it is limited to spe-

cific architectures and fails to provide single-instance explanations.

To adaptively adjust the influence of each node, a learnt exploitation

of information from neighborhoods of differing locality and selective

combining of different aggregations was proposed in [55]. Though

their method can automatically discover the importance of each node

in a GNN, it is not specifically designed for explaining model predic-

tions. GNNExplainer [56] was proposed to explain the predictions of

model-agnostic GNNs. It interprets the GNN models by maximizing the

mutual information between a subgraph (or a subset of node features)

and the predictions for the original graph. Another work [57] uses

image interpretation methods, such as sensitivity analysis, guided back-

propagation, and layer-wise relevance propagation (LRP), to explain

the node-level predictions. GraphLIME [58] is a local interpretable

method that captures the nonlinear dependency between features and

predictions. It then considers the perturbation near a node and uses

a linear explanation model to find features as explanations for GNNs.

X-GNN [59] proposes to find the graph patterns that maximize a

particular prediction through graph generation, which is formulated as

a reinforcement learning problem and trained with a policy gradient

method. GNN-LRP [60] is a theoretically founded XAI method for inter-

preting GNN predictions, which is derived from the higher-order Taylor

expansions based on LRP. A recent work [61] systematically reviews

existing explainable GNN methods, and proposes to enable information

fusion for multi-modal causability using interpretable GNNs.

What separates our work from the existing GNN-based approaches is

that we propose a learning model which incorporates the geolocations

and distances and we provide a greater extent of explainability.

剩余12页未读，继续阅读

评论收藏

内容反馈

路由跳变

粉丝: 2755
资源: 19

细粒度IP定位参文27：Identifying user geolocation

最新资源

细粒度IP定位参文27：Identifying user geolocation

K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels

JEDEC JEP158：2009 3D Chip Stack with Through-Silicon Vias (TSVS)：Identifying, Evaluating and Understanding Reliability Interactions - 完整英文电子版（21页）.pdf

Attacks in Cognitive Radio Networks

ssd7 unit1 Skill Builder Identifying Keys and Integrity Constraints B

文章2：全文Identifying and Predicting Autism Spectrum Disorder Based on Multi-Site Structural MRI With Machine Learning.pdf

A novel approach of identifying user intents in microblog

JEDEC JEP158：2009 3D Chip Stack with Through-Silicon Vias (TSVS)

Identifying Encrypted Malware Traffic with Contextual Flow Data.pdf

Identifying multi-instance outlier

Identifying influential spreaders in complex networks

Blockchain.Basics.A.Non-Technical.Introduction.in.25.Steps.pdf

Hot Topic Propagation Model and Opinion Leader Identifying Model

Creating.Maintainable.APIs.A.Practical.Case-Study.Approach

Skill Builder Identifying Keys and Integrity Constraints A

Identifying viruses from metag.emmx

Beginning.Object-Oriented.Programming.with.C#

An inverse geometry problem in identifying irregular

Identifying Transportation Modes from Raw GPS Data.pdf

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

YOLOV5 + 双目相机实现三维测距（新版本）

Unet眼底血管图像分割数据集+代码+模型+系统界面+教学视频.zip

全新的SOTA模型YOLOv9

基于YOLOv8-Pose的姿态识别项目，带数据集可直接跑通的源码

最新资源