1051-8215 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCSVT.2014.2355711, IEEE Transactions on Circuits and Systems for Video Technology
1
Social Attribute-aware Force Model: Exploiting
Richness of Interaction for Abnormal Crowd
Detection
Yanhao Zhang, Lei Qin, Member, IEEE, Rongrong Ji, Senior Member, IEEE, Hongxun Yao, Member, IEEE,
and Qingming Huang, Senior Member, IEEE
Abstract—Interactions between pedestrians usually play an
important role for understanding crowd behavior. However,
there are great challenges facing accurate analysis of pedestrian
interactions, such as occlusions, motion, appearance variance, etc.
In this paper, we introduce a novel social attribute-aware force
model for detection of abnormal crowd events. The proposed
model incorporates social characteristics of crowd behaviors to
improve the description of interactive behaviors. To this end,
we first efficiently estimate the scene scale in an unsupervised
manner. Then, we introduce the concepts of social disorder
and congestion attributes to characterize the interaction of
social behaviors, and construct our crowd interaction model on
the basis of social force by an online fusion strategy. These
attributes encode social interaction characteristics and offer
robustness against motion pattern variance. Abnormal event
detection is finally performed based on the proposed social
attribute-aware force model. In addition, the attribute-aware
interaction force indicates the possible locations of anomalous
interactions. We validate our method on the publicly available
datasets for abnormal detection, and the experimental results
show promising performance compared to alternative and state
of the art methods.
Index Terms—Social Force Model, Social Attributes, Crowd
Behaviors, Abnormal Detection
I. INTRODUCTION
Analyzing crowd behavior has become a salient research
topic in video surveillance and beyond. In contrast to indi-
vidual actions, crowd behaviors are far more challenging to
analyze, due to the possibility for complex interactions among
individuals.
Crowd behavior models aim to describe individuals and
groups in crowded scenes. From a sociological viewpoint,
crowd behaviors usually occur under the constraints of the
sociologically inspired prior knowledge, and hence reflect
Copyright (c) 2014 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
This work was supported in part by National Basic Research Program of
China (973 Program): 2012CB316400, in part by National Natural Science
Foundation of China: 61025011, 61133003, 61332016 and 61035001.
Y. Zhang, Q. Huang, and H. Yao are with School of Computer Science and
Technology, Harbin Institute of Technology, Harbin 150001, China. Q. Huang
is also with University of Chinese Academy of Sciences, Beijing 100190,
China (e-mail: yhzhang@hit.edu.cn; qmhuang@jdl.ac.cn; H.yao@hit.edu.cn).
L. Qin is with Key Laboratory of Intelligent Information Processing of
Chinese Academy of Sciences (CAS), Institute of Computing Technology,
CAS, Beijing 100190, China (e-mail: qinlei@ict.ac.cn).
R. Ji is with Department of Cognitive Science, School of Information
Science and Engineering, Xiamen University, Xiamen 361005, China (e-mail:
rrji@xmu.edu.cn).
high-level semantic interactions. Based on the underlying
motion characteristics, mid-level visual representation has
become increasingly popular, as it can extend the semantic
description of visual content from feature-level to object-level,
focusing on the specific connections and interactions between
the objects. Therefore, constructing mid-level representations
of crowds for exploiting the richness of interactions can lead
to breakthroughs in a wide range of applications.
To achieve this goal of automatically identifying the in-
teraction of crowd behaviors, one emerging and challenging
task here is the detection of abnormal crowd behaviors. Under
such a circumstance, crowd behavior is usually modeled as
a quantitative result of semantic representation, which is
the basis of analyzing and understanding the abnormality.
Given a video clip, the abnormal detection models motion
consistency among individuals, labeling the ones that are
significantly inconsistent with others as “abnormal”. To this
end, extensive work has been proposed towards accurate and
robust abnormality discovery, ranging from motion feature
representation to inconsistency model definition, such as MP-
PCA [1], low-level statistics [2], Dynamic Texture [3], MR-
F [4], and sparse representation [5]. Rather than classifying
the overall “inconsistency”, work in [5], [3], [6], [7] also
focus on detecting and locating the “local” inconsistency or
the “selective” mechanism of the crowded scene.
A. The Issues
Detecting abnormal crowd behaviors, by analyzing low level
semantics of crowds, is still far from real world applications.
Despite the complex models extensively studied in the litera-
tures, key issues remain in the implementation of robust yet
accurate feature representation specified for crowd behaviors,
due to:
• The traditional motion features in the existing works [2],
[8], such as optical flow, Space-Time Interest Point
(STIP) [9], as well as spatial-temporal volume [8] are
incapable of characterizing motion inconsistency among
individuals. For instance, basic representation elements
are supposed to have the same status (such as scale and
motion velocity) without considering dynamic changes
and perspective effects of the scene.
• While a few recent works [10], [11], [12] attempt to
model crowd interactions, such modeling still relies on
the low-level visual and/or motion statistics among indi-
viduals, without regard to their higher-level semantics and