没有合适的资源?快使用搜索试试~ 我知道了~
快速跨模态目标跟踪技术,红外可见光融合目标跟踪算法
资源推荐
资源详情
资源评论
Communicated by Dr. H. Yu
Accepted Manuscript
Fast RGB-T Tracking via Cross-Modal Correlation Filters
Sulan Zhai, Pengpeng Shao, Xinyan Liang, Xin Wang
PII: S0925-2312(19)30034-7
DOI: https://doi.org/10.1016/j.neucom.2019.01.022
Reference: NEUCOM 20326
To appear in: Neurocomputing
Received date: 8 May 2018
Revised date: 11 December 2018
Accepted date: 10 January 2019
Please cite this article as: Sulan Zhai, Pengpeng Shao, Xinyan Liang, Xin Wang, Fast
RGB-T Tracking via Cross-Modal Correlation Filters, Neurocomputing (2019), doi:
https://doi.org/10.1016/j.neucom.2019.01.022
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Fast RGB-T Tracking via Cross-Modal Correlation Filters
Sulan Zhai
1
, Pengpeng Shao
1
, Xinyan Liang
1
, Xin Wang
2
1
Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University
2
Shenzhen Raixun Information Technology Co., Ltd. & PKU Shenzhen Institute
Abstract
This paper studies how to perform RGB-T object tracking in the correlation filter
framework. Given the input RGB and thermal videos, we utilize the correlation fil-
ter for each modality due to its high performance in both of accuracy and speed. To
take the interdependency between RGB and thermal modalities, we introduce the low-
rank constraint to learn filters collaboratively, based on the observation that different
modality features should have similar filters to make them have consistent localization
of the target object. For optimization, we design an efficient ADMM (Alternating Di-
rection Method of Multipliers) algorithm to solve the proposed model. Experimental
results on the benchmark datasets (i.e., GTOT, RGBT210 and OSU-CT) suggest that
the proposed approach performs favorably in both accuracy and efficiency against the
state-of-the-art RGB-T methods.
Keywords: RGB-Thermal, Cross-Modal, Correlation Filters, Low-Rank, Tracking
1. Introduction1
The task of visual tracking [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] is to estimate the position2
and size scale of target object in video sequences on the premise of giving initial state3
of target in first frame. It is a fundamental problem in computer vision, and plays4
a critical role in a wide range of applications, such as video surveillance, robotics,5
human-computer interaction, autonomous driving, and public security. Although ob-6
ject tracking makes great progress in recent years, but it still remains challenging due7
to existence of various internal and external factors, such as illumination change, oc-8
clusion, deformation, background clutter, fast motion and bad weather (e.g., rain, haze,9
Preprint submitted to Neurocomputing January 17, 2019
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
smog).10
As a new branch in visual tracking, tracking with RGB and thermal videos (named11
RGB-T tracking) has received more and more interests in the computer vision commu-12
nity due to its effectiveness in handling malfunction of some individual source. These13
two kinds of modalities can provide strong complementary information, and thus ef-14
fective fusion of them is able to boosting tracking performance significantly [11, 12,15
13, 14]. For example, the thermal information [15] has strong ability to penetrate haze16
or smog, and is also insensitive to lighting conditions, which makes it robust to most17
of external challenging factors, such as illumination variation and bad weather. While18
visible spectral sensors possess plenty of fine-grained features, like colors and textures,19
and will be effective to tracking target objects in presence of internal challenges. Fig. 120
shows some typical examples.21
(a) (b)
Figure 1: Illustration of complementary benefits of RGB and thermal images. (a) Complementary benefits of
thermal images over RGB ones, where visible spectrum is disturbed by low illumination, high illumination
and background occlusion while thermal information can overcome them effectively. (b) Complementary
benefits of RGB images over thermal ones, where thermal spectrum is disturbed by thermal crossover and
glass while RGB information can handle them effectively.
Most of existing works employ the sparse representation for RGB-T object track-22
ing in Bayesian filtering framework [11, 12, 13]. For example, Liu et al. [11] design23
a similarity induced by joint sparse representation to construct the likelihood function24
of particle filter tracker so that the color visual spectrum and thermal spectrum images25
can be fused for object tracking. Li et al. [12] propose collaborative sparse represen-26
2
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
tation which introduces modality weights for adaptive fusion of different source data.27
The performance of these trackers, however, might be limited by the following two28
aspects. Firstly, the joint sparse constraints on RGB and thermal information make29
modal consistency too strong to achieve effective fusion. For example, the l
2,1
-norm30
used in [11, 12] encourages column-wise sparse of the joint modal matrix, i.e., all ele-31
ments of one column are zeros or non-zeros, while RGB and thermal data are hetero-32
geneous. Secondly, Bayesian filtering algorithm needs sample a large set of candidates33
for effective tracking, which makes optimization to sparse representation models has a34
high computational complexity and is thus time consuming.35
In this paper, we propose a novel and flexible algorithm for effective RGB-T track-36
ing, which achieves state-of-the-art results in terms of both accuracy and efficiency.37
The proposed approach builds on the correlation filter due to its high computational38
efficiency and robust tracking performance [7, 16, 17, 18, 19, 20, 21]. We employ cor-39
relation filter for each modality, and propose a flexible way to achieve collaborative40
fusion of multiple modalities. In particular, we observe that different modality features41
should have similar correlation filter to make them have consistent localization of the42
target object. To this end, the low-rank constraint is utilized to learn filters jointly to43
take the interdependency between RGB and thermal modalities. Benefiting from the44
proposed fusion method, the learned filters could incorporate useful information from45
different source data and thus obtain robust tracking results, as shown in Fig. 2. For46
optimization to the proposed model, we present an efficient ADMM (Alternating Di-47
rection Method of Multipliers) algorithm [22], which makes our tracker fast enough48
for satisfying real-time applications (e.g., 227 frames per second in the experiments).49
The contributions of the proposed approach are as follows. 1) We propose to use50
low-rank constraint to perform cross-modal correlation filters collaboration which can51
take the interdependency between RGB and thermal modalities in the learning process.52
It is generic, and could incorporate more modalities (e.g., near infrared spectrum). 2)53
We design en efficient solver based on the ADMM algorithm to the proposed cross-54
modal correlation filter model, in which the time consuming step can be converted into55
Fourier domain to improve efficiency significantly. 3) Extensive experiments on large-56
scale benchmark datasets are conducted, and our tracker achieves the state-of-the-art57
3
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
-0.2
200
0
0.2
150
150
0.4
0.6
100
100
0.8
1
50
50
0
0
-0.2
200
0
0.2
150
150
0.4
0.6
100
100
0.8
1
50
50
0
0
-0.2
200
0
0.2
150
150
0.4
0.6
100
100
0.8
1
50
50
0
0
(a)
(b)
(c)
RGB Thermal RGB-T
RGB Thermal RGB-T
Figure 2: Illustration of the effectiveness of the proposed tracker in fusing RGB and thermal modalities. (a)
A pair of RGB and thermal frames. (b) Response maps of RGB, thermal and multiple modalities without
low-rank constraint, respectively. Herein, we use red solid point to represent the peak of response maps, and
show the corresponding results on (a) with red bounding boxes. (c) Response maps of RGB, thermal and
multiple modalities with low-rank constraint, respectively. The results are shown in black colors.
4
剩余26页未读,继续阅读
资源评论
zl201110
- 粉丝: 55
- 资源: 9
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功