15898 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2022
ID-YOLO: Real-Time Salient Object Detection
Based on the Driver’s Fixation Region
Long Qin , Yi Shi, Yahui He, Junrui Zhang, Xianshi Zhang, Yongjie Li , Senior Member, IEEE,
Tao Deng , Member, IEEE, and Hongmei Yan
Abstract— Object detection is an important task for
self-driving vehicles or advanced driver assistant systems
(ADASs). Additionally, visual selective attention is a crucial
neural mechanism in a driver’s vision system that can rapidly
filter out unnecessary visual information in a driving scene. Some
existing models detect all objects in driving scenes from the
aspect of computer vision. However, in a rapidly changing driving
environment, detecting salient or critical objects appearing in
drivers’ interested or safety-relevant areas is more useful for
ADASs. In this paper, we managed to detect salient and critical
objects based on drivers’ fixation regions. To this end, we built
an augmented eye tracking object detection (ETOD) dataset
based on driving videos with multiple drivers’ eye movement
collected by Deng et al. Furthermore, we proposed a real-time
salient object detection network named increase-decrease YOLO
(ID-YOLO) to discriminate the critical objects within the drivers’
fixation region. The proposed ID-YOLO shows excellent detection
of major objects that drivers are concerned about during
driving. Compared with the present object detection models
in autonomous and assisted driving systems, our object detec-
tion framework simulates the selective attention mechanism of
drivers. Thus, it does not detect all of the objects appearing
in the driving scenes but only detects the most relevant ones for
driving safety. It can largely reduce the interference of irrelevant
scene information, showing potential practical applications in
intelligent or assisted driving systems.
Index Terms— Visual attention, eye tracking, object detection,
YOLO, traffic driving.
I. INTRODUCTION
D
RIVER assistance technology is currently deployed on
production vehicles and is commonly referred to as
Human-centric Advanced Driver Assistant Systems (ADASs).
ADASs sense the surrounding environment while driving to
Manuscript received 10 October 2020; revised 15 May 2021,
13 September 2021, and 6 December 2021; accepted 19 January 2022.
Date of publication 3 February 2022; date of current version 12 September
2022. This work was supported in part by the National Natural Science
Foundation of China under Grant 61773094, Grant 61503059, and Grant
62106208; in part by the China Postdoctoral Science Foundation under Grant
2021TQ0272 and Grant 2021M702715; and in part by the Sichuan Science
and Technology Program under Grant 2020JDRC0031. The Associate Editor
for this article was J. W. Choi. (Corresponding authors: Tao Deng; Hongmei
Yan.)
Long Qin, Yi Shi, Yahui He, Junrui Zhang, Xianshi Zhang, Yongjie Li,
and Hongmei Yan are with the MOE Key Laboratory for Neuroinformation,
School of Life Science and Technology, University of Electronic Science and
Technology of China, Chengdu 610054, China (e-mail: hmyan@uestc.edu.cn).
Tao Deng is with the School of Information Science and Technology,
Southwest Jiaotong University, Chengdu 611756, China, and also with the
MOE Key Laboratory for Neuroinformation, School of Life Science and
Technology, University of Electronic Science and Technology of China,
Chengdu 610054, China (e-mail: tdeng@swjtu.edu.cn).
Digital Object Identifier 10.1109/TITS.2022.3146271
detect, identify and track dynamic objects and then conduct
system calculations and analyses (such as adaptive cruise
control (ACC), blind spot control and lane change assis-
tance [1], [2]) so that vehicles can detect possible dangers
in advance, thereby significantly increasing the comfort and
safety of driving a car.
However, a traffic-laden driving environment is a complex
and dynamic scenario in which many objective and subjective
factors are combined. These factors attract the driver’s gaze
and attention and can come from bottom-up sensory stimuli
and top-down goals or experiences. While driving, drivers
often focus their attention on the most important and salient
object. Sometimes, there is more than one salient object. For
example, drivers must pay attention to surrounding driving
vehicles and roadside pedestrians when crossing an intersec-
tion. Detecting the objects accurately and quickly in the salient
locations that drivers focus on in a driving environment is an
important and challenging issue for driving assistance systems.
To achieve this goal, saliency detection models, object detec-
tion models and large databases must be developed. To date,
many achievements have been made regarding the above three
aspects, and we will briefly review them in Section II.
For current object detection in driving tasks, there are at
least two problems: (1) the often used state-of-the-art object
detection models such as YOLOv3 detect all objects appearing
in traffic scenes. However, not all objects are closely related to
driving tasks, and too many detected redundant objects may
interfere with decision-making and influence driving safety.
(2) There is a lack of appropriate traffic object detection
datasets based on drivers’ attention. To solve these problems,
we accomplished the following:
• We built an eye tracking object detection (ETOD)
dataset based on traffic-laden driving videos with eye
movement tracking from multiple drivers collected by
Deng et al. [3]. The ETOD is an augmented version
of [3]. The key objects appearing within the drivers’ fix-
ation regions were labeled by bounding box annotations.
• Based on the selective attention mechanism of the drivers,
we proposed a traffic salient object detection framework
with increased prediction scale and decreased network
depth YOLO (ID-YOLO) to predict the objects within
drivers’ fixation regions. The ID-YOLO network with
33 layers and 5 different detection scales was trained by
the ETOD dataset, which included bottom-up and top-
down attention information from multiple drivers.
• Finally, we compared the performance of our model
with that of other methods. The experimental results
demonstrated that our model can detect the key objects
1558-0016 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: DALIAN MARITIME UNIVERSITY. Downloaded on September 27,2022 at 06:45:53 UTC from IEEE Xplore. Restrictions apply.
评论0
最新资源