【免费】ID-YOLO：基于驾驶员固定区域的实时突出目标检测（资源-CSDN文库

需积分: 0 124 浏览量 2022-10-17 11:32:03 上传评论收藏 10.85MB PDF 举报

在智能交通系统领域，对象检测是一项至关重要的任务，特别是在自动驾驶车辆和高级驾驶辅助系统(ADAS)中。这项技术能够帮助车辆实时理解周围环境，识别潜在的危险，并为驾驶员提供必要的安全信息。传统的计算机视觉方法通常关注于检测驾驶场景中的所有物体，然而，考虑到驾驶环境中信息的快速变化，关注驾驶员视线焦点所在的显著或关键物体更为有效。驾驶员的视觉选择性注意力是其视觉系统中的关键神经机制，能快速过滤掉不相关的视觉信息。ID-YOLO（Increase-Decrease YOLO）正是针对这一机制提出的一种实时显著对象检测网络。该模型旨在检测驾驶员感兴趣或与安全相关的区域内的关键物体，而不是检测所有出现在驾驶场景中的物体。为了训练和验证ID-YOLO，研究者构建了一个增强的眼动追踪对象检测（ETOD）数据集。这个数据集基于驾驶视频，包含了多个驾驶员的眼动数据，由Deng等人收集。这样的数据集能够反映不同驾驶员在驾驶过程中的视线焦点，从而为训练模型提供了丰富的信息。 ID-YOLO网络设计的核心是区分驾驶员固定区域内的关键物体。它在YOLO（You Only Look Once）框架的基础上进行了改进，YOLO是一种广泛使用的实时对象检测算法，以其快速的检测速度和相对较高的准确性而闻名。ID-YOLO通过增加和减少策略来关注驾驶员的视线焦点，提高了对关键对象的检测性能，同时减少了对无关场景信息的干扰。与其他现有的自动驾驶和辅助驾驶系统的对象检测模型相比，ID-YOLO模拟了驾驶员的注意力选择机制，只关注对驾驶安全最重要的对象。这不仅降低了计算资源的需求，还提高了检测的效率和准确性。因此，ID-YOLO在网络复杂性和性能之间找到了一个良好的平衡，对于实际的智能或辅助驾驶系统具有极大的应用潜力。在实验部分，ID-YOLO的性能通过一系列基准测试得到了验证。这些测试表明，即使在复杂的驾驶环境中，ID-YOLO也能有效地检测出驾驶员关注的关键物体，如其他车辆、行人、交通标志等，这对于提高驾驶安全性和舒适性具有重要意义。总结起来，ID-YOLO是针对驾驶员视觉注意力机制的实时显著目标检测网络，通过结合眼动追踪数据和改进的YOLO架构，实现了对驾驶场景中关键对象的有效检测。这种创新方法为智能交通系统带来了新的视角，有望在未来的ADAS和自动驾驶技术中发挥重要作用，提升驾驶安全水平。

资源详情

资源评论

15898 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 9, SEPTEMBER 2022

ID-YOLO: Real-Time Salient Object Detection

Based on the Driver’s Fixation Region

Long Qin , Yi Shi, Yahui He, Junrui Zhang, Xianshi Zhang, Yongjie Li , Senior Member, IEEE,

Tao Deng , Member, IEEE, and Hongmei Yan

Abstract— Object detection is an important task for

self-driving vehicles or advanced driver assistant systems

(ADASs). Additionally, visual selective attention is a crucial

neural mechanism in a driver’s vision system that can rapidly

ﬁlter out unnecessary visual information in a driving scene. Some

existing models detect all objects in driving scenes from the

aspect of computer vision. However, in a rapidly changing driving

environment, detecting salient or critical objects appearing in

drivers’ interested or safety-relevant areas is more useful for

ADASs. In this paper, we managed to detect salient and critical

objects based on drivers’ ﬁxation regions. To this end, we built

an augmented eye tracking object detection (ETOD) dataset

based on driving videos with multiple drivers’ eye movement

collected by Deng et al. Furthermore, we proposed a real-time

salient object detection network named increase-decrease YOLO

(ID-YOLO) to discriminate the critical objects within the drivers’

ﬁxation region. The proposed ID-YOLO shows excellent detection

of major objects that drivers are concerned about during

driving. Compared with the present object detection models

in autonomous and assisted driving systems, our object detec-

tion framework simulates the selective attention mechanism of

drivers. Thus, it does not detect all of the objects appearing

in the driving scenes but only detects the most relevant ones for

driving safety. It can largely reduce the interference of irrelevant

scene information, showing potential practical applications in

intelligent or assisted driving systems.

Index Terms— Visual attention, eye tracking, object detection,

YOLO, trafﬁc driving.

I. INTRODUCTION

RIVER assistance technology is currently deployed on

production vehicles and is commonly referred to as

Human-centric Advanced Driver Assistant Systems (ADASs).

ADASs sense the surrounding environment while driving to

Manuscript received 10 October 2020; revised 15 May 2021,

13 September 2021, and 6 December 2021; accepted 19 January 2022.

Date of publication 3 February 2022; date of current version 12 September

2022. This work was supported in part by the National Natural Science

Foundation of China under Grant 61773094, Grant 61503059, and Grant

62106208; in part by the China Postdoctoral Science Foundation under Grant

2021TQ0272 and Grant 2021M702715; and in part by the Sichuan Science

and Technology Program under Grant 2020JDRC0031. The Associate Editor

for this article was J. W. Choi. (Corresponding authors: Tao Deng; Hongmei

Yan.)

Long Qin, Yi Shi, Yahui He, Junrui Zhang, Xianshi Zhang, Yongjie Li,

and Hongmei Yan are with the MOE Key Laboratory for Neuroinformation,

School of Life Science and Technology, University of Electronic Science and

Technology of China, Chengdu 610054, China (e-mail: hmyan@uestc.edu.cn).

Tao Deng is with the School of Information Science and Technology,

Southwest Jiaotong University, Chengdu 611756, China, and also with the

MOE Key Laboratory for Neuroinformation, School of Life Science and

Technology, University of Electronic Science and Technology of China,

Chengdu 610054, China (e-mail: tdeng@swjtu.edu.cn).

Digital Object Identiﬁer 10.1109/TITS.2022.3146271

detect, identify and track dynamic objects and then conduct

system calculations and analyses (such as adaptive cruise

control (ACC), blind spot control and lane change assis-

tance [1], [2]) so that vehicles can detect possible dangers

in advance, thereby signiﬁcantly increasing the comfort and

safety of driving a car.

However, a trafﬁc-laden driving environment is a complex

and dynamic scenario in which many objective and subjective

factors are combined. These factors attract the driver’s gaze

and attention and can come from bottom-up sensory stimuli

and top-down goals or experiences. While driving, drivers

often focus their attention on the most important and salient

object. Sometimes, there is more than one salient object. For

example, drivers must pay attention to surrounding driving

vehicles and roadside pedestrians when crossing an intersec-

tion. Detecting the objects accurately and quickly in the salient

locations that drivers focus on in a driving environment is an

important and challenging issue for driving assistance systems.

To achieve this goal, saliency detection models, object detec-

tion models and large databases must be developed. To date,

many achievements have been made regarding the above three

aspects, and we will brieﬂy review them in Section II.

For current object detection in driving tasks, there are at

least two problems: (1) the often used state-of-the-art object

detection models such as YOLOv3 detect all objects appearing

in trafﬁc scenes. However, not all objects are closely related to

driving tasks, and too many detected redundant objects may

interfere with decision-making and inﬂuence driving safety.

(2) There is a lack of appropriate trafﬁc object detection

datasets based on drivers’ attention. To solve these problems,

we accomplished the following:

• We built an eye tracking object detection (ETOD)

dataset based on trafﬁc-laden driving videos with eye

movement tracking from multiple drivers collected by

Deng et al. [3]. The ETOD is an augmented version

of [3]. The key objects appearing within the drivers’ ﬁx-

ation regions were labeled by bounding box annotations.

• Based on the selective attention mechanism of the drivers,

we proposed a trafﬁc salient object detection framework

with increased prediction scale and decreased network

depth YOLO (ID-YOLO) to predict the objects within

drivers’ ﬁxation regions. The ID-YOLO network with

33 layers and 5 different detection scales was trained by

the ETOD dataset, which included bottom-up and top-

down attention information from multiple drivers.

• Finally, we compared the performance of our model

with that of other methods. The experimental results

demonstrated that our model can detect the key objects

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: DALIAN MARITIME UNIVERSITY. Downloaded on September 27,2022 at 06:45:53 UTC from IEEE Xplore. Restrictions apply.

QIN et al.: ID-YOLO: REAL-TIME SALIENT OBJECT DETECTION BASED ON DRIVER’S FIXATION REGION 15899

that appeared within the ﬁxational areas more accurately

and quickly, outperforming other object detection models.

For the application of the driver assistance system, our

model outperforms the traditional state-of-the-art object detec-

tion models in autonomous and assisted driving systems.

Moreover, ID-YOLO is extremely fast, achieving a speed

of 68 FPS and an accuracy of 85.9% mAP. It is well-

known that the detection speed is also an important factor

for consumer grade vehicles. However, it does not detect all of

the objects in the driving scenes. Instead, it provides the most

relevant and salient objects based on multiple drivers’ atten-

tion, which can largely reduce the interference of unrelated

scene information when driving decision-making. In addition,

our model can predict the attention distribution of drivers in

the driving environment by taking advantage of the driver’s

attention experience.

II. R

ELATED WORKS

In the past two decades, a number of studies on saliency

detection/ﬁxation prediction, salient object detection and

related datasets have been published. In this section, the related

works are brieﬂy reviewed.

A. Saliency Detection/Fixation Prediction Models

Trafﬁc saliency detection is an important task for pre-

dicting a driver’s attention in a speciﬁc driving environment

and is a hot topic in intelligent vehicle systems. Many

algorithms and models have been proposed to predict a

human’s ﬁxation location or attention saliency map [4]. Some

researchers use driver monitoring systems to estimate the

direction of the driver’s gaze or ﬁxation area from head pose

and eye position cues [5]–[7]. For example, Pugeault et al.

analyzed driver’s precautions at the T junction [8]. The

authors studied the subtle effects by analyzing the saliency

of the subject and proposed a vision-based method to pre-

dict the driver’s behavior in real time, but not ﬁxation.

Alletto et al. built a true driving attention-based dataset called

DR(eye)VE [9]. With the development of deep learning and

the availability of large-scale saliency datasets, saliency detec-

tion models have achieved great improvement. For example,

Palazzi et al. [10] proposed a driver attention prediction

model based on the DR(eye)VE dataset using a deep learning

method. Tawari et al. [7] proposed a Bayesian framework to

model the visual attention of a human driver and developed

a fully convolutional neural network to detect the salient

region based on the DR(eye)VE dataset. Lai et al. [11] intro-

duced a novel residual attentive learning network architecture

for predicting dynamic eye-ﬁxation maps. Wang et al. [12]

presented a novel image saliency detection method called

saliency transfer. In the follow-up, the authors further pro-

posed a framework for automatically producing thumb-

nails from stereo image pairs [13], [14]. Baee et al. [15]

introduced a maximum entropy deep inverse reinforcement

learning (MEDIRL) framework for modeling the visual

attention allocation of drivers in imminent rear-end collisions.

Deng et al. proposed a bottom-up and top-down combined

saliency detection model to predict the driver’s direct atten-

tion area by utilizing attention mechanism [16] and random

forest learning methods [17]. Deng et al. proposed a compact

Fig. 1. Comparison of ﬁxation prediction model and object detection model in

trafﬁc-laden driving scenes. (a) Fixational salient regions predicted by CDNN,

which can only predict where drivers look [3]. (b) Salient objects detected by

our proposed model, which can present what and why drivers pay attention

to in these salient regions.

convolution-deconvolutional neural network (CDNN)-based

trafﬁc video to predict a driver’s dynamic ﬁxation location [3].

Recently, they proposed a driving video ﬁxation prediction

model based on the spatio-temporal networks and attention

gates method (DSTANet) [18]. These models can predict

the driver’s attention area accurately and largely reduce the

interference of irrelevant information/objects.

However, in a rapidly changing driving environment, it is

not sufﬁcient to provide only the driver’s ﬁxation location for

the driver assistance system and intelligent driving vehicle.

Detecting what attracts the driver’s attention in the saliency

area can provide more useful information. For example,

as shown in Fig. 1(a), the saliency prediction of CDNN [3]

provided two saliency areas at the end of the road and nearby,

but it was unable to present what objects occupy these areas.

Therefore, these saliency detection/ﬁxation prediction models

can predict where drivers look but cannot present what and

why drivers focus on. Detecting and providing the categories

of objects in the saliency areas to which drivers pay attention

is an important task for assisting driving systems. Instead

of providing saliency areas, this work aims to detect and

provide the speciﬁc object categories in the two saliency areas,

asshowninFig.1(b).

B. Object Detectio n Models

Currently, there are several advanced object detection mod-

els such as the Single Shot MultiBox Detector (SSD) [19],

Authorized licensed use limited to: DALIAN MARITIME UNIVERSITY. Downloaded on September 27,2022 at 06:45:53 UTC from IEEE Xplore. Restrictions apply.

剩余10页未读，继续阅读

评论收藏

内容反馈

ID-YOLO：基于驾驶员固定区域的实时突出目标检测（

评论0

最新资源

ID-YOLO：基于驾驶员固定区域的实时突出目标检测（

评论0

最新资源

相关推荐

YOLO-World：实时开放词汇对象检测

CSL-YOLO：一种用于边缘计算的新型轻量级目标检测系统.7z

使用YOLO进行实时目标检测：项目实战.md

YOLO-TLA：基于YOLOv5的高效轻量级小目标检测模型

D-YOLO：在恶劣天气条件下进行物体检测的强大框架

基于改进YOLO和迁移学习的水下鱼类目标实时检测.pdf

Python-一种基于简单池设计用于实时突出目标检测

一种基于YOLO的交通目标实时检测方法.pdf

目标检测经典论文-YOLO论文翻译：（YOLO：统一的实时目标检测）

ios_camera_object_detection:基于TensorFlow和YOLO模型的实时移动可视化目标检测

Attention-YOLO：引入注意力机制的 YOLO 检测算法

LF-YOLO：一种轻量且更快速的用于X射线图像焊接缺陷检测的YOLO

PP-YOLO模型.zip

Mamba-YOLO基于 SSM 的对象检测官方 pytorch 实现.zip

Pruned-YOLO:使用模型修剪方法获得基于YOLOv5的紧凑模型Pruned-YOLOv5

keras-yolo3 实时目标检测

keras-yolo4-master.zip

参考论文1：SSDA-YOLO: SEMI-SUPERVISED DOMAIN ADAPTIVE YOLO FOR CROSS

基于python-opencv-yolo的目标检测.zip

keras-yolo3-recognize.rar

Tiny-yolo预训练模型darknet.conv.weights

【图像检测】基于YoloV4实现点云上实时 3D 对象检测附论文和python代码.zip

深度学习论文：华为提出Gold-YOLO，高效实时目标检测器

YOLO-Former：YOLO与ViT握手

YOLO系列算法旋翼无人机目标检测 YOLO无人机检测数据集-drone-part1.zip

C# Onnx 阿里达摩院开源DAMO-YOLO目标检测 源码

RCS-YOLO: A Fast and High-Accuracy Object Detector for Brain Tum

YOLOV5安全帽检测数据集-yolo格式-voc格式 图片标注

C# Onnx 阿里达摩院开源DAMO-YOLO目标检测源码

YOLOV5安全帽检测数据集-yolo格式-voc格式图片标注