没有合适的资源?快使用搜索试试~ 我知道了~
yolov8改进资源 论文
需积分: 5 2 下载量 34 浏览量
2024-05-14
11:34:10
上传
评论
收藏 17.61MB PDF 举报
温馨提示
试读
14页
yolov8改进资源 论文
资源推荐
资源详情
资源评论
Citation: Lou, H.; Duan, X.; Guo, J.;
Liu, H.; Gu, J.; Bi, L.; Chen, H.
DC-YOLOv8: Small-Size Object
Detection Algorithm Based on
Camera Sensor. Electronics 2023, 12,
2323. https://doi.org/10.3390/
electronics12102323
Academic Editor: Donghyeon Cho
Received: 6 April 2023
Revised: 15 May 2023
Accepted: 16 May 2023
Published: 21 May 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
DC-YOLOv8: Small-Size Object Detection Algorithm Based on
Camera Sensor
Haitong Lou
1
, Xuehu Duan
1
, Junmei Guo
1
, Haiying Liu
1,
*, Jason Gu
2
, Lingyun Bi
1
and Haonan Chen
1
1
The School of Information and Automation Engineering, Qilu University of Technology
(Shandong Academy of Sciences), Jinan 250300, China; 10431210431@stu.qlu.edu.cn (X.D.);
gjm@qlu.edu.cn (J.G.)
2
The School of Electrical and Computer Engineering, Dalhousie University, Halifax, NS B3J 1Z1, Canada
* Correspondence: haiyingliu2019@qlu.edu.cn
Abstract:
Traditional camera sensors rely on human eyes for observation. However, human eyes
are prone to fatigue when observing objects of different sizes for a long time in complex scenes, and
human cognition is limited, which often leads to judgment errors and greatly reduces efficiency.
Object recognition technology is an important technology used to judge the object’s category on a
camera sensor. In order to solve this problem, a small-size object detection algorithm for special
scenarios was proposed in this paper. The advantage of this algorithm is that it not only has higher
precision for small-size object detection but also can ensure that the detection accuracy for each size
is not lower than that of the existing algorithm. There are three main innovations in this paper, as
follows: (1) A new downsampling method which could better preserve the context feature information
is proposed. (2) The feature fusion network is improved to effectively combine shallow information
and deep information. (3) A new network structure is proposed to effectively improve the detection
accuracy of the model. From the point of view of detection accuracy, it is better than YOLOX, YOLOR,
YOLOv3, scaled YOLOv5, YOLOv7-Tiny, and YOLOv8. Three authoritative public datasets are used
in these experiments: (a) In the Visdron dataset (small-size objects), the map, precision, and recall
ratios of DC-YOLOv8 are 2.5%, 1.9%, and 2.1% higher than those of YOLOv8s, respectively. (b) On
the Tinyperson dataset (minimal-size objects), the map, precision, and recall ratios of DC-YOLOv8
are 1%, 0.2%, and 1.2% higher than those of YOLOv8s, respectively. (c) On the PASCAL VOC2007
dataset (normal-size objects), the map, precision, and recall ratios of DC-YOLOv8 are 0.5%, 0.3%, and
0.4% higher than those of YOLOv8s, respectively.
Keywords: small-size objects; object detection; camera sensor; feature fusion
1. Introduction
As one of the most widely used devices, cameras have been an essential device in
various industries and families, such as robotics, monitoring, transportation, medicine,
autonomous driving, and so on [
1
–
5
]. A camera sensor is one of the core sensors of the above
requirements; it is composed of a lens, a lens module, a filter, a CMOS (complementary
metal oxide semiconductor)/CCD (charge-coupled device), ISP (image signal processing),
and a data transmission part. It works by first collecting images using optical imaging
principles and finally performing image signal processing. The application of cameras in
traffic, medicine, automatic driving, etc., is crucial to accurately identify an object, so the
object recognition algorithm is one of the most important parts in a camera sensor.
Traditional video cameras capture a scene and present it on a screen; then, the shape
and type of the object are observed and judged by the human eye. However, human
cognitive ability is limited, and it is difficult to judge the category of the object when the
camera resolution is too low. A complex scene will also strain the human eye, resulting
in the inability to detect some small details. A viable alternative to this problem is to use
camera sensors to find areas and categories of interest [6].
Electronics 2023, 12, 2323. https://doi.org/10.3390/electronics12102323 https://www.mdpi.com/journal/electronics
Electronics 2023, 12, 2323 2 of 14
At present, technology for object recognition through a camera is one of the most chal-
lenging topics, and accuracy and real-time performance are the most important indicators
applied in a camera sensor. In recent years, with the ultimate goal of achieving accuracy
or being used in real time, MobileNet [
7
–
9
], ShuffleNet [
10
,
11
], etc., which can be used
on a CPU, and ResNet [
12
], DarkNet [
13
], etc., which can be used on a GPU, have been
proposed by researchers.
At this stage, the most classical object detection algorithms are divided into two kinds:
two-stage object detection algorithms and one-stage object detection algorithms. Two-stage
object detection algorithms include R-CNN (Region-based Convolutional Neural Net-
work) [
14
], Fast R-CNN [
15
], Faster R-CNN [
16
],
Mask R-CNN [17],
etc. One-stage object
detection algorithms include YOLO series algorithms (you only look once)
[13,18–22]
, SSD
algorithms (Single Shot MultiBox Detector) [
23
], and so on. The YOLO series of algorithms
is one of the fastest growing and best algorithms so far, especially the novel YOLOv8 algo-
rithm released in 2023, which has reached the highest accuracy so far. However, YOLO only
solves for object of full sizes. When the project becomes a special scene with a special size,
its performance is not as good as some current small-size object detection algorithms [
24
,
25
].
In order to solve this problem, this paper proposed an improved algorithm for YOLOv8.
The detection accuracy of this algorithm had a stable small improvement for normal-scale
objects and greatly improved the detection accuracy of small objects in complex scenes.
The pixels of small objects are small, which make the detector extract features accurately
and comprehensively during feature extraction. Especially in complex scenes such as object
overlap, it is more difficult to extract information, so the accuracy of various algorithms for
small objects is generally low. Greatly improving the detection accuracy of small objects
in complex scenes while the detection accuracy of normal-scale objects remains stable
or shows slight improvement, the main contributions of the proposed algorithm are as
follows:
(a)
The MDC module is proposed to perform downsampling operations (the method
of concatenating depth-wise separable convolutions, maxpool, and convolutions of
dimension size 3
×
3 with stride = 2 is presented). It can supplement the information
lost by each module in the downsampling process, making the contextual information
saved in the feature extraction process more complete.
(b) The C2f module in front of the detector in YOLOv8 is replaced by the DC module pro-
posed in this paper (the network structure formed by stacking depth-wise separable
convolution and ordinary convolution). A new network structure is formed by stack-
ing DC modules and fusing each small module continuously. It increases the depth
of the whole structure, achieves higher resolution without significant computational
cost, and is able to capture more contextual information.
(c)
The feature fusion method of YOLOv8 is improved, which could perfectly combine
shallow information and deep information, make the information retained during
network feature extraction more comprehensive, and solves the problem of missed
detection due to inaccurate positioning.
This paper is divided into the following parts: Section 2 introduces the reasons
for choosing YOLOv8 as the baseline and the main idea of YOLOv8; Section 3 mainly
introduces the improved method of this paper; Section 4 focuses on the experimental
results and comparative experiments; Section 5 provides the conclusions and directions of
subsequent work and improvement.
2. Related Work
Currently, camera sensors are crucial and have been widely used in real life. Existing
researchers also applied a large number of camera sensors to a variety of different sce-
narios. For example, Zou et al. proposed a new camera-sensor-based obstacle detection
method for day and night on a traditional excavator based on a camera sensor [
1
]. Addi-
tionally, robust multi-target tracking with camera sensor fusion based on both a camera
sensor and object detection has been proposed by Sengupta et al [
26
]. There is also the
Electronics 2023, 12, 2323 3 of 14
camera-sensor approach proposed by Bharati applied to assisted navigation for people with
visual impairments [
27
]. However, in order to be applicable to real life. It can ensure real-
time detection was the most important indicator, so we used the most popular one-stage
algorithm. The YOLO family of algorithms is the state of the art for real-time performance.
2.1. The Reason for Choosing YOLOv8 as the Baseline
This section introduces the most popular algorithms in recent years and describes in
detail some main contents of this paper for YOLOv8 improvement.
YOLO is currently the most popular real-time object detector and can be widely
accepted for the following reasons: (a) lightweight network architecture, (b) effective
feature fusion methods, (c) and more accurate detection results.
In terms of current usage, YOLOv5 and YOLOv7 are the two most widely accepted
algorithms. Deep learning technology to achieve real-time and efficient object detection
tasks is used in YOLOv5. Compared with its predecessor YOLOv4, YOLOv5 had been
improved in terms of model structure, training strategy, and performance. The CSP (Cross-
Stage Partial) network structure was adopted by YOLOv5, which could effectively reduce
repeated calculations and improve computational efficiency. However, YOLOv5 also has
some drawbacks. For example, it still has some shortcomings in small object detection,
and the detection effect of dense objects also needs to be improved. Additionally, the
performance of YOLOv5 in complex situations such as occlusion and pose change still
needs to be strengthened.
YOLOv7 proposed a novel training strategy, called Trainable Bag of Freebies (TBoF),
for improving the performance of real-time object detectors. The TBoF method included a
series of trainable tricks, such as data augmentation, MixUp, etc., which could significantly
improve the accuracy and generalization ability of the object detector by applying TBoF to
three different types of object detectors (SSD, RetinaNet, and YOLOv3). However, YOLOv7
is also limited by the training data, model structure, and hyperparameters, which leads to
performance degradation in some cases. In addition, the proposed method requires more
computational resources and training time to achieve the best performance.
YOLOv8, published in 2023, aimed to combine the best of many real-time object
detectors. It still adopted the idea of CSP in YOLOv5 [
28
], feature fusion method (PAN-
FPN) [
29
,
30
], and SPPF module. Its main improvements were the following: (a) It provided
a brand new SOTA model, including P5 640 and P6 1280 resolution object detection net-
works and YOLACT’s instance segmentation model [
31
]. In order to meet the needs of
different projects, it also designed models of different scales based on the scaling coefficient
similar to YOLOv5. (b) On the premise of retaining the original idea of YOLOv5, the C2f
module was designed by referring to the ELAN structure in YOLOv7 [
22
]. (c) The detection
head part also used the current popular method (separating the classification and detec-
tion heads) [
32
]. Most of the other parts were still based on the original idea of YOLOv5.
(d) YOLOv8 classification loss used BCE loss. The regression Loss was of the form CIOU
loss + DFL, and VFL proposed an asymmetric weighting operation [
33
]. DFL: The position
of the box was modeled as a general distribution. The network quickly focused on the
distribution of the location close to the object location, and the probability density was
as near the location as possible, as shown in Equation
(1)
.
s
i
is the output of sigmod for
the network, y
i
and y
i+1
are interval orders, and y is a label. Compared with the previous
YOLO algorithm, YOLOv8 is very extensible. It is a framework that can support previous
versions of YOLO and can switch between different versions, so it is easy to compare the
performance of different versions.
DFL
(s
i
,s
i+1
)
= −((y
i+1
− y) log(s
i
) + (y − y
i
) log(s
i+1
)) (1)
YOLOv8 uses Anchor-Free instead of Anchor-Base. V8 used dynamic TaskAlignedAs-
signer for matching strategy. It calculates the alignment degree of Anchor-level for each
instance using Equation
(2)
,
s
is the classification score, u is the IOU value, and
α
and
β
are the weight hyperparameters. It selects m anchors with the maximum value (t) in each
剩余13页未读,继续阅读
资源评论
2401_84755150
- 粉丝: 0
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功