yolov8改进资源论文资源-CSDN文库

需积分: 5 34 浏览量 2024-05-14 11:34:10 上传评论收藏 17.61MB PDF 举报

资源推荐

资源详情

资源评论

Citation: Lou, H.; Duan, X.; Guo, J.;

Liu, H.; Gu, J.; Bi, L.; Chen, H.

DC-YOLOv8: Small-Size Object

Detection Algorithm Based on

Camera Sensor. Electronics 2023, 12,

2323. https://doi.org/10.3390/

electronics12102323

Academic Editor: Donghyeon Cho

Received: 6 April 2023

Revised: 15 May 2023

Accepted: 16 May 2023

Published: 21 May 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

electronics

Article

DC-YOLOv8: Small-Size Object Detection Algorithm Based on

Camera Sensor

Haitong Lou

, Xuehu Duan

, Junmei Guo

, Haiying Liu

*, Jason Gu

, Lingyun Bi

and Haonan Chen

The School of Information and Automation Engineering, Qilu University of Technology

(Shandong Academy of Sciences), Jinan 250300, China; 10431210431@stu.qlu.edu.cn (X.D.);

gjm@qlu.edu.cn (J.G.)

The School of Electrical and Computer Engineering, Dalhousie University, Halifax, NS B3J 1Z1, Canada

* Correspondence: haiyingliu2019@qlu.edu.cn

Abstract:

Traditional camera sensors rely on human eyes for observation. However, human eyes

are prone to fatigue when observing objects of different sizes for a long time in complex scenes, and

human cognition is limited, which often leads to judgment errors and greatly reduces efﬁciency.

Object recognition technology is an important technology used to judge the object’s category on a

camera sensor. In order to solve this problem, a small-size object detection algorithm for special

scenarios was proposed in this paper. The advantage of this algorithm is that it not only has higher

precision for small-size object detection but also can ensure that the detection accuracy for each size

is not lower than that of the existing algorithm. There are three main innovations in this paper, as

follows: (1) A new downsampling method which could better preserve the context feature information

is proposed. (2) The feature fusion network is improved to effectively combine shallow information

and deep information. (3) A new network structure is proposed to effectively improve the detection

accuracy of the model. From the point of view of detection accuracy, it is better than YOLOX, YOLOR,

YOLOv3, scaled YOLOv5, YOLOv7-Tiny, and YOLOv8. Three authoritative public datasets are used

in these experiments: (a) In the Visdron dataset (small-size objects), the map, precision, and recall

ratios of DC-YOLOv8 are 2.5%, 1.9%, and 2.1% higher than those of YOLOv8s, respectively. (b) On

the Tinyperson dataset (minimal-size objects), the map, precision, and recall ratios of DC-YOLOv8

are 1%, 0.2%, and 1.2% higher than those of YOLOv8s, respectively. (c) On the PASCAL VOC2007

dataset (normal-size objects), the map, precision, and recall ratios of DC-YOLOv8 are 0.5%, 0.3%, and

0.4% higher than those of YOLOv8s, respectively.

Keywords: small-size objects; object detection; camera sensor; feature fusion

1. Introduction

As one of the most widely used devices, cameras have been an essential device in

various industries and families, such as robotics, monitoring, transportation, medicine,

autonomous driving, and so on [

–

]. A camera sensor is one of the core sensors of the above

requirements; it is composed of a lens, a lens module, a ﬁlter, a CMOS (complementary

metal oxide semiconductor)/CCD (charge-coupled device), ISP (image signal processing),

and a data transmission part. It works by ﬁrst collecting images using optical imaging

principles and ﬁnally performing image signal processing. The application of cameras in

trafﬁc, medicine, automatic driving, etc., is crucial to accurately identify an object, so the

object recognition algorithm is one of the most important parts in a camera sensor.

Traditional video cameras capture a scene and present it on a screen; then, the shape

and type of the object are observed and judged by the human eye. However, human

cognitive ability is limited, and it is difﬁcult to judge the category of the object when the

camera resolution is too low. A complex scene will also strain the human eye, resulting

in the inability to detect some small details. A viable alternative to this problem is to use

camera sensors to ﬁnd areas and categories of interest [6].

Electronics 2023, 12, 2323. https://doi.org/10.3390/electronics12102323 https://www.mdpi.com/journal/electronics

Electronics 2023, 12, 2323 2 of 14

At present, technology for object recognition through a camera is one of the most chal-

lenging topics, and accuracy and real-time performance are the most important indicators

applied in a camera sensor. In recent years, with the ultimate goal of achieving accuracy

or being used in real time, MobileNet [

–

], ShufﬂeNet [

], etc., which can be used

on a CPU, and ResNet [

], DarkNet [

], etc., which can be used on a GPU, have been

proposed by researchers.

At this stage, the most classical object detection algorithms are divided into two kinds:

two-stage object detection algorithms and one-stage object detection algorithms. Two-stage

object detection algorithms include R-CNN (Region-based Convolutional Neural Net-

work) [

], Fast R-CNN [

], Faster R-CNN [

Mask R-CNN [17],

etc. One-stage object

detection algorithms include YOLO series algorithms (you only look once)

[13,18–22]

, SSD

algorithms (Single Shot MultiBox Detector) [

], and so on. The YOLO series of algorithms

is one of the fastest growing and best algorithms so far, especially the novel YOLOv8 algo-

rithm released in 2023, which has reached the highest accuracy so far. However, YOLO only

solves for object of full sizes. When the project becomes a special scene with a special size,

its performance is not as good as some current small-size object detection algorithms [

In order to solve this problem, this paper proposed an improved algorithm for YOLOv8.

The detection accuracy of this algorithm had a stable small improvement for normal-scale

objects and greatly improved the detection accuracy of small objects in complex scenes.

The pixels of small objects are small, which make the detector extract features accurately

and comprehensively during feature extraction. Especially in complex scenes such as object

overlap, it is more difﬁcult to extract information, so the accuracy of various algorithms for

small objects is generally low. Greatly improving the detection accuracy of small objects

in complex scenes while the detection accuracy of normal-scale objects remains stable

or shows slight improvement, the main contributions of the proposed algorithm are as

follows:

(a)

The MDC module is proposed to perform downsampling operations (the method

of concatenating depth-wise separable convolutions, maxpool, and convolutions of

dimension size 3

3 with stride = 2 is presented). It can supplement the information

lost by each module in the downsampling process, making the contextual information

saved in the feature extraction process more complete.

(b) The C2f module in front of the detector in YOLOv8 is replaced by the DC module pro-

posed in this paper (the network structure formed by stacking depth-wise separable

convolution and ordinary convolution). A new network structure is formed by stack-

ing DC modules and fusing each small module continuously. It increases the depth

of the whole structure, achieves higher resolution without signiﬁcant computational

cost, and is able to capture more contextual information.

(c)

The feature fusion method of YOLOv8 is improved, which could perfectly combine

shallow information and deep information, make the information retained during

network feature extraction more comprehensive, and solves the problem of missed

detection due to inaccurate positioning.

This paper is divided into the following parts: Section 2 introduces the reasons

for choosing YOLOv8 as the baseline and the main idea of YOLOv8; Section 3 mainly

introduces the improved method of this paper; Section 4 focuses on the experimental

results and comparative experiments; Section 5 provides the conclusions and directions of

subsequent work and improvement.

2. Related Work

Currently, camera sensors are crucial and have been widely used in real life. Existing

researchers also applied a large number of camera sensors to a variety of different sce-

narios. For example, Zou et al. proposed a new camera-sensor-based obstacle detection

method for day and night on a traditional excavator based on a camera sensor [

]. Addi-

tionally, robust multi-target tracking with camera sensor fusion based on both a camera

sensor and object detection has been proposed by Sengupta et al [

]. There is also the

Electronics 2023, 12, 2323 3 of 14

camera-sensor approach proposed by Bharati applied to assisted navigation for people with

visual impairments [

]. However, in order to be applicable to real life. It can ensure real-

time detection was the most important indicator, so we used the most popular one-stage

algorithm. The YOLO family of algorithms is the state of the art for real-time performance.

2.1. The Reason for Choosing YOLOv8 as the Baseline

This section introduces the most popular algorithms in recent years and describes in

detail some main contents of this paper for YOLOv8 improvement.

YOLO is currently the most popular real-time object detector and can be widely

accepted for the following reasons: (a) lightweight network architecture, (b) effective

feature fusion methods, (c) and more accurate detection results.

In terms of current usage, YOLOv5 and YOLOv7 are the two most widely accepted

algorithms. Deep learning technology to achieve real-time and efﬁcient object detection

tasks is used in YOLOv5. Compared with its predecessor YOLOv4, YOLOv5 had been

improved in terms of model structure, training strategy, and performance. The CSP (Cross-

Stage Partial) network structure was adopted by YOLOv5, which could effectively reduce

repeated calculations and improve computational efﬁciency. However, YOLOv5 also has

some drawbacks. For example, it still has some shortcomings in small object detection,

and the detection effect of dense objects also needs to be improved. Additionally, the

performance of YOLOv5 in complex situations such as occlusion and pose change still

needs to be strengthened.

YOLOv7 proposed a novel training strategy, called Trainable Bag of Freebies (TBoF),

for improving the performance of real-time object detectors. The TBoF method included a

series of trainable tricks, such as data augmentation, MixUp, etc., which could signiﬁcantly

improve the accuracy and generalization ability of the object detector by applying TBoF to

three different types of object detectors (SSD, RetinaNet, and YOLOv3). However, YOLOv7

is also limited by the training data, model structure, and hyperparameters, which leads to

performance degradation in some cases. In addition, the proposed method requires more

computational resources and training time to achieve the best performance.

YOLOv8, published in 2023, aimed to combine the best of many real-time object

detectors. It still adopted the idea of CSP in YOLOv5 [

], feature fusion method (PAN-

FPN) [

], and SPPF module. Its main improvements were the following: (a) It provided

a brand new SOTA model, including P5 640 and P6 1280 resolution object detection net-

works and YOLACT’s instance segmentation model [

]. In order to meet the needs of

different projects, it also designed models of different scales based on the scaling coefﬁcient

similar to YOLOv5. (b) On the premise of retaining the original idea of YOLOv5, the C2f

module was designed by referring to the ELAN structure in YOLOv7 [

]. (c) The detection

head part also used the current popular method (separating the classiﬁcation and detec-

tion heads) [

]. Most of the other parts were still based on the original idea of YOLOv5.

(d) YOLOv8 classiﬁcation loss used BCE loss. The regression Loss was of the form CIOU

loss + DFL, and VFL proposed an asymmetric weighting operation [

]. DFL: The position

of the box was modeled as a general distribution. The network quickly focused on the

distribution of the location close to the object location, and the probability density was

as near the location as possible, as shown in Equation

(1)

is the output of sigmod for

the network, y

and y

i+1

are interval orders, and y is a label. Compared with the previous

YOLO algorithm, YOLOv8 is very extensible. It is a framework that can support previous

versions of YOLO and can switch between different versions, so it is easy to compare the

performance of different versions.

DFL

i+1

)

= −((y

i+1

− y) log(s

) + (y − y

) log(s

i+1

)) (1)

YOLOv8 uses Anchor-Free instead of Anchor-Base. V8 used dynamic TaskAlignedAs-

signer for matching strategy. It calculates the alignment degree of Anchor-level for each

instance using Equation

(2)

is the classiﬁcation score, u is the IOU value, and

and

are the weight hyperparameters. It selects m anchors with the maximum value (t) in each

剩余13页未读，继续阅读

评论收藏

内容反馈

2401_84755150

粉丝: 0
资源: 2

yolov8改进资源 论文

yolov论文-改进YOLOv5s的复杂交通场景路侧目标检测算法

基于YOLOv7改进将顶会论文模块复现加入模型（源码+权重文件+说明文档）.rar

基于改进YOLOv5s的猪脸识别检测方法.pdf

论文研究-移动设备上YOLO模型的改进 .pdf

yolo系列论文原文，包含yolov1~yolov7

本科毕设-YOLOV5+注意力机制训练测试源码

本科毕设-YOLOV5+注意力机制源码及训练测试权重及结果

基于Perclos＆改进YOLOv7的疲劳驾驶DMS检测系统（源码＆教程）

yolov2源码matlab版-object-detection-and-tracking:目标检测与追踪

yolov3 yolov4 channel and layer pruning, Knowledge Distillation 层剪枝，通道剪枝，知识蒸馏-Python开发

用卷积滤波器matlab代码-State-of-the-art-algorithms-:其中包含有关自动驾驶汽车感知的所有计算机视觉和深度学习

用卷积滤波器matlab代码-Hashfly:哈希飞

用卷积滤波器matlab代码-Object-Tracking-Detection-repl:对象跟踪检测替换

第十五届蓝桥杯大赛软件赛省赛C++B组题目

C/C++中文参考手册离线最新版

代码随想录-八股文 pdf

编译器（gcc、g++）

TL（TypeLetters）V2.2免费下载

Qt5.9 C++开发指南.pdf 及示例源码

Qt （高仿Visio）流程图组件开发，源码分享

mingw-w64-install.exe

Qt、QCustomPlot、实时波形绘制、实时曲线绘制

C/C++中文帮助文档

GitKrakenSetup-6.5.1 版本，包括win和linux

2023蓝桥杯C++A组省赛真题

QT7.0.2，2022.05最新版本，包含openssl1.1.1和WebEngine等

PUBG吃鸡罗技鼠标宏

C++面试八股文深度总结

基于eNSP模拟企业网的实现（代码＋毕业设计＋论文）

最新资源

yolov8改进资源论文