【免费】yolofacev2论文_yolofastestv2网络结构资源-CSDN文库

需积分: 0 38 浏览量 2023-05-05 10:53:43 上传评论收藏 2.59MB PDF 举报

YOLO-FaceV2是基于深度学习的实时面部检测算法，该算法是在流行的单阶段检测器YOLOv5的基础上进行改进的。近年来，基于深度学习的面部检测算法取得了显著的进步，其中分为两类：两阶段检测器（如Faster R-CNN）和一阶段检测器（如YOLO）。由于在准确性和速度之间具有更好的平衡，一阶段检测器在许多应用中得到了广泛使用。 YOLO-FaceV2的核心创新点包括： 1. **Receptive Field Enhancement (RFE) 模块**：针对小脸部的检测问题，RFE模块被设计来增强感受野。感受野是指神经网络能够捕捉到的输入图像的区域范围，对于小目标检测至关重要。通过增大对小脸部的感受野，可以提高检测的精度。 2. **NWD Loss**：为了弥补IoU（Intersection over Union）损失函数对微小物体位置偏差的敏感性，作者引入了NWD Loss。IoU通常用于衡量预测框与真实框的重叠程度，但对微小目标的位置变化很敏感。NWD Loss可能提供了更稳定和精确的定位性能。 3. **Self-Attention Enhanced Module (SEAM)**：针对面部遮挡问题，SEAM是一种注意力机制，能够更好地关注未被遮挡的部分，从而提升遮挡脸部的检测能力。 4. **Repulsion Loss**：为了解决遮挡或重叠人脸的检测难题，Repulsion Loss被引入来增加预测框之间的排斥力，防止它们过度重叠，确保每个脸部都能被正确识别。 5. **Slide Weight Function**：为了解决易样本和难样本之间的不平衡问题，滑动权重函数被采用，它可以根据样本的难易程度动态调整权重，使得训练过程更加公平，提高整体检测性能。 6. **Effective Receptive Field-based Anchor Design**：利用有效感受野的信息来设计锚点（Anchor），锚点是预测框的初始参考，改进后的设计有助于更准确地匹配脸部大小和比例。在WiderFace数据集上的实验结果显示，YOLO-FaceV2在所有难度级别（easy、medium、hard）上都优于YOLO及其变种。这表明提出的改进策略有效地提高了对小脸部、遮挡脸部以及复杂环境下的面部检测性能。 YOLO-FaceV2论文提出的这些技术改进，旨在提升面部检测的准确度、鲁棒性和实时性，特别关注了小目标检测、遮挡处理和样本不均衡的问题。这一工作为深度学习在面部检测领域的应用提供了新的思路和解决方案。源代码可在<https://github.com/Krasjet-Yu/YOLO-FaceV2>找到，供研究者和开发者进一步研究和应用。

资源推荐

资源详情

资源评论

YOLO-FaceV2: A Scale and Occlusion Aware Face

Detector

Ziping Yu

, Hongbo Huang

∗2

, Weijun Chen

, Yongxin Su

, Yahui Liu

, and Xiuying

Wang

School of Instrument Science and Opto-electronic Engineering, Beijing Information Science and Technology

University, Beijing, China

Computer School, Beijing Information Science and Technology University, Beijing, China

Data Algorithm, NIO, Shanghai, China

School of Mechanical and Electrical Engineering, Beijing Information Science and Technology University,

Beijing, China

School of Information Management, Beijing Information Science and Technology University, Beijing, China

Abstract

In recent years, face detection algorithms based on deep learning have made

great progress. These algorithms can be generally divided into two categories,

i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO. Be-

cause of the better balance between accuracy and speed, one-stage detectors have

been widely used in many applications. In this paper, we propose a real-time face

detector based on the one-stage detector YOLOv5, named YOLO-FaceV2. We

design a Receptive Field Enhancement module called RFE to enhance receptive

ﬁeld of small face, and use NWD Loss to make up for the sensitivity of IoU to

the location deviation of tiny objects. For face occlusion, we present an attention

module named SEAM and introduce Repulsion Loss to solve it. Moreover, we use

a weight function Slide to solve the imbalance between easy and hard samples and

use the information of the eﬀective receptive ﬁeld to design the anchor. The ex-

perimental results on WiderFace dataset show that our face detector outperforms

YOLO and its variants can be ﬁnd in all easy, medium and hard subsets. Source

code in https://github.com/Krasjet-Yu/YOLO-FaceV2

Keywords— Face detection, YOLO, Scale-Aware, Loss function, Imbalance problem

1 Introduction

Face detection is an essential step in many face-related applications, such as face recognition,

face veriﬁcation and face attribute analysis, etc. With the booming of deep convolutional

neural networks in recent years, the performance of face detectors has been greatly improved.

Many high-performance face detection algorithms based on deep learning have been proposed.

Generally, these algorithms can be divided into two branches. One branch of typical deep-

learning-based face detection algorithms [1, 2, 3] uses cascading means of neural networks as

∗

Corresponding Author: hhb@bistu.edu.cn

arXiv:2208.02019v2 [cs.CV] 4 Aug 2022

feature extractors and classiﬁers to detect faces from coarse to ﬁne. Despite their great success,

it is important to note that cascade detectors suﬀer some drawbacks such as having diﬃculties

in training and slow detection speed. The other branch is improved from general purpose

object detection algorithms [4, 5, 6]. General purpose object detectors take into account more

common features and broader characteristics of objects. Therefore, task-speciﬁc detectors can

share these information and then enforce the spectacular properties by special designs. Some

popular face detectors including YOLO [7, 8, 9, 10], Faster R-CNN [5] and RetinaNet [6] fall

into this category. In this paper, inspired by YOLOv5 [11], TridentNet [12] and Attention

Network in FAN [13], we propose a novel face detector that achieves the state-of-the-art in

one-stage face detection.

Although deep convolutional networks have improved face detection remarkably, detecting

faces with high variance in scale, pose, occlusion, expression, appearance, and illumination

in realistic scenes remains great challenge. In our previous work, we proposed the YOLO-

Face [14], an improved face detector based on YOLOv3 [9], which mainly focused on the

problem of scale variance, design anchor ratios suitable for human face and utilized a more

accurate regression loss function. The mAP of Easy, Medium, and Hard on the WiderFace [15]

validation set reached 0.899, 0.872, and 0.693, respectively. Since then variety of new detectors

have been presented and the face detection performance has been signiﬁcantly improved.

However, for small objects, the one-stage detectors have to divide the search space with a ﬁner

granularity, so it is apt to cause the problem of imbalance of positive and negative samples

[16]. Furthermore, face occlusions [13] in complex scenes aﬀects the accuracy of the face

detector remarkably. Aimed to address the problems of varying face scales, easy and hard

sample imbalance and face occlusion, we propose a YOLOv5-based face detection method

called YOLO-FaceV2.

By carefully analyzing the diﬃculties encountered by face detectors and the shortcomings

of YOLOv5 detector, we carry out the following solutions.

Multi scale f usion: In many scenarios, there are usually diﬀerent scale faces existing in

the images, which is really diﬃcult for them all to be detected by the face detector. Therefore,

solving diﬀerent scale faces is a very important task for face algorithms. Currently, the main

method to solve the problem of varying scales is constructing a pyramid to fuse the multi-scale

features of faces [17, 18, 19, 20]. For example, in YOLOv5, FPN [20] fuses the features of P3,

P4 and P5 layers. However, for small-scale objects, the information can be easily lost after

multi-layer convolutions, and the pixel information retained is very little, even in the shallower

P3 layer. Therefore, increasing the resolution of the feature map can undoubtedly beneﬁt the

detection of small objects.

Attention mechanism: In many complex scenes, face occlusion often occurs, which is

one of the main reasons for the accuracy decline of face detectors. To address this problem,

some researchers try to use attention mechanism to facial feature extraction. FAN [13] proposes

a anchor-level attention. They suggest that the solution is to maintain the response value of

the unobstructed region and to compensate the reduced response value of the obscured region

through the attention mechanism. However, it doesn’t fully utilize the information between

channels.

Hard Samples: In one-stage detectors, many bounding boxes are not been ﬁltered out

iterately. So the number of easy samples in one-stage detectors is very large. During training,

their cumulative contribution dominates the update of the model, leading to the overﬁt of

the model [16]. This is known as the problem of imbalanced samples. To deal with this

problem, Lin et al. proposes Focal Loss to dynamically assign more weights to diﬃcult sample

examples [6]. Similar to focal loss, Gradient Harmonizing Mechanism (GHM) [21] suppresses

the gradients from positive and negative simple samples to focus more on diﬃcult samples.

Prime Sample Attention (PISA) [22] proposed by Cao et al. assigns weights to positive and

negative samples according to diﬀerent criteria. However, current hard sample mining methods

have too many hyperparameters to be set, which is very inconvenient in practice.

Anchor design: As pointed out in [23] a region in a CNN feature map has two types

of receptive ﬁelds, the theoretical receptive ﬁeld and the actual receptive ﬁeld. It is experi-

mentally shown that not all pixels in the receptive ﬁeld respond equally, but obey a Gaussian

distribution. This makes the anchor size based on the theoretical receptive ﬁeld larger than

its actual size, which makes it more diﬃcult for the regression of bounding boxes. Zhang et.

al designs the size of the anchors based on the eﬀective receptive ﬁeld in S

F D [24]. And

FaceBoxes [25] designs the multiscale anchor to enrich the receptive ﬁelds and discretize an-

chors over diﬀerent layers to handle faces of various scales. Therefore, the design of scales

and ratios of the anchor boxes is very important which may greatly beneﬁts the accuracy and

convergence procedure of the model.

Regression Loss: Regression loss is used to measure the diﬀerence between the pre-

dicted bounding box and the ground truth bounding box. The commonly used regression

loss functions in object detectors are L1/L2 loss, smooth L1 loss, IoU loss and its variants

[26, 27, 28, 29]. YOLOv5 takes IoU loss as its objective regression function. However, the

sensitivity of IoU varies greatly for objects of diﬀerent scales. It is readily comprehensible

that, for small targets, a slight position deviation leads to a signiﬁcant IoU decrease. Wang et

al. [30] proposes a small target evaluation method based on Wasserstein distance to eﬀectively

mitigate the eﬀect of small target. However, their method performs not so signiﬁcant for large

targets.

In this paper, to address the aforementioned problems, we design a new face detector based

on YOLOv5. Our aim is to ﬁnd an optimal combinatorial detector that eﬀectively solves the

problems of small faces, large scale variations, occluded scenes and imbalanced hard and easy

samples. First, we fuse P2 layer information of FPN to obtain more pixel-level information and

compensate the information of small face. However, in this way, the detection accuracy of large

and medium targets will be slightly reduced because the output feature map perceptual ﬁeld

becomes smaller. To ameliorate this situation, we design Receptive Field Enhancement (RFE)

for the P5 layer, which increases the receptive ﬁeld by using dilated convolution. Second, in-

spired by FAN and ConvMixer [31], we redesign a multi-head attention network to compensate

for the loss of occluded face response values. In addition, we also introduce Repulsion Loss [32]

to improve the recall of intra-class occlusions. Third, to mine hard samples, inspired by ATSS

[33], we design the Slide weight function with adaptive thresholding to make the model focus

more on hard samples during training. Fourth, in order to make the anchor more suitable

for regression, we redesign the anchor size and proportion according to the eﬀective receptive

ﬁeld and the proportion of the face. Fifth, we borrowed the Normalized Wasserstein Distance

metric [30] and introduced it into the regression loss function to balance the shortage of IoU

in predicting small faces.

In summary, we propose a new face detector YOLO-FaceV2, in which the highlighted

contributions are as follows.

1. For detecting multiscale faces, the perceptive ﬁeld and resolution are key factors. There-

fore, we design a receptive ﬁeld enhancement module (called RFE) to learn diﬀerent receptive

ﬁelds of the feature map and enhance the feature pyramid representation.

2. We classify the face occlusions into two categories, i.e., the occlusion between diﬀerent

faces, and the occlusion of faces by other objects. The former makes the detection accuracy

very sensitive to NMS thresholds which leads to missed detection. We use Repulsion Loss

to face detection which penalizes the predicted box for shifting to the other ground-truth

objects and requires each predicted box to keep away from the other predicted boxes with

diﬀerent designated targets to make the detection results less sensitive to NMS. The latter

causes feature disappearance leading to inaccurate localization, and we design the attention

剩余17页未读，继续阅读

评论收藏

内容反馈

胡杨2012

粉丝: 1
资源: 4

yolo face v2 论文

最新资源

yolo face v2 论文

(完整word版)人工智能YOLO V2 图像识别实验报告.pdf

(完整word版)人工智能YOLO V2 图像识别实验报告.docx

深度学习YOLO V2论文理解

yolo_v2.docx

YOLO论文v1 v2 v3核心.zip

yolo，yolov2,yolov3论文原文

yolo v1 v2 v3 论文及代码实现

使用 TensorFlow 实现 YOLO v2.zip

基于YOLO-v2视觉神经网络在移动机器人平台ROS框架下的实现.pdf

yolo 系列论文和源码， 全部源码和论文 v1 - 7. yolo 系列论文和源码， 全部源码和论文 v1 - 7yolo 系

yolo系列论文.zip

人工智能YOLO-V2-图像识别实验报告

tf2_yolo_v2.zip

YOLO系列论文翻译

使用 resnet50 和 Yolo-v2 实现了自然场景中的数字检测器 我使用 SVHN 作为训练集，并使用 tensorflow 和 keras 实现了它 .zip

yolov3-master_目标检测_yolo_V2_mubiaojiance_

这只是另一个 YOLO V2 实现 在 jupyter 笔记本中训练您自己的数据集！.zip

人工智能YOLO V2 图像识别实验报告材料.docx

【图像识别】基于yolo v2深度学习检测识别车辆matlab源码.md

YOLO系列论文.zip

YOLO目标检测论文总结

论文对YOLO的演进进行了全面的分析，考察了从原始的YOLO到YOLOv8和YOLO-NAS每个版本中的创新和贡献

68.【必看】Yolo v1 v2 v31

YOLO人脸检测数据集face_mask

C02_深度目标检测_邬书哲_V2_目标检测_目标检测评价_cv_yolo_V2_

YOLO v3 和 Tiny YOLO v1、v2、v3 与 Tensorflow.js.zip

yolo论文理论梳理总结

使用对象检测-YOLO-v2-深度学习：使用Yolo v2的基于深度学习的对象检测的MATLAB示例

YOLO v2配置文档

最新资源

yolo 系列论文和源码，全部源码和论文 v1 - 7. yolo 系列论文和源码，全部源码和论文 v1 - 7yolo 系

使用 resnet50 和 Yolo-v2 实现了自然场景中的数字检测器我使用 SVHN 作为训练集，并使用 tensorflow 和 keras 实现了它 .zip

这只是另一个 YOLO V2 实现在 jupyter 笔记本中训练您自己的数据集！.zip