没有合适的资源?快使用搜索试试~ 我知道了~
An Anchor-Free Lightweight Object Detection Network
0 下载量 199 浏览量
2023-11-25
15:47:33
上传
评论
收藏 4.77MB PDF 举报
温馨提示
试读
14页
SCI原文:An Anchor-Free Lightweight Object Detection Network
资源推荐
资源详情
资源评论
Received 2 August 2023, accepted 24 September 2023, date of publication 4 October 2023, date of current version 11 October 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3321966
An Anchor-Free Lightweight Object
Detection Network
WEINA WANG AND YUNYAN GOU
College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, China
Corresponding author: Weina Wang (wangweina@jlict.edu.cn)
This work was supported in part by the Natural Science Foundation of China under Grant 62266046; in part by the Natural Science
Foundation of Jilin Province, China, under Grant YDZJ202201ZYTS603; and in part by the Natural Science Foundation of
Jilin Provincial Department of Education, China, under Grant JJKH20230281KJ.
ABSTRACT Existing anchor-free object detection methods have achieved some amazing results, but these
methods are relatively complex and the inference speed is also slow. In this paper, an anchor-free lightweight
object detection network is proposed. The proposed method effectively overcomes the limitation of detection
model by anchor-free mechanism, and the lightweight backbone network reduces the computational cost.
In addition, the proposed small object enhancement module can enhance the focus on small objects, which
improves the detection capability of small objects. Besides, a label assignment strategy is proposed to
determine the prominent feature, and a center correction mechanism is introduced to make the predicted
bounding box closer to the ground truth to further improve the detection accuracy. Extensive experiments
are conducted on MS COCO and Pascal VOC datasets, and the results demonstrate that the proposed method
achieves better results than the existing detection methods on detection accuracy by increasing 0.5% on the
MS COCO dataset, and has a detection accuracy increase of 0.4% on the Pascal VOC dataset, which proves
the superiority of the proposed method.
INDEX TERMS Object detection, anchor-free, lightweight, small object, label assignment strategy.
I. INTRODUCTION
Object detection task is to find out all the objects of interest
in the image, determine their class and location, and is
one of the core problems in the field of computer vision.
In the field of computer vision, there are many important and
challenging branches of object detection, such as pedestrian
detection [1], [2], medical detection [3], small object detec-
tion [4], face detection [5], [6], [7], salient object detectio [8],
[9], [10], text detection [11], [12] traffic detection [13], [14].
Due to boom in deep learning, the object detection algorithm
has been developed rapidly. Typical methods are Faster R-
CNN [15] and fully convolutional networks (FCN) [16].
They have the advantages of conceptual intuitiveness, good
flexibility, robustness, and rapidity of training and reasoning.
In addition, feature pyramid networks (FPN) [17] use
multi-stage image pyramids to obtain feature pyramids,
which improved the average accuracy of the detector.
The associate editor coordinating the review of this manuscript and
approving it for publication was Badri Narayan Subudhi .
Focal loss [18] focuses on sparse hard samples and prevented
the large number of negative samples, which can improve
the detection performance. However, these anchor-based
detectors require setting a large number of anchor points in
the detection process, leading to computational cost increase
and detection speed degradation.
To overcome the shortcoming of anchor-based detectors,
anchor-free detection is proposed. The anchor-free detector
utilizes pixel-by-pixel prediction to solve the detection
problem. The detector need not require predetermined anchor
points, avoiding complex anchor point calculations, and
greatly reducing computation time and computational power.
Therefore, anchor-free detection has gained great momentum
in the object detection field. Cornernet [19] converts the
network’s focus on the object bounding box into a pair
of key points without designing anchor points, which
improves the detection accuracy. However, the network
only focuses on the edges and corner points, resulting in
insufficient internal information acquisition. To overcome
the problem, Centernet [20] directly focuses on the centroid
VOLUME 11, 2023
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
110361
W. Wang, Y. Gou: Anchor-Free Lightweight Object Detection Network
of the object. Although the method effectively improves
the accuracy and recall, the detection with occlusion is
not effectively solved because of mutual occlusion for the
object. Fully convolutional one-stage (FCOS) [21] is a pixel-
by-pixel object detection model, which utilizes centrality
center-ness to improve the accuracy. Compared with anchor-
based methods, FCOS has achieved promising performance
in detection accuracy and speed. However, it still has
higher computational costs and shortcomings in the detection
capability of small objects.
Recently, lightweight networks have shown superior per-
formance in many practical applications. SqueezeNet [22] is
a smaller and more intelligent network. The model efficiently
reduces the number of parameters, while maintaining good
performance. ShuffleNet [23] is specially designed for
mobile devices, which have fewer parameters and efficient
computation. The traditional approaches to lightening the
network are to use simplification, pruning, and compression.
MobileNet [24], [25], [26] series discard these approaches
and use the efficient network architecture. MobileNets not
only have the advantage of less computation, but also can
ensure the accuracy of detection. Inspired by the structure
of MobileNets, the proposed method introduces the efficient
architecture as the backbone network to lighten the network.
Considering all the above-mentioned observations, we
propose a lightweight anchor-free detection model, while
improving the detection capability of small objects. Firstly,
the backbone network of FCOS is optimized to lighten
the network. Then, the small object detection enhancement
module is constructed to enhance the detection capability
of small objects. Finally, a label assignment strategy is
proposed, and then the strategy is combined center correction
mechanism to improve the limited detection performance
due to the lack of accurate label assignment. The main
contributions of this paper are as follows:
1) A small object enhancement module is constructed
based on the single stage headless face detector. The module
can improve the focus on small objects, and enhance the
detection capability.
2) A label assignment strategy is proposed to select the
optimal anchor point for each object. The top-ranked pixels
are selected by self-learning for matching, and the accuracy
of the detector is improved.
3) A center correction mechanism is introduced to make the
predicted bounding box closer to the ground truth the center
of the bounding box. This can avoid the effect of extremely
abnormal bad cases, and the predicted index can be further
optimized.
4) An anchor-free lightweight object detection network
is proposed, which can reduce the computational cost and
improve the accuracy of detection. Meanwhile, the proposed
network can effectively detect small objects.
5) A comprehensive evaluation of the proposed method
on MS COCO and Pascal Voc detection benchmarks.
The experimental results not only demonstrate that the
proposed detector has better detection accuracy than the other
detectors, but also show the parameters of the model are
substantially reduced.
The structure of this paper is as follows: Section II
introduces three classes of object detection methods and
discusses the application of advanced algorithms for object
detection. The proposed object detection model is introduced,
and the main components and algorithms are described in
detail in Section III. In Section IV, a series of experiments and
visualizations are carried out to demonstrate the superiority
of the proposed model. Our work is summarized, and the
research direction of follow-up work is pointed out in
Section V.
II. RELATED WORK
A. ANCHOR-BASED OBJECT DETECTION
The workflow of the anchor-based object detector can be
summarized as follows: Firstly, a set of predefined anchors
is identified, and the image is divided into regions. Then,
each candidate box is classified with a classifier to determine
whether it contains the target object. Finally, according
to the confidence level of the classifier and the position
of the bounding box, the final detection result is output.
R-CNN [27] first used CNNs for object detection. Although
can greatly improve object detection performance, the prob-
lem of redundant computation is not solved. Fast R-CNN [28]
uses the search method to construct candidate bound, but the
speed is still insufficient for real-time requirements. Faster
R-CNN [15] utilizes a fully convolutional network as a
region suggestion network. The network generates the corre-
sponding candidate windows with associated object scores,
which can determine the probability of the appearance of
an object. Compared with one-stage networks, the two-stage
Faster R-CNN model is more advantageous in objecting
high-precision and multi-scale detection. SSD [29] uses
several different detection branches to detect multiple scales
of objects, thus improving the accuracy of multi-scale object
detection. YOLO9000 [30] uses a joint training technique of
object classification and detection to expand the network to
thousands of detection categories. The network has only one
detection branch and lacks the capture of multi-scale con-
textual information, leading to poor performance for small
object detection. Dynamic R-CNN [31] can automatically
adjust the labels, and the loss function based on the statistical
information is proposed to fit high-quality samples.
However, anchor-based object detection still has the
following limitations: 1) The anchor-based detector is very
sensitive to changes in anchors. 2) Fixed anchors compromise
the versatility of the detector, resulting in the resize for
the size and aspect ratio of the anchors for different tasks.
3) The detector needs to generate a large number of anchors
to match the ground truth boxes. However, most of the
anchors are marked as negative samples, which can cause an
extreme sample imbalance. 4) During the training process,
the IoU of all anchor boxes with ground truth boxes needs
to be calculated, which consumes a lot of memory and time.
110362 VOLUME 11, 2023
W. Wang, Y. Gou: Anchor-Free Lightweight Object Detection Network
To overcome the above drawbacks, the anchor-free detector
was proposed.
B. ANCHOR-FREE OBJECT DETECTION
The workflow of the anchor-free object detector can be
summarized as follows: Firstly, the presence of an order is
predicted by each pixel on the feature map, and a confidence
map is generated. Then, for the pixels on each confidence
map, the position of the target bounding box is predicted
by a regressor. Finally, based on the confidence map and
bounding box location, select the final detection result.
Anchor-free detectors do need not predefined anchor boxes,
and the detection process is implemented by the following
two methods. One method is called the key point method.
This method performs object detection by locating multiple
predefined or self-learning key points, and then constraining
the spatial extent of the object. Cornernet [19] transforms
the bounding box detection into the key points detection
without designing anchor boxes as priori boxes. However,
the network only interests in edges and corner, which not
only lacks the internal information but also requires many
post-processing mechanisms. Inspired by Cornernet [19],
Centernet [20] further improved the accuracy and recall by
three points instead of two, which effectively overcome the
drawback of too many wrong check boxes and insufficient
recognition of intermediate information. ExtremeNet [32]
estimated the network detection by standard key points. The
object detection is transformed into key points estimation
problem, thus avoiding region classification and implicit
feature learning. RepPoints [33] represents objects as a
collection of sample points. The model constrains the spatial
extent of the objects and emphasizes semantically important
local regions. YOLO [34] divides the object into some grids,
and then predicted the bounding boxes and the corresponding
probability. DenseBox [35] uses a circular region at the
center to define a positive sample, and then predicts four
distances from the circle to the object boundary. FSAF [36]
integrates anchor-free branching and online feature selection
mechanisms in RetinaNet. The central region of the object
is defined as a positive sample and uses the distances from
the four edges of the object for localization. FoveaBox [37]
assigns different scale objects to different feature layers
for direct classification and regression of pixel points. The
network determines object locations based on the central
concave structure. FCOS [21] defines all positions within the
object bounding box as positive samples, and then detected
the object by four distance values and centrality scores.
Therefore, FCOS [21] has achieved promising performance
in detection accuracy and speed.
However, these approaches still have some limitations:
1) The approaches have higher computational complexity and
arithmetic power requirements. 2) The detection performance
of small objects still needs to be improved. 3) The insufficient
recall of boxes leads to accuracy that cannot reach the SOTA
of the anchor-based method. To solve the above problems,
we propose a lightweight backbone network, a small object
enhancement module, and a label assignment strategy to
enhance the detection performance.
C. SEMI-ANCHOR-FREE OBJECT DETECTION
The workflow of the semi-anchor-free object detector can be
summarized as follows: Firstly, a pretrained convolutional
neural network is used to extract features from the input
image. Secondly, a proposal frame generation network is
used to generate a series of proposal bounding boxes. Then,
classification and regression operations are performed on
each proposal bounding box. Finally, the detection results are
post-processed using a non-maximal suppression algorithm
to remove redundant candidate frames. The NMS algorithm
filters the proposal bounding boxes based on the degree of
overlap between them and retains the ones with the highest
confidence level. Semi-anchor-free object detection is an
object detection method that is different from the traditional
anchor method. Instead of predefining the anchor, the
location and scale information of the target is automatically
generated by the network. The advantage of this method is
that it can better adapt to the size and shape variations of
different targets and improve the accuracy of object detection.
SAFNet [38] proposes a new enhanced feature pyramid
generation paradigm consisting of an adaptive feature fusion
module (AFFM) and a self-enhanced module (SEM). The
model can obtain a clean and enhanced feature pyramid
because the paradigm adaptively integrates multi-scale
representations in a nonlinear manner while suppressing
redundant semantic information. Second, the adaptive anchor
generator (AAG) generates a few suitable anchor boxes
for each input image. With this semi-anchor-free approach,
the detector overcomes its shortcomings while retaining the
points of the anchor-based model. SAFDet [39] solves the
problem that two-stage detectors are affected by horizontal
recommendation misalignment and complex background
interference in accurate object detection. First, the model
uses a rotation-anchor-free branch (RAFB) to enhance the
foreground feature by accurately regressing the oriented
bounding box (OBB). Secondly, the center-prediction module
(CPM) is introduced to enhance object localization and
suppress background noise. 3SNet [40] object detector uses
voxel-based methods to assist in learning point features
and achieve anchor-free performance in inference. Then, the
model designs a Directional Slice Attention to enhance the
discriminability of features. Finally, the framework proposes
a region of interest representation based on symmetric feature
propagation to alleviate the obstacles caused by incomplete
object scanning in autonomous driving scenarios.
III. PROPOSED METHOD
In this section, we propose an anchor-free lightweight
detection model, which can effectively detect small objects.
Firstly, based on the framework of FCOS, MobileNetV3
is incorporated into FPN to generate relevant features. The
detection heads are designed to reduce the parameters of
VOLUME 11, 2023 110363
剩余13页未读,继续阅读
资源评论
DrYJ
- 粉丝: 40
- 资源: 24
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
- 林周瑜-论文.docx
- 基于MIC+NE555光敏电阻的声光控电路Multisim仿真原理图
- 基于JSP毕业设计-基于WEB操作系统课程教学网站的设计与实现(源代码+论文).zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功