没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
18页
在5G通信时代,消除影响通信的干扰源是一项资源密集型任务。计算机视觉的快速发展使无人机能够执行各种高空探测任务。由于天线干扰源的目标检测领域尚未得到充分探索,因此该行业缺乏针对该特定任务的专用学习样本和检测模型。本文创建了一个天线数据集,以解决重要的天线干扰源检测问题,并作为后续研究的基础。我们介绍了YOLO-Ant,这是一款专为天线干扰源检测而设计的轻量级CNN和变压器混合探测器。具体来说,我们最初为网络深度和宽度制定了轻量级设计,确保后续研究在轻量级框架内进行。然后,提出了一种基于深度可分离卷积和大卷积核的DSLK-Block模块,以增强网络的特征提取能力,有效提高小目标检测能力。为了解决天线检测中复杂的背景和较大的类间差异等挑战,我们构建了DSLKVit-Block,这是一个强大的特征提取模块,结合了DSLK-Block和变压器结构。考虑到其轻量级设计和精度,该方法不仅在天线数据集上实现了最佳性能,而且在公共数据集上也取得了具有竞争力的结果。
资源推荐
资源详情
资源评论
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 1
YOLO-Ant: A Lightweight Detector via Depthwise
Separable Convolutional and Large Kernel Design
for Antenna Interference Source Detection
Xiaoyu Tang, Member, IEEE, Xingming Chen, Jintao Cheng,
Jin Wu, Member, IEEE, Rui Fan, Senior Member, IEEE, Chengxi Zhang, Member, IEEE, Zebo Zhou
Abstract—In the era of 5G communication, removing interfer-
ence sources that affect communication is a resource-intensive
task. The rapid development of computer vision has enabled
unmanned aerial vehicles to perform various high-altitude de-
tection tasks. Because the field of object detection for antenna
interference sources has not been fully explored, this industry
lacks dedicated learning samples and detection models for this
specific task. In this article, an antenna dataset is created to
address important antenna interference source detection issues
and serves as the basis for subsequent research. We introduce
YOLO-Ant, a lightweight CNN and transformer hybrid detector
specifically designed for antenna interference source detection.
Specifically, we initially formulated a lightweight design for
the network depth and width, ensuring that subsequent in-
vestigations were conducted within a lightweight framework.
Then, we propose a DSLK-Block module based on depthwise
separable convolution and large convolution kernels to enhance
the network’s feature extraction ability, effectively improving
small object detection. To address challenges such as complex
backgrounds and large interclass differences in antenna detec-
tion, we construct DSLKVit-Block, a powerful feature extraction
module that combines DSLK-Block and transformer structures.
Considering both its lightweight design and accuracy, our method
not only achieves optimal performance on the antenna dataset
but also yields competitive results on public datasets.
Index Terms—YOLO-Ant, Antenna interference source de-
This research was supported by the National Natural Science Foundation
of China under Grant 62001173, the Project of Special Funds for the
Cultivation of Guangdong College Students’ Scientific and Technological
Innovation (“Climbing Program” Special Funds) under Grant pdjh2022a0131
and pdjh2023b0141, the National Natural Science Foundation of China under
Grant 42074038, the Fundamental Research Funds for the Central Universities
and Xiaomi Young Talents Program.
Corresponding author: Xiaoyu Tang. E-mail address: tangxy@scnu.edu.cn.
Xiaoyu Tang, Xingming Chen and Jintao Cheng is with the School of
Electronic and Information Engineering, Faculty of Engineering, South China
Normal University, Foshan, Guangdong 528225, China, and also with the
School of Physics, South China Normal University, Guangzhou, Guangdong
510000, China.
Jin Wu is with the Department of Electronic and Computer Engineering,
Hong Kong University of Science and Technology, Hong Kong.
Rui Fan is with the College of Electronics & Information Engineering,
Shanghai Research Institute for Intelligent Autonomous Systems, the State
Key Laboratory of Intelligent Autonomous Systems, and Frontiers Science
Center for Intelligent Autonomous Systems, Tongji University, Shanghai
201804, China (e-mail: rui.fan@ieee.org).
Chengxi Zhang is with the School of Internet of Things Engineering,
Jiangnan University, Wuxi, 214122, China.
Zebo Zhou is with the School of Aeronautics & Astronautics University of
Electronic Science and Technology of China and Aircraft swarm intelligent
sensing and cooperative control Key Laboratory of Sichuan Province, Chengdu
610097, China, and also with the National Laboratory on Adaptive Optics,
Chengdu 610209, China.
tection, Small object detection, Lightweight, CNN-transformer
fusion.
I. INTRODUCTION
T
O ensure high-quality communication in people’s work
and daily lives, various wireless devices operate in differ-
ent frequency bands. 5G communication is of particular note
due to its introduction of new frequency bands into everyday
communication. However, due to the presence of numerous
private wireless signals that have not undergone spectrum al-
location by communication regulatory authorities, the 5G com-
munication network has accumulated a considerable number of
sources of interference. If individuals operate in the same geo-
graphical areas and occupy similar or adjacent frequency bands
as these interference signals in their everyday communication,
this will result in a significant deterioration in communication
quality, as shown in Fig. 1. Regular remediation of radio
interference sources is vital for communication departments to
alleviate this situation. The identification of signal interference
sources necessitates monitoring personnel to visually inspect
areas where communication quality is compromised due to
the presence of suspicious antennas elevated at high altitudes,
constituting a time-consuming and labor-intensive task. In
light of the mature advancements in unmanned aerial vehicle
(UAV) cruising technology and object detection techniques
within computer vision, unmanned drones have become viable
alternatives for handling complex and challenging detection
tasks previously performed by humans. For example, [1] [2]
[3] noted that object detection tasks in deep learning combined
with UAVs have been useful in production and other areas. The
success of these approaches has demonstrated the feasibility of
utilizing UAVs for object detection tasks aimed at interference
source antennas. However, due to the nascent stage of this
detection task within the current domain of object detection,
the creation of a suitable antenna dataset and the exploration of
appropriate object detection methodologies are of paramount
importance.
Convolutional architectures are the basis for most object
detection frameworks in industrial scenarios and rely on the
development of efficient convolutional neural networks in
deep learning. When addressing various tasks and technical
challenges, corresponding enhancements to these architectures
are necessary. The antenna interference source object detection
0000–0000/00$00.00 © 2021 IEEE
arXiv:2402.12641v1 [cs.CV] 20 Feb 2024
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 2
task presents three main challenges. The first issue pertains
to the lightweight nature and low computational complexity
of a detection model; consequently, object detectors can be
deployed on lightweight computing devices, enabling real-
time object detection via UAVs. Previous research, exempli-
fied by GhostNet [4] and EfficientNet [5], has focused on
designing lightweight networks as potential backbones for
different detection models to achieve an overall lightweight
solution. However, these networks are susceptible to infor-
mation feature loss. The second difficulty in the antenna
interference source object detection task lies in the differences
arising from the different angles and heights from which
the UAV captures the antennas. These variations result in a
nonuniform distribution of target sizes within the images, most
of which are extremely small in size. Additionally, there is
a significant interclass dissimilarity issue, wherein antennas
of the same type exhibit markedly different morphologies in
different images. To address these issues, researchers have
explored two aspects: multiscale feature learning and attention
mechanisms [6] [7] [8] [9]. However, despite improvements
in small object detection accuracy, these methods encounter
challenges related to the model’s weak generalizability and
robustness. As a result, the overall detection accuracy for
the targets was compromised. Moreover, the computational
complexity of such models is higher. The third difficulty is
complex target backgrounds, which cause serious false and
missed detections. Given that antennas are commonly installed
on tall buildings or fenced balconies in practical scenarios,
the resulting complex and mutually obscuring environment
between the target and the background significantly hinders
detection. Researchers have suggested using attention or self-
attention mechanisms to address this difficulty. In [10] [11]
[12], a squeeze-and-excitation(SE) attention module was pro-
posed, or a self-attention structure was used to build the
whole network for object detection. The advantage of these
models lies in their ability to effectively capture the spatial
relationship between the target and the background. This
capability significantly enhances object detection performance
on complex backgrounds. However, these mechanisms tend to
consume considerable computational resources and memory,
which is not consistent with the original lightweight design
intention. Additionally, networks built solely on self-attention
mechanisms also suffer from long training times and poor
detection accuracy for small objects.
In response to the aforementioned limitations, we propose
YOLO-Ant, a lightweight one-stage detector designed for
detecting antenna interference sources with small targets and
complex backgrounds. Initially, we analyze the scale and
number of channels in each feature layer of the model;
subsequently, we design the network’s width and depth to
ensure that the entire detection process is performed within
a lightweight framework. Our design considerations aim to
balance detection accuracy with the reduction of model param-
eters and computational complexity. To address the issues of
small target size and large interclass variation, we implement
an efficient feature extraction module based on depthwise
separable convolution, DSLK-Block, which is applied to each
feature layer in the model. This method effectively enhances
Fig. 1: The process of 5G communication in the CBN-U-H5H-
0713 area is shown in the figure. Two antenna interference
source signals appear in it. The gNB (gNodeB) denotes a 5G
base station. The UE (User Equipment) denotes the terminal
equipment that users use to access the wireless network.
the network’s feature learning and fusion capabilities, leading
to a significant improvement in detection accuracy for all
types of targets, particularly small targets. Additionally, this
approach contributes to reducing the model’s overall weight.
Finally, to address the problem of complex backgrounds,
YOLO-Ant uses an innovative CNN and transformer hybrid
structure to act on the neck of the model. This process enables
us to fully utilize both local and global feature learning to
address the challenges posed by complex backgrounds while
still accounting for small object detection. This approach sig-
nificantly improves all the detection accuracy indicators while
only slightly increasing the model’s number of parameters
and computational complexity. To demonstrate the model’s
generalizability and robustness, we also tested YOLO-Ant on
public datasets and achieved highly competitive results. In
conclusion, the main contributions of this paper are as follows:
1) In response to the lack of learning samples for antenna
object detection schemes, we conducted image acquisi-
tion and manual annotation of the three most common
types of antennas encountered in real-world interference
source investigation tasks. This dataset is pioneering and
establishes the foundation for subsequent work.
2) We initially pruned YOLOv5-s [13], obtaining a
lightweight detection framework. Within this framework,
(i) a lightweight plug-and-play module based on depth-
wise separable convolution combined with large con-
volutional kernels was proposed to effectively improve
the feature extraction and detection capabilities of the
network for small targets; (ii) the innovative use of a
transformer module to construct the neck structure of
the detection model improved the detection capability of
the network without increasing the model’s parameter
count or computational complexity, effectively solving
the problem of dealing with complex backgrounds.
3) Our proposed method achieves state-of-the-art (SOTA)
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 3
performance on the antenna dataset, striking a bal-
ance between lightweight design and detection accu-
racy. Moreover, YOLO-Ant yields competitive results on
public datasets such as COCO, validating its robustness
and superior performance. Source code is released in
https://github.com/SCNU-RISLAB/YOLO-Ant.
The remainder of this paper is structured as follows. Section
II presents related work, briefly introducing the improvement
points of the model proposed in this paper and the related work
involved, including CNN network development, the emergence
of the transformer detection model and the crossover devel-
opment between the CNN and transformer. Section III dis-
cusses the proposed lightweight detection framework based on
the YOLOv5-s improvement, including pruning the baseline
model, the design of the DSLK-Block module, and the neck
structure built based on DSLKVit-Block. The experimental
results are given in Section IV, and Section V concludes the
paper.
II. RELATED WORK
A. CNN (convolutional neural network)-based object detec-
tion
The development of object detection in the computer vision
field has been greatly influenced by CNN-based methods.
Traditional approaches using hand-designed features and clas-
sifiers have been shown to be inadequate, leading to the
dominance of CNN-based methods. The initial CNN model,
LeNet-5 [14], was limited by computational resources and
model size. However, with advancements in computational
power and larger datasets, deeper and more complex CNN
models, such as AlexNet [15], VGGNet [16], GoogLeNet [17],
and ResNet [18], have emerged. These models have improved
network accuracy, reduced parameters, and addressed network
degradation issues, laying a solid foundation for 2D object
detection.
Two distinct methods have emerged from the convoluted
development of 2D object detection: two-stage and one-stage
detectors. Two-stage detectors, such as R-CNN [19] and Fast
R-CNN [20], generate candidate frames using algorithms and
perform classification and regression on each candidate frame.
Faster R-CNN [21] introduces the region proposal network
(RPN) for candidate frame generation. In contrast, one-stage
detectors, such as YOLO [22] and SSD [23], perform classi-
fication and regression directly on each location in the input
image. YOLOv2 [24] and YOLOv3 [25] improved detection
accuracy through methods such as multiscale prediction, batch
normalization, and feature pyramid networks (FPNs). SSD
introduces multiscale detection using multiple-scale feature
maps, while RetinaNet [26] focuses on addressing the category
imbalance problem. For the aforementioned model, one-stage
detectors are more suitable for real-time detection tasks on
UAVs than two-stage detectors are because they do not require
additional networks or algorithms for fine-tuning. However,
to compensate for the deficiency in accuracy resulting from
the pursuit of detection speed, improvements need to be
made to the backbone and neck of the one-stage detector
by developing various efficient feature extraction modules or
structures. The backbone and neck are the basic components
of object detection models. The backbone is a CNN trained on
image classification datasets such as ImageNet [27], in which
the input image is transformed into a high-dimensional feature
representation. The neck module further processes the feature
map, changing the scale and resolution to extract different lev-
els of feature information. Numerous object detection models,
such as NAS-FPN [28], EfficientDet [29], YOLOv4 [30], and
YOLOv7 [31], have been developed based on these concepts,
incorporating various improvements and techniques to enhance
accuracy and performance. However, these general models are
often designed with modules that consider various common
tasks, exhibiting generalizability but not effectively addressing
specific challenges in particular scenarios. For instance, there
are several challenges, such as small object detection and
complex backgrounds, in our task. Therefore, making task-
specific modifications is crucial when contemplating different
tasks.
B. Developing an attention mechanism in the CV domain
Attention mechanisms, initially utilized in natural language
processing, have gained significant traction in computer vi-
sion [32], [33], particularly in the field of object detection.
Attention mechanisms such as channel attention, spatial atten-
tion, and their combinations have been introduced [34] [35]
[36]. They effectively utilize global and local information in
feature maps, improving feature representation and attention
weighting, thereby enhancing model accuracy and efficiency.
However, for these conventional attention mechanisms, a fixed
window size or other constraints are typically employed to
regulate the correlation between each position and others. In
contrast, self-attention mechanisms can extract information
from different positions in the information sequence more
flexibly, enabling the extraction of global information. This
flexibility has contributed to the widespread application of
transformer [37] models based on self-attention mechanisms,
including in various domains such as computer vision. For
example, the Vision Transformer (ViT) [38] splits images
into patches for self-attention computations. The swin trans-
former [39] improves local information processing by using
a window-based partitioning approach. Detection with trans-
formers (DETR) [11] adopts a global self-attention mecha-
nism, allowing each position to obtain contextual information
from the entire image. Naturally, transformers incur substantial
computational costs and training time, posing challenges for
model convergence. To address these challenges, researchers
have introduced lightweight transformer object detectors, in-
cluding MobileViT [40] and EdgeViT [41]. Moreover, inno-
vative approaches such as conditional DETR [42] and DN-
DETR [43] have been developed to address the crucial issue
of slow training convergence. However, due to their simpli-
fied design, the majority of current lightweight transformer
structures are applicable only to classification tasks involving
small-sized image inputs and are not suitable for detection
tasks. The proposed detection methods aimed at addressing
slow convergence have made transformer models more com-
plex. Therefore, achieving a balance between the lightweight
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 4
nature of transformer models and detection accuracy remains a
crucial research scope in the current field of computer vision.
C. Combination of CNN and Transformer
In object detection, CNNs and transformers have distinct
applications and advantages. CNNs are known for their strong
image feature extraction abilities, ability to perform multi-
channel processing, and ability to learn spatial correlations.
However, CNN-based models have limitations in handling
objects of different sizes and proportions due to fixed window
sizes and strides. On the other hand, transformers exhibit
excellent performance in capturing long-range dependencies
within input sequences without prior knowledge, albeit at a
slower speed and requiring substantial amounts of training
data. Evidently, the amalgamation of CNNs and transform-
ers offers complementarity across various dimensions, and
researchers have already delved into numerous methodologies
to explore this synergy.
The pioneering DETR model replaces fully connected and
convolutional layers with transformers while using ResNet
as the feature extractor, improving accuracy and efficiency.
Huawei’s CMTBlock combines depthwise separable convolu-
tion and the transformer’s multihead self-attention module for
local and global information fusion. The CMT model [44]
stacks the CMTBlock in a hybrid CNN-transformer structure.
The Conformer [45] adopts a dual-network structure, where
the CNN branch enhances local perception of the transformer
branch. The mobile-former [46] features parallel CNN and
transformer modules with bidirectional bridges, leveraging
MobileNet [47] for local processing and the transformer for
global interaction. However, networks or models employing
such hybrid structures face challenges in effectively balancing
accuracy and lightweight design. For instance, detectors such
as DETR, lacking FPN structures, exhibit suboptimal perfor-
mance in small object detection. While the CMT and Con-
former networks have proven effective in classification tasks,
their application to downstream tasks such as object detection
deviates from the realm of lightweight design. In contrast
to the aforementioned models, which concatenate both struc-
tures, an alternative approach involves making transformer-
style improvements directly on the CNN network. ConvNeXt
[48] implements novel architectures and optimization strate-
gies similar to those of transformers, achieving competitive
results without attention structures. RepLKNet [49] employs
large convolutional kernels to widen the receptive field, thus
emulating the transformer-like capability for global feature
extraction. By investigating the computational principles of
transformers, ACMix [50] maps their operation process onto
convolutional operators, thereby combining them with tra-
ditional convolution operations to construct a novel CNN
architecture. Parc-Net [51] introduces circular convolution
for global information extraction within a pure convolutional
structure. Although these innovative networks may not achieve
SOTA performance, their greater significance lies in exploring
the factors contributing to the success of transformers from
a CNN perspective, providing inspiration for subsequent re-
search endeavors. The fusion of transformers and CNNs offers
a flexible and diverse range of integration methods. Future
research should strive to deepen the understanding of their
interactions to improve design and optimization.
D. Object Detection of Antenna Interference Sources
Regularly monitoring and mitigating antenna interference
sources has become one of the most critical tasks in the
wireless communication field. In the past, detecting antenna
interference sources mainly relied on traditional techniques
such as spectrum analysis, signal recognition and positioning.
However, these methods have many limitations. For example,
when detection personnel identify a radio interference signal
through a spectrum analyzer, they can determine only the
approximate direction of the interference source based on the
strength of the received signal and cannot accurately determine
its position.
The rapid advancement of deep learning and computer
vision has facilitated the successful application of object
detection-assisted tasks in various industries. Examples in-
clude defect detection in industrial settings, pest/weed de-
tection in agriculture, and vehicle and pedestrian detection
in transportation [52] [53] [54] [55] [56]. These solutions
provide effective ideas for our antenna interference source
detection task. When investigators confirm the approximate
direction of the interference source antenna through a signal
receiver and spectrum analyzer, they can use drones with
cameras and related object detection algorithms to replace
manual accurate positioning work. Unfortunately, the field of
antenna interference source detection based on object detection
tasks has largely not been explored. Due to the lack of
learning samples and models for related antenna interference
source detection, existing detection methods are not suitable
for antenna detection. Therefore, it is urgent and meaningful
to create a professional dataset and train a model suitable for
this detection task to address the difficulty of locating antenna
interference sources in the wireless communication field.
III. PROPOSED DETECTION FRAMEWORK
A. Overall model structure
The overall idea for the network(Fig. 2) lies in the com-
bination of a CNN and transformer, both the inductive bias
ability of the convolutional operation and the ability of the
transformer to extract global information, while also meeting
the needs of a lightweight model with low computational
complexity. YOLO-Ant adopts DSLKNet, which is composed
of DSLK-Blocks, as the backbone for downsampling and
feature extraction in images. In DSLKNet, four DSLK-Layers
employ convolutional kernels of varying sizes to sequentially
extract rich features from different receptive fields of the
image. To address the challenge of detecting small objects, we
incorporate the neck structures of the FPN and PAN for multi-
scale feature learning. On the neck component, we conducted
pruning based on YOLOv5-s (detailed data provided in Section
IV. EXPERIMENT). In comparison to the baseline model, the
pruned neck model features an increased number of module
stacks and a reduced number of channels in each module.
This structural modification effectively alleviates redundancy
剩余17页未读,继续阅读
资源评论
人工智能_SYBH
- 粉丝: 4w+
- 资源: 200
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功