YOLO-Ant：通过深度可分离卷积和大核设计实现天线干扰源检测的轻量级探测器

版权申诉

184 浏览量 2024-04-11 15:13:50 上传评论收藏 13.55MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 1

YOLO-Ant: A Lightweight Detector via Depthwise

Separable Convolutional and Large Kernel Design

for Antenna Interference Source Detection

Xiaoyu Tang, Member, IEEE, Xingming Chen, Jintao Cheng,

Jin Wu, Member, IEEE, Rui Fan, Senior Member, IEEE, Chengxi Zhang, Member, IEEE, Zebo Zhou

Abstract—In the era of 5G communication, removing interfer-

ence sources that affect communication is a resource-intensive

task. The rapid development of computer vision has enabled

unmanned aerial vehicles to perform various high-altitude de-

tection tasks. Because the ﬁeld of object detection for antenna

interference sources has not been fully explored, this industry

lacks dedicated learning samples and detection models for this

speciﬁc task. In this article, an antenna dataset is created to

address important antenna interference source detection issues

and serves as the basis for subsequent research. We introduce

YOLO-Ant, a lightweight CNN and transformer hybrid detector

speciﬁcally designed for antenna interference source detection.

Speciﬁcally, we initially formulated a lightweight design for

the network depth and width, ensuring that subsequent in-

vestigations were conducted within a lightweight framework.

Then, we propose a DSLK-Block module based on depthwise

separable convolution and large convolution kernels to enhance

the network’s feature extraction ability, effectively improving

small object detection. To address challenges such as complex

backgrounds and large interclass differences in antenna detec-

tion, we construct DSLKVit-Block, a powerful feature extraction

module that combines DSLK-Block and transformer structures.

Considering both its lightweight design and accuracy, our method

not only achieves optimal performance on the antenna dataset

but also yields competitive results on public datasets.

Index Terms—YOLO-Ant, Antenna interference source de-

This research was supported by the National Natural Science Foundation

of China under Grant 62001173, the Project of Special Funds for the

Cultivation of Guangdong College Students’ Scientiﬁc and Technological

Innovation (“Climbing Program” Special Funds) under Grant pdjh2022a0131

and pdjh2023b0141, the National Natural Science Foundation of China under

Grant 42074038, the Fundamental Research Funds for the Central Universities

and Xiaomi Young Talents Program.

Corresponding author: Xiaoyu Tang. E-mail address: tangxy@scnu.edu.cn.

Xiaoyu Tang, Xingming Chen and Jintao Cheng is with the School of

Electronic and Information Engineering, Faculty of Engineering, South China

Normal University, Foshan, Guangdong 528225, China, and also with the

School of Physics, South China Normal University, Guangzhou, Guangdong

510000, China.

Jin Wu is with the Department of Electronic and Computer Engineering,

Hong Kong University of Science and Technology, Hong Kong.

Rui Fan is with the College of Electronics & Information Engineering,

Shanghai Research Institute for Intelligent Autonomous Systems, the State

Key Laboratory of Intelligent Autonomous Systems, and Frontiers Science

Center for Intelligent Autonomous Systems, Tongji University, Shanghai

201804, China (e-mail: rui.fan@ieee.org).

Chengxi Zhang is with the School of Internet of Things Engineering,

Jiangnan University, Wuxi, 214122, China.

Zebo Zhou is with the School of Aeronautics & Astronautics University of

Electronic Science and Technology of China and Aircraft swarm intelligent

sensing and cooperative control Key Laboratory of Sichuan Province, Chengdu

610097, China, and also with the National Laboratory on Adaptive Optics,

Chengdu 610209, China.

tection, Small object detection, Lightweight, CNN-transformer

fusion.

I. INTRODUCTION

O ensure high-quality communication in people’s work

and daily lives, various wireless devices operate in differ-

ent frequency bands. 5G communication is of particular note

due to its introduction of new frequency bands into everyday

communication. However, due to the presence of numerous

private wireless signals that have not undergone spectrum al-

location by communication regulatory authorities, the 5G com-

munication network has accumulated a considerable number of

sources of interference. If individuals operate in the same geo-

graphical areas and occupy similar or adjacent frequency bands

as these interference signals in their everyday communication,

this will result in a signiﬁcant deterioration in communication

quality, as shown in Fig. 1. Regular remediation of radio

interference sources is vital for communication departments to

alleviate this situation. The identiﬁcation of signal interference

sources necessitates monitoring personnel to visually inspect

areas where communication quality is compromised due to

the presence of suspicious antennas elevated at high altitudes,

constituting a time-consuming and labor-intensive task. In

light of the mature advancements in unmanned aerial vehicle

(UAV) cruising technology and object detection techniques

within computer vision, unmanned drones have become viable

alternatives for handling complex and challenging detection

tasks previously performed by humans. For example, [1] [2]

[3] noted that object detection tasks in deep learning combined

with UAVs have been useful in production and other areas. The

success of these approaches has demonstrated the feasibility of

utilizing UAVs for object detection tasks aimed at interference

source antennas. However, due to the nascent stage of this

detection task within the current domain of object detection,

the creation of a suitable antenna dataset and the exploration of

appropriate object detection methodologies are of paramount

importance.

Convolutional architectures are the basis for most object

detection frameworks in industrial scenarios and rely on the

development of efﬁcient convolutional neural networks in

deep learning. When addressing various tasks and technical

challenges, corresponding enhancements to these architectures

are necessary. The antenna interference source object detection

arXiv:2402.12641v1 [cs.CV] 20 Feb 2024

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 2

task presents three main challenges. The ﬁrst issue pertains

to the lightweight nature and low computational complexity

of a detection model; consequently, object detectors can be

deployed on lightweight computing devices, enabling real-

time object detection via UAVs. Previous research, exempli-

ﬁed by GhostNet [4] and EfﬁcientNet [5], has focused on

designing lightweight networks as potential backbones for

different detection models to achieve an overall lightweight

solution. However, these networks are susceptible to infor-

mation feature loss. The second difﬁculty in the antenna

interference source object detection task lies in the differences

arising from the different angles and heights from which

the UAV captures the antennas. These variations result in a

nonuniform distribution of target sizes within the images, most

of which are extremely small in size. Additionally, there is

a signiﬁcant interclass dissimilarity issue, wherein antennas

of the same type exhibit markedly different morphologies in

different images. To address these issues, researchers have

explored two aspects: multiscale feature learning and attention

mechanisms [6] [7] [8] [9]. However, despite improvements

in small object detection accuracy, these methods encounter

challenges related to the model’s weak generalizability and

robustness. As a result, the overall detection accuracy for

the targets was compromised. Moreover, the computational

complexity of such models is higher. The third difﬁculty is

complex target backgrounds, which cause serious false and

missed detections. Given that antennas are commonly installed

on tall buildings or fenced balconies in practical scenarios,

the resulting complex and mutually obscuring environment

between the target and the background signiﬁcantly hinders

detection. Researchers have suggested using attention or self-

attention mechanisms to address this difﬁculty. In [10] [11]

[12], a squeeze-and-excitation(SE) attention module was pro-

posed, or a self-attention structure was used to build the

whole network for object detection. The advantage of these

models lies in their ability to effectively capture the spatial

relationship between the target and the background. This

capability signiﬁcantly enhances object detection performance

on complex backgrounds. However, these mechanisms tend to

consume considerable computational resources and memory,

which is not consistent with the original lightweight design

intention. Additionally, networks built solely on self-attention

mechanisms also suffer from long training times and poor

detection accuracy for small objects.

In response to the aforementioned limitations, we propose

YOLO-Ant, a lightweight one-stage detector designed for

detecting antenna interference sources with small targets and

complex backgrounds. Initially, we analyze the scale and

number of channels in each feature layer of the model;

subsequently, we design the network’s width and depth to

ensure that the entire detection process is performed within

a lightweight framework. Our design considerations aim to

balance detection accuracy with the reduction of model param-

eters and computational complexity. To address the issues of

small target size and large interclass variation, we implement

an efﬁcient feature extraction module based on depthwise

separable convolution, DSLK-Block, which is applied to each

feature layer in the model. This method effectively enhances

Fig. 1: The process of 5G communication in the CBN-U-H5H-

0713 area is shown in the ﬁgure. Two antenna interference

source signals appear in it. The gNB (gNodeB) denotes a 5G

base station. The UE (User Equipment) denotes the terminal

equipment that users use to access the wireless network.

the network’s feature learning and fusion capabilities, leading

to a signiﬁcant improvement in detection accuracy for all

types of targets, particularly small targets. Additionally, this

approach contributes to reducing the model’s overall weight.

Finally, to address the problem of complex backgrounds,

YOLO-Ant uses an innovative CNN and transformer hybrid

structure to act on the neck of the model. This process enables

us to fully utilize both local and global feature learning to

address the challenges posed by complex backgrounds while

still accounting for small object detection. This approach sig-

niﬁcantly improves all the detection accuracy indicators while

only slightly increasing the model’s number of parameters

and computational complexity. To demonstrate the model’s

generalizability and robustness, we also tested YOLO-Ant on

public datasets and achieved highly competitive results. In

conclusion, the main contributions of this paper are as follows:

1) In response to the lack of learning samples for antenna

object detection schemes, we conducted image acquisi-

tion and manual annotation of the three most common

types of antennas encountered in real-world interference

source investigation tasks. This dataset is pioneering and

establishes the foundation for subsequent work.

2) We initially pruned YOLOv5-s [13], obtaining a

lightweight detection framework. Within this framework,

(i) a lightweight plug-and-play module based on depth-

wise separable convolution combined with large con-

volutional kernels was proposed to effectively improve

the feature extraction and detection capabilities of the

network for small targets; (ii) the innovative use of a

transformer module to construct the neck structure of

the detection model improved the detection capability of

the network without increasing the model’s parameter

count or computational complexity, effectively solving

the problem of dealing with complex backgrounds.

3) Our proposed method achieves state-of-the-art (SOTA)

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 3

performance on the antenna dataset, striking a bal-

ance between lightweight design and detection accu-

racy. Moreover, YOLO-Ant yields competitive results on

public datasets such as COCO, validating its robustness

and superior performance. Source code is released in

https://github.com/SCNU-RISLAB/YOLO-Ant.

The remainder of this paper is structured as follows. Section

II presents related work, brieﬂy introducing the improvement

points of the model proposed in this paper and the related work

involved, including CNN network development, the emergence

of the transformer detection model and the crossover devel-

opment between the CNN and transformer. Section III dis-

cusses the proposed lightweight detection framework based on

the YOLOv5-s improvement, including pruning the baseline

model, the design of the DSLK-Block module, and the neck

structure built based on DSLKVit-Block. The experimental

results are given in Section IV, and Section V concludes the

paper.

II. RELATED WORK

A. CNN (convolutional neural network)-based object detec-

tion

The development of object detection in the computer vision

ﬁeld has been greatly inﬂuenced by CNN-based methods.

Traditional approaches using hand-designed features and clas-

siﬁers have been shown to be inadequate, leading to the

dominance of CNN-based methods. The initial CNN model,

LeNet-5 [14], was limited by computational resources and

model size. However, with advancements in computational

power and larger datasets, deeper and more complex CNN

models, such as AlexNet [15], VGGNet [16], GoogLeNet [17],

and ResNet [18], have emerged. These models have improved

network accuracy, reduced parameters, and addressed network

degradation issues, laying a solid foundation for 2D object

detection.

Two distinct methods have emerged from the convoluted

development of 2D object detection: two-stage and one-stage

detectors. Two-stage detectors, such as R-CNN [19] and Fast

R-CNN [20], generate candidate frames using algorithms and

perform classiﬁcation and regression on each candidate frame.

Faster R-CNN [21] introduces the region proposal network

(RPN) for candidate frame generation. In contrast, one-stage

detectors, such as YOLO [22] and SSD [23], perform classi-

ﬁcation and regression directly on each location in the input

image. YOLOv2 [24] and YOLOv3 [25] improved detection

accuracy through methods such as multiscale prediction, batch

normalization, and feature pyramid networks (FPNs). SSD

introduces multiscale detection using multiple-scale feature

maps, while RetinaNet [26] focuses on addressing the category

imbalance problem. For the aforementioned model, one-stage

detectors are more suitable for real-time detection tasks on

UAVs than two-stage detectors are because they do not require

additional networks or algorithms for ﬁne-tuning. However,

to compensate for the deﬁciency in accuracy resulting from

the pursuit of detection speed, improvements need to be

made to the backbone and neck of the one-stage detector

by developing various efﬁcient feature extraction modules or

structures. The backbone and neck are the basic components

of object detection models. The backbone is a CNN trained on

image classiﬁcation datasets such as ImageNet [27], in which

the input image is transformed into a high-dimensional feature

representation. The neck module further processes the feature

map, changing the scale and resolution to extract different lev-

els of feature information. Numerous object detection models,

such as NAS-FPN [28], EfﬁcientDet [29], YOLOv4 [30], and

YOLOv7 [31], have been developed based on these concepts,

incorporating various improvements and techniques to enhance

accuracy and performance. However, these general models are

often designed with modules that consider various common

tasks, exhibiting generalizability but not effectively addressing

speciﬁc challenges in particular scenarios. For instance, there

are several challenges, such as small object detection and

complex backgrounds, in our task. Therefore, making task-

speciﬁc modiﬁcations is crucial when contemplating different

tasks.

B. Developing an attention mechanism in the CV domain

Attention mechanisms, initially utilized in natural language

processing, have gained signiﬁcant traction in computer vi-

sion [32], [33], particularly in the ﬁeld of object detection.

Attention mechanisms such as channel attention, spatial atten-

tion, and their combinations have been introduced [34] [35]

[36]. They effectively utilize global and local information in

feature maps, improving feature representation and attention

weighting, thereby enhancing model accuracy and efﬁciency.

However, for these conventional attention mechanisms, a ﬁxed

window size or other constraints are typically employed to

regulate the correlation between each position and others. In

contrast, self-attention mechanisms can extract information

from different positions in the information sequence more

ﬂexibly, enabling the extraction of global information. This

ﬂexibility has contributed to the widespread application of

transformer [37] models based on self-attention mechanisms,

including in various domains such as computer vision. For

example, the Vision Transformer (ViT) [38] splits images

into patches for self-attention computations. The swin trans-

former [39] improves local information processing by using

a window-based partitioning approach. Detection with trans-

formers (DETR) [11] adopts a global self-attention mecha-

nism, allowing each position to obtain contextual information

from the entire image. Naturally, transformers incur substantial

computational costs and training time, posing challenges for

model convergence. To address these challenges, researchers

have introduced lightweight transformer object detectors, in-

cluding MobileViT [40] and EdgeViT [41]. Moreover, inno-

vative approaches such as conditional DETR [42] and DN-

DETR [43] have been developed to address the crucial issue

of slow training convergence. However, due to their simpli-

ﬁed design, the majority of current lightweight transformer

structures are applicable only to classiﬁcation tasks involving

small-sized image inputs and are not suitable for detection

tasks. The proposed detection methods aimed at addressing

slow convergence have made transformer models more com-

plex. Therefore, achieving a balance between the lightweight

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 14, NO. 8, AUGUST 2021 4

nature of transformer models and detection accuracy remains a

crucial research scope in the current ﬁeld of computer vision.

C. Combination of CNN and Transformer

In object detection, CNNs and transformers have distinct

applications and advantages. CNNs are known for their strong

image feature extraction abilities, ability to perform multi-

channel processing, and ability to learn spatial correlations.

However, CNN-based models have limitations in handling

objects of different sizes and proportions due to ﬁxed window

sizes and strides. On the other hand, transformers exhibit

excellent performance in capturing long-range dependencies

within input sequences without prior knowledge, albeit at a

slower speed and requiring substantial amounts of training

data. Evidently, the amalgamation of CNNs and transform-

ers offers complementarity across various dimensions, and

researchers have already delved into numerous methodologies

to explore this synergy.

The pioneering DETR model replaces fully connected and

convolutional layers with transformers while using ResNet

as the feature extractor, improving accuracy and efﬁciency.

Huawei’s CMTBlock combines depthwise separable convolu-

tion and the transformer’s multihead self-attention module for

local and global information fusion. The CMT model [44]

stacks the CMTBlock in a hybrid CNN-transformer structure.

The Conformer [45] adopts a dual-network structure, where

the CNN branch enhances local perception of the transformer

branch. The mobile-former [46] features parallel CNN and

transformer modules with bidirectional bridges, leveraging

MobileNet [47] for local processing and the transformer for

global interaction. However, networks or models employing

such hybrid structures face challenges in effectively balancing

accuracy and lightweight design. For instance, detectors such

as DETR, lacking FPN structures, exhibit suboptimal perfor-

mance in small object detection. While the CMT and Con-

former networks have proven effective in classiﬁcation tasks,

their application to downstream tasks such as object detection

deviates from the realm of lightweight design. In contrast

to the aforementioned models, which concatenate both struc-

tures, an alternative approach involves making transformer-

style improvements directly on the CNN network. ConvNeXt

[48] implements novel architectures and optimization strate-

gies similar to those of transformers, achieving competitive

results without attention structures. RepLKNet [49] employs

large convolutional kernels to widen the receptive ﬁeld, thus

emulating the transformer-like capability for global feature

extraction. By investigating the computational principles of

transformers, ACMix [50] maps their operation process onto

convolutional operators, thereby combining them with tra-

ditional convolution operations to construct a novel CNN

architecture. Parc-Net [51] introduces circular convolution

for global information extraction within a pure convolutional

structure. Although these innovative networks may not achieve

SOTA performance, their greater signiﬁcance lies in exploring

the factors contributing to the success of transformers from

a CNN perspective, providing inspiration for subsequent re-

search endeavors. The fusion of transformers and CNNs offers

a ﬂexible and diverse range of integration methods. Future

research should strive to deepen the understanding of their

interactions to improve design and optimization.

D. Object Detection of Antenna Interference Sources

Regularly monitoring and mitigating antenna interference

sources has become one of the most critical tasks in the

wireless communication ﬁeld. In the past, detecting antenna

interference sources mainly relied on traditional techniques

such as spectrum analysis, signal recognition and positioning.

However, these methods have many limitations. For example,

when detection personnel identify a radio interference signal

through a spectrum analyzer, they can determine only the

approximate direction of the interference source based on the

strength of the received signal and cannot accurately determine

its position.

The rapid advancement of deep learning and computer

vision has facilitated the successful application of object

detection-assisted tasks in various industries. Examples in-

clude defect detection in industrial settings, pest/weed de-

tection in agriculture, and vehicle and pedestrian detection

in transportation [52] [53] [54] [55] [56]. These solutions

provide effective ideas for our antenna interference source

detection task. When investigators conﬁrm the approximate

direction of the interference source antenna through a signal

receiver and spectrum analyzer, they can use drones with

cameras and related object detection algorithms to replace

manual accurate positioning work. Unfortunately, the ﬁeld of

antenna interference source detection based on object detection

tasks has largely not been explored. Due to the lack of

learning samples and models for related antenna interference

source detection, existing detection methods are not suitable

for antenna detection. Therefore, it is urgent and meaningful

to create a professional dataset and train a model suitable for

this detection task to address the difﬁculty of locating antenna

interference sources in the wireless communication ﬁeld.

III. PROPOSED DETECTION FRAMEWORK

A. Overall model structure

The overall idea for the network(Fig. 2) lies in the com-

bination of a CNN and transformer, both the inductive bias

ability of the convolutional operation and the ability of the

transformer to extract global information, while also meeting

the needs of a lightweight model with low computational

complexity. YOLO-Ant adopts DSLKNet, which is composed

of DSLK-Blocks, as the backbone for downsampling and

feature extraction in images. In DSLKNet, four DSLK-Layers

employ convolutional kernels of varying sizes to sequentially

extract rich features from different receptive ﬁelds of the

image. To address the challenge of detecting small objects, we

incorporate the neck structures of the FPN and PAN for multi-

scale feature learning. On the neck component, we conducted

pruning based on YOLOv5-s (detailed data provided in Section

IV. EXPERIMENT). In comparison to the baseline model, the

pruned neck model features an increased number of module

stacks and a reduced number of channels in each module.

This structural modiﬁcation effectively alleviates redundancy

剩余17页未读，继续阅读

评论收藏

内容反馈

版权申诉

人工智能_SYBH

粉丝: 4w+
资源: 200

YOLO-Ant：通过深度可分离卷积和大核设计实现天线干扰源检测的轻量级探测器

YOLO-World：实时开放词汇对象检测

YOLO-TLA：基于YOLOv5的高效轻量级小目标检测模型

YOLO-Drone：高空视角空中实时检测致密小物体

YOLO-CIANNA：在无线电数据中进行深度学习的星系检测 I. 一种受YOLO启发的新型源检测方法应用于SKAO SDC1

YOLO-Former：YOLO与ViT握手

YOLO-Nano:新版YOLO-Nano

YOLO-ReT: 边缘GPU上实现高准确性实时物体检测的探索

YOLO-MED ： 生物医学图像的多任务交互网络

用opencv的dnn模块实现Yolo-Fastest的目标检测.zip

论文对YOLO的演进进行了全面的分析，考察了从原始的YOLO到YOLOv8和YOLO-NAS每个版本中的创新和贡献

yolov论文-一种改进 YOLOv5 算法来提高自动驾驶系统中小物体检测的方法

CSL-YOLO：一种用于边缘计算的新型轻量级目标检测系统.7z

YOLO-World完整代码资源

Yolo-Fastest:Yolo通用目标检测模型与EfficientNet-lite结合使用，计算量仅为230Mflops（0.23Bflops），模型大小为1.3MB

YOLO-V5:使用对象检测模型YOLO-V5对图像进行定位和分类

深度学习领域yolo-v5算法在小麦头目标检测（带数据集）-10、wheat-detection-using-yolo-v5

yolo-FastestV2

用opencv的dnn模块实现Yolo-Fastest的目标检测python源码+模型+说明.zip

YOLO船舶目标检测数据集 yolo-boat-detect-dataset-1.zip

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

YOLOV5 + 双目相机实现三维测距（新版本）

Unet眼底血管图像分割数据集+代码+模型+系统界面+教学视频.zip

全新的SOTA模型YOLOv9

YOLOV5口罩检测数据集+代码+模型 2000张标注好的数据+教学视频.zip

最新资源

YOLO-MED ：生物医学图像的多任务交互网络