没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and
Arbitrary Number of Parameters
Abstract. Neural networks based on convolutional operations have achieved remarkable results
in the field of deep learning, but there are two inherent flaws in standard convolutional
operations. On the one hand, the convolution operation be confined to a local window and
cannot capture information from other locations, and its sampled shapes is fixed. On the other
hand, the size of the convolutional kernel is fixed to
𝑘
×
𝑘
, which is a fixed square shape, and
the number of parameters tends to grow squarely with size. It is obvious that the shape and size
of targets are various in different datasets and at different locations. Convolutional kernels with
fixed sample shapes and squares do not adapt well to changing targets. In response to the above
questions, the Alterable Kernel Convolution (AKConv) is explored in this work, which gives
the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to
provide richer options for the trade-off between network overhead and performance. In
AKConv, we define initial positions for convolutional kernels of arbitrary size by means of a
new coordinate generation algorithm. To adapt to changes for targets, we introduce offsets to
adjust the shape of the samples at each position. Moreover, we explore the effect of the neural
network by using the AKConv with the same size and different initial sampled shapes. AKConv
completes the process of efficient feature extraction by irregular convolutional operations and
brings more exploration options for convolutional sampling shapes. Object detection
experiments on representative datasets COCO2017, VOC 7+12 and VisDrone-DET2021 fully
demonstrate the advantages of AKConv. AKConv can be used as a plug-and-play convolutional
operation to replace convolutional operations to improve network performance. The code for
the relevant tasks can be found at https://github.com/CV-ZhangXin/AKConv.
1 Introduction
Convolutional Neural Networks (CNNs), such as ResNet [1], DenseNet [2], and YOLO
[3], have demonstrated excellent performance in various applications and have led the
technological progress in many aspects of modern society. It has become indispensable from
image recognition in self-driving cars [4] and medical image analysis [5] to intelligent
surveillance [6] and personalized recommendation systems [7]. These successful network
models rely heavily on convolutional operations, which efficiently extract local features in
images and ensure model complexity.
Despite the fact that CNNs have achieved many successes in classification [8], object
detection [9], semantic segmentation [10], etc., they still have some limitations. One of the most
notable limitations concerns the choice of convolutional sample shape and size. Standard
convolution operations tend to rely on square kernels with fixed sampling locations, such as 1
× 1, 3 × 3, 5 × 5 and 7 × 7, etc. The sampling position of the regular kernel is not deformable
and cannot be dynamically changed in response to changes in the shape of the object.
Deformable Conv [11, 12] enhances network performance with offset to flexibly adjust the
sampling shape of the convolution kernel, which adapts to the change of the target. For instance,
in [13, 14, 15], they utilized it to to align features. Zhao et al. [16] improved the effectively of
detection the dead fish by adding it in YOLOv4 [17]. Yang et al. [18] improved the YOLOv8
[19] for detecting the cattle by adding it in backbone. Li et al. [20] introduced Deformable Conv
into deep image compression tasks [21, 22] to obtain content-adaptive receptive-fields.
Although the studies mentioned above have demonstrated the superior benefits
ofDeformable Conv. It is still not flexible enough. Because the convolution kernel is still
limited to select kernel-size, and the number of convolution kernel parameters in standard
convolutional operations and Deformable Conv shows a squared growth trend with the increase
of the convolution kernel size, which is not a friendly way of growth to the hardware
environment. Therefore, after careful analysis of standard convolution operations and
Deformable Conv, we propose Alterable Kernel Convolution (AKConv). Unlike standard
regular convolution, AKConv is a novel convolutional operations, which can extract features
using efficient convolution kernels with any number of parameters such as (1, 2, 3, 4, 5, 6, 7...),
which is not implemented by standard convolution and Deformable Convolution. AKConv can
easily be used to replace the standard convolutional operations in a network to improve network
performance. Importantly, AKConv allows the number of convolutional parameters to trend
linearly up or down, which is beneficial to hardware environments, and it can be used as an
alternative to lightweight models to reduce the number of model parameters and computational
overhead. Secondly, it has more options to improve the network performance in large kernels
with sufficient resources. Fig. 1 shows that the regular convolutional kernel makes the number
of parameters to show a square increasing trend, while AKConv only shows a linear increasing
trend. Compared to the square growth trend, AKConv grows gently and provides more options
for the choice of convolution kernel. Furthermore, its ideas can be extended to specific areas.
Because, the special sampled shapes can be created for convolution operations according to the
prior knowledge, and then dynamically and automatically adapt to changes in the target shape
via offset. Object detection experiments on representative datasets VOC [23], COCO2017 [24],
VisDrone-DET2021 [25] fully demonstrate the advantages of AKConv. In summary, our
contributions are as follows:
1. For different sizes of convolutional kernels, we propose an algorithm to generate initial
sampled coordinate for convolutional kernels of arbitrary sizes.
2. To adapt to the different variations of the target, we adjust the sampling position of the
irregular convolutional kernel by the obtained offsets.
3. Compared to regular convolution kernels, the proposed AKConv realizes the function
of irregular convolution kernels to extract features, providing convolution kernels with arbitrary
sampling shapes and sizes for a variety of varying targets, which makes up for the shortcomings
of regular convolutions.
2 Related works
In recent years, many works have considered and analyzed standard convolutional
operations from different perspectives, and then designed novel convolutional operations to
improve network performance.
Li et al. [26] argued that convolutional kernels sharing parameters across all spatial
locations, which leads to limited modeling capabilities across different spatial locations, and do
not effectively capture spatially long-range relationships. Secondly, the approach of using a
different convolution kernel for each output channel is actually not efficient. Therefore, to
address these shortcomings, they proposed the Involution operator, which inverts the features
of the convolutional operation to improve network performance. Qi et al. [27] proposed the
DSConv based on Deformable Conv. The offset obtained from learning in Deformable Conv is
freedom, leading to the model losing a small percentage of fine structure features, which poses
a great challenge for the task of segmenting elongated tubular structures, therefore, they
proposed the DSConv. Zhang et al. [28] understood the spatial attention mechanism form a new
perspective, they asserted that the spatial attention mechanism essentially solves the problem
of parameter sharing of convolutional operations. However, some spatial attention mechanisms,
such as CBAM [29] and CA [30], not completely solve the problem of large-size convolutional
parameter sharing. Therefore, they proposed RFAConv. Chen et al. [31] proposed the Dynamic
Conv. Unlike using a convolutional kernel for every layers, the Dynamic Conv dynamically
aggregated multiple parallel convolutional kernels based on their attention. The Dynamic Conv
provided greater representation of features. Tan et al. [32] argued that kernel size is often
neglected in CNNS, which may affect the accuracy and efficiency of the network. Second,
using only layer-by-layer convolution does not utilize the full potential of convolutional
networks. Therefore, they proposed MixConv, which naturally mixes multiple kernel sizes in a
single convolution to improve performance of networks.
Although these methods improve the performance of convolutional operations, they are
still limited to regular convolutional operations and do not allow multiple variations of
convolutional sample shapes. In contrast, our proposed AKConv can efficiently extract features
using a convolutional kernel with arbitrary number of parameters and sample shapes.
3 Methods
3.1 Define the initial sampling position
Convolutional neural networks are based on the convolution operation, which localizes
the features at the corresponding locations by means of a regular sampling grid. In [11, 33, 34],
the regular sampling grid for the 3 × 3 convolution operation is given. Let R denote the sampling
grid, then R is denoted as follows:
However, the sampling grid is regular, while AKConv targets irregularly shaped
convolutional kernels. Therefore, to allow irregular convolutional kernels to have a sampling
grid, we create an algorithm for arbitrary size convolution, which generates the initial sampling
coordinates of the convolutional kernel Pn. First, we generate the sampling grid as a regular
sampling grid, then the irregular grids is created for the remaining sampling points, and finally,
we stitch them to generate the overall sampling grid. The pseudo code is as in Algorithm 1.
As shown in Fig. 2, it shown that the initial sampled coordinates is generated for arbitrary
size convolution. The sampling grid of the regular convolution is centered at the (0, 0) point.
While the irregular convolution has no center at many sizes, to adapt to the size of the
convolution used, we set the upper left corner (0, 0) point as the sampling origin in the algorithm.
After defining the initial coordinates Pn for the irregular convolution, the corresponding
convolution operation at position P0 can be defined as follows:
Here, w denotes the convolutional parameter. However, the irregular convolution
operations are impossible to realize, because irregular sampling coordinates cannot be matched
to the corresponding size convolution operations, e.g., convolution of sizes 5, 7, and 13.
Cleverly, our proposed AKConv realizes it.
3.2 Alterable convolutional operation
It is obvious that the standard convolutional sampling position is fixed, which leads to the
convolution can only extract the local information of the current window, and can not capture
the information of other positions. Deformable Conv learns the offsets through convolutional
operations to adjust the sampling grid of the initial regular pattern. The approach compensates
for the shortcomings of the convolution operation to a certain extent. However, the standard
convolution and Deformable Conv are regular sampling grids that not allow convolution
kernels with arbitrary number of parameters. Moreover, as the size of the convolution kernel
increases their number of convolution parameters tends to increase by a square, which is not
friendly for the hardware environment. Therefore, we propose a novel Alterable convolutional
operation (AKConv). As shown in Fig. 3, it illustrates the overall structure of an AKConv of
size 5.
Similar to Deformable Conv, in AKConv, the offset of the corresponding kernel are first
obtained by convolution operations, which has the dimensions (B, 2N, H, W), where N is the
convolution kernel size. Take Fig. 3 as an example, N = 5. Then the modified coordinates are
剩余25页未读,继续阅读
资源评论
ProgrammerMonkey
- 粉丝: 42
- 资源: 37
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功