具有深度可分离卷积的多尺度学习网络

171 浏览量 2021-03-08 13:17:43 上传评论收藏 1.05MB PDF 举报

多尺度学习网络、深度可分离卷积、残差连接和图像分类是本文主要研究的知识点。在深入讲解之前，我们首先要理解卷积神经网络（CNN）的基础知识及其在图像处理领域的应用。卷积神经网络（CNN）是一种深度学习模型，主要用于处理具有类似网格结构的数据，尤其是图像。CNN的核心概念包括共享权重、卷积层、池化层和非线性激活函数。LeNet是CNN早期的一个成功应用案例，之后以AlexNet的出现为标志，CNN进入了一个崭新的发展阶段。AlexNet使用ReLU作为激活函数，并加入了Dropout来减少过拟合现象。在此之后，更深的网络架构，例如VGG、Inception和ResNet，相继被提出。针对本文的主题，多尺度学习网络指的是能从多个尺度捕获特征信息的网络结构。多尺度网络对于图像识别来说具有优势，因为它能够提取不同层次的特征，从宏观到微观，这对于提高图像分类的准确性很有帮助。多尺度学习网络在图像检测、特征选择等方面也得到了应用。深度可分离卷积是谷歌在MobileNet中提出的概念。不同于传统卷积，深度可分离卷积将一个卷积分解为深度卷积和逐点卷积。深度卷积作用于每个输入通道，而逐点卷积则在深度卷积的输出上进行，以保持维度的一致性。通过这种结构，深度可分离卷积大幅减少了模型的计算复杂度和参数数量，同时保持了较高的准确性，这使得它在移动和边缘设备上的应用变得可行。残差连接是ResNet网络结构的创新之处，它允许网络学习输入和输出之间的残差映射，而不是直接拟合输入和输出之间的映射。这种结构有助于解决随着网络深度增加，梯度消失和梯度爆炸问题，从而使网络可以更深，提升性能。在本文中，作者提出的多尺度学习网络融合了深度可分离卷积和残差连接两个特点。网络设计中增加网络宽度（即多尺度块中的子网络数量），同时保持计算资源不变，实现了高效的学习。此外，结合残差连接后，显著加快了网络的训练速度，从而提高了网络在不同数据集上的性能。在实际应用中，多尺度学习网络对图像分类任务有很好的表现。由于多尺度学习网络能够捕捉图像中的不同尺度特征，因此能够在各种不同尺寸、风格和内容的图像上进行有效识别。深度可分离卷积减少了网络的计算需求，同时在一定程度上保持了特征学习的能力，这使得模型更加轻量级，适合于资源受限的环境，比如移动设备和实时图像处理。具有深度可分离卷积的多尺度学习网络为图像分类提供了一种创新的方法。该方法通过在保持计算资源不变的情况下增加网络宽度，以及通过残差连接加速训练，展示了其在图像分类任务上的强大性能。这些概念和技术的进步不断推动图像处理和计算机视觉领域向前发展。

资源推荐

资源详情

资源评论

RES E A R C H P A P E R Open Access

An multi-scale learning network with

depthwise separable convolutions

Gaihua Wang

1,2

, Guoliang Yuan

, Tao Li

and Meng Lv

Abstract

We present a simple multi-scale learning networ k for image classification that is i nspired by the MobileNet.

The proposed method has two advantages: (1) It uses the multi-scale block with depthwise separable convolutions,

which forms multiple sub-networks by increasing the width of the network while keeping the computational resources

constant. (2) It combines the multi-scale block with residual connections and that accelerates the training of networks

significantly. The experimental results show that the proposed method has strong performance compared to other

popular models on different datasets.

Keywords: Multi-scale, MobileNet, Residual connections, Image classification

1 Introduction

Convolutional neural network (CNN) has been proposed

since the late 1970s [1], and the first successful applica-

tion is the LeNet [2]. In CNNs, weights in the network

are shared, and pooling layers are spatial or tempor al

sampling using invariant function [3, 4]. In 2012,

AlexNet [5] was proposed to use rectified linear units

(ReLU) instead of the hyperbolic tangent as activation

function while adding dropout network [6] to decrease

the effect of overfitting. In subsequent years, further pro-

gress has been made by using deeper architectures [7–10].

For multiple-scale leaning network, in 2015, He et al.

[11] proposed a ResNet architecture that consists of many

stacked “residual units.” Szegedy et al. [12]proposedan

inception module by using a combination of all filter sizes

1 × 1, 3 × 3, and 5 × 5 into a single output vector. In 2017,

Xie [13] proposed the ResNeXt network structure, which

is a multiple-scale network by using “cardinality” as an es-

sential factor. Moreover, the multiple-scale architectures

have also been successfully employed in the detection [14]

and feature selection [15]. The bigger size means a larger

number of parameters, which makes the network prone to

overfitting. In addition, larger network size can increase

computational resources. Some efficient network architec-

tures [16, 17] are proposed in order to build smaller, lower

latency models. The MobileNet [18] is built primarily by

depthwise separable convolutions for mobile and

embedded vision applications. The ShuffleNet [19]

utilizes pointwise group convolution and channel

shuffle t o greatly reduce the computation cost while

maintaining the accuracy.

Motivated by the analysis above, in this paper, we use

the multiple-scale network to construct a convolutional

module. Different sizes can become more robust to scale

the changes. Then, we use depthwise separable convolu-

tions to modify the convolutional module. In addition,

residual connections are combined into the network.

The experimental results show that the proposed

method has better performance and less parameter on

different benchmark datasets for image classification.

The remaining of this paper is organized as follows:

Section 2 reviews the related work. Section 3 details the

architecture of the proposed method. Section 4 describes

our experiment and discusses the results of our experi-

ments. Section 5 concludes the paper.

2 The depthwise separable convolutions

Depthwise separable convolutions divide standard convo-

lution into a depthwise convolution and a 1 × 1 pointwise

convolution [18]. The depthwise convolution applies a

single filter to each input channel, given the feature map

is expressed by D

× D

× M. Depthwise convolution with

one filter per input is as follows (Eq. (1)):

* Correspondence: guoliang_yuan@hotmail.com

School of Electrical and Electronic Engineering, Hubei University of Technology,

Wuhan 430068, China

Full list of author information is available at the end of the article

PSJ Transactions on Compute

Vision and A

lication

International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and

reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to

the Creative Commons license, and indicate if changes were made.

Wang et al. IPSJ Transactions on Computer Vision and Applications

(2018) 10:11

https://doi.org/10.1186/s41074-018-0047-6

k;l;m

i: j

i; j;m

 F

kþi−1;lþ j−1; m

ð1Þ

where

K is the depthwise convolutional kernel of the

size D

× D

× M. F is the feature map of the input. The

filter in

K is applied to the m

channel in F to pro-

duce output feature maps

G. i, j represent the pixel pos-

ition of the convolutional kernel. k, l are the pixel

positions of the feature map.

Pointwise convolutio n is a simple 1 × 1 convolution

and used to create a linear combination of the depthwise

layer. The architecture is shown in Fig. 1. The standard

convolutional filters (Fig. 1a) are replaced by two layers:

depthwise convolution (Fig. 1b) and pointwise convolu-

tion (Fig. 1c). The depthwise separable convolutions

have the effect of drastically reducing the computation

and model parameters.

3 The proposed method

Our method increases the multi-scale features of the

network. It contains features at different scales of the in-

put. In the merge of these features, we adopt two differ-

ent merge methods: feature connection and feature

addition. The two kinds of merge methods can be

expressed by the following: Eq. (2) and Eq. (3) (our

method 1 and method 2), respectively. And with the

depthwise separable convolution, it reduces the com-

plexity of calculations. G represents the feature map. h

and w represent the width and height of the features

map, respectively. G

represents the m

input feature

maps at the n

scale that need to be merged. i and j rep-

resent the corresponding pixel position.

hw mnðÞ

¼ concat G

; ⋯; G



ð2Þ

hwm

i; j∈h;w

i j

þ ⋯ þ G

i j



ð3Þ

In Eq. (2), concat is a connecting function. The feature is

connected by different scales to a larger dimension. In

Eq. (3), the feature is composed of the sum of the corre-

sponding pixel of the different scale feature maps. The

structures of the multi-scale separable convolutional unit

are shown in Fig. 2. Figure 2a is a network unit of our

method 1. Figure 2b is a network unit of our method 2. N

represents the number of extended feature maps. The

number of feature maps is increased by 1 × 1 pointwise

convolution. From Fig . 2, multi-scale separable convolu-

tions are constructed by using a 3 × 3 convolution layer

and two 3 × 3 convolution layers. They are connected

(Fig. 2a) or added (Fig. 2b) by different forms, which in-

crease the width of the network. For smaller input datasets,

we usually choose 2–3 units to construct the network. The

network structure (input is 32 × 32) is described in detail

(Table 1). It is composed of three parts: input layer, three

network units, and two fully connected layers.

To compare the different network, we also split

GoogLeNet, AlexNet, and MobileNet into different units

respectively (Fig. 3). Figure 3a is a standard convolutional

filter that is applied into AlexNet. Figure 3b is a depthwise

separable convolutional filter that is applied into

MobileNet. Figure 3c is the structure of GoogLeNet.

We give the calculation of the training parameters

(Table 2). The training parameters of GoogL eNet are far

more than MobileNet (almost 13 times). AlexNet is

seven times more than MobileNet. And our methods are

only two times MobileNet. But their performances are

significantly higher than the MobileNet. The specific re-

sults can be seen in the experimental part (Section 4).

4 Experiments

In this section, the experiments on several public datasets

are conducted to demonstrate the effectiveness of the pro-

posed method. We also compare the proposed method

with some existing methods including GoogLeNet,

AlexNet, and MobileNet. Different layers are used for

different datasets.

Fig. 1 The architecture. a Standard convolution filters. b Depthwise

convolutional filters. c 1 × 1 convolutional filters called pointwise

convolution in the context of depthwise separable convolution

Wang et al. IPSJ Transactions on Computer Vision and Applications (2018) 10:11 Page 2 of 8

剩余7页未读，继续阅读

评论收藏

内容反馈

weixin_38527987

粉丝: 6
资源: 976

具有深度可分离卷积的多尺度学习网络

MUNet：一种多尺度自适应的遥感语义分割深度学习网络.pdf

基于深度学习的多尺度轻量级图像去雾网络.pdf

基于多尺度学习与深度卷积神经网络的遥感图像土地利用分类.pdf

深度可分离卷积网络的理论与实战（TF2.0）

轻量级深度可分离混合卷积神经网络的目标检测算法.pdf

基于多尺度轻量级卷积网络的PCB裸板缺陷识别算法.pdf

人工智能-深度学习-基于深度学习的无人车视觉跟踪系统研究.pdf

多尺度并行融合的轻量级卷积神经网络设计.pdf

具有多尺度相关性的深度多模态度量学习，用于图像-文本检索

基于多尺度反卷积深度学习的显著性检测.pdf

一种基于深度学习的多尺度深度网络的场景标注算法.pdf

基于注意力机制和可分离卷积的双目立体匹配算法.docx

基于深度卷积神经网络的蝇类面部识别.pdf

改进深度卷积神经网络及其在变工况滚动轴承故障诊断中的应用.pdf

基于深度学习的偏光片缺陷实时检测算法.pdf

基于轻量化YOLOv3卷积神经网络的苹果检测方法.pdf

基于轻量深度学习网络的机房人物检测研究.pdf

基于改进卷积神经网络的多视角人脸表情识别.pdf

基于深度学习的柑橘实时识别方法.pdf

基于卷积神经网络的图像改进处理.pdf

基于深度学习的手部21类关键点检测

Python-WaveUNet用于端到端音频源分离的多尺度神经网络

卷积神经网络研究综述_周飞燕

图像识别：基于mobilenet轻量级神经网络识别

基于改进的深度学习网络的SAR图像瞬时海岸线自动提取算法.pdf

基于深度学习的疲劳检测算法.pdf

MATLAB计算机视觉与深度学习实战代码 - 基于多尺度形态学提取眼前节组织.rar

最新资源