Helix Vol. 8(4): 3465- 3469
3465 Copyright © 2018 Helix ISSN 2319 – 5592 (Online)
Application of Interpolation Pooling in Convolutional Neural Networks
1
Gaihua Wang,
*2
Guoliang Yuan,
3
Meng Lv,
4
WenZhou Liu
1
Hubei Collaborative Innovation Centre for High-efficiency Utilization of Solar Energy, Hubei University of
Technology, Wuhan 430068, China
1,2,3,4
School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China
Email: guoliang_yuan@hotmail.com
Received: 22
nd
March 2018, Accepted: 6
th
April 2018, Published:30
th
June 2018
Abstract In the existing convolutional neural
networks, the majority of the used pooling operations
are max pooling or mean pooling, but it would lose
some important feature information when processing
the feature maps. Here we report interpolation pooling
to overcome the problem for retaining more effective
information of feature maps. The interpolation pooling
takes the known pixel points of 4x4 with the nearest to
the interpolation point into account. Due to the
distance from the pixels to be inserted, the weight of
the pixels near the distance in the calculation is larger.
We apply it to different convolutional neural networks,
such as lenet-5 and pyramid convolutional neural
networks. We found that the method has the
advantages of faster convergence and higher accuracy
than the traditional method of pooling.
Keywords: Interpolation Pooling, Image
Classification, Convolutional Networks
1. Introduction
In recent years, deep learning has attracted the
attention of many scholars. The fact has proved that
deep learning has deeper and wider application than
traditional shallow learning networks, including visual
recognition, speech recognition and natural language
processing. In 2006, Hinton[1] improved the
method(deep belief nets) of deep learning breaks the
bottleneck of the development of BP neural network.
In all kinds of deep learning, convolutional neural
networks (CNNs) have been the most extensively
studied. CNNs consist of three types of layers:
convolution, pooling and fully connected layer[2]. For
convolutional layers, the convolution kernel is shared
by all the spatial positions. which reduce the
complexity of the model and make the network easier
to train[3]. Pooling is an important concept of CNNs,
including max pooling, mean pooling or mixed
pooling. A pooling layer reduces computational load
by reducing the number of convolutional layers. In
2012, Krizhevsky et al. proposed an AlexNet[4] model
that shows significant improvements. AlexNet is
similar to LeNet-5[3], but with a deeper structure.
Simonyan et[5] proposed the VGG network based on
AlexNet. And he proved that the enhancement of net
work depth helps to improve the accuracy of image
classification. By increasing the depth, the network
can better approximate the objective function, increase
the non-linearity, and get a better representation of the
features. However, this also increases the complexity
of the network and makes it more difficult to optimize.
To solve degradation problem with increasing the
depth of CNNs, He et[6] proposed a ResNet that won
the 2015 ILSVRC championship. ResNet maps low-
level features directly to high-level Network. And it is
eight times as deep as VGG and 20 times faster than
AlexNet. Szegedy et[7]. proposed an inception
module by observing and optimizing the network
structure, which reduces the network complexity and
replaces the previous convolution kernel by using a 1
× 1 convolution kernel in the inception module. The
number of training parameters for GoogLeNet[7] built
using the Inception module is only 1 / 12th of AlexNet,
but the accuracy of image classification on ImageNet
is improved. In 2017, Saining Xie[8] proposed the
ResNeXt network structure based on ResNet.
ResNeXt improves the accuracy without increasing
the complexity of the parameters and reducing the
number of hyper-parameters. At the same time, a
variety of methods [9-11]have been proposed to
overcome the difficulties encountered in deep CNNs
training.
All the methods mentioned above are improvements
for the depth, activation function and convolution
kernel of CNNs. In these models, max pooling or mean
pooling is used. Max pooling simply selects the
maximum value from the pooling area as the final
response value, which is sensitive to noise information.
Mean pooling takes the average values in the pooling
area, which effectively reduces the impact of noise
information, but it smooths the image and leads to the
loss of high frequency information[12].In this paper,
for the existing problems of the max pooling and the
mean pooling in CNNs, interpolation pooling is
proposed to optimize the network efficiency.
Interpolation pooling mainly uses the method of image
interpolation to select the nearest 16 pixels as the pixel
value corresponding to the final image, so as to
achieve the purpose of scaling the feature map.
DOI 10.29042/2018-3465-3469