MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
Applications
Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko
Weijun Wang Tobias Weyand Marco Andreetto Hartwig Adam
Google Inc.
{howarda,menglong,bochen,dkalenichenko,weijunw,weyand,anm,hadam}@google.com
Abstract
We present a class of efficient models called MobileNets
for mobile and embedded vision applications. MobileNets
are based on a streamlined architecture that uses depth-
wise separable convolutions to build light weight deep
neural networks. We introduce two simple global hyper-
parameters that efficiently trade off between latency and
accuracy. These hyper-parameters allow the model builder
to choose the right sized model for their application based
on the constraints of the problem. We present extensive
experiments on resource and accuracy tradeoffs and show
strong performance compared to other popular models on
ImageNet classification. We then demonstrate the effective-
ness of MobileNets across a wide range of applications and
use cases including object detection, finegrain classifica-
tion, face attributes and large scale geo-localization.
1. Introduction
Convolutional neural networks have become ubiquitous
in computer vision ever since AlexNet [19] popularized
deep convolutional neural networks by winning the Ima-
geNet Challenge: ILSVRC 2012 [24]. The general trend
has been to make deeper and more complicated networks
in order to achieve higher accuracy [27, 31, 29, 8]. How-
ever, these advances to improve accuracy are not necessar-
ily making networks more efficient with respect to size and
speed. In many real world applications such as robotics,
self-driving car and augmented reality, the recognition tasks
need to be carried out in a timely fashion on a computation-
ally limited platform.
This paper describes an efficient network architecture
and a set of two hyper-parameters in order to build very
small, low latency models that can be easily matched to the
design requirements for mobile and embedded vision ap-
plications. Section 2 reviews prior work in building small
models. Section 3 describes the MobileNet architecture and
two hyper-parameters width multiplier and resolution mul-
tiplier to define smaller and more efficient MobileNets. Sec-
tion 4 describes experiments on ImageNet as well a variety
of different applications and use cases. Section 5 closes
with a summary and conclusion.
2. Prior Work
There has been rising interest in building small and effi-
cient neural networks in the recent literature, e.g. [16, 34,
12, 36, 22]. Many different approaches can be generally
categorized into either compressing pretrained networks or
training small networks directly. This paper proposes a
class of network architectures that allows a model devel-
oper to specifically choose a small network that matches
the resource restrictions (latency, size) for their application.
MobileNets primarily focus on optimizing for latency but
also yield small networks. Many papers on small networks
focus only on size but do not consider speed.
MobileNets are built primarily from depthwise separable
convolutions initially introduced in [26] and subsequently
used in Inception models [13] to reduce the computation in
the first few layers. Flattened networks [16] build a network
out of fully factorized convolutions and showed the poten-
tial of extremely factorized networks. Independent of this
current paper, Factorized Networks[34] introduces a similar
factorized convolution as well as the use of topological con-
nections. Subsequently, the Xception network [3] demon-
strated how to scale up depthwise separable filters to out
perform Inception V3 networks. Another small network is
Squeezenet [12] which uses a bottleneck approach to design
a very small network. Other reduced computation networks
include structured transform networks [28] and deep fried
convnets [37].
A different approach for obtaining small networks is
shrinking, factorizing or compressing pretrained networks.
Compression based on product quantization [36], hashing
1
arXiv:1704.04861v1 [cs.CV] 17 Apr 2017