没有合适的资源?快使用搜索试试~ 我知道了~
二值神经网络综述(Binary Neural Networks: A Survey)【北航】.pdf
需积分: 49 23 下载量 130 浏览量
2020-04-10
20:31:48
上传
评论
收藏 849KB PDF 举报
温馨提示
在本文中,我们对这些算法进行了全面的概述,主要分为直接进行二值化的本机解决方案,以及使用使量化误差最小化,改善网络损耗函数和减小梯度误差等技术进行优化的解决方案。我们还将研究二进制神经网络的其他实用方面,例如硬件友好的设计和训练技巧。然后,我们对不同的任务进行了评估和讨论,包括图像分类,对象检测和语义分割。最后,展望了未来研究可能面临的挑战。
资源推荐
资源详情
资源评论
Binary Neural Networks: A Survey
Haotong Qin
a
, Ruihao Gong
a
, Xianglong Liu
∗a,b
, Xiao Bai
e
,
Jingkuan Song
c
, Nicu Sebe
d
a
State Key Lab of Software Development Environment, Beihang University, Beijing, China.
b
Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang
University, Beijing, China.
c
Center for Future Media and School of Computer Science and Engineering, University of
Electronic Science and Technology of China, Chengdu, China.
d
Department of Information Engineering and Computer Science, University of Trento,
Trento, Italy.
e
School of Computer Science and Engineering, Beijing Advanced Innovation Center for Big
Data and Brain Computing, Jiangxi Research Institute, Beihang University, Beijing, China
Abstract
The binary neural network, largely saving the storage and computation, serves
as a promising technique for deploying deep models on resource-limited devices.
However, the binarization inevitably causes severe information loss, and even
worse, its discontinuity brings difficulty to the optimization of the deep network.
To address these issues, a variety of algorithms have been proposed, and achieved
satisfying progress in recent years. In this paper, we present a comprehensive
survey of these algorithms, mainly categorized into the native solutions directly
conducting binarization, and the optimized ones using techniques like minimiz-
ing the quantization error, improving the network loss function, and reducing
the gradient error. We also investigate other practical aspects of binary neural
networks such as the hardware-friendly design and the training tricks. Then,
we give the evaluation and discussions on different tasks, including image clas-
sification, object detection and semantic segmentation. Finally, the challenges
that may be faced in future research are prospected.
Keywords: binary neural network, deep learning, model compression, network
quantization, model acceleration
∗
Corresponding author
Preprint submitted to Pattern Recognition April 8, 2020
arXiv:2004.03333v1 [cs.NE] 31 Mar 2020
1. Introduction
With the continuous development of deep learning [1], deep neural networks
have made significant progress in various fields, such as computer vision, natural
language processing and speech recognition. Convolutional neural networks
(CNNs) have been proved to be reliable in the fields of image classification [2,
3, 4, 5, 6], object detection [7, 8, 9, 10] and object recognition [11, 12, 2, 13, 14],
and thus have been widely used in practice.
Owing to the deep structure with a number of layers and millions of param-
eters, the deep CNNs enjoy strong learning capacity, and thus usually achieve
satisfactory performance. For example, the VGG-16 [12] network contains about
140 million 32-bit floating-point parameters, and can achieve 92.7% top-5 test
accuracy for image classification task on ImageNet dataset. The entire network
needs to occupy more than 500 megabytes of storage space and perform 1.6×10
10
floating-point arithmetic operations. This fact makes the deep CNNs heavily
rely on the high-performance hardware such as GPU, while in the real-world
applications, usually only the devices (e.g. , the mobile phones and embedded
devices) with limited computational resources are available [15]. For example,
embedded devices based on FPGAs usually have only a few thousands of com-
puting units, far from dealing with millions of floating-point operations in the
common deep models. There exists a severe contradiction between the complex
model and the limited computational resources. Although at present, a large
amount of dedicated hardware emerges for deep learning [16, 17, 18, 19, 20], pro-
viding efficient vector operations to enable fast convolution in forward inference,
the heavy computation and storage still inevitably limit the applications of the
deep CNNs in practice. Besides, due to the huge model parameter space, the
prediction of the neural networks is usually viewed as a black-box, which brings
great challenges to the interpretability of CNNs. Some works like [21, 22, 23]
empirically explore the function of each layer in the network. They visualize
the feature maps extracted by different filters and view each filter as a visual
unit focusing on different visual components.
2
From the aspect of explainable machine learning, we can summarize that
some filters are playing a similar role in the model, especially when the model
size is large. So it is reasonable to prune some useless filters or reduce their
precision to lower bits. On the one hand, we can enjoy more efficient inference
with such compression technique. On the other hand, we can utilize it to further
study the interpretability of CNNs, i.e. , finding out which layer is important,
which layer is useless and can be removed from the black-box, what structure
is beneficial for accurate prediction. Many prior studies have proven that there
usually exists large redundancy in the deep structure [24, 25, 26, 27]. For exam-
ple, by simply discarding the redundant weights, one can keep the performance
of the ResNet-50 [28], and meanwhile save more than 75% of parameters and
50% computational time. In the literature, approaches for compressing the deep
networks can be classified into five categories: parameter pruning [26, 29, 30, 31],
parameter quantizing [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], low-rank param-
eter factorization [42, 43, 44, 45, 46], transferred/compact convolutional fil-
ters [47, 48, 49, 50], and knowledge distillation [51, 52, 53, 54, 55, 56]. The
parameter pruning and quantizing mainly focus on eliminating the redundancy
in the model parameters respectively by removing the redundant/uncritical ones
or compressing the parameter space (e.g. , from the floating-point weights to the
integer ones). Low-rank factorization applies the matrix/tensor decomposition
techniques to estimate the informative parameters using the proxy ones of small
size. The compact convolutional filter based approaches rely on the carefully-
designed structural convolutional filters to reduce the storage and computation
complexity. The knowledge distillation methods try to distill a more compact
model to reproduce the output of a larger network.
Among the existing network compression techniques, quantization based one
serves as a promising and fast solution that yields highly compact models com-
pared to their floating-point counterparts, by representing the network weights
with very low precision. Along this direction, the most extreme quantization
is binarization, the interest in this survey. Binarization is a 1-bit quantization
where data can only have two possible values, namely -1(0) or +1. For network
3
compression, both the weight and activation can be represented by 1-bit with-
out taking too much memory. Besides, with the binarization, the heavy matrix
multiplication operations can be replaced with light-weighted bitwise XNOR op-
erations and Bitcount operations. Therefore, compared with other compression
methods, binary neural networks enjoy a number of hardware-friendly properties
including memory saving, power efficiency and significant acceleration. The pi-
oneering work like BNN [57] and XNOR-Net [58] has proven the effectiveness of
the binarization, namely, up to 32× memory saving and 58× speedup on CPUs,
which has been achieved by XNOR-Net for a 1-bit convolution layer. Follow-
ing the paradigm of binary neural network, in the past years a large amount
of research has been attracted on this topic from the fields of computer vision
and machine learning [1, 2, 12, 28], and has been applied to various popular
tasks such as image classification[59, 60, 61, 62, 63], detection [64, 65], and so
on. With the binarization technique, the importance of a layer can be easily
validated by switching it to full-precision or 1-bit. If the performance greatly
decreases after binarizing certain layer, we can conclude that this layer is on
the critical path of the network. Furthermore, it is also significant to find out
whether the full-precision model and the binarized model work in the same way
from the explainable machine learning view.
Besides focusing on the strategies of model binarization, many studies have
attempted to reveal the behaviors of model binarization, and further explain
the connections between the model robustness and the structure of deep neural
networks. This possibly helps to approach the answers to the essential questions:
how does the deep network work indeed and what network structure is better?
It is very interesting and important to well investigate the studies of binary
neural network, which will be very beneficial for understanding the behaviors
and structures of the efficient and robust deep learning models. Some of studies
in the literature have shown that binary neural networks can filter the input
noise, and pointed out that specially designed BNNs are more robust compared
with the full-precision neural networks. [66] shows that noise is continuously
amplified during the forward propagation of neural networks, and binarization
4
improves robustness by keeping the magnitude of the noise small.
The studies based on BNNs can also help us to analyze how structures in
deep neural networks work. Liu et.al. creatively proposed Bi-Real Net, which
added additional shortcuts (Bi-Real) to reduce the information loss caused by
binarization [62]. This structure works like the shortcut in ResNet and it helps
to explain why the widely used shortcuts can improve performance of deep neu-
ral networks to some extent. On the one hand, by visualizing the activations, it
can be seen that more detailed information in the shallow layer can be passed to
the deeper layer during forward propagation. On the other hand, gradients can
be directly backward propagated through the shortcut to avoid gradient vanish
problem. Zhu et.al. leveraged ensemble methods to improve the performance of
BNNs by building several groups of weak classifiers, and the ensemble methods
improve the performance of BNNs although sometimes face over-fitting prob-
lem [67]. Based on analysis and experimentation of BNNs, they showed that
the number of neurons is more important than the bit-width and it may not be
necessary to use real-valued neurons in deep neural networks, which is similar
to the principle of biological neural networks. Besides, reducing the bit-width
of certain layer to explore its effect on accuracy is one effective approach to
study the interpretability of deep neural networks. There are many works to
explore the sensitivity of different layers to binarization. It is a common sense
that the first layer and the last layer should be kept in higher precision, which
means that these layers play a more important role in the prediction of neural
networks.
This survey tries to exploit the nature of binary neural networks and cate-
gorizes the them into the naive binarization without optimizing the quantiza-
tion function and the optimized binarization including minimizing quantization
error, improving the loss function, and reducing the gradient error. It also dis-
cusses the hardware-friendly methods and the useful tricks of training binary
neural networks. In addition, we present the common datasets and network
structures of evaluation, and compare the performance of current methods on
different tasks. The organization of the remaining part is given as the follow-
5
剩余38页未读,继续阅读
资源评论
syp_net
- 粉丝: 158
- 资源: 1187
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功