二值神经网络综述（BinaryNeuralNetworks:ASurvey）【北航】.pdf

binary_NN

需积分: 49 130 浏览量 2020-04-10 20:31:48 上传评论收藏 849KB PDF 举报

资源推荐

资源详情

资源评论

Binary Neural Networks: A Survey

Haotong Qin

, Ruihao Gong

, Xianglong Liu

∗a,b

, Xiao Bai

Jingkuan Song

, Nicu Sebe

State Key Lab of Software Development Environment, Beihang University, Beijing, China.

Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang

University, Beijing, China.

Center for Future Media and School of Computer Science and Engineering, University of

Electronic Science and Technology of China, Chengdu, China.

Department of Information Engineering and Computer Science, University of Trento,

Trento, Italy.

School of Computer Science and Engineering, Beijing Advanced Innovation Center for Big

Data and Brain Computing, Jiangxi Research Institute, Beihang University, Beijing, China

Abstract

The binary neural network, largely saving the storage and computation, serves

as a promising technique for deploying deep models on resource-limited devices.

However, the binarization inevitably causes severe information loss, and even

worse, its discontinuity brings diﬃculty to the optimization of the deep network.

To address these issues, a variety of algorithms have been proposed, and achieved

satisfying progress in recent years. In this paper, we present a comprehensive

survey of these algorithms, mainly categorized into the native solutions directly

conducting binarization, and the optimized ones using techniques like minimiz-

ing the quantization error, improving the network loss function, and reducing

the gradient error. We also investigate other practical aspects of binary neural

networks such as the hardware-friendly design and the training tricks. Then,

we give the evaluation and discussions on diﬀerent tasks, including image clas-

siﬁcation, object detection and semantic segmentation. Finally, the challenges

that may be faced in future research are prospected.

Keywords: binary neural network, deep learning, model compression, network

quantization, model acceleration

∗

Corresponding author

Preprint submitted to Pattern Recognition April 8, 2020

arXiv:2004.03333v1 [cs.NE] 31 Mar 2020

1. Introduction

With the continuous development of deep learning [1], deep neural networks

have made signiﬁcant progress in various ﬁelds, such as computer vision, natural

language processing and speech recognition. Convolutional neural networks

(CNNs) have been proved to be reliable in the ﬁelds of image classiﬁcation [2,

3, 4, 5, 6], object detection [7, 8, 9, 10] and object recognition [11, 12, 2, 13, 14],

and thus have been widely used in practice.

Owing to the deep structure with a number of layers and millions of param-

eters, the deep CNNs enjoy strong learning capacity, and thus usually achieve

satisfactory performance. For example, the VGG-16 [12] network contains about

140 million 32-bit ﬂoating-point parameters, and can achieve 92.7% top-5 test

accuracy for image classiﬁcation task on ImageNet dataset. The entire network

needs to occupy more than 500 megabytes of storage space and perform 1.6×10

ﬂoating-point arithmetic operations. This fact makes the deep CNNs heavily

rely on the high-performance hardware such as GPU, while in the real-world

applications, usually only the devices (e.g. , the mobile phones and embedded

devices) with limited computational resources are available [15]. For example,

embedded devices based on FPGAs usually have only a few thousands of com-

puting units, far from dealing with millions of ﬂoating-point operations in the

common deep models. There exists a severe contradiction between the complex

model and the limited computational resources. Although at present, a large

amount of dedicated hardware emerges for deep learning [16, 17, 18, 19, 20], pro-

viding eﬃcient vector operations to enable fast convolution in forward inference,

the heavy computation and storage still inevitably limit the applications of the

deep CNNs in practice. Besides, due to the huge model parameter space, the

prediction of the neural networks is usually viewed as a black-box, which brings

great challenges to the interpretability of CNNs. Some works like [21, 22, 23]

empirically explore the function of each layer in the network. They visualize

the feature maps extracted by diﬀerent ﬁlters and view each ﬁlter as a visual

unit focusing on diﬀerent visual components.

From the aspect of explainable machine learning, we can summarize that

some ﬁlters are playing a similar role in the model, especially when the model

size is large. So it is reasonable to prune some useless ﬁlters or reduce their

precision to lower bits. On the one hand, we can enjoy more eﬃcient inference

with such compression technique. On the other hand, we can utilize it to further

study the interpretability of CNNs, i.e. , ﬁnding out which layer is important,

which layer is useless and can be removed from the black-box, what structure

is beneﬁcial for accurate prediction. Many prior studies have proven that there

usually exists large redundancy in the deep structure [24, 25, 26, 27]. For exam-

ple, by simply discarding the redundant weights, one can keep the performance

of the ResNet-50 [28], and meanwhile save more than 75% of parameters and

50% computational time. In the literature, approaches for compressing the deep

networks can be classiﬁed into ﬁve categories: parameter pruning [26, 29, 30, 31],

parameter quantizing [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], low-rank param-

eter factorization [42, 43, 44, 45, 46], transferred/compact convolutional ﬁl-

ters [47, 48, 49, 50], and knowledge distillation [51, 52, 53, 54, 55, 56]. The

parameter pruning and quantizing mainly focus on eliminating the redundancy

in the model parameters respectively by removing the redundant/uncritical ones

or compressing the parameter space (e.g. , from the ﬂoating-point weights to the

integer ones). Low-rank factorization applies the matrix/tensor decomposition

techniques to estimate the informative parameters using the proxy ones of small

size. The compact convolutional ﬁlter based approaches rely on the carefully-

designed structural convolutional ﬁlters to reduce the storage and computation

complexity. The knowledge distillation methods try to distill a more compact

model to reproduce the output of a larger network.

Among the existing network compression techniques, quantization based one

serves as a promising and fast solution that yields highly compact models com-

pared to their ﬂoating-point counterparts, by representing the network weights

with very low precision. Along this direction, the most extreme quantization

is binarization, the interest in this survey. Binarization is a 1-bit quantization

where data can only have two possible values, namely -1(0) or +1. For network

compression, both the weight and activation can be represented by 1-bit with-

out taking too much memory. Besides, with the binarization, the heavy matrix

multiplication operations can be replaced with light-weighted bitwise XNOR op-

erations and Bitcount operations. Therefore, compared with other compression

methods, binary neural networks enjoy a number of hardware-friendly properties

including memory saving, power eﬃciency and signiﬁcant acceleration. The pi-

oneering work like BNN [57] and XNOR-Net [58] has proven the eﬀectiveness of

the binarization, namely, up to 32× memory saving and 58× speedup on CPUs,

which has been achieved by XNOR-Net for a 1-bit convolution layer. Follow-

ing the paradigm of binary neural network, in the past years a large amount

of research has been attracted on this topic from the ﬁelds of computer vision

and machine learning [1, 2, 12, 28], and has been applied to various popular

tasks such as image classiﬁcation[59, 60, 61, 62, 63], detection [64, 65], and so

on. With the binarization technique, the importance of a layer can be easily

validated by switching it to full-precision or 1-bit. If the performance greatly

decreases after binarizing certain layer, we can conclude that this layer is on

the critical path of the network. Furthermore, it is also signiﬁcant to ﬁnd out

whether the full-precision model and the binarized model work in the same way

from the explainable machine learning view.

Besides focusing on the strategies of model binarization, many studies have

attempted to reveal the behaviors of model binarization, and further explain

the connections between the model robustness and the structure of deep neural

networks. This possibly helps to approach the answers to the essential questions:

how does the deep network work indeed and what network structure is better?

It is very interesting and important to well investigate the studies of binary

neural network, which will be very beneﬁcial for understanding the behaviors

and structures of the eﬃcient and robust deep learning models. Some of studies

in the literature have shown that binary neural networks can ﬁlter the input

noise, and pointed out that specially designed BNNs are more robust compared

with the full-precision neural networks. [66] shows that noise is continuously

ampliﬁed during the forward propagation of neural networks, and binarization

improves robustness by keeping the magnitude of the noise small.

The studies based on BNNs can also help us to analyze how structures in

deep neural networks work. Liu et.al. creatively proposed Bi-Real Net, which

added additional shortcuts (Bi-Real) to reduce the information loss caused by

binarization [62]. This structure works like the shortcut in ResNet and it helps

to explain why the widely used shortcuts can improve performance of deep neu-

ral networks to some extent. On the one hand, by visualizing the activations, it

can be seen that more detailed information in the shallow layer can be passed to

the deeper layer during forward propagation. On the other hand, gradients can

be directly backward propagated through the shortcut to avoid gradient vanish

problem. Zhu et.al. leveraged ensemble methods to improve the performance of

BNNs by building several groups of weak classiﬁers, and the ensemble methods

improve the performance of BNNs although sometimes face over-ﬁtting prob-

lem [67]. Based on analysis and experimentation of BNNs, they showed that

the number of neurons is more important than the bit-width and it may not be

necessary to use real-valued neurons in deep neural networks, which is similar

to the principle of biological neural networks. Besides, reducing the bit-width

of certain layer to explore its eﬀect on accuracy is one eﬀective approach to

study the interpretability of deep neural networks. There are many works to

explore the sensitivity of diﬀerent layers to binarization. It is a common sense

that the ﬁrst layer and the last layer should be kept in higher precision, which

means that these layers play a more important role in the prediction of neural

networks.

This survey tries to exploit the nature of binary neural networks and cate-

gorizes the them into the naive binarization without optimizing the quantiza-

tion function and the optimized binarization including minimizing quantization

error, improving the loss function, and reducing the gradient error. It also dis-

cusses the hardware-friendly methods and the useful tricks of training binary

neural networks. In addition, we present the common datasets and network

structures of evaluation, and compare the performance of current methods on

diﬀerent tasks. The organization of the remaining part is given as the follow-

剩余38页未读，继续阅读

评论收藏

内容反馈

syp_net

粉丝: 158
资源: 1187

二值神经网络综述（Binary Neural Networks: A Survey）【北航】.pdf

最新资源

二值神经网络综述（Binary Neural Networks: A Survey）【北航】.pdf

二值神经网络（Binary Neural Networks）最新综述.md

神经网络综述与现状分析

2021 - A Survey of Binary Code Similarity.pdf

图卷积神经网络中的池化综述.pdf

神经网络模型压缩方法讲解二值三值DNS\INQ等

图神经网络计算综述论文（2020年）

BNN_NoBN:[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

神经网络入门（Neural Networks in Plain English）.docx

Wiley.Publishing.Professional.Linux.Kernel.Architecture.2008.pdf

二值神经网络调研

SEG-Y Binary Conventor

binary2txt.rar_binary2t_binary2text_binary2txt_site:www.pudn.com

03Neural Network (Basic Ideas).pdf

《Analysis and Applications of Artificial Neural Networks》英文版

cef_binary2

machine learning Neural Networks for Binary Classification

cef_binary_3.2623.1401.gb90a3be_windows32.7z

Training-Tricks-for-Binarized-Neural-Networks:二值神经网络训练技巧的集合

cef binary 91.1.23 chromium 91.0.4472.164 windows64

cef_binary_3.3071.1611.g4a19305_windows64.rar

cef_binary_3.3071.1611.g4a19305_windows64.zip

matlab神经网络和优化算法：66 二进制PID.zip

cef_binary_3.2623.1401.gb90a3be_windows64

TensorFlow学习笔记(一补)：使用Anaconda安装TensorFlow.pdf

cef_binary_3.3029.1619.geeeb5d7_windows32.tar.bz2.tar

最新资源