【免费】用于可穿戴设备上资源受限推理的深度学习层的稀疏化和分离1资源-CSDN文库

需积分: 0 101 浏览量 2022-08-04 15:20:26 上传评论收藏 4.35MB PDF 举报

资源详情

资源评论

资源推荐

Sparsiﬁcation and Separation of Deep Learning Layers

for Constrained Resource Inference on Wearables

Sourav Bhattacharya

and Nicholas D. Lane

§,†

Nokia Bell Labs,

†

University College London

ABSTRACT

Deep learning has revolutionized the way sensor data are

analyzed and interpreted. The accuracy gains these ap-

proaches o↵er make them attractive for the next genera-

tion of mobile, wearable and embedded sensory applica-

tions. However, state-of-the-art deep learning algorithms

typically require a signiﬁcant amount of device and pro-

cessor resources, even just for the inference stages that are

used to discriminate high-level classes from low-level data.

The limited availability of memory, computation, and en-

ergy on mobile and embedded platforms thus pose a signif-

icant challenge to the adoption of these powerful learning

techniques. In this paper, we propose SparseSep, a new ap-

proach that leverages the sparsiﬁcation of fully connected

layers and separation of convolutional kernels to reduce the

resource requirements of popular deep learning algorithms.

As a result, SparseSep allows large-scale DNNs and CNNs to

run eﬃciently on mobile and embedded hardware with only

minimal impact on inference accuracy. We experiment using

SparseSep across a variety of common processors such as the

Qualcomm Snapdragon 400, ARM Cortex M0 and M3, and

Nvidia Tegra K1, and show that it allows inference for vari-

ous deep models to execute more eﬃciently; for example, on

average requiring 11.3 times less memory and running 13.3

times faster on these representative platforms.

CCS Concepts

•Computing methodologies ! Machine learning; Neu-

ral networks; •Computer systems organization !

Embedded software;

Keywords

Wearable computing; deep learning; sparse coding; weight

factorization

1. INTRODUCTION

Recognizing co ntextual signals and the everyday activity

of users from raw sensor data is a core enabler for mobile

and wearable applications. By monitoring user actions (via

sp eech, ambient audio, motion) and context using a variety

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

SenSys ’16, November 14-16, 2016, Stanford, CA, USA

 2016 ACM. ISBN 978-1-4503-4263-6/16/11. . . $15.00

DOI: http://dx.doi.org/10.1145/2994551.2994564

of sensing modalities, mobile developers are able to provide

b oth enhanced, and brand new, application features. While

sensor-related applications and systems are still maturing,

and are highly diverse, a notable characteristic is their re-

liance on making a wide-variety of sensor inferences.

Accurately extracting context and activity information from

noisy mobile sensor data remains an unsolved problem. Be-

cause the real world is highly complex, unpredictable and

constantly changing, it often causes confusion to the ma-

chine learning and signal processing algorithms used by mo-

bile devices. One of the most promising directions today

in overcoming such challenges is deep learning [1, 2]. De-

velopments in this particular ﬁeld of machine learning have

caused the approaches and algorithms used in even mature

sensing tasks to be completely changed (e.g., speech [3] and

face [4] recognition). The study of deep learning usage for

mobile applications is in its early stages (e.g., [5, 6, 7, 8]),

but with promising initial results.

While deep learning o↵ers important beneﬁts to robust mod-

eling, its integration into mobiles and wearables is compli-

cated by the sizable system resource requirements these algo-

rithms introduce. Barriers exist in the form of memory, com-

putation and energy; these collectively prevent most deep

mo dels from executing directly on mobile hardware. Conse-

quently, existing examples of deep learning for smartphones

(e.g., speech recognition) remain largely cloud-assisted. A

number of negative side-e↵ects of this: ﬁrst, inference ex-

ecution becomes coupled to ﬂuctuating and unpredictable

network quality (e.g., latency, throughput); but more im-

p ortantly it exposes users to privacy dangers [9] as sensitive

data (e.g., audio) is processed o↵-device by a third party.

Allowing broader device-centric deep learning classiﬁcation

and prediction will need the development of brand-new tech-

niques for optimized resource sensitive execution. Up to this

p oint, the machine learning community has made excellent

progress in training-time optimizations and is only now be-

ginning to consider how these ideas transfer to inference-

time. Currently, most knowledge of deep learning algo-

rithm behavior on constrained devices is largely limited to

one-o↵ task-speciﬁc experiences (e.g. , [10, 11]). These sys-

tems are limited however is providing examples and evidence

that local execution is feasible, although they do provide

some insights for ways forward. What is required however

is a deeper study of these issues with an aim towards the

development of techniques like o↵-line model optimization

and runtime execution environments to match the resources

(e.g., memory, computation and energy) present on edge de-

vices like wearables and mobile phones.

In this work, we make an signiﬁcant progress into the de-

velopment of such algorithms and software by developing

a sparse coding- and convolution kernel separation-based

approach to optimizing deep learning model layers. This

framework – SparseSep – includes: (1) a compiler, Layer

Compression Compiler (LCC), in which unchanged deep mod-

els are inserted and then optimized; (2) a runtime frame-

work, Sparse Inference Runtime (SIR), that is able to exploit

the transformation of the model and realize radical reduc-

tions in computation, energy and memory usage; and (3) a

separator, Convolution Separation Runtime (CSR), that sig-

niﬁcantly reduces convolution operations. SparseSep tech-

niques can allow a developer to adopt existing o↵-the-shelf

deep models and scale their processor behavior such as, ac-

ceptable accuracy reduction and device limits, e.g., memory

and necessary execution time.

The core concept of this work is the hypothesis that compu-

tational and space complexity of the deep learning models

can be signiﬁcantly improved through the sparse representa-

tion of key layers and separation of convolution layers. Deep

mo dels often have millions of parameters spread throughout

a number of hidden layers that capture the robust represen-

tations of the data. By using theory from sparse dictionary

learning we investigate how the originally complex synap-

tic weight matrix can be captured in much smaller matri-

ces that require less computational and memory resources.

Critically, such theory a↵ords the ability of these sparsiﬁed

layers to be faithful to the originals with theoretical bounds

on important aspects such as, reconstruction error. This is

the ﬁrst time this approach has been used.

Our experiments include both DNNs and CNNs, the most

p opular forms of deep learning today. Tests span both au-

dio classiﬁcation tasks (ambient scene analysis and speaker

identiﬁcation) that are common in the mobile sensing sys-

tems; along with image tasks (object recognition) seen in

mobile vision devices like Google Glass. We ﬁnd that across

a range of experiments and devices SparseSep can allow deep

mo dels to execute using (on an average) only 26% of the orig-

inal energy while only sacriﬁcing approximately up to 5% of

the accuracy of these models. Speciﬁc examples include the

Snap dragon 400 processor running a deep learning model for

sp eaker identiﬁcation with a 4.1 times improvement in exe-

cution time, and a 17.6 times reduction in memory. Further-

more, we benchmark this deep learning version of speaker

identiﬁcation and ﬁnd, as expected, that the deep model is

much more robust than models conventionally used (such

as random forests). Most important of all, we examine de-

vice restrictions found on other common processors like the

Cortex M3 equipped with 32 KB of RAM. Not surprisingly

we ﬁnd these processors can not support any form of deep

learning model tested (due to restrictions to computation

and/or memory) – until we apply the SparseSep process.

The key scientiﬁc contributions of this research are:

• We propose, for the ﬁrst time, a sparse coding-based

approach to the optimization of deep learning inference

execution. We propose the use of convolution kernel sep-

aration technique to minimize overall computations of

CNNs on resource constrained platforms.

• To our knowledge, this work is the ﬁrst to demonstrate

very deep learning models (many layer DNNs and CNNs)

executing on severely constrained wearable hardware with

acceptable levels of performance (energy eﬃciency, com-

putation times).

• We design and implement a prototype that realizes our

Input

Fully connected Layers

Pooling Layer

Convolution Layer

Output

Layer

Convolution Layer

Figure 1: ACNNmixesconvolutionalandfeed-forwardlayers

approach to sparse dictionary learning and kernel sepa-

ration into deep learning model representation and infer-

ence execution. We implement necessary runtime com-

p onents for 4 embedded and mobile processor platforms.

• We experiment with four di↵erent CNN and DNN mod-

els under large audio and image datasets. We demon-

strate gains of the order of 11.3⇥ improvements in mem-

ory and 13.3⇥ in execution time under multiple exper-

iment conﬁgurations, while only su↵ering accuracy loss

of ⇡ 5%.

2. BACKGROUND

Popular deep learning architectures, such as Restricted Bo-

ltzmann Machines and Deep Belief Networks, share a com-

mon architecture. Often, they are collectively referred to as

Deep Neural Networks. Typically, a DNN contains a num-

b er of fully-connected layers, where each layer is composed

of a collection of nodes. Sensor measurements (e.g., audio,

images) are fed to the ﬁrst layer (the input layer). The ﬁ-

nal layer, also known as the output layer, corresponds to

inference classes with nodes capturing individual inference

categories (e.g., music or cat). Layers in between the input

and the output layer are referred to as hidden layers. The

degree of inﬂuence of units between layers vary on a pair-

wise basis determined by a weight value. Together with the

synaptic connections and inherent non-linearity, the hidden

layers transform raw data applied to the input layer into the

prediction classes captured in the output layer.

DNN-based inferencing follows a feed-forward algorithm that

op erates on sensor data segments in isolation. The algorithm

starts at the input layer and moves layer wise sequentially,

while updating the activation states of all nodes one by one.

The process ﬁnishes at the output layer when all nodes have

b een updated. Finally, the inferred class is identiﬁed as the

class corresponding to the output layer node with the great-

est state value.

CNNs are another popular class of dee p models that share

architectural similarities to DN Ns. As presented in Fig-

ure 1, a CNN mode l contains one or more convolutional

layers, pooling or sub-sampling layers, and fully connected

layers (equivalent to those used in DNNs). The objective of

these layers is to ext ract simple representations from the in-

put data, and then converting the representation into more

complex representations at much coarser resolutions within

the subsequent layers. For instance, ﬁrst convolutional ﬁl-

ters (with small kernel width) are applied to the input data

to capture local data properties. Next, max or min pooling

is applied to make the representations invariant to transla-

tions. Pooling operations can also be seen as a form of di-

mensionality re duction. Lastly, fully connected layers (i.e.,

a DNN) help a CNN to make predictions.

A CNN follows a sequential approach, as in DNNs, to gener-

ate isolated prediction at a time. Often in CNN-based pre-

dictions, sensor data is ﬁrst vectorized into two dimensions.

Next, data is passed through a series of convolution, pooling

and non-linear layers. The purpose of the convolution and

p ooling layers can be viewed as that of feature extractor be-

fore the fully connected layers are engaged. Inference then

pro ceeds exactly as previously described for DNNs until ul-

timately a classiﬁcation is reached.

Contrary to the shallow learning-based models, deep learn-

ing models are usually big an often contains more than mil-

lion parameters. H igh parameter space improves the capac-

ity of these models and they often outperform prior shallow

mo dels in terms of model generalization perform ances. How-

ever, the accuracy gains come at the expense of high energy

and memory costs. Although, high end wearables contain-

ing GPU, e.g., NVIDIA Tegra K1, can eﬃciently run deep

mo dels [12], the high resource demands make deep learning

mo dels unattractive for low end wearables. In this paper we

explore sparse factorizations and convolutional kernel sep-

arations to optimize the resource demands of deep models,

while maintaining the functional properties of the models.

3. DESIGN AND OPERATION

Beginning with this section, and spanning the following two,

we detail the design and algorithms of SparseSep.

3.1 Design Goals

SparseSep is shaped on the following objectives.

• No Re-training. The training of a large deep model is

the most time consuming and computationally demand-

ing task. For example, a large model such as GoogleNet

is trained using thousands of CPU cores [13], which is

b eyond the current capabilities of a single wearable de-

vice. In this work, we mainly focus on the inference

cycle of a deep model and p erform no training on the

resource-constrained devices. The training process also

requires a very large training dataset, often inaccessible

to the developers [14]. Thus new techniques are needed

to compress popular cloud-scale deep learning models to

run on wearable and IoT grade hardware gracefully.

• No Cloud O✏oading. As noted in §1, o✏oading

the execution of portions of deep models can result in

leaking sensitive sensor data. By keeping inference com-

pletely local, user and applications have greater privacy

protection as the data or any intermediate results never

leave the device.

• Target Low-resource Platforms. Even high-end

mobile processors (such as the Tegra K1 [15]) still require

careful resource use, when executing deep learning mod-

els. But in this class of processors, the gap in resources

is closing. However, for low-energy highly portable wear-

able processors that lack GPUs or have only a few MBs

of RAM (e.g., ARM Cortex M3 [16]), local execution of

deep models remains impractical. For this reason, Spars-

eSep turns to new ideas like the use of sparsiﬁcation of

weights and kernel separation, in search of the leaps in

resource eﬃciency required to make these low-end pro-

cessors viable.

• Minimize Model Changes. Deep models must un-

dergo some degree of change to enable their operation

on wearable hardware. However, a core tenet of Spars-

eSep is to minimize the extent of such modiﬁcations

and remain functionally faithful to the initial model ar-

chitecture. For this reason, we frame the problem as

one of deep model compression (originally formulated by

the m achine learning community), where model layer ar-

rangements remain unchanged and only per-layer con-

nections are changed through the insertion of additional

summarizing layers. Thus, the degree of changes made

by SparseSep is a key metric that is minimized during

mo del processing.

• Adopt Principled Approaches. Ad-ho c methods

to al ter a deep model – such as ‘specializing’ a model to

recognize a smaller set of activities/contexts, or chang-

ing layer/unit parameters to generate a desired resource

consumption proﬁle – are dangerous as they violate the

domain experience of the modeling experts. Methods like

sparse coding [17] and model compression [18] are sup-

p orted by theoretical analysis [19]. Assessing if a model

can be altered solely by changes in the accuracy metric

can be dangerous and can potentially hurt, for example,

its ability to generalize.

3.2 Overview

We now brieﬂy outline the core approach of SparseSep to

optimize the architecture of large deep learning models so

that they meet the constraints of target wearable devices.

In §4 we provide the necessary theory and algorithms of this

pro cess, but we begin here with the key ideas.

The inference pipeline of a deep learning model is domi-

nated by a series of matrix computations, especially multi-

plications, and convolutions. Attempts have been made to

optimize the total number of computations by low-rank fac-

torizing of the weight matrix or decomposing convolutional

kernels into separable ﬁlters in an ad-hoc manner. Both

weight factorization and kernel separation, however, require

mo diﬁcation in the architecture of the model by inserting

a new layer and updating weight components (see §4.1 and

§4.4). Although, counter-intuitive, the insertion of a new

layer only achieves computat ional eﬃciency under certain

conditions, which depends on, e.g., the size of the newly

inserted layer, the size of the original weight matrix, and

the size of convolutional kernels. In §4.1, §4.2 and §4.4 we

derive and show the c onditions unde r which computational

and memory eﬃciencies can be achieved.

In this paper, we postulate that the computational and space

eﬃciency of the deep learning models can b e further im-

proved by adding sparsity constraints to the factorization

pro cess. Accordingly, we propose a sparse dictionary learn-

ing approach to enforcing a sparse factorization of the weight

matrix (see §4.3). In §5.2 we show that under speciﬁc spar-

sity conditions the resource scalability of the proposed ap-

proach is signiﬁcantly better than existing approaches.

The weight factorization approach signiﬁcantly reduces the

memory footprint of both DNN and CNN models by opti-

mizing the parameter space of the fully connected layers.

The factorization also helps to reduce the overall number of

op erations needed and improves the inference time. How-

ever, the inference time improvement due to factorization

is much more pronounced for DNNs than CNNs. This is

primarily due to the fact that a major portion of the CNN-

based inference time (often over 95%) is spent on performing

convolution operations [12, 20], where the layer factorization

technique has no inﬂuence. To overcome this limitation, we

also propose a runtime convolution kernel separation tech-

nique that optimizes the convolution operations to reduce

剩余13页未读，继续阅读

评论收藏

内容反馈

马李灵珊

粉丝: 34
资源: 297

用于可穿戴设备上资源受限推理的深度学习层的稀疏化和分离1

评论0

最新资源

用于可穿戴设备上资源受限推理的深度学习层的稀疏化和分离1

评论0

在资源受限的边缘设备上使用深度学习方法了解传感器数据

EdgeMI：资源受限条件下深度学习多设备协同推理.pdf

matlab的代码在相机上实现-sparse-depth-sensing:IROS'16/IJRR“稀疏感知，用于资源受限的深度重构”

资源受限设备开发要点提示

论文研究-基于改进蚁群优化的多目标资源受限项目调度方法.pdf

深度学习课程作业：基于常识推理的细粒度目标检测.docx

行业分类-设备装置-用于受限区域平坦化的设备和方法.zip

基于深度学习的泊松散弹噪声受限光子计数通信系统速率优化研究.pdf

解析深度学习：语音识别实践

快速和简单的资源受限的深度网络结构学习-Python开发

深度学习算法神经网络架构_受限玻尔兹曼机_编程项目案例解析实例详解课程教程.pdf

深度学习方法及应用 完整版

深度学习视频讲座：2017年最新深度学习视频讲座

结合纹理特征和深度学习的行人检测算法

论文研究-一种求解多目标资源受限项目调度的遗传算法.pdf

资源受限项目调度问题的研究综述

论文研究-深度学习的研究进展与发展.pdf

论文研究-资源受限项目调度问题的改进文化微粒群算法求解.pdf

BurpLoaderKeygen.jar.zip

网络安全+《2024网络安全报告》

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

最新资源

深度学习方法及应用完整版