【免费】IONN从移动设备到边缘服务器的神经网络计算的增量卸载1资源-CSDN文库

需积分: 0 194 浏览量 2022-08-04 16:20:47 上传评论收藏 1.82MB PDF 举报

资源详情

资源评论

资源推荐

IONN: Incremental Oloading of Neural Network Computations

from Mobile Devices to Edge Servers

Hyuk-Jin Jeong

Seoul National University

Seoul, South Korea

jinevening@snu.ac.kr

Hyeon-Jae Lee

Seoul National University

Seoul, South Korea

thlhjq@snu.ac.kr

Chang Hyun Shin

Seoul National University

Seoul, South Korea

schyun9212@snu.ac.kr

Soo-Mook Moon

Seoul National University

Seoul, South Korea

smoon@snu.ac.kr

ABSTRACT

Current wisdom to run computation-intensive deep neural network

(DNN) on resource-constrained mobile devices is allowing the mo-

bile clients to make DNN queries to central cloud servers, where

the corresponding DNN models are pre-installed. Unfortunately,

this centralized, cloud-based DNN ooading is not appropriate for

emerging decentralized cloud infrastructures (e.g., cloudlet, edge/fog

servers), where the client may send computation requests to any

nearby server located at the edge of the network. To use such a

generic edge server for DNN execution, the client should rst up-

load its DNN model to the server, yet it can seriously delay query

processing due to long uploading time. This paper proposes IONN

(Incremental Ooading of Neural Network), a partitioning-based

DNN ooading technique for edge computing. IONN divides a

client’s DNN model into a few partitions and uploads them to the

edge server one by one. The server incrementally builds the DNN

model as each DNN partition arrives, allowing the client to start of-

oading partial DNN execution even before the entire DNN model

is uploaded. To decide the best DNN partitions and the uploading

order, IONN uses a novel graph-based algorithm. Our experiments

show that IONN signicantly improves query performance in real-

istic hardware congurations and network conditions.

CCS CONCEPTS

• Human-centered computing → Mobile computing

;

• Com-

puting methodologies → Distributed computing methodolo-

gies; Neural networks;

KEYWORDS

Mobile computing, edge computing, computation ooading, neural

network, cyber foraging

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA

2018 Copyright held by the owner/author(s). Publication rights licensed to Associa-

tion for Computing Machinery.

ACM ISBN 978-1-4503-6011-1/18/10.. .$15.00

https://doi.org/10.1145/3267809.3267828

ACM Reference Format:

Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon.

2018. IONN: Incremental Ooading of Neural Network Computations from

Mobile Devices to Edge Servers. In Proceedings of SoCC ’18: ACM Symposium

on Cloud Computing, Carlsbad, CA, USA, October 11–13, 2018 (SoCC ’18),

11 pages.

https://doi.org/10.1145/3267809.3267828

1 INTRODUCTION

In recent years, Deep Neural Network (DNN) has shown remarkable

achievements in the eld of computer vision [

], natural language

processing [

], speech recognition [

] and articial intelligence

[

]. Owing to the success of DNN, new applications using DNN

are becoming increasingly popular in mobile devices. However,

DNN is known to be extremely computation-intensive, such that

a mobile device with limited hardware has diculties in running

the DNN computations by itself. Some mobile devices may handle

DNN computations with specialized hardware (e.g., GPU, ASIC)

[

] [

], but this is not a general option for today’s low-powered,

compact mobile devices (e.g., wearables or IoT devices).

Current wisdom to run DNN applications on such resource-

constrained devices is to ooad DNN computations to central cloud

servers. For example, mobile clients can send their machine learning

(ML) queries (requests for execution) to the clouds of commercial

ML services [

] [

]. These services often provide servers

where pre-trained DNN models or client’s DNN models are installed

in advance, so that the servers can execute the models on behalf of

the client. More recently, there have been research eorts that install

the same DNN models at the client as well as at the server, and

execute the models partly by the client and partly by the server to

trade-o accuracy/resource usage [

] or to improve performance/

energy savings [

]. Both approaches require the pre-installation

of DNN models at the dedicated servers.

Unfortunately, the previous approaches are not appropriate for

the generic use of decentralized cloud infrastructures (e.g., cloudlet

[

], fog nodes [

], edge servers [

]), where the client can send

its ML queries to any nearby generic servers located at the edge

of the network (referred to as cyber foraging [

]). In this edge

computing environment, it is not realistic to pre-install DNN models

at the servers for use by the client, since we cannot know which

servers will be used at runtime, especially when the client is on

the move. Rather, on-demand installation by uploading the client’s

401

SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon

DNN model to the server would be more practical. A critical issue of

the on-demand DNN installation is that the overhead of uploading

the DNN model is non-trivial, making the client wait for a long

time to use the edge server (see Section 2).

To solve this issue, we propose a new ooading approach, In-

cremental Ooading of Neural Network (IONN). IONN divides a

client’s DNN model into several partitions and determines the order

of uploading them to the server. The client uploads the partitions

to the server one by one, instead of sending the entire DNN model

at once. The server incrementally builds the DNN model as each

DNN partition arrives, allowing the client to start ooading of

DNN execution even before the entire DNN model is uploaded.

That is, when there is a DNN query, the server will execute those

partitions uploaded so far, while the client will execute the rest of

the partitions, allowing collaborative execution. This incremental,

partial DNN ooading enables mobile clients to use edge servers

more quickly, improving the query performance.

As far as we know, IONN is the rst work on partitioning-based

DNN ooading in the context of cyber foraging. To decide the

best DNN partitions and the uploading order, we introduce a novel

heuristic algorithm based on graph data structure, which expresses

the process of collaborative DNN execution. In the proposed graph,

IONN derives the rst DNN partition to upload by using a shortest

path algorithm, which is expected to get the best query performance

initially. To derive the next DNN partition to upload, IONN updates

the edge weights of the graph and searches for the new shortest path.

By repeating this process, IONN can derive a complete uploading

plan for the DNN partitions, which ensures that the DNN query

performance increases as more partitions are uploaded to the server

and eventually converges to the best performance, expected to

achieve with collaborative DNN execution.

We implemented IONN based on cae DNN framework [

]. Ex-

perimental results show that IONN promptly improves DNN query

performance by ooading partial DNN execution. Also, IONN pro-

cesses more DNN queries while uploading the DNN model, making

the embedded client consume energy more eciently, compared

to the simple all-at-once approach (i.e., uploading the entire DNN

model at once).

The rest of this paper is organized as follows. Section 2 illustrates

how much overhead is involved in uploading a DNN model for

edge computing. In section 3, we briey review DNN and previous

approaches to DNN ooading. In section 4, we explain how IONN

works. Section 5 depicts our partitioning algorithm in detail. We

evaluate our work in section 6 and show related works in section 7.

Finally, we conclude in section 8.

2 MOTIVATION

In this section, we describe a motivating example where the over-

head of uploading a DNN model obstructs the use of decentralized

cloud servers (throughout this paper, we will refer to the decentral-

ized cloud servers as edge servers).

Scenario: A man with poor eyesight wears smart glasses (with-

out powerful GPU) and rides the subway. In the crowded subway

station, he can get help from his smart glasses to identify objects

around him. Fortunately, edge servers are deployed over the station

Client

Cloud Server

1.3 sec/query

(ARM CPU)

Edge Server

0.001 sec/query

(NVIDIA GPU)

0.001 sec/query

(NVIDIA GPU)

Uploading AlexNet takes

~24 seconds in

80 Mbps Wireless network

Subway station

Figure 1: Example scenario of using remote servers to of-

oad DNN computation for image recognition.

(like Wi-Fi Hotspots), so the smart glasses can use them to accel-

erate the object recognition service by ooading complex DNN

computations to a nearby server.

The above scenario is a typical case of mobile cognitive assistance

[

]. The cognitive assistance on the smart glasses can help the

user by whispering the name of objects seen on the camera. For

this, it will perform image recognition on the video frames by using

DNNs [

] [

]. We performed a quick experiment to check the

feasibility of using edge servers for this scenario, based on realistic

hardware and network conditions.

Our client device is an embedded board Odroid XU4 [

] with an

ARM big.LITTLE CPU (2.0GHz/1.5GHz 4 cores) and 2GB memory.

Our edge server has an x86 CPU (3.6GHz 4 cores), GTX 1080 Ti GPU,

and 32GB memory. We assumed that the client is connected to Wi-

Fi with a strong signal, whose bandwidth is measured to be about 80

Mbps. We experimented with AlexNet [

], a representative DNN

for image recognition.

Figure 1 shows the result. Local execution on the smart glasses

takes 1.3 seconds to handle one DNN query to recognize an image.

Although the CPU on our client board is competitive (the same one

used in Samsung Galaxy S5 smartphone), 1.3 seconds per query

seems to be barely usable, especially when our smart glasses must

recognize several images per second.

If we employ the edge server for ooading DNN queries, one

query will take about

∼

1 ms for execution, which would make a

real-time service. However, the DNN model should be available at

the edge server in advance to make the edge server ready to execute

the queries.

A popular technique to use a random edge server is VM (virtual

machine)-based provisioning, where a mobile client uploads a service

program and its execution environment, encapsulated with VM,

to the edge server (or the edge server can download them from

the cloud), so that the server can run the service program [

];

some recent studies have proposed using a lightweight container

technology instead of VM [

] [

]. If we use these techniques for

the purpose of DNN ooading, we would need to upload a VM (or

a container) image that includes a DNN model, a DNN framework,

and other libraries from the client to the edge server. However,

today’s commercial DNN framework, such as cae [

], tensorow

402

IONN: Incremental Oloading of Neural Network Computations SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA

[

], or pytorch [

], requires a substantial space (more than 3 GB)

so it is not realistic to upload such an image on demand at runtime.

Rather, it is more reasonable for a VM (or a container) image for the

DNN framework to be pre-installed at the edge server in advance,

so the client uploads only the client’s DNN model to the edge server

on demand.

To check the overhead of uploading a DNN model, we measured

the time to transmit the DNN model through wireless network. It

takes about 24 seconds to upload the AlexNet model, meaning that

the smart glasses should execute the queries locally for 24 seconds

before using the edge server, thus no improvement in the meantime.

Of course, worse network conditions would further increase the

uploading time.

If we used a central cloud server with the same hardware where

the user’s DNN model is installed in advance, we would have ob-

tained the same DNN execution time, yet with a longer network

latency. For example, if we access a cloud server in our local region

(East Asia) [

], the network latency would be about 60 ms, com-

pared to 1 ms of our edge server due to multi-hop transmission.

Also, it is known that the multi-hop transmission to distant cloud

datacenters causes high jitters, which may hurt the real-time user

experience [32].

Although edge servers are attractive alternatives for running

DNN queries, our experimental result indicates that users should

wait quite a while to use an edge server due to the time to upload

a DNN model. Especially, a highly-mobile user, who can leave the

service area of an edge server shortly, will suer heavily from the

problem; if the client moves to another location before it completes

uploading its DNN, the client will waste its battery for network

transmission but never use the edge server. To solve this issue,

we propose IONN, which allows the client to ooad partial DNN

execution to the server while the DNN model is being uploaded.

3 BACKGROUND

Before explaining IONN, we briey review a DNN and its variant,

Convolutional Neural Network (CNN), typically used for image pro-

cessing. We also describe some previous approaches to ooading

DNN computations to remote servers.

3.1 Deep Neural Network

Deep neural network (DNN) can be viewed as a directed graph

whose nodes are layers. Each layer in DNN performs its opera-

tion on the input matrices and passes the output matrices to the

next layer (in other words, each layer is executed). Some layers

just perform the same operations with xed parameters, but the

others contain trainable parameters. The trainable parameters are

iteratively updated according to learning algorithms using training

data (training). After trained, the DNN model can be deployed as a

le and used to infer outputs for new input data (inference). DNN

frameworks, such as cae [

], can load a pre-trained DNN from

the model le and perform inference for new data by executing

the DNN. In this paper, we focus on ooading computations for

We measured the size of a docker image for each DNN framework (GPU-version) from

dockerhub, which contains all libraries to run the framework as well as the framework

itself.

inference, because training requires much more resources than in-

ference, hence typically performed on powerful cloud datacenters.

A CNN is a DNN that includes convolution layers, widely used

to classify an image into one of pre-determined classes. The image

classication in the CNN commonly proceeds as follows. When

an image is given to the CNN, the CNN extracts features from the

image using convolution (conv ) layers and pooling (po ol) layers.

The conv/pool layers can be placed in series [

] or in parallel [

]

[

]. Using the features, a fully-connected (fc) layer calculates the

scores of each output class, and a softmax layer normalizes the

scores. The normalized scores are interpreted as the possibilities of

each output class where the input image belongs. There are many

other types of layers (e.g., about 50 types of layers are currently

implemented in cae [

]), but explaining all of them is beyond the

scope of this paper.

3.2 Oloading of DNN Computations

Many cloud providers are oering machine learning (ML) services

[

] [

], which perform computation-intensive ML algorithms

(including DNN) on behalf of clients. They often provide an appli-

cation programming interface (API) to app developers so that the

developers can implement ML applications using the API. Typically,

the API allows a user to make a request (query) for DNN compu-

tation by simply sending an input matrix to the service provider’s

clouds where DNN models are pre-installed. The server in the

clouds executes the corresponding DNN model in response to the

query and sends the result back to the client. Unfortunately, this

centralized, cloud-only approach is not appropriate for our scenario

of the generic use of edge servers since pre-installing DNN models

at the edge servers is not straightforward.

Recent studies have proposed to execute DNN using both the

client and the server [

] [

]. NeuroSurgeon is the latest work on

the collaborative DNN execution using a DNN partitioning scheme

[

]. NeuroSurgeon creates a prediction model for DNN, which

estimates the execution time and the energy consumption for each

layer, by performing regression analysis using the DNN execution

proles. Using the prediction model and the runtime information,

NeuroSurgeon dynamically partitions a DNN into the front part

and the rear part. The client executes the front part and sends its

output matrices to the server. The server runs the rear part with the

delivered matrices and sends the new output matrices back to the

client. To decide the partitioning point, NeuroSurgeon estimates the

expected query execution time for every possible partitioning point

and nds the best one. Their experiments show that collaborative

DNN execution between the client and the server improves the

performance, compared to the server-only approach.

Although collaborative DNN execution in NeuroSurgeon was ef-

fective, it is still based on the cloud servers where the DNN model is

pre-installed, thus not well suited for our edge computing scenario;

it does not upload the DNN model nor its partitioning algorithm

considers the uploading overhead. However, collaborative execu-

tion gives a useful insight for the DNN edge computing. That is, we

can partition the DNN model and upload each partition incremen-

tally, so that the client and the server can execute the partitions

collaboratively, even before the whole model is uploaded. Start-

ing from this insight, we designed the incremental ooading of

403

剩余10页未读，继续阅读

评论收藏

内容反馈

奔跑的楠子

粉丝: 24
资源: 299

IONN 从移动设备到边缘服务器的神经网络计算的增量卸载1

评论0

最新资源

IONN 从移动设备到边缘服务器的神经网络计算的增量卸载1

评论0

电信设备-联合考虑延迟和能量消耗的移动边缘计算系统计算任务的高效卸载方法.zip

基于神经网络的增量式模型算法控制

随机特征映射的四层神经网络及其增量学习.pdf

一种改进的神经网络增量学习算法.caj

基于增量式GHSOM神经网络模型的入侵检测研究

基于增量式卷积神经网络的入侵检测方法.pdf

基于BP神经网络的增量式PID控制

基于人工神经网络模型预测2017_2018年成都市危险废物增量研究.pdf

自组织增量神经网络matlab代码

拔模增量计算工具

基于RBF神经网络的集成增量学习算法.pdf

移动云环境下基于增量同步的压缩文件同步新算法

Pgsql跨网络数据库增量同步

RBF神经网络增量式PID自动转向控制系统设计.pdf

自组织增量学习神经网络综述_邱天宇1

边缘计算报告

具有拓扑学习神经网络的增量非参数回归的高斯混合框架

跨服务器增量同步mysql表数据的shell实现

拔模增量计算工具.exe

可变环境中并行计算卸载的自适应应用程序组件映射

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

iNodeClient-MacOS-7.30(E0630).tar.gz

最新资源