没有合适的资源?快使用搜索试试~ 我知道了~
IONN 从移动设备到边缘服务器的神经网络计算的增量卸载1
需积分: 0 1 下载量 194 浏览量
2022-08-04
16:20:47
上传
评论
收藏 1.82MB PDF 举报
温馨提示
试读
11页
IONN: Incremental Offloading of Neural Network Computationsfrom Mobile Devices t
资源详情
资源评论
资源推荐
IONN: Incremental Oloading of Neural Network Computations
from Mobile Devices to Edge Servers
Hyuk-Jin Jeong
Seoul National University
Seoul, South Korea
jinevening@snu.ac.kr
Hyeon-Jae Lee
Seoul National University
Seoul, South Korea
thlhjq@snu.ac.kr
Chang Hyun Shin
Seoul National University
Seoul, South Korea
schyun9212@snu.ac.kr
Soo-Mook Moon
Seoul National University
Seoul, South Korea
smoon@snu.ac.kr
ABSTRACT
Current wisdom to run computation-intensive deep neural network
(DNN) on resource-constrained mobile devices is allowing the mo-
bile clients to make DNN queries to central cloud servers, where
the corresponding DNN models are pre-installed. Unfortunately,
this centralized, cloud-based DNN ooading is not appropriate for
emerging decentralized cloud infrastructures (e.g., cloudlet, edge/fog
servers), where the client may send computation requests to any
nearby server located at the edge of the network. To use such a
generic edge server for DNN execution, the client should rst up-
load its DNN model to the server, yet it can seriously delay query
processing due to long uploading time. This paper proposes IONN
(Incremental Ooading of Neural Network), a partitioning-based
DNN ooading technique for edge computing. IONN divides a
client’s DNN model into a few partitions and uploads them to the
edge server one by one. The server incrementally builds the DNN
model as each DNN partition arrives, allowing the client to start of-
oading partial DNN execution even before the entire DNN model
is uploaded. To decide the best DNN partitions and the uploading
order, IONN uses a novel graph-based algorithm. Our experiments
show that IONN signicantly improves query performance in real-
istic hardware congurations and network conditions.
CCS CONCEPTS
• Human-centered computing → Mobile computing
;
• Com-
puting methodologies → Distributed computing methodolo-
gies; Neural networks;
KEYWORDS
Mobile computing, edge computing, computation ooading, neural
network, cyber foraging
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA
©
2018 Copyright held by the owner/author(s). Publication rights licensed to Associa-
tion for Computing Machinery.
ACM ISBN 978-1-4503-6011-1/18/10.. .$15.00
https://doi.org/10.1145/3267809.3267828
ACM Reference Format:
Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon.
2018. IONN: Incremental Ooading of Neural Network Computations from
Mobile Devices to Edge Servers. In Proceedings of SoCC ’18: ACM Symposium
on Cloud Computing, Carlsbad, CA, USA, October 11–13, 2018 (SoCC ’18),
11 pages.
https://doi.org/10.1145/3267809.3267828
1 INTRODUCTION
In recent years, Deep Neural Network (DNN) has shown remarkable
achievements in the eld of computer vision [
22
], natural language
processing [
35
], speech recognition [
11
] and articial intelligence
[
34
]. Owing to the success of DNN, new applications using DNN
are becoming increasingly popular in mobile devices. However,
DNN is known to be extremely computation-intensive, such that
a mobile device with limited hardware has diculties in running
the DNN computations by itself. Some mobile devices may handle
DNN computations with specialized hardware (e.g., GPU, ASIC)
[
25
] [
4
], but this is not a general option for today’s low-powered,
compact mobile devices (e.g., wearables or IoT devices).
Current wisdom to run DNN applications on such resource-
constrained devices is to ooad DNN computations to central cloud
servers. For example, mobile clients can send their machine learning
(ML) queries (requests for execution) to the clouds of commercial
ML services [
26
] [
2
] [
10
]. These services often provide servers
where pre-trained DNN models or client’s DNN models are installed
in advance, so that the servers can execute the models on behalf of
the client. More recently, there have been research eorts that install
the same DNN models at the client as well as at the server, and
execute the models partly by the client and partly by the server to
trade-o accuracy/resource usage [
14
] or to improve performance/
energy savings [
20
]. Both approaches require the pre-installation
of DNN models at the dedicated servers.
Unfortunately, the previous approaches are not appropriate for
the generic use of decentralized cloud infrastructures (e.g., cloudlet
[
33
], fog nodes [
3
], edge servers [
32
]), where the client can send
its ML queries to any nearby generic servers located at the edge
of the network (referred to as cyber foraging [
31
]). In this edge
computing environment, it is not realistic to pre-install DNN models
at the servers for use by the client, since we cannot know which
servers will be used at runtime, especially when the client is on
the move. Rather, on-demand installation by uploading the client’s
401
SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon
DNN model to the server would be more practical. A critical issue of
the on-demand DNN installation is that the overhead of uploading
the DNN model is non-trivial, making the client wait for a long
time to use the edge server (see Section 2).
To solve this issue, we propose a new ooading approach, In-
cremental Ooading of Neural Network (IONN). IONN divides a
client’s DNN model into several partitions and determines the order
of uploading them to the server. The client uploads the partitions
to the server one by one, instead of sending the entire DNN model
at once. The server incrementally builds the DNN model as each
DNN partition arrives, allowing the client to start ooading of
DNN execution even before the entire DNN model is uploaded.
That is, when there is a DNN query, the server will execute those
partitions uploaded so far, while the client will execute the rest of
the partitions, allowing collaborative execution. This incremental,
partial DNN ooading enables mobile clients to use edge servers
more quickly, improving the query performance.
As far as we know, IONN is the rst work on partitioning-based
DNN ooading in the context of cyber foraging. To decide the
best DNN partitions and the uploading order, we introduce a novel
heuristic algorithm based on graph data structure, which expresses
the process of collaborative DNN execution. In the proposed graph,
IONN derives the rst DNN partition to upload by using a shortest
path algorithm, which is expected to get the best query performance
initially. To derive the next DNN partition to upload, IONN updates
the edge weights of the graph and searches for the new shortest path.
By repeating this process, IONN can derive a complete uploading
plan for the DNN partitions, which ensures that the DNN query
performance increases as more partitions are uploaded to the server
and eventually converges to the best performance, expected to
achieve with collaborative DNN execution.
We implemented IONN based on cae DNN framework [
19
]. Ex-
perimental results show that IONN promptly improves DNN query
performance by ooading partial DNN execution. Also, IONN pro-
cesses more DNN queries while uploading the DNN model, making
the embedded client consume energy more eciently, compared
to the simple all-at-once approach (i.e., uploading the entire DNN
model at once).
The rest of this paper is organized as follows. Section 2 illustrates
how much overhead is involved in uploading a DNN model for
edge computing. In section 3, we briey review DNN and previous
approaches to DNN ooading. In section 4, we explain how IONN
works. Section 5 depicts our partitioning algorithm in detail. We
evaluate our work in section 6 and show related works in section 7.
Finally, we conclude in section 8.
2 MOTIVATION
In this section, we describe a motivating example where the over-
head of uploading a DNN model obstructs the use of decentralized
cloud servers (throughout this paper, we will refer to the decentral-
ized cloud servers as edge servers).
Scenario: A man with poor eyesight wears smart glasses (with-
out powerful GPU) and rides the subway. In the crowded subway
station, he can get help from his smart glasses to identify objects
around him. Fortunately, edge servers are deployed over the station
Client
Cloud Server
1.3 sec/query
(ARM CPU)
Edge Server
0.001 sec/query
(NVIDIA GPU)
0.001 sec/query
(NVIDIA GPU)
Uploading AlexNet takes
~24 seconds in
80 Mbps Wireless network
Subway station
Figure 1: Example scenario of using remote servers to of-
oad DNN computation for image recognition.
(like Wi-Fi Hotspots), so the smart glasses can use them to accel-
erate the object recognition service by ooading complex DNN
computations to a nearby server.
The above scenario is a typical case of mobile cognitive assistance
[
12
]. The cognitive assistance on the smart glasses can help the
user by whispering the name of objects seen on the camera. For
this, it will perform image recognition on the video frames by using
DNNs [
22
] [
29
]. We performed a quick experiment to check the
feasibility of using edge servers for this scenario, based on realistic
hardware and network conditions.
Our client device is an embedded board Odroid XU4 [
30
] with an
ARM big.LITTLE CPU (2.0GHz/1.5GHz 4 cores) and 2GB memory.
Our edge server has an x86 CPU (3.6GHz 4 cores), GTX 1080 Ti GPU,
and 32GB memory. We assumed that the client is connected to Wi-
Fi with a strong signal, whose bandwidth is measured to be about 80
Mbps. We experimented with AlexNet [
22
], a representative DNN
for image recognition.
Figure 1 shows the result. Local execution on the smart glasses
takes 1.3 seconds to handle one DNN query to recognize an image.
Although the CPU on our client board is competitive (the same one
used in Samsung Galaxy S5 smartphone), 1.3 seconds per query
seems to be barely usable, especially when our smart glasses must
recognize several images per second.
If we employ the edge server for ooading DNN queries, one
query will take about
∼
1 ms for execution, which would make a
real-time service. However, the DNN model should be available at
the edge server in advance to make the edge server ready to execute
the queries.
A popular technique to use a random edge server is VM (virtual
machine)-based provisioning, where a mobile client uploads a service
program and its execution environment, encapsulated with VM,
to the edge server (or the edge server can download them from
the cloud), so that the server can run the service program [
32
];
some recent studies have proposed using a lightweight container
technology instead of VM [
23
] [
27
]. If we use these techniques for
the purpose of DNN ooading, we would need to upload a VM (or
a container) image that includes a DNN model, a DNN framework,
and other libraries from the client to the edge server. However,
today’s commercial DNN framework, such as cae [
19
], tensorow
402
IONN: Incremental Oloading of Neural Network Computations SoCC ’18, October 11–13, 2018, Carlsbad, CA, USA
[
8
], or pytorch [
28
], requires a substantial space (more than 3 GB)
1
,
so it is not realistic to upload such an image on demand at runtime.
Rather, it is more reasonable for a VM (or a container) image for the
DNN framework to be pre-installed at the edge server in advance,
so the client uploads only the client’s DNN model to the edge server
on demand.
To check the overhead of uploading a DNN model, we measured
the time to transmit the DNN model through wireless network. It
takes about 24 seconds to upload the AlexNet model, meaning that
the smart glasses should execute the queries locally for 24 seconds
before using the edge server, thus no improvement in the meantime.
Of course, worse network conditions would further increase the
uploading time.
If we used a central cloud server with the same hardware where
the user’s DNN model is installed in advance, we would have ob-
tained the same DNN execution time, yet with a longer network
latency. For example, if we access a cloud server in our local region
(East Asia) [
10
], the network latency would be about 60 ms, com-
pared to 1 ms of our edge server due to multi-hop transmission.
Also, it is known that the multi-hop transmission to distant cloud
datacenters causes high jitters, which may hurt the real-time user
experience [32].
Although edge servers are attractive alternatives for running
DNN queries, our experimental result indicates that users should
wait quite a while to use an edge server due to the time to upload
a DNN model. Especially, a highly-mobile user, who can leave the
service area of an edge server shortly, will suer heavily from the
problem; if the client moves to another location before it completes
uploading its DNN, the client will waste its battery for network
transmission but never use the edge server. To solve this issue,
we propose IONN, which allows the client to ooad partial DNN
execution to the server while the DNN model is being uploaded.
3 BACKGROUND
Before explaining IONN, we briey review a DNN and its variant,
Convolutional Neural Network (CNN), typically used for image pro-
cessing. We also describe some previous approaches to ooading
DNN computations to remote servers.
3.1 Deep Neural Network
Deep neural network (DNN) can be viewed as a directed graph
whose nodes are layers. Each layer in DNN performs its opera-
tion on the input matrices and passes the output matrices to the
next layer (in other words, each layer is executed). Some layers
just perform the same operations with xed parameters, but the
others contain trainable parameters. The trainable parameters are
iteratively updated according to learning algorithms using training
data (training). After trained, the DNN model can be deployed as a
le and used to infer outputs for new input data (inference). DNN
frameworks, such as cae [
19
], can load a pre-trained DNN from
the model le and perform inference for new data by executing
the DNN. In this paper, we focus on ooading computations for
1
We measured the size of a docker image for each DNN framework (GPU-version) from
dockerhub, which contains all libraries to run the framework as well as the framework
itself.
inference, because training requires much more resources than in-
ference, hence typically performed on powerful cloud datacenters.
A CNN is a DNN that includes convolution layers, widely used
to classify an image into one of pre-determined classes. The image
classication in the CNN commonly proceeds as follows. When
an image is given to the CNN, the CNN extracts features from the
image using convolution (conv ) layers and pooling (po ol) layers.
The conv/pool layers can be placed in series [
22
] or in parallel [
36
]
[
15
]. Using the features, a fully-connected (fc) layer calculates the
scores of each output class, and a softmax layer normalizes the
scores. The normalized scores are interpreted as the possibilities of
each output class where the input image belongs. There are many
other types of layers (e.g., about 50 types of layers are currently
implemented in cae [
19
]), but explaining all of them is beyond the
scope of this paper.
3.2 Oloading of DNN Computations
Many cloud providers are oering machine learning (ML) services
[
26
] [
2
] [
10
], which perform computation-intensive ML algorithms
(including DNN) on behalf of clients. They often provide an appli-
cation programming interface (API) to app developers so that the
developers can implement ML applications using the API. Typically,
the API allows a user to make a request (query) for DNN compu-
tation by simply sending an input matrix to the service provider’s
clouds where DNN models are pre-installed. The server in the
clouds executes the corresponding DNN model in response to the
query and sends the result back to the client. Unfortunately, this
centralized, cloud-only approach is not appropriate for our scenario
of the generic use of edge servers since pre-installing DNN models
at the edge servers is not straightforward.
Recent studies have proposed to execute DNN using both the
client and the server [
20
] [
14
]. NeuroSurgeon is the latest work on
the collaborative DNN execution using a DNN partitioning scheme
[
20
]. NeuroSurgeon creates a prediction model for DNN, which
estimates the execution time and the energy consumption for each
layer, by performing regression analysis using the DNN execution
proles. Using the prediction model and the runtime information,
NeuroSurgeon dynamically partitions a DNN into the front part
and the rear part. The client executes the front part and sends its
output matrices to the server. The server runs the rear part with the
delivered matrices and sends the new output matrices back to the
client. To decide the partitioning point, NeuroSurgeon estimates the
expected query execution time for every possible partitioning point
and nds the best one. Their experiments show that collaborative
DNN execution between the client and the server improves the
performance, compared to the server-only approach.
Although collaborative DNN execution in NeuroSurgeon was ef-
fective, it is still based on the cloud servers where the DNN model is
pre-installed, thus not well suited for our edge computing scenario;
it does not upload the DNN model nor its partitioning algorithm
considers the uploading overhead. However, collaborative execu-
tion gives a useful insight for the DNN edge computing. That is, we
can partition the DNN model and upload each partition incremen-
tally, so that the client and the server can execute the partitions
collaboratively, even before the whole model is uploaded. Start-
ing from this insight, we designed the incremental ooading of
403
剩余10页未读,继续阅读
奔跑的楠子
- 粉丝: 24
- 资源: 299
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0