没有合适的资源?快使用搜索试试~ 我知道了~
一种基于委员会机制的分散式联邦学习框架_A Decentralized Federated Learning Framework
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
5星 · 超过95%的资源 1 下载量 132 浏览量
2022-01-19
15:37:30
上传
评论
收藏 6.09MB PDF 举报
温馨提示
试读
20页
一种基于委员会机制的分散式联邦学习框架_A Decentralized Federated Learning Framework via Committee Mechanism with Convergence Guarantee.pdf
资源详情
资源评论
资源推荐
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
A Decentralized Federated Learning Framework
via Committee Mechanism with Convergence
Guarantee
Chunjiang Che, Xiaoli Li, Student Member, IEEE,
Chuan Chen, Member, IEEE, Xiaoyu He, and Zibin Zheng, Senior Member, IEEE
Abstract—Federated learning allows multiple participants to collaboratively train an efficient model without exposing data privacy.
However, this distributed machine learning training method is prone to attacks from Byzantine clients, which interfere with the training
of the global model by modifying the model or uploading the false gradient. In this paper, we propose a novel serverless federated
learning framework Committee Mechanism based Federated Learning (CMFL), which can ensure the robustness of the algorithm with
convergence guarantee. In CMFL, a committee system is set up to screen the uploaded local gradients. The committee system selects
the local gradients rated by the elected members for the aggregation procedure through the selection strategy, and replaces the
committee member through the election strategy. Based on the different considerations of model performance and defense, two
opposite selection strategies are designed for the sake of both accuracy and robustness. Extensive experiments illustrate that CMFL
achieves faster convergence and better accuracy than the typical Federated Learning, in the meanwhile obtaining better robustness
than the traditional Byzantine-tolerant algorithms, in the manner of a decentralized approach. In addition, we theoretically analyze and
prove the convergence of CMFL under different election and selection strategies, which coincides with the experimental results.
Index Terms—Decentralized Federated Learning, Committee Mechanism, Byzantine Robustness, Theoretical Convergence Analysis.
F
1 INTRODUCTION
N
OWADAYS, data arise from a wide range of sources,
such as mobile devices, commercial, industrial and
medical activities. These data are further used for training
the Artificial Intelligence (AI) models applied in a variety
of fields. The conventional AI methods always require up-
loading the source data to a central server. However, this is
usually impractical due to data privacy or commercial com-
petition. Federated Learning (FL) [1], which allows multiple
devices to train a shared global model without uploading
the local source data to a central server, is an effective way
to solve the aforementioned problem. In FL settings, multi-
ple clients (also known as participants) are responsible for
model training and uploading the local gradients, while the
central server is responsible for the model aggregation. A
single round of FL mainly follows the following four steps:
(1) the multiple clients download a global model from the
server, and train their local models on their local datasets;
(2) the clients upload the local gradients to the server, and
the server aggregates the received multiple local gradients
to construct the global gradient; (3) the server uses the
global gradient to update the global model; (4) the clients
download the global model to the local to continue the next
training round. The above operations will be repeated until
the algorithm converges.
It is fascinating that FL can perform model training
without uploading source data, McMahan et al. [2] proposed
that FL can achieve a similar test accuracy as the centralized
method based on the full training dataset while providing
Chunjiang Che, Xiaoli Li, Chuan Chen, Xiaoyu He and Zibin Zheng are with
the School of Computer Science and Engineering, Sun Yat-sen University,
Guangzhou, China. e-mail: {chechj, lixli27}@mail2.sysu.edu.cn, {chenchuan,
hexy73, zhzibin}@mail.sysu.edu.cn. Corresponding author: Chuan Chen.
stronger privacy guarantees. However, Lyu et al. [3] pro-
posed that the conventional FL is vulnerable to malicious
attacks from the Byzantine clients and the central server.
For example, the Byzantine clients upload false gradients to
affect the performance of the global model, which may lead
to training failure [4] [5]. Hu et al. [6] proposed that external
attacks on the central server will cause the entire learning
process to terminate. In recent years, there have been a lot of
works to solve the security problem of FL. Some works aim
to design a robust aggregation rule to reduce the negative
impact of malicious gradients [7]. For example, Blanchard
et al. [8] proposed a Byzantine-tolerant algorithm Krum,
which can tolerate Byzantine workers via aggregation rule
with resilience property. Similarly, Yin et al. [9] proposed
two robust distributed gradient descent algorithms based
on coordinate-wise median and trimmed mean operations
for Byzantine-robust distributed learning. Chen et al. [10]
proposed a simple variant of the classical gradient descent
method based on the geometric median of means of the
gradients, which can tolerate the Byzantine failures. Some
other works detect Byzantine attackers and remove the
malicious local gradients from aggregation by clustering
[11] [12]. Different from the aforementioned works, Li et
al. [13] propose a framework that trains a detection model
on the server to detect and remove the malicious local gra-
dients. Although they can guarantee the convergence and
Byzantine-tolerant, they cannot provide effective defense in
the presence of malicious servers [14].
As for the further comment, the typical FL requires a cen-
tral server to complete the gradient aggregation procedure,
thus it is difficult to find a fully trusted server in actual sce-
narios. Beyond that, the entire FL system will be paralyzed
arXiv:2108.00365v1 [cs.LG] 1 Aug 2021
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2
if the server suffers a malicious attack. Therefore, a lot of
works are devoted to designing a serverless FL framework
to reduce the risk of single point failure [15]. Some existing
works design serverless FL frameworks by learning from
network protocols such as P2P [16] [17] [18] [19] and Gossip
[20], [21] [22]. These approaches treat clients as network
nodes, which communicate with each other according to the
improved network protocol and complete the local training
and the aggregation of the global model. Besides, other
approaches employ blockchain technology to complete the
work of the server to achieve a serverless FL framework [23]
[24] [25] [26] [27] [28]. They treat clients as blockchain nodes,
record the local gradients uploaded on the block, and then
make the leading who completes the PoW (Proof of Work)
to ensure the aggregation procedure. However, few works
present the theoretical convergence analysis of the serverless
FL, leading to the lack of performance guarantee.
In this paper, we comprehensively consider Byzantine
attacks of both clients and the central server, and design a
serverless Federated Learning framework under Committee
Mechanism (CMFL), in which some participating clients
are appointed as committee members, and the committee
members take the responsibility for monitoring the entire
FL training process and ensuring the dependable aggre-
gation of the local gradients. The CMFL consists of three
components: the scoring system, the election strategy and
the selection strategy. The scoring system plays a role in
distinguishing different clients. The election strategy is re-
sponsible for selecting some clients who can represent the
majority of clients to become members of the new com-
mittee. Based on different considerations, we propose two
opposite selection strategies to make the committee mem-
bers verify local gradients uploaded by clients. A selection
strategy is designed to ensure the robustness of the training
process, where the committee members accept those local
gradients who are similar to their own local gradients but
reject those local gradients who are significantly different
from their own ones. The other selection strategy is designed
to accelerate the convergence of the global model in a
non-attack scenario, where the committee members accept
those local gradients who are different from their own local
gradients but reject those gradients who are similar to their
own ones. Theoretical analysis on the convergence of the
model is further presented for the performance guarantee,
which illustrates the impact of the proposed election and
selection strategies. Extensive experiments further demon-
strate the outperformance of CMFL compared with both the
typical and Byzantine-tolerant FL models, coinciding with
the theoretical analysis on the efficiency of the proposed
election and selection strategies. In summary, we highlight
the contributions as follow:
• We propose a serverless FL framework: CMFL, with
the ability to monitor the gradient aggregation pro-
cedure and prevent both malicious clients and the
server from hindering the training of the global
model.
• We propose an election strategy and two selection
strategies suitable for different scenarios, which can
ensure the robustness of the algorithm or accelerate
the training process.
• We give the proof and analysis of the convergence
of the proposed serverless FL framework for the the-
oretical guarantee, which considers the influence of
election and selection strategies on the performance
of the global model.
• We conduct extensive experimental results to show
that CMFL has a faster model convergence rate
and better model performance than the typical and
Byzantine-tolerant FL models.
The remainder of this paper is organized as follows.
Section 2 surveys related work. Section 3 briefly outlines the
background and problem formulation of FL. Section 4 intro-
duces our proposed framework. We show the convergence
analysis in Section 5. Experimental results and analysis are
summarized in Section 6. We conclude the paper in Section
7. Finally, Appendix A gives the convergence proof.
2 RELATED WORK
Kone
ˇ
cn
´
y et al. [29] introduce a new distributed optimization
setting in machine learning in 2015. Based on this setting the
concept of FL is proposed [1], which aims to train an efficient
centralized model in a scenario where the training data is
distributed across a large number of clients. However, these
frameworks suffer from heterogeneous clients, failing to
achieve satisfactory performance on the Non-IID dataset. To
further handle such issues, several works extend FL to Non-
IID dataset. Li et al. [30] presents the theoretical guarantees
under Non-IID settings and analyzes the convergence of
FedAvg. Li et al. [31] propose a framework FedProx with
convergence guarantees to tackle heterogeneity on the Non-
IID dataset. Yue et al. [32] proposed a strategy to mitigate
the negative impact of Non-IID data by share a small subset
of data between all the clients. Briggs et al. [33] improve
the performance of FL on the Non-IID dataset by intro-
ducing a hierarchical clustering step to separate clusters
of clients. Different from the aforementioned works, the
proposed framework provides a new perspective to enhance
the performance on the Non-IID setting.
The attractiveness of Federated learning relies on the
trainable centralized model on user equipment without up-
loading user data. However, such a framework is vulnerable
to Byzantine attackers due to the lack of identity authenti-
cation for local gradients. Lots of works have designed a
series of Byzantine-tolerant algorithms to further ensure the
robustness of the training process. For example, Blanchard et
al. [8] proposed Krum, which aims to select the global model
update based on the Euclidean distance between the local
models. Yin et al. [9] proposed Median and Trimmed Mean,
which are designed to remove extreme local gradients to
ensure the robustness of the algorithm. The Median method
constructs a global gradient, where each element is the
median of the elements in the local gradients with the same
coordinate, while the Trimmed Mean method removes the
maximum and minimum fraction of elements in the local
gradients, and then performs a weighted average on the
remaining local gradients to construct the global gradient.
Mu
˜
noz-Gonz
´
alez et al. [7] propose a Hidden Markov Model
to learn the quality of local gradients and design a robust
aggregation rule, which discards the bad local gradients in
the aggregation procedure. The aforementioned algorithms
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
are all trying to ensure the robustness of FL by designing a
more appropriate aggregation mechanism. However, these
works can not provide effective defense in the presence of
malicious servers.
In real applications, commercial competition makes it
difficult to find a fully trusted central server among the
participants. In addition, server error or malicious server
will also cause irreparable damage to the FL system. In
this way, many serverless FL frameworks are proposed
to solve these problems. Among these approaches, some
of them achieve a serverless FL framework by imitating
existing network protocols. For example, Abhijit et al. [16]
proposed a P2P serverless FL framework, where any two
clients exchange information end-to-end and update their
local models at each epoch. Hu et al. [20] used the Gossip
protocol to complete the model aggregation process, which
takes on the role of the central server. Besides, other works
build a serverless FL framework based on the blockchain
system. For example, Kim et al. [23] proposed a blockchain
FL architecture, in which they divide the blockchain nodes
into devices and miners. The device nodes provide data,
train the model locally, and upload the local gradients to
their associated miner in the blockchain network. Miner
nodes exchange and verify all the local gradients. Although
these works have obtained the corresponding performance
to some degree, they lack the theoretical analysis of the
model convergence under serverless FL framework and
consideration of Byzantine attacks of clients.
To deal with the limitations of the existing works,
we proposed a serverless FL framework under committee
mechanism. In scenarios that consider the Byzantine attacks,
the framework can be efficient Byzantine-tolerant of mali-
cious clients and servers. In scenarios without considering
the Byzantine attacks, the framework can achieve better
performance of the global model on the Non-IID dataset.
Beyond that, we present the theoretical analysis of the
convergence under the framework.
3 BACKGROUND
3.1 Federated Learning
A typical FL framework consists of a central server and
multiple clients. The server maintains a global model, and
each client maintains a local model. At the beginning of
training, the global model and all local models will be
initialized randomly. And then the following steps will be
performed at each communication round to continue the
training process [1]:
1. The server randomly selects a subset of clients, which
then download the global model to the local.
2. Each client in the subset performs a certain number of
Stochastic Gradient Descent (SGD) [34] [35] and com-
putes the local gradient.
3. The clients in the subset send their local gradients to the
server.
4. The server receives the local gradients and performs the
Federated Averaging (FedAvg) algorithm [36] to con-
struct a global gradient, which is used to update the
global model.
The above steps will be iterated until the algorithm
converges or the model accuracy meets the requirements.
3.2 Problem Formulation
We consider the typical FL setup with total K clients. The k-
th client for k = 1, ..., K owns a local dataset D
k
, which
contains n
k
= |D
k
| data samples. We denote the user-
defined loss function for sample ξ and model parameter
vector w as f(w, ξ), the local objective function F
k
(w) on
the k-th client can be written as follows:
F
k
(w) =
1
n
k
X
ξ∈D
k
f(w, ξ). (1)
We consider the following global objective function:
F (w) =
K
X
k=1
p
k
F
k
(w), (2)
where p
k
= n
k
/
P
K
k=1
n
k
denotes the weight of the dataset
on the k-th client. Formally, ∇F
k
(w
t
k,i
) denotes the local
gradient over dataset D
k
. Assume that at round t the k-
th client trains its local model w
t
k,i
over mini-batch B
t
k,i
for i iterations of SGD, where B
t
k,i
is randomly sampled
from D
k
. Then the k-th client computes the local gradient
g
k
(w
t
k,i
, B
t
k,i
) by the following formula:
g
k
(w
t
k,i
, B
t
k,i
) =
1
|B
t
k,i
|
X
ξ∈B
t
k,i
∇f(w
t
k,i
, ξ). (3)
The g
k
(w
t
k,i
, B
t
k,i
) is used to update the local model w
t
k,i
as
follows:
w
t
k,i+1
= w
t
k,i
− η
t
i
g
k
(w
t
k,i
, B
t
k,i
), (4)
where η
t
i
represents the local learning rate at iteration i of
round t and τ represents the maximal local SGD iterations.
And after τ iterations of local updating, the local gradient
g
k
(w
t
k,τ
, B
t
k,τ
) is sent to the server to construct a global
gradient as follows:
g
t
=
X
k∈S
t
p
k,S
t
g
k
(w
t
k,τ
, B
t
k,τ
), (5)
where S
t
denotes the subset of clients and p
k,S
t
=
n
k
/
P
k∈S
t
n
k
is the weight of the dataset on the k-th client
of S
t
. The w
t
is updated at each round as follows:
w
t+1
= w
t
− η
t
g
t
, (6)
where η
t
represents the global learning rate.
4 COMMITTEE MECHANISM BASED FEDERATED
LEARNING
The typical FL system is vulnerable to Byzantine attacks
and malicious servers due to its disability to implement a
serverless framework and the lack of verification for the
uploaded local gradients. The proposed serverless FL frame-
work CMFL overcomes these two flaws by introducing the
committee mechanism. Under this framework, we appoint
some clients participating in the training process as the com-
mittee members, which are responsible for filtering the local
gradients uploaded by the remaining clients. Therefore, how
to select committee members and aggregate the appropriate
parameter from the clients requires careful consideration.
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
In this work, a reliable election strategy and two opposite
selection strategies are designed for convergence guaran-
tees. The detailed introduction of the proposed framework
and the training progress is involved in this section and the
further theoretical analysis of the framework is illustrated
in Section 5.
4.1 Framework of CMFL
In the CMFL framework shown in Figure 1, the clients
are divided into three categories: training client, committee
client, and idle client. At each round, the following steps are
performed to complete the training process:
• Activate. A part of the idle clients are activated to be
the training clients, which participate in the training
at this round.
• Training. The training clients and the committee
clients download the global model to the local for
training, while the idle clients stay idle until the next
round. The training clients and the committee clients
perform SGD over their local dataset and compute
the local gradients. The difference is that the local
gradients on the training clients are used to update
the global model, while the local gradients on the
committee clients are used to verify the gradients
uploaded by the training clients.
• Scoring. The training clients send their local gra-
dients to each committee client and the committee
clients assign a score on them according to an estab-
lished scoring system.
• Selection. Only the qualified gradients according to
the set selection strategy will be used to construct the
global gradient.
• Aggregation. The clients who meet the selection
strategy are responsible for completing the aggrega-
tion procedure, which is called the aggregation client.
• Election. An election strategy is designed to com-
plete the replacement of committee members. A part
of the training clients who meet the election strategy
become the new committee clients.
• Step Down. The prior committee clients become the
idle clients, waiting for the next participate.
Supposed that C clients will be selected as commit-
tee clients, the local model of c-th committee client for
c = 1, ..., C at round t is expressed as w
t
c,τ
. Assume that we
have a total of K clients, and m gradients will be accepted
at each round, which are verified by C committee members.
Also, we represent the committee client set, the training
client set and the aggregation client set at round t as S
t
c
,
S
t
b
and S
t
a
respectively. The relevant symbols are shown in
Table 1.
4.2 Committee Mechanism
A committee system is set up to complete the screening of
local gradients. The committee system consists of the scor-
ing system, the election strategy, and the selection strategy.
TABLE 1: Notation
Notation Description
K Number of total clients
T Number of maximal communication rounds
τ Number of maximal SGD iterations
D
k
Local dataset
B
t
k,i
Mini-batch randomly sampled from D
k
w
t
k,i
Local model parameter
w
t
Global model parameter
w
∗
k
Optimal local model parameter
w
∗
Optimal global model parameter
w
t
c,i
Local model parameter of committee client
f(w, ξ) Overall loss function
F
k
(w) Local objective function
g
k
(w
t
k,i
, B
t
k,i
) Mini-batch local gradient for i iteration
∇F
k
(w
t
k,i
) Full local gradient for i iteration
ˆg
t
k
Local gradient for τ iterations
F
∗
k
Optimal value of the local objective function
F
∗
Optimal value of the global objective function
S
t
a
Aggregation client set
S
t
b
Training client set
S
t
c
Committee client set
m Number of clients in S
t
a
C Number of clients in S
t
c
P
c
k
Score of the k-th client assigned by the c-th client
P
k
Final score of the k-th client
4.2.1 Scoring System
The scoring system is designed to distinguish between the
honest and malicious gradients, where the clients who up-
load the honest gradients can obtain higher score while the
clients who upload the malicious gradients will obtain lower
score. Assume that the local gradient on the k-th training
client at round t is denoted as ˆg
t
k
= g
k
(w
t
k,τ
, B
t
k,τ
), and the
local gradient on the the c-th committee client at round t
is expressed as ˆg
t
c
= g
c
(w
t
c,τ
, B
t
k,τ
). The score P
c
k
of the k-
th training client assigned by the c-th committee client is
computed as follows:
P
c
k
=
1
||ˆg
t
k
− ˆg
t
c
||
2
2
. (7)
Since ˆg
t
k
and ˆg
t
c
are local gradients generated from different
clients, we assume that ˆg
t
k
6= ˆg
t
c
for any k ∈ S
t
b
, c ∈ S
t
c
. We
define
P
k
=
1
1
C
P
C
c
||ˆg
t
k
− ˆg
t
c
||
2
2
=
C
P
C
c
1
P
c
k
(8)
as the final score of the k-th training client. The scoring
principle is mainly based on the Euclidean distance between
the local gradients ˆg
t
k
and ˆg
t
c
. The key insight remains that
Byzantine attackers usually replace some local gradients
with malicious gradients, which directly increases the Eu-
clidean distance between these gradients and the honest
gradients. As a result, when the proportion of malicious
clients is within a tolerable range, the score of the malicious
training clients is expected to be lower than the honest
training clients. However, in a scenario without Byzantine
attacks, the score represents the degree of heterogeneity of
clients. A higher score means a higher degree of heterogene-
ity.
剩余19页未读,继续阅读
易小侠
- 粉丝: 6503
- 资源: 9万+
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- maven下载、安装、配置与使用教程(windows、Linux)
- html css网页制作成品案例.zip
- 神经网络介绍及教程&案例.docx
- 重庆市首席信息官(CIO)协会 《软件及信息化工程造价规范V5.0》T/CQCIO 001-2019
- 联想Y471老A卡笔记本开机黑屏 需等待黑屏结束才能进入桌面,解决方法
- elasticsearch-7.14.0和ik分词器
- 利用MPI计算任意范围内的质数
- EnviroSkyandWeather v2.1.1(u2017.1.2)真实动态天气系统包
- React框架介绍及相关教程、案例.docx
- 基于Springboot和Vue教务评教系统(PC端+server端源码+数据库MySQL脚本+环境部署步骤讲解+运行步骤讲解)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1