一种基于委员会机制的分散式联邦学习框架_ADecentralizedFederatedLearningFramework

版权申诉

去中心化

5星 · 超过95%的资源 132 浏览量 2022-01-19 15:37:30 上传评论收藏 6.09MB PDF 举报

资源详情

资源评论

资源推荐

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

A Decentralized Federated Learning Framework

via Committee Mechanism with Convergence

Guarantee

Chunjiang Che, Xiaoli Li, Student Member, IEEE,

Chuan Chen, Member, IEEE, Xiaoyu He, and Zibin Zheng, Senior Member, IEEE

Abstract—Federated learning allows multiple participants to collaboratively train an efﬁcient model without exposing data privacy.

However, this distributed machine learning training method is prone to attacks from Byzantine clients, which interfere with the training

of the global model by modifying the model or uploading the false gradient. In this paper, we propose a novel serverless federated

learning framework Committee Mechanism based Federated Learning (CMFL), which can ensure the robustness of the algorithm with

convergence guarantee. In CMFL, a committee system is set up to screen the uploaded local gradients. The committee system selects

the local gradients rated by the elected members for the aggregation procedure through the selection strategy, and replaces the

committee member through the election strategy. Based on the different considerations of model performance and defense, two

opposite selection strategies are designed for the sake of both accuracy and robustness. Extensive experiments illustrate that CMFL

achieves faster convergence and better accuracy than the typical Federated Learning, in the meanwhile obtaining better robustness

than the traditional Byzantine-tolerant algorithms, in the manner of a decentralized approach. In addition, we theoretically analyze and

prove the convergence of CMFL under different election and selection strategies, which coincides with the experimental results.

Index Terms—Decentralized Federated Learning, Committee Mechanism, Byzantine Robustness, Theoretical Convergence Analysis.

1 INTRODUCTION

OWADAYS, data arise from a wide range of sources,

such as mobile devices, commercial, industrial and

medical activities. These data are further used for training

the Artiﬁcial Intelligence (AI) models applied in a variety

of ﬁelds. The conventional AI methods always require up-

loading the source data to a central server. However, this is

usually impractical due to data privacy or commercial com-

petition. Federated Learning (FL) [1], which allows multiple

devices to train a shared global model without uploading

the local source data to a central server, is an effective way

to solve the aforementioned problem. In FL settings, multi-

ple clients (also known as participants) are responsible for

model training and uploading the local gradients, while the

central server is responsible for the model aggregation. A

single round of FL mainly follows the following four steps:

(1) the multiple clients download a global model from the

server, and train their local models on their local datasets;

(2) the clients upload the local gradients to the server, and

the server aggregates the received multiple local gradients

to construct the global gradient; (3) the server uses the

global gradient to update the global model; (4) the clients

download the global model to the local to continue the next

training round. The above operations will be repeated until

the algorithm converges.

It is fascinating that FL can perform model training

without uploading source data, McMahan et al. [2] proposed

that FL can achieve a similar test accuracy as the centralized

method based on the full training dataset while providing

Chunjiang Che, Xiaoli Li, Chuan Chen, Xiaoyu He and Zibin Zheng are with

the School of Computer Science and Engineering, Sun Yat-sen University,

Guangzhou, China. e-mail: {chechj, lixli27}@mail2.sysu.edu.cn, {chenchuan,

hexy73, zhzibin}@mail.sysu.edu.cn. Corresponding author: Chuan Chen.

stronger privacy guarantees. However, Lyu et al. [3] pro-

posed that the conventional FL is vulnerable to malicious

attacks from the Byzantine clients and the central server.

For example, the Byzantine clients upload false gradients to

affect the performance of the global model, which may lead

to training failure [4] [5]. Hu et al. [6] proposed that external

attacks on the central server will cause the entire learning

process to terminate. In recent years, there have been a lot of

works to solve the security problem of FL. Some works aim

to design a robust aggregation rule to reduce the negative

impact of malicious gradients [7]. For example, Blanchard

et al. [8] proposed a Byzantine-tolerant algorithm Krum,

which can tolerate Byzantine workers via aggregation rule

with resilience property. Similarly, Yin et al. [9] proposed

two robust distributed gradient descent algorithms based

on coordinate-wise median and trimmed mean operations

for Byzantine-robust distributed learning. Chen et al. [10]

proposed a simple variant of the classical gradient descent

method based on the geometric median of means of the

gradients, which can tolerate the Byzantine failures. Some

other works detect Byzantine attackers and remove the

malicious local gradients from aggregation by clustering

[11] [12]. Different from the aforementioned works, Li et

al. [13] propose a framework that trains a detection model

on the server to detect and remove the malicious local gra-

dients. Although they can guarantee the convergence and

Byzantine-tolerant, they cannot provide effective defense in

the presence of malicious servers [14].

As for the further comment, the typical FL requires a cen-

tral server to complete the gradient aggregation procedure,

thus it is difﬁcult to ﬁnd a fully trusted server in actual sce-

narios. Beyond that, the entire FL system will be paralyzed

arXiv:2108.00365v1 [cs.LG] 1 Aug 2021

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

if the server suffers a malicious attack. Therefore, a lot of

works are devoted to designing a serverless FL framework

to reduce the risk of single point failure [15]. Some existing

works design serverless FL frameworks by learning from

network protocols such as P2P [16] [17] [18] [19] and Gossip

[20], [21] [22]. These approaches treat clients as network

nodes, which communicate with each other according to the

improved network protocol and complete the local training

and the aggregation of the global model. Besides, other

approaches employ blockchain technology to complete the

work of the server to achieve a serverless FL framework [23]

[24] [25] [26] [27] [28]. They treat clients as blockchain nodes,

record the local gradients uploaded on the block, and then

make the leading who completes the PoW (Proof of Work)

to ensure the aggregation procedure. However, few works

present the theoretical convergence analysis of the serverless

FL, leading to the lack of performance guarantee.

In this paper, we comprehensively consider Byzantine

attacks of both clients and the central server, and design a

serverless Federated Learning framework under Committee

Mechanism (CMFL), in which some participating clients

are appointed as committee members, and the committee

members take the responsibility for monitoring the entire

FL training process and ensuring the dependable aggre-

gation of the local gradients. The CMFL consists of three

components: the scoring system, the election strategy and

the selection strategy. The scoring system plays a role in

distinguishing different clients. The election strategy is re-

sponsible for selecting some clients who can represent the

majority of clients to become members of the new com-

mittee. Based on different considerations, we propose two

opposite selection strategies to make the committee mem-

bers verify local gradients uploaded by clients. A selection

strategy is designed to ensure the robustness of the training

process, where the committee members accept those local

gradients who are similar to their own local gradients but

reject those local gradients who are signiﬁcantly different

from their own ones. The other selection strategy is designed

to accelerate the convergence of the global model in a

non-attack scenario, where the committee members accept

those local gradients who are different from their own local

gradients but reject those gradients who are similar to their

own ones. Theoretical analysis on the convergence of the

model is further presented for the performance guarantee,

which illustrates the impact of the proposed election and

selection strategies. Extensive experiments further demon-

strate the outperformance of CMFL compared with both the

typical and Byzantine-tolerant FL models, coinciding with

the theoretical analysis on the efﬁciency of the proposed

election and selection strategies. In summary, we highlight

the contributions as follow:

• We propose a serverless FL framework: CMFL, with

the ability to monitor the gradient aggregation pro-

cedure and prevent both malicious clients and the

server from hindering the training of the global

model.

• We propose an election strategy and two selection

strategies suitable for different scenarios, which can

ensure the robustness of the algorithm or accelerate

the training process.

• We give the proof and analysis of the convergence

of the proposed serverless FL framework for the the-

oretical guarantee, which considers the inﬂuence of

election and selection strategies on the performance

of the global model.

• We conduct extensive experimental results to show

that CMFL has a faster model convergence rate

and better model performance than the typical and

Byzantine-tolerant FL models.

The remainder of this paper is organized as follows.

Section 2 surveys related work. Section 3 brieﬂy outlines the

background and problem formulation of FL. Section 4 intro-

duces our proposed framework. We show the convergence

analysis in Section 5. Experimental results and analysis are

summarized in Section 6. We conclude the paper in Section

7. Finally, Appendix A gives the convergence proof.

2 RELATED WORK

Kone

y et al. [29] introduce a new distributed optimization

setting in machine learning in 2015. Based on this setting the

concept of FL is proposed [1], which aims to train an efﬁcient

centralized model in a scenario where the training data is

distributed across a large number of clients. However, these

frameworks suffer from heterogeneous clients, failing to

achieve satisfactory performance on the Non-IID dataset. To

further handle such issues, several works extend FL to Non-

IID dataset. Li et al. [30] presents the theoretical guarantees

under Non-IID settings and analyzes the convergence of

FedAvg. Li et al. [31] propose a framework FedProx with

convergence guarantees to tackle heterogeneity on the Non-

IID dataset. Yue et al. [32] proposed a strategy to mitigate

the negative impact of Non-IID data by share a small subset

of data between all the clients. Briggs et al. [33] improve

the performance of FL on the Non-IID dataset by intro-

ducing a hierarchical clustering step to separate clusters

of clients. Different from the aforementioned works, the

proposed framework provides a new perspective to enhance

the performance on the Non-IID setting.

The attractiveness of Federated learning relies on the

trainable centralized model on user equipment without up-

loading user data. However, such a framework is vulnerable

to Byzantine attackers due to the lack of identity authenti-

cation for local gradients. Lots of works have designed a

series of Byzantine-tolerant algorithms to further ensure the

robustness of the training process. For example, Blanchard et

al. [8] proposed Krum, which aims to select the global model

update based on the Euclidean distance between the local

models. Yin et al. [9] proposed Median and Trimmed Mean,

which are designed to remove extreme local gradients to

ensure the robustness of the algorithm. The Median method

constructs a global gradient, where each element is the

median of the elements in the local gradients with the same

coordinate, while the Trimmed Mean method removes the

maximum and minimum fraction of elements in the local

gradients, and then performs a weighted average on the

remaining local gradients to construct the global gradient.

noz-Gonz

alez et al. [7] propose a Hidden Markov Model

to learn the quality of local gradients and design a robust

aggregation rule, which discards the bad local gradients in

the aggregation procedure. The aforementioned algorithms

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

are all trying to ensure the robustness of FL by designing a

more appropriate aggregation mechanism. However, these

works can not provide effective defense in the presence of

malicious servers.

In real applications, commercial competition makes it

difﬁcult to ﬁnd a fully trusted central server among the

participants. In addition, server error or malicious server

will also cause irreparable damage to the FL system. In

this way, many serverless FL frameworks are proposed

to solve these problems. Among these approaches, some

of them achieve a serverless FL framework by imitating

existing network protocols. For example, Abhijit et al. [16]

proposed a P2P serverless FL framework, where any two

clients exchange information end-to-end and update their

local models at each epoch. Hu et al. [20] used the Gossip

protocol to complete the model aggregation process, which

takes on the role of the central server. Besides, other works

build a serverless FL framework based on the blockchain

system. For example, Kim et al. [23] proposed a blockchain

FL architecture, in which they divide the blockchain nodes

into devices and miners. The device nodes provide data,

train the model locally, and upload the local gradients to

their associated miner in the blockchain network. Miner

nodes exchange and verify all the local gradients. Although

these works have obtained the corresponding performance

to some degree, they lack the theoretical analysis of the

model convergence under serverless FL framework and

consideration of Byzantine attacks of clients.

To deal with the limitations of the existing works,

we proposed a serverless FL framework under committee

mechanism. In scenarios that consider the Byzantine attacks,

the framework can be efﬁcient Byzantine-tolerant of mali-

cious clients and servers. In scenarios without considering

the Byzantine attacks, the framework can achieve better

performance of the global model on the Non-IID dataset.

Beyond that, we present the theoretical analysis of the

convergence under the framework.

3 BACKGROUND

3.1 Federated Learning

A typical FL framework consists of a central server and

multiple clients. The server maintains a global model, and

each client maintains a local model. At the beginning of

training, the global model and all local models will be

initialized randomly. And then the following steps will be

performed at each communication round to continue the

training process [1]:

1. The server randomly selects a subset of clients, which

then download the global model to the local.

2. Each client in the subset performs a certain number of

Stochastic Gradient Descent (SGD) [34] [35] and com-

putes the local gradient.

3. The clients in the subset send their local gradients to the

server.

4. The server receives the local gradients and performs the

Federated Averaging (FedAvg) algorithm [36] to con-

struct a global gradient, which is used to update the

global model.

The above steps will be iterated until the algorithm

converges or the model accuracy meets the requirements.

3.2 Problem Formulation

We consider the typical FL setup with total K clients. The k-

th client for k = 1, ..., K owns a local dataset D

, which

contains n

= |D

| data samples. We denote the user-

deﬁned loss function for sample ξ and model parameter

vector w as f(w, ξ), the local objective function F

(w) on

the k-th client can be written as follows:

(w) =

ξ∈D

f(w, ξ). (1)

We consider the following global objective function:

F (w) =

k=1

(w), (2)

where p

= n

k=1

denotes the weight of the dataset

on the k-th client. Formally, ∇F

k,i

) denotes the local

gradient over dataset D

. Assume that at round t the k-

th client trains its local model w

k,i

over mini-batch B

k,i

for i iterations of SGD, where B

k,i

is randomly sampled

from D

. Then the k-th client computes the local gradient

k,i

, B

k,i

) by the following formula:

k,i

, B

k,i

) =

k,i

ξ∈B

k,i

∇f(w

k,i

, ξ). (3)

The g

k,i

, B

k,i

) is used to update the local model w

k,i

follows:

k,i+1

= w

k,i

− η

k,i

, B

k,i

), (4)

where η

represents the local learning rate at iteration i of

round t and τ represents the maximal local SGD iterations.

And after τ iterations of local updating, the local gradient

k,τ

, B

k,τ

) is sent to the server to construct a global

gradient as follows:

k∈S

k,S

k,τ

, B

k,τ

), (5)

where S

denotes the subset of clients and p

k,S

k∈S

is the weight of the dataset on the k-th client

of S

. The w

is updated at each round as follows:

t+1

= w

− η

, (6)

where η

represents the global learning rate.

4 COMMITTEE MECHANISM BASED FEDERATED

LEARNING

The typical FL system is vulnerable to Byzantine attacks

and malicious servers due to its disability to implement a

serverless framework and the lack of veriﬁcation for the

uploaded local gradients. The proposed serverless FL frame-

work CMFL overcomes these two ﬂaws by introducing the

committee mechanism. Under this framework, we appoint

some clients participating in the training process as the com-

mittee members, which are responsible for ﬁltering the local

gradients uploaded by the remaining clients. Therefore, how

to select committee members and aggregate the appropriate

parameter from the clients requires careful consideration.

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

In this work, a reliable election strategy and two opposite

selection strategies are designed for convergence guaran-

tees. The detailed introduction of the proposed framework

and the training progress is involved in this section and the

further theoretical analysis of the framework is illustrated

in Section 5.

4.1 Framework of CMFL

In the CMFL framework shown in Figure 1, the clients

are divided into three categories: training client, committee

client, and idle client. At each round, the following steps are

performed to complete the training process:

• Activate. A part of the idle clients are activated to be

the training clients, which participate in the training

at this round.

• Training. The training clients and the committee

clients download the global model to the local for

training, while the idle clients stay idle until the next

round. The training clients and the committee clients

perform SGD over their local dataset and compute

the local gradients. The difference is that the local

gradients on the training clients are used to update

the global model, while the local gradients on the

committee clients are used to verify the gradients

uploaded by the training clients.

• Scoring. The training clients send their local gra-

dients to each committee client and the committee

clients assign a score on them according to an estab-

lished scoring system.

• Selection. Only the qualiﬁed gradients according to

the set selection strategy will be used to construct the

global gradient.

• Aggregation. The clients who meet the selection

strategy are responsible for completing the aggrega-

tion procedure, which is called the aggregation client.

• Election. An election strategy is designed to com-

plete the replacement of committee members. A part

of the training clients who meet the election strategy

become the new committee clients.

• Step Down. The prior committee clients become the

idle clients, waiting for the next participate.

Supposed that C clients will be selected as commit-

tee clients, the local model of c-th committee client for

c = 1, ..., C at round t is expressed as w

c,τ

. Assume that we

have a total of K clients, and m gradients will be accepted

at each round, which are veriﬁed by C committee members.

Also, we represent the committee client set, the training

client set and the aggregation client set at round t as S

and S

respectively. The relevant symbols are shown in

Table 1.

4.2 Committee Mechanism

A committee system is set up to complete the screening of

local gradients. The committee system consists of the scor-

ing system, the election strategy, and the selection strategy.

TABLE 1: Notation

Notation Description

K Number of total clients

T Number of maximal communication rounds

τ Number of maximal SGD iterations

Local dataset

k,i

Mini-batch randomly sampled from D

k,i

Local model parameter

Global model parameter

∗

Optimal local model parameter

∗

Optimal global model parameter

c,i

Local model parameter of committee client

f(w, ξ) Overall loss function

(w) Local objective function

k,i

, B

k,i

) Mini-batch local gradient for i iteration

∇F

k,i

) Full local gradient for i iteration

ˆg

Local gradient for τ iterations

∗

Optimal value of the local objective function

∗

Optimal value of the global objective function

Aggregation client set

Training client set

Committee client set

m Number of clients in S

C Number of clients in S

Score of the k-th client assigned by the c-th client

Final score of the k-th client

4.2.1 Scoring System

The scoring system is designed to distinguish between the

honest and malicious gradients, where the clients who up-

load the honest gradients can obtain higher score while the

clients who upload the malicious gradients will obtain lower

score. Assume that the local gradient on the k-th training

client at round t is denoted as ˆg

= g

k,τ

, B

k,τ

), and the

local gradient on the the c-th committee client at round t

is expressed as ˆg

= g

c,τ

, B

k,τ

). The score P

of the k-

th training client assigned by the c-th committee client is

computed as follows:

||ˆg

− ˆg

. (7)

Since ˆg

and ˆg

are local gradients generated from different

clients, we assume that ˆg

6= ˆg

for any k ∈ S

, c ∈ S

. We

deﬁne

||ˆg

− ˆg

(8)

as the ﬁnal score of the k-th training client. The scoring

principle is mainly based on the Euclidean distance between

the local gradients ˆg

and ˆg

. The key insight remains that

Byzantine attackers usually replace some local gradients

with malicious gradients, which directly increases the Eu-

clidean distance between these gradients and the honest

gradients. As a result, when the proportion of malicious

clients is within a tolerable range, the score of the malicious

training clients is expected to be lower than the honest

training clients. However, in a scenario without Byzantine

attacks, the score represents the degree of heterogeneity of

clients. A higher score means a higher degree of heterogene-

ity.

剩余19页未读，继续阅读

qishduwjjdhdjd

2023-07-30

支持这个资源，内容详细，主要是能解决当下的问题，感谢大佬分享~

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6503
资源: 9万+

一种基于委员会机制的分散式联邦学习框架_A Decentralized Federated Learning Framework

评论1

最新资源

一种基于委员会机制的分散式联邦学习框架_A Decentralized Federated Learning Framework

评论1

Decentralized Federated Learning Framework Based On Blockchain

GFL：Galaxy Federated Learning Framework（星际联邦学习框架）

联邦学习介绍（ppt）

最新《联邦学习Federated Learning》报告

分散式机器学习：一种灵活且可扩展的分散式机器学习模式_Scatterbrained A flexible and expanda

Advances and Open Problems in Federated Learning.pdf

A Blockchain-based Decentralized Federated.pdf

Decentralized Overlapping Control of a Platoon of Vehicles车辆队列的分散式重叠控制的阅读笔记.pdf

Federated learning联合学习（1）-基础知识

non-iid数据下的深度神经网络的分布式联邦学习 代码复现

Scalable and Communication-efficient.pdf

decentralized-ml：互操作和分散式机器学习

eVoting_Elections_Decentralized_App：以Solidity 0.8.0编写的以太坊网络的eVoting分散式应用程序

分散小数据的隐私保护无服务器边缘学习_Privacy-Preserving Serverless Edge Learning w

mec_drl:用于移动边缘计算的深度强化学习

The Bitcoin Standard- The Decentralized Alternative

A Decentralized Public Key Infrastructure with Identity Retention

awesome-decentralized-web, 分散服务的出色列表.zip

Communication-efﬁcient learning of deep networks from decentralized data.pdf

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip

最新资源

non-iid数据下的深度神经网络的分布式联邦学习代码复现