提高联邦学习的性能基于非IID设置的医学图像分析使用图像增强资源-CSDN文库

61 浏览量 2024-07-23 08:55:53 上传评论 1 收藏 8.12MB PDF 举报

Improving_Performance_of_Federated_Learning_based_Medical_Image_Analysis_in_Non-IID_Settings_using_Image_Augmentation_【提高联邦学习的性能基于非IID设置的医学图像分析使用图像增强】.pdf ### 提高联邦学习在非IID设置下的医学图像分析性能：基于图像增强的研究 #### 概述联邦学习（Federated Learning, FL）作为一种新兴的数据处理框架，旨在解决敏感数据的利用问题，尤其是在严格隐私限制下操作的患者、个人、公司或行业的数据。通过联邦学习，多个边缘设备或组织能够在不共享原始数据的情况下共同训练一个全局模型，从而解决了数据隐私与安全的问题。然而，在实际应用中，由于数据分布的非独立同分布（Non-IID）特性，联邦学习面临着性能下降和稳定性偏差等挑战。本文介绍了一种新颖的方法——通过动态平衡客户端的数据分布来解决联邦学习中的非IID数据问题。该方法通过对图像进行增强来实现这一目标，并且在高度非IID环境下对胸部X光图像进行多病种检测时显著提高了模型的稳定性和测试精度。实验结果表明，所提出的方法有助于促进医疗保健领域以及其他领域的组织和研究人员开发出更好的系统，以便在尊重数据隐私的前提下充分利用数据的价值。 #### 联邦学习背景联邦学习是一种分布式机器学习范式，它允许在不同客户端上分散的数据集上训练模型，而无需将数据集中到单一位置。这种方法对于处理涉及大量用户或组织的数据集尤其有用，这些数据集往往受到严格的隐私保护规定。在联邦学习中，每个参与方只保留自己的本地数据，并仅分享更新后的模型参数，这有助于维护数据的安全性和隐私性。 #### 非IID数据的挑战在现实世界的应用场景中，数据通常是非独立同分布的（Non-IID），这意味着来自不同客户端的数据在统计属性上存在差异。例如，某些客户端可能拥有特定类型的医学图像，而其他客户端则可能具有不同的图像类型。这种非IID性质会导致全局模型的性能下降，因为它难以从多样化的数据源中学习到全面的特征表示。 #### 图像增强技术的作用为了应对非IID数据带来的挑战，研究者提出了基于图像增强的方法。具体而言，通过在训练过程中对图像进行变换，如旋转、缩放、裁剪等，可以增加训练数据的多样性，从而使模型更加健壮。这种方法不仅能够提高模型的泛化能力，还能减轻非IID数据的影响。 #### 方法论本研究提出的图像增强方法主要包括以下几个方面： 1. **动态平衡数据分布**：根据每个客户端的数据特点动态调整图像增强策略，确保各个客户端的数据分布更加均衡。 2. **增强图像的多样性**：通过多种图像增强技术，如随机旋转、亮度调整、对比度变化等，增加训练数据的多样性。 3. **个性化增强策略**：根据不同客户端的具体情况定制个性化的图像增强方案，进一步优化全局模型的性能。 #### 实验结果通过在高度非IID环境下对胸部X光图像进行多病种检测的实验验证了所提出方法的有效性。实验结果显示，采用该方法后，模型的测试准确率从83.22%显著提升至89.43%，这表明动态平衡数据分布并通过图像增强来解决非IID数据问题的方法是可行的。 #### 结论联邦学习在医疗图像分析等敏感领域具有巨大的潜力，但非IID数据的存在对其性能产生了负面影响。通过对图像进行增强来动态平衡数据分布的方法能够有效缓解这一问题，提高模型的稳定性和准确性。未来的研究可以探索更多样化的图像增强技术和更复杂的非IID场景，以进一步推动联邦学习在实际应用中的发展。

资源推荐

资源详情

资源评论

Improving Performance of Federated Learning

based Medical Image Analysis in Non-IID Settings

using Image Augmentation

Alper Emin Cetinkaya

Information Security Program

Gazi University

Ankara, Turkey

aemin.cetinkaya@gazi.edu.tr

0000-0003-2424-6075

Murat Akin

Gazi AI Center of Gazi University,

Basarsoft Information Systems Inc.

Ankara, Turkey

muratakin@gazi.edu.tr

0000-0003-0001-1036

Seref Sagiroglu

Computer Engineering Dept.

Gazi AI Center, Gazi University

Ankara, Turkey

ss@gazi.edu.tr

0000-0003-0805-5818

Abstract

—Federated Learning (FL) is a suitable solution for

making use of sensitive data belonging to patients, people,

companies, or industries that are obligatory to work under rigid

privacy constraints. FL mainly or partially supports data

privacy and security issues and provides an alternative to model

problems facilitating multiple edge devices or organizations to

contribute a training of a global model using a number of local

data without having them. Non-IID data of FL caused from its

distributed nature presents a significant performance

degradation and stabilization skews. This paper introduces a

novel method dynamically balancing the data distributions of

clients by augmenting images to address the non-IID data

problem of FL. The introduced method remarkably stabilizes

the model training and improves the model’s test accuracy from

83.22% to 89.43% for multi-chest diseases detection of chest X-

ray images in highly non-IID FL setting. The results of IID, non-

IID and non-IID with proposed method federated trainings

demonstrated that the proposed method might help to

encourage organizations or researchers in developing better

systems to get values from data with respect to data privacy not

only for healthcare but also other fields.

Keywords—Federated Learning, Deep Learning, Medical

Image Analysis, Chest X-Ray Image, Privacy, Non-IID Data.

I. INTRODUCTION

Using deep learning (DL) for medical image analysis such

as detecting Covid-19 from the chest X-Ray (CXR) images

without using any dedicated test kits is a low-cost and

accurate alternative to laboratory-based testing. Recent

advances in DL provides promising results for medical image

analysis and large-scale diagnostic. Also, the ease of access,

scalability, and the rapid diagnostic of DL are huge plus when

compared to human based diagnosis. However, DL methods

require large amounts of samples to achieve competitive

results since the performance of the DL algorithms is highly

affected by the volume and diversity of the data.

The approach of centrally training an DL model for

leveraging health data suffers from the risk of violation of

patient privacy with the increasing concerns on data privacy.

Also, it is usually not likely that medical institutions share

their local data due to ownership concerns and strict

regulations on data privacy such as Health Insurance

Portability and Accountability Act (HIPAA) and General

Data Protection Regulation (GDPR). Even if the collected

data is adequately protected against malicious actors, it is

high probability that a condition beyond expected situations

resulting the violation of individuals privacy may occur. The

healthcare data including personal identity, behavior,

biometrics, biomedical images, genomic data, and medical

history of patient has become the one of the primary targets

of attackers or hackers while the healthcare is the sector that

is most exposed to cyber-attacks. According to the recent

report published by HIPAA [1], healthcare records of more

than 5 million people were breached across 38 incidents in

August 2021, while these leaks bring the total figure to 707

in the period between September 2020 and August 2021. The

breach of the health data has a lifelong impact unlike any

other personal data breach since it may include information

such as genomic data that cannot be altered afterwards. Data

holders, who are obliged to ensure the security and privacy of

the data they keep, face serious economic and legal

consequences in such cases. Hence, one of the primary

challenges on developing data-driven intelligent applications

for healthcare is to preserve privacy and secure shared data

against any kind of cyber threats or attacks.

A naive solution to leverage high volume and diversity of

data across multiple organizations is to alter the data before

collecting to a central place eighter by removing or

anonymizing personal data such that no private information

of individuals can be inferred. Unfortunately, re-

identification of anonymized or removed information against

such protections is still possible using advanced attacks [2]

such as linkage attacks. Furthermore, there is a trade-off

between the data utility and the privacy for this kind of

privacy-preserving methods such that the utility of the data

decreases as the more privacy is needed [3].

An alternative solution to data anonymization is training

a global model with a recent approach called Federated

Learning (FL) that was introduced by Google in 2016 [4]. In

contrast to conventional strategy of training a model

centrally, FL enables collaborative training of a global model

across multiple agents without gathering the data to a central

place. Instead of gathering the data in a central place, the

model training phase is decentralized and the training is

performed on the device where the data is produced. Since

the data never leaves its origin, there is no need to concern

about privacy risks and legal regulations to leverage the high

volume and diverse data. FL also reduces the cost of

2-3 DECEMBER 2021, 14TH INTERNATIONAL CONFERENCE ON INFORMATION SECURITY AND CRYPTOLOGY, ANKARA-TURKEY

Authorized licensed use limited to: Xi'an Univ of Posts & Telecom. Downloaded on July 15,2022 at 16:40:17 UTC from IEEE Xplore. Restrictions apply.

Improving Performance of Federated Learning

based Medical Image Analysis in Non-IID Settings

using Image Augmentation

Alper Emin Cetinkaya

Information Security Program

Gazi University

Ankara, Turkey

aemin.cetinkaya@gazi.edu.tr

0000-0003-2424-6075

Murat Akin

Gazi AI Center of Gazi University,

Basarsoft Information Systems Inc.

Ankara, Turkey

muratakin@gazi.edu.tr

0000-0003-0001-1036

Seref Sagiroglu

Computer Engineering Dept.

Gazi AI Center, Gazi University

Ankara, Turkey

ss@gazi.edu.tr

0000-0003-0805-5818

Abstract

—Federated Learning (FL) is a suitable solution for

making use of sensitive data belonging to patients, people,

companies, or industries that are obligatory to work under rigid

privacy constraints. FL mainly or partially supports data

privacy and security issues and provides an alternative to model

problems facilitating multiple edge devices or organizations to

contribute a training of a global model using a number of local

data without having them. Non-IID data of FL caused from its

distributed nature presents a significant performance

degradation and stabilization skews. This paper introduces a

novel method dynamically balancing the data distributions of

clients by augmenting images to address the non-IID data

problem of FL. The introduced method remarkably stabilizes

the model training and improves the model’s test accuracy from

83.22% to 89.43% for multi-chest diseases detection of chest X-

ray images in highly non-IID FL setting. The results of IID, non-

IID and non-IID with proposed method federated trainings

demonstrated that the proposed method might help to

encourage organizations or researchers in developing better

systems to get values from data with respect to data privacy not

only for healthcare but also other fields.

Keywords—Federated Learning, Deep Learning, Medical

Image Analysis, Chest X-Ray Image, Privacy, Non-IID Data.

I. INTRODUCTION

Using deep learning (DL) for medical image analysis such

as detecting Covid-19 from the chest X-Ray (CXR) images

without using any dedicated test kits is a low-cost and

accurate alternative to laboratory-based testing. Recent

advances in DL provides promising results for medical image

analysis and large-scale diagnostic. Also, the ease of access,

scalability, and the rapid diagnostic of DL are huge plus when

compared to human based diagnosis. However, DL methods

require large amounts of samples to achieve competitive

results since the performance of the DL algorithms is highly

affected by the volume and diversity of the data.

The approach of centrally training an DL model for

leveraging health data suffers from the risk of violation of

patient privacy with the increasing concerns on data privacy.

Also, it is usually not likely that medical institutions share

their local data due to ownership concerns and strict

regulations on data privacy such as Health Insurance

Portability and Accountability Act (HIPAA) and General

Data Protection Regulation (GDPR). Even if the collected

data is adequately protected against malicious actors, it is

high probability that a condition beyond expected situations

resulting the violation of individuals privacy may occur. The

healthcare data including personal identity, behavior,

biometrics, biomedical images, genomic data, and medical

history of patient has become the one of the primary targets

of attackers or hackers while the healthcare is the sector that

is most exposed to cyber-attacks. According to the recent

report published by HIPAA [1], healthcare records of more

than 5 million people were breached across 38 incidents in

August 2021, while these leaks bring the total figure to 707

in the period between September 2020 and August 2021. The

breach of the health data has a lifelong impact unlike any

other personal data breach since it may include information

such as genomic data that cannot be altered afterwards. Data

holders, who are obliged to ensure the security and privacy of

the data they keep, face serious economic and legal

consequences in such cases. Hence, one of the primary

challenges on developing data-driven intelligent applications

for healthcare is to preserve privacy and secure shared data

against any kind of cyber threats or attacks.

A naive solution to leverage high volume and diversity of

data across multiple organizations is to alter the data before

collecting to a central place eighter by removing or

anonymizing personal data such that no private information

of individuals can be inferred. Unfortunately, re-

identification of anonymized or removed information against

such protections is still possible using advanced attacks [2]

such as linkage attacks. Furthermore, there is a trade-off

between the data utility and the privacy for this kind of

privacy-preserving methods such that the utility of the data

decreases as the more privacy is needed [3].

An alternative solution to data anonymization is training

a global model with a recent approach called Federated

Learning (FL) that was introduced by Google in 2016 [4]. In

contrast to conventional strategy of training a model

centrally, FL enables collaborative training of a global model

across multiple agents without gathering the data to a central

place. Instead of gathering the data in a central place, the

model training phase is decentralized and the training is

performed on the device where the data is produced. Since

the data never leaves its origin, there is no need to concern

about privacy risks and legal regulations to leverage the high

volume and diverse data. FL also reduces the cost of

2-3 DECEMBER 2021, 14TH INTERNATIONAL CONFERENCE ON INFORMATION SECURITY AND CRYPTOLOGY, ANKARA-TURKEY

Authorized licensed use limited to: Xi'an Univ of Posts & Telecom. Downloaded on July 15,2022 at 16:40:17 UTC from IEEE Xplore. Restrictions apply.

提高联邦学习的性能

基于非IID设置的医学图像分析

使用图像增强

信息安全计划

土耳其安卡拉

加齐大学

穆拉特·阿金

Gazi大学GaziAI中心，

Basarsoft信息系统公司

土耳其安卡拉

计算机工程系

加兹大学加兹人工智能中心

土耳其安卡拉s

s@gazi.edu.tr

摘要—联邦学习(FL)是一个合适的解决方案

使用属于必须在严格的隐私约束下工作的患者、人员、公司或

行业的敏感数据。FL主要或部分支持数据隐私和安全问题，并

为模型问题提供了一种替代方案，便于多个边缘设备或组织使

用大量本地数据进行全局模型的训练，而无需它们。FL的非II

D数据由其分布引起

退化和稳定偏差。本文介绍了一种通过增强图像来动态平衡客

户端数据分布的新方法，以解决FL的非IID数据问题。引入的

方法显着稳定了模型训练，并将模型的测试准确率从83.22%

提高到89.43%，用于高度非IIDFL设置的胸部X线图像的多胸

部疾病检测。IID、非IID和非IID与所提出的方法联合训练的结

果展示了

建议的方法可能有助于

鼓励组织或研究人员开发更好的系统，以从数据中获取与数据

隐私相关的价值，不仅适用于医疗保健，还适用于其他领域。

关键词——联邦学习、深度学习、医学

图像分析、胸部X射线图像、隐私、非IID数据。

使用深度学习(DL)进行医学图像分析，例如

因为在不使用任何专用测试套件的情况下从胸部X射线(

CXR)图像中检测Covid-19是基于实验室的测试的低成本

且准确的替代方案。DL的最新进展为医学图像分析和大

规模诊断提供了有希望的结果。此外，与基于人类的诊

断相比，DL的易用性、可扩展性和快速诊断是巨大的优

势。然而，DL方法需要大量样本才能获得有竞争力的结

果，因为DL算法的性能受到数据量和多样性的高度影响

。

集中训练DL模型的方法

随着对数据隐私的日益关注，利用健康数据存在侵犯患

者隐私的风险。此外，由于所有权问题和对数据隐私的

严格规定，例如健康保险流通与责任法案(HIPAA)和通用

数据保护条例(GDPR)，医疗机构通常不太可能共享其本

地数据。即使收集到的数据得到了充分的保护，免受恶

意行为者的侵害，它也是

可能发生超出预期情况导致侵犯个人隐私的情况的可能

性很高。医疗保健

包括个人身份在内的数据，

生物识别、生物医学图像、基因组数据和患者的病史已

成为攻击者或黑客的主要目标之一，而医疗保健是最容

易受到网络攻击的部门。根据HIPAA[1]最近发布的报告

，在2021年8月的38起事件中，超过500万人的医疗记录

被泄露，而这些泄露使2020年9月至2021年8月期间的总

数达到707起。与任何其他个人数据泄露不同，健康数据

具有终身影响，因为它可能包含诸如基因组数据之类的

信息，之后无法更改。数据持有者有义务确保他们保存

的数据的安全性和隐私性，在这种情况下将面临严重的

经济和法律后果。因此，为医疗保健开发数据驱动的智

能应用程序的主要挑战之一是保护隐私并保护共享数据

免受任何类型的网络威胁或攻击。

一种利用大量和多样性的简单解决方案

跨多个组织的数据是在收集到一个中心位置之前更改数

据，方法是删除或匿名个人数据，以便没有私人信息

使用高级攻击[2]（例如链接攻击）仍然可以识别针对此

类保护的匿名或删除信息。此外，这种隐私保护方法在

数据效用和隐私之间存在权衡，因此数据的效用会随着

需要的隐私越多而降低[3]。

数据匿名化的另一种解决方案是训练

谷歌在2016年推出的一种全球模型，最近采用了一种称

为联合学习(FL)的方法[4]。与集中训练模型的传统策略

相比，FL可以跨多个代理对全局模型进行协作训练，而

无需将数据收集到中心位置。模型训练阶段不是在中心

位置收集数据，而是分散的，训练在产生数据的设备上

执行。由于数据永远不会离开其来源，因此无需担心隐

私风险和法律法规来利用海量和多样化的数据。FL还降

低了成本

2021年12月2日至3日，第14届信息安全与密码学国际会议，土耳其安卡拉

年

信

息

安

全

和

密

码

学

国

际

会

议

(IS

-0

31.

00

:10

.11

27.

21.

授权许可使用仅限于：西安邮电大学。从IEEEXplore于2022年7月15日16:40:17UTC下载。有限制。

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余5页未读，立即下载

评论收藏

内容反馈

AI智博信息

粉丝: 1494
资源: 238

提高联邦学习的性能基于非IID设置的医学图像分析使用图像增强

基于无线网络的联邦学习 优化模型设计与分析_无线网络联邦学习_

FedAvg-master联邦学习MINST分类，包含独立同分布非独立同分布，以及多种聚合策略。（有创新！）

个性化联邦学习的总结.pdf

FEDSD具有共享标签分发的医疗联邦学习图像分类

具有非独立同分布和不平衡数据集的个性化联邦学习仿真平台___下载.zip

联邦学习-杨强1

联邦学习算法仿真.zip

non-iid数据下的深度神经网络的分布式联邦学习 代码复现

联邦学习领域著名的FedAvg算法python源码

图像技术（IDL）- 15图像数据库与图像重建

联邦学习Advances and Open Problems in Federated Learning

Optimizing Federated Learning on Non-IID Data with Reinforcement

联邦学习及其在电信行业的应用.docx

本地学习问题重新思考联邦学习中的数据异构性_Local Learning Matters Rethinking Data Het

图像处理（第二版）章毓晋

面向研究的联邦学习库和基准平台python源码+使用文档（支持多种图数据集,包括分子网络、社交网络、知识图谱和推荐系统）.zip

簇联邦学习改进python实现源码+项目说明+代码注释(提高精度+缓解用户孤立问题).zip

FEDNS：改进协作图像的联合学习移动客户端分类

DeepSeek从入门到精通-清华大学-202502.pdf

DeepSeek从入门到精通-清华大学

清华deepseek入门到精通文档 夸克网盘资源下载

DEEP SEEK 本地部署（Ollama + ChatBox）+ 私有知识库（cherry studio）教程

Ollama windows安装包 0.5.7（截止2025-02-01）

人工智能应用：DeepSeek从入门到精通的操作指南与多功能实战详解

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

DeepSeek从入门到精通

浏览器插件Page Assist-1.4.5

AI使用+DeepSeek+DeepSeek清华大学第二版

最新资源

基于无线网络的联邦学习优化模型设计与分析_无线网络联邦学习_

non-iid数据下的深度神经网络的分布式联邦学习代码复现

清华deepseek入门到精通文档夸克网盘资源下载