具有非独立同分布和不平衡数据集的个性化联邦学习仿真平台___下载.zip

共131个文件

py：75个

npz：40个

gz：4个

版权申诉

130 浏览量 2023-04-16 20:19:08 上传评论 3 收藏 39.88MB ZIP 举报

联邦学习（Federated Learning, FL）是一种分布式机器学习方法，它允许在多个设备或机构之间进行协作训练，而无需实际共享数据。在"具有非独立同分布和不平衡数据集的个性化联邦学习仿真平台___下载.zip"这个压缩包中，我们关注的核心是联邦学习在面对非独立同分布（Non-IID）数据和数据不平衡问题时的解决方案。 1. 非独立同分布（Non-IID）数据：在传统的集中式学习中，假设所有数据都是从同一分布中抽取的，但在现实世界中，不同设备上的数据往往具有显著差异。例如，用户的手机应用使用习惯会因个人偏好而异，导致本地数据分布不同。非独立同分布数据增加了训练模型的复杂性，因为模型需要在多样化的数据上表现良好。为解决这个问题，联邦学习采用策略如局部自适应、全局模型调整和数据迁移学习等。 2. 数据不平衡：在很多实际应用中，不同类别的数据样本数量可能相差悬殊，比如在医疗诊断中，正常病例可能远多于异常病例。这导致模型容易偏向多数类别，对少数类别的识别能力下降。在联邦学习环境中，数据不平衡问题可能更突出，因为每个设备的数据可能是局部不平衡的。为此，可以采取重采样技术（如过采样、欠采样）、集成学习、类权重调整等策略来平衡不同类别的影响力。 3. 个性化联邦学习：为了应对数据的非独立同分布和不平衡，个性化联邦学习应运而生。它允许每个设备或参与者根据其特定的数据分布训练个性化的模型，同时保持一定的全局一致性。通过权衡全局性能与本地适应性，个性化联邦学习可以提高模型在各种场景下的泛化能力。 4. 模拟平台：PFL-Non-IID-master很可能是一个用于模拟和研究这些挑战的开源平台。它可能包含用于创建和管理非独立同分布数据集的工具，以及评估个性化联邦学习算法性能的实验设置。这样的平台对于研究者和开发者来说是非常宝贵的资源，可以方便地测试新的算法和策略，以优化联邦学习在真实世界问题中的应用。 5. 实现与优化：在该平台上，用户可能能够设置不同的非独立同分布模式，如数据类别分布的偏斜、数据特征的相关性变化等。此外，它可能支持调整数据不平衡程度，以观察模型性能的变化。用户还可以探索不同的联邦学习框架，如FedAvg、FedProx、Scaffold等，以及针对不平衡和非独立同分布数据的优化策略。这个压缩包提供的资源对于理解和解决联邦学习在非独立同分布和不平衡数据集上的挑战具有重要意义。通过使用这个仿真平台，研究人员和工程师可以深入研究如何设计更适应实际场景的联邦学习算法，提升模型的鲁棒性和公平性。

资源推荐

资源详情

资源评论

收起资源包目录

具有非独立同分布和不平衡数据集的个性化联邦学习仿真平台___下载.zip （131个子文件）

.gitattributes 66B

.gitignore 14B

train-images-idx3-ubyte.gz 9.45MB

t10k-images-idx3-ubyte.gz 1.57MB

train-labels-idx1-ubyte.gz 28KB

t10k-labels-idx1-ubyte.gz 4KB

config.json 913B

LICENSE 18KB

README.md 45KB

5.npz 1.19MB

17.npz 1.19MB

19.npz 1.17MB

15.npz 1.14MB

18.npz 1.09MB

14.npz 1.05MB

16.npz 832KB

11.npz 778KB

12.npz 738KB

9.npz 680KB

10.npz 670KB

0.npz 498KB

3.npz 453KB

8.npz 433KB

17.npz 407KB

5.npz 406KB

19.npz 404KB

15.npz 391KB

4.npz 386KB

13.npz 385KB

18.npz 370KB

14.npz 361KB

2.npz 338KB

16.npz 278KB

7.npz 262KB

11.npz 258KB

12.npz 244KB

9.npz 224KB

10.npz 223KB

6.npz 190KB

0.npz 164KB

3.npz 151KB

8.npz 145KB

13.npz 132KB

4.npz 129KB

2.npz 112KB

1.npz 103KB

7.npz 89KB

6.npz 63KB

1.npz 35KB

models.py 21KB

main.py 19KB

serverbase.py 11KB

resnet.py 10KB

generate_Digit5.py 10KB

mem_utils.py 8KB

mobilenet_v2.py 7KB

servergen.py 6KB

dataset_utils.py 6KB

clientbase.py 6KB

clientfomo.py 6KB

clientproto.py 6KB

generate_DomainNet.py 5KB

clientrod.py 5KB

clientditto.py 5KB

serverpFedMe.py 5KB

clientpFedMe.py 5KB

serverscaffold.py 5KB

clientapfl.py 5KB

clientgen.py 4KB

clientphp.py 4KB

serverproto.py 4KB

generate_tiny_imagenet.py 4KB

bilstm.py 4KB

clientdistill.py 4KB

serverditto.py 4KB

clientperavg.py 4KB

serverdyn.py 4KB

clientdyn.py 3KB

clientmoon.py 3KB

serverfomo.py 3KB

clientapple.py 3KB

clientscaffold.py 3KB

serverapple.py 3KB

serveramp.py 3KB

serverdistill.py 3KB

clientrep.py 3KB

clientprox.py 3KB

generate_har.py 3KB

data_utils.py 3KB

generate_sogounews.py 3KB

generate_agnews.py 3KB

generate_pamap2.py 3KB

clientamp.py 3KB

generate_AmazonReview.py 3KB

generate_mnist.py 3KB

serverrep.py 3KB

clientbabu.py 3KB

generate_cifar100.py 3KB

HAR_utils.py 3KB

generate_cifar10.py 3KB

共 131 条

# Personalized Federated Learning Platform [![DOI](https://zenodo.org/badge/292225878.svg)](https://zenodo.org/badge/latestdoi/292225878) ***We expose this user-friendly platform for beginners who intend to start federated learning (FL) study.*** ***Now there are 22 FL (or pFL) methods, three scenarios, and 14 datasets in this platform.*** Due to the frequent update, please download the **master branch** as the latest version. The origin of the **statistical heterogeneity** phenomenon is the personalization of users, who generate the non-IID (not Independent and Identically Distributed) and unbalanced data. With statistical heterogeneity existing in the FL scenario, a myriad of approaches have been proposed to crack this hard nut. In contrast, the personalized FL (pFL) may take the advantage of the statistically heterogeneious data to learn the personalized model for each user. Thanks to [@Stonesjtu](https://github.com/Stonesjtu/pytorch_memlab/blob/d590c489236ee25d157ff60ecd18433e8f9acbe3/pytorch_memlab/mem_reporter.py#L185), this platform can also record the **GPU memory usage** for the model. By using the package [opacus](https://opacus.ai/), we introduce **DP (differential privacy)** into this platform (please refer to `./system/flcore/clients/clientavg.py` for example). Following [FedCG](https://www.ijcai.org/proceedings/2022/0324), we also introduce the **[DLG (Deep Leakage from Gradients)](https://papers.nips.cc/paper_files/paper/2019/hash/60a6c4002cc7b29142def8871531281a-Abstract.html) attack** and **PSNR (Peak Signal-to-Noise Ratio) metric** to evaluate the privacy-preserving ability of FL/pFL methods (please refer to `./system/flcore/servers/serveravg.py` for example). ## Methods with Code (updating) > ### Traditional FL - **FedAvg** — [Communication-Efficient Learning of Deep Networks from Decentralized Data](http://proceedings.mlr.press/v54/mcmahan17a.html) *AISTATS 2017* - **SCAFFOLD** - [SCAFFOLD: Stochastic Controlled Averaging for Federated Learning](http://proceedings.mlr.press/v119/karimireddy20a.html) *ICML 2020* ***Regularization-based FL*** - **FedProx** — [Federated Optimization in Heterogeneous Networks](https://proceedings.mlsys.org/paper/2020/hash/38af86134b65d0f10fe33d30dd76442e-Abstract.html) *MLsys 2020* - **FedDyn** — [Federated Learning Based on Dynamic Regularization](https://openreview.net/forum?id=B7v4QMR6Z9w) *ICLR 2021* ***Feature-extraction-based FL*** - **MOON** — [Model-Contrastive Federated Learning](https://openaccess.thecvf.com/content/CVPR2021/html/Li_Model-Contrastive_Federated_Learning_CVPR_2021_paper.html) *CVPR 2021* ***Knowledge-distillation-based FL*** - **FedGen** — [Data-Free Knowledge Distillation for Heterogeneous Federated Learning](http://proceedings.mlr.press/v139/zhu21b.html) *ICML 2021* > ### Personalized FL - **FedMTL (not MOCHA)** — [Federated multi-task learning](https://papers.nips.cc/paper/2017/hash/6211080fa89981f66b1a0c9d55c61d0f-Abstract.html) *NeurIPS 2017* - **FedBN** — [FedBN: Federated Learning on non-IID Features via Local Batch Normalization](https://openreview.net/forum?id=6YEQUn0QICG) *ICLR 2021* ***Meta-learning-based pFL*** - **Per-FedAvg** — [Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach](https://proceedings.neurips.cc/paper/2020/hash/24389bfe4fe2eba8bf9aa9203a44cdad-Abstract.html) *NeurIPS 2020* ***Regularization-based pFL*** - **pFedMe** — [Personalized Federated Learning with Moreau Envelopes](https://papers.nips.cc/paper/2020/hash/f4f1f13c8289ac1b1ee0ff176b56fc60-Abstract.html) *NeurIPS 2020* - **Ditto** — [Ditto: Fair and robust federated learning through personalization](https://proceedings.mlr.press/v139/li21h.html) *ICML 2021* ***Personalized-aggregation-based pFL*** - **APFL** — [Adaptive Personalized Federated Learning](https://arxiv.org/abs/2003.13461) *2020* - **FedFomo** — [Personalized Federated Learning with First Order Model Optimization](https://openreview.net/forum?id=ehJqJQk9cw) *ICLR 2021* - **FedAMP** — [Personalized Cross-Silo Federated Learning on non-IID Data](https://ojs.aaai.org/index.php/AAAI/article/view/16960) *AAAI 2021* - **FedPHP** — [FedPHP: Federated Personalization with Inherited Private Models](https://link.springer.com/chapter/10.1007/978-3-030-86486-6_36) *ECML PKDD 2021* - **APPLE** — [Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning](https://www.ijcai.org/proceedings/2022/301) *IJCAI 2022* ***Feature-extraction-based pFL*** - **FedPer** — [Federated Learning with Personalization Layers](https://arxiv.org/abs/1912.00818) *2019* - **FedRep** — [Exploiting Shared Representations for Personalized Federated Learning](http://proceedings.mlr.press/v139/collins21a.html) *ICML 2021* - **FedRoD** — [On Bridging Generic and Personalized Federated Learning for Image Classification](https://openreview.net/forum?id=I1hQbx10Kxn) *ICLR 2022* - **FedBABU** — [Fedbabu: Towards enhanced representation for federated image classification](https://openreview.net/forum?id=HuaYQfggn5u) *ICLR 2022* ***Knowledge-distillation-based pFL*** - **FedDistill** — [Federated Knowledge Distillation](https://www.cambridge.org/core/books/abs/machine-learning-and-wireless-communications/federated-knowledge-distillation/F679266F85493319EB83635D2B17C2BD#access-block) *2020* - **FedProto** — [FedProto: Federated Prototype Learning across Heterogeneous Clients](https://ojs.aaai.org/index.php/AAAI/article/view/20819) *AAAI 2022* ## Datasets and Separation (updating) For the ***label skew*** scenario, we introduce **8** famous datasets: **MNIST**, **Fashion-MNIST**, **Cifar10**, **Cifar100**, **AG_News**, **Sogou_News** (If ConnectionError raises, please use the given downloaded file in `./dataset`), and **Tiny-ImageNet** (fetch raw data from [this site](http://cs231n.stanford.edu/tiny-imagenet-200.zip)), they can be easy split into **IID** and **non-IID** version. Since some codes for generating datasets such as splitting are the same for all datasets, we move these codes into `./dataset/utils/dataset_utils.py`. In **non-IID** scenario, two situations exist. The first one is the **pathological non-IID** scenario, the second one is **practical non-IID** scenario. In the **pathological non-IID** scenario, for example, the data on each client only contains the specific number of labels (maybe only two labels), though the data on all clients contains 10 labels such as MNIST dataset. In the **practical non-IID** scenario, Dirichlet distribution is utilized (please refer to this [paper](https://proceedings.neurips.cc/paper/2020/hash/18df51b97ccd68128e994804f3eccc87-Abstract.html) for details). We can input `balance` for the iid scenario, where the data are uniformly distributed. For the ***feature shift*** scenario, we use **three** datasets that are widely used in Domain Adaptation: **AmazonReview** (fetch raw data from [this site](https://drive.google.com/file/d/1QbXFENNyqor1IlCpRRFtOluI2_hMEd1W/view?usp=sharing)), **Digit5** (fetch raw data from [this site](https://drive.google.com/file/d/1PT6K-_wmsUEUCxoYzDy0mxF-15tvb2Eu/view?usp=share_link)), and **DomainNet**. For the ***real-world (or IoT)*** scenario, we also introduce **three** naturally separated datasets: **Omniglot** (20 clients, 50 labels), **HAR (Human Activity Recognition)** (30 clients, 6 labels), **PAMAP2** (9 clients, 12 labels). For the details of datasets and FL methods in **IoT**, please refer to [my FL-IoT repo](https://github.com/TsingZ0/FL-IoT). *If you need another data set, just write another code to download it and then using the utils.* ### Examples for **MNIST** - MNIST ``` cd ./dataset python generate_mnist.py iid - - # for iid and unbalanced scenario # python generate_mnist.py iid balance - # for iid and balanced scenario # python generate_mnist.py noniid - pat # for pathological noniid and unbalanced

评论收藏

内容反馈

版权申诉