知识蒸馏-基于Tensorflow实现的无数据知识蒸馏-附项目源码+流程教程-优质项目分享.zip

共43个文件

py：35个

png：7个

md：1个

版权申诉

知识蒸馏

Tensorflow

项目源码

152 浏览量 2024-06-01 20:41:12 上传评论收藏 433KB ZIP 举报

知识蒸馏是深度学习领域的一种技术，用于将大型复杂模型（通常称为教师模型）的知识转移到小型、更易于部署的模型（学生模型）。这种方法允许在不牺牲太多性能的情况下，压缩模型的大小，从而提高运行速度和减少计算资源的需求。在本项目中，重点是使用Tensorflow框架来实现这一过程，而且在没有大量额外数据的情况下进行知识蒸馏。 Tensorflow是一个广泛使用的开源深度学习库，由Google Brain团队开发。它提供了丰富的API，可以方便地构建、训练和部署机器学习模型。在知识蒸馏的应用中，Tensorflow提供了一个强大的平台，能够灵活地定义教师模型和学生模型的架构，并实现它们之间的知识转移。无数据知识蒸馏是一种特殊的知识蒸馏方式，它并不依赖大量的额外训练数据。在传统知识蒸馏中，通常会用到与教师模型相同的数据集来训练学生模型。然而，在无数据知识蒸馏中，学生模型是通过教师模型的预测输出来学习，而不是直接从原始输入数据学习。这使得在数据有限或获取新数据困难的场景下，知识蒸馏成为可能。项目源码包含了实现知识蒸馏的完整代码，包括教师模型的定义、学生模型的构建、损失函数的设计以及训练过程。教师模型通常是预训练的大型模型，如BERT或ResNet等，其在大规模数据集上已经过充分训练。学生模型则设计为更简单的结构，如卷积神经网络（CNN）或循环神经网络（RNN），旨在保持较低的计算成本和内存占用。流程教程会详细指导如何配置环境、加载预训练模型、准备数据、训练和评估学生模型。这对于初学者理解知识蒸馏的概念及其在Tensorflow中的应用至关重要。教程通常分为以下几个步骤： 1. **环境搭建**：安装Tensorflow和其他必要的库，确保所有依赖项都已就绪。 2. **数据处理**：即使没有额外的数据，也需要准备一些基础数据以启动训练过程，例如使用少量的公开数据集或者合成数据。 3. **模型定义**：定义教师模型和学生模型的架构，可以使用Tensorflow的预训练模型库加载教师模型。 4. **知识蒸馏损失函数**：设计损失函数，不仅包含传统的交叉熵损失，还包含教师模型预测的软标签作为额外信息。 5. **训练与优化**：设置训练参数，如学习率、批次大小，然后训练学生模型，使其模仿教师模型的行为。 6. **评估与验证**：在验证集上评估学生模型的性能，与教师模型对比，看看知识是否成功转移。通过这个优质项目分享，开发者和研究人员可以深入理解知识蒸馏的原理，同时掌握在实际应用中如何使用Tensorflow实现这一技术。这不仅可以提高他们对深度学习模型压缩的理解，还有助于他们在资源受限的环境中构建高效且准确的模型。

资源推荐

资源详情

资源评论

收起资源包目录

知识蒸馏_基于Tensorflow实现的无数据知识蒸馏_附项目源码+流程教程_优质项目分享.zip （43个子文件）

知识蒸馏_基于Tensorflow实现的无数据知识蒸馏_附项目源码+流程教程_优质项目分享

utils.py 3KB

main.py 4KB

readme.md 12KB

viz

pixel_intensities.py 3KB

pixel_intensities_batch.py 1KB

get_stats_sample.py 1KB

view.py 1KB

stats_viz.py 2KB

print_stats.py 758B

datasets

__init__.py 1KB

casia.py 3KB

celeba.py 4KB

celeba_iden.py 5KB

mnist_conv.py 1KB

yale.py 4KB

mnist.py 1013B

celeba_balance.py 5KB

optimized_dataset.py 2KB

casia_upscaled.py 3KB

procedures

__init__.py 479B

compute_stats.py 6KB

_optimization_objectives.py 10KB

optimize_dataset.py 8KB

train.py 5KB

distill.py 6KB

models

__init__.py 783B

lenet.py 8KB

bilinear.py 11KB

vgg16.py 11KB

hinton800.py 3KB

alex_half.py 11KB

lenet_half.py 4KB

alex.py 11KB

vgg19.py 18KB

hinton1200.py 9KB

imgs

spectral_layer_pairs.png 50KB

all_layers.png 44KB

top_layer.png 38KB

pure_distill.png 24KB

spectral_all_layers.png 48KB

production_pipeline.png 77KB

means_and_random.png 111KB

test.py 2KB

# Data-Free Knowledge Distillation For Deep Neural Networks <div align="center"> <img alt="Production pipeline image" src="imgs/production_pipeline.png" /> </div> ## Abstract Recent advances in model compression have provided procedures for compressing large neural networks to a fraction of their original size while retaining most if not all of their accuracy. However, all of these approaches rely on access to the original training set, which might not always be possible if the network to be compressed was trained on a very large non-public dataset. In this work, we present a method for data-free knowledge distillation, which is able to compress deep models to a fraction of their size leveraging only some extra metadata to be provided with a pretrained model release. We also explore different kinds of metadata that can be used with our method, and discuss tradeoffs involved in using each of them. ## Overview Our method for knowledge distillation has a few different steps: training, computing layer statistics on the dataset used for training, reconstructing (or optimizing) a new dataset based solely on the trained model and the activation statistics, and finally distilling the pre-trained "teacher" model into the smaller "student" network. Each of these steps constitute a "procedure", which are implemented in the `procedures/` module. Each procedure implements a `run` function, which does everything from loading models to training. When optimizing a dataset reconstruction, there's also the choice of different optimization objectives (top layer, all layers, spectral all layers, spectral layer pairs, all discussed in the paper). These are implemented in `procedures/_optimization_objectives.py`, and take care of creating the optimization and loss operations, as well as of sampling from the saved activation statistics and creating a `feed_dict` that loads all necessary placeholders. Every dataset goes under `datasets/`, and needs to implement the same interface as `datasets/mnist.py`. Namely, the dataset class needs to have an `io_size` property that specifies the input size and the label output size. It also needs two iterator methods: `train_epoch_in_batches` and `test_epoch_in_batches`. Note: Credit for the attribute and data files of [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset is given to [this repo](https://github.com/andersbll/deeppy/blob/master/deeppy/dataset/celeba.py). We provide four models in `models/`: two fully connected and two convolutional. The fully connected models are hinton-1200 and hinton-800, as described in the original [knowledge distillation paper](https://arxiv.org/abs/1503.02531). The convolutional models are [LeNet-5](https://arxiv.org/abs/1503.02531), and a modified version of it which has half the number of convolutional filters per layer. Each model implemented to be a teacher network needs to implement all three functions in the interface: `create_model`, `load_model`, and `load_and_freeze_model`. If a model is meant to be a student network, like `lenet_half` and `hinton800`, then it need only implement `create_model`. Every artifact created will be saved under `summaries/`, the default `--summary_folder`. This includes tf summaries, checkpoints, optimized datasets, log files with information about the experiment run, activation statistics, etc. On the newly added VGG11, 16 and 19 models, there is an option to initialize the layers with ImageNet pre-trained layers. You can get those `*.npy` files [here](https://github.com/machrisaa/tensorflow-vgg). ## Requirements This code requires that you have [tensorflow](https://tensorflow.org) 1.0 installed, along with `numpy` and `scikit-image 0.13.0` on python 3.6+. The visualization scripts (used to debug optimized/reconstructed datasets) also require `opencv 3.2.0` and `matplotlib`. ## Usage ### Train and Save a Model First, we need to have the model trained on the original dataset. This step can be skipped if you already have a pre-trained model that you can easily load through the same interface as the ones in `models/`. The `procedure` flag specifies what to do with the model and dataset. In this case, train it from scratch. ```bash python main.py --run_name=experiment --model=hinton1200 --dataset=mnist \ --procedure=train ``` ### Compute and Save Statistics for that Model We use the original dataset to compute layer statistics for the model. These are the "metadata" mentioned in the paper, which we save so we can reconstruct a dataset representative of the original one. The `model_meta` and `model_checkpoint` flags are required because the `compute_stats` procedure loads a pre-trained model. If you are planning on optimizing a dataset with a spectral optimization objective, you need to compute stats with the flag `compute_graphwise_stats=True`. The reason why this is not done by default is because graphwise statistics are computationally expensive. ```bash python main.py --run_name=experiment --model=hinton1200 --dataset=mnist \ --procedure=compute_stats \ --model_meta=summaries/experiment/train/checkpoint/hinton1200-8000.meta \ --model_checkpoint=summaries/experiment/train/checkpoint/hinton1200-8000 ``` ### Optimize a Dataset Using the Saved Model and the Statistics This is where the real magic happens. We use the saved metadata and the pre-trained model (but not the original dataset) to reconstruct/optimize a new dataset that maximally reconstruct samples from the activation statistics. These samples and the corresponding objective loss can take different forms (`top_layer`, `all_layers`, `all_layers_dropout`, `spectral_all_layers`, `spectral_layer_pairs`), which are discussed in the paper. Note that `all_layers_dropout` is meant for teacher models that are trained with dropout. Currently, we only provide `hinton1200` that does. Also note that spectral optimization objectives require that the `compute_graphwise_stats` be set when running `compute_stats` The pre-trained model is loaded, and a new graph is constructed using its saved weights, but as `tf.constant`. This ensures that the only thing being back-propagated to is the input `tf.Variable`, which is initialized to random noise. The `optimization_objective` flag is needed to determine what loss to use (see paper for details, coming soon on arxiv). The `dataset` flag is only needed to determine `io_size`, so if you're using a pre-trained model+statistics that you don't have the original data for, you can mock the dataset class and simply provide the `self.io_size` attribute. Using all of this, a new dataset will be reconstructed and saved. ```bash python main.py --run_name=experiment --model=hinton1200 --dataset=mnist \ --procedure=optimize_dataset \ --model_meta=summaries/experiment/train/checkpoint/hinton1200-8000.meta \ --model_checkpoint=summaries/experiment/train/checkpoint/hinton1200-8000 \ --optimization_objective=top_layer --lr=0.07 # or all_layers, spectral_all_layers, spectral_layer_pairs ``` ### Distilling a Model Using One of the Reconstructed Datasets You can then train a student network on the reconstructed dataset, and the temperature-scaled teacher model activations. This time, the `dataset` flag is the location where the reconstructed dataset was saved. Additionally, a `student_model` needs to be specified to be trained from scratch. If you want to evaluate the student's performance on the original test set (if you have access to it), you can specify it as the `eval_dataset`. ```bash python main.py --run_name=experiment --model=hinton1200 \ --dataset="summaries/experiment/data/data_optimized_top_layer_experiment_<clas>_<batch>.npy" \ --procedure=distill \ --model_meta=summaries/experiment/train/checkpoint/hinton1200-8000.meta \ --model_checkpoint=summaries/experiment/train/checkpoint/hinton1200-8000 \ --eval_dataset=mnist --student_model=hinton800 --epochs=30 --lr=0.00001 ``` ### Distilling a Model Usi

评论收藏

内容反馈

版权申诉