知识蒸馏-基于Caffe实现的知识蒸馏Layer算子实现-附项目源码-优质项目实战.zip资源-CSDN文库

共3个文件

md：1个

hpp：1个

cpp：1个

版权申诉

知识蒸馏

Caffe

项目源码

优质项目

135 浏览量 2024-10-20 17:29:54 上传评论收藏 6KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

知识蒸馏_基于Caffe实现的知识蒸馏Layer算子实现_附项目源码_优质项目实战.zip （3个子文件）

知识蒸馏_基于Caffe实现的知识蒸馏Layer算子实现_附项目源码_优质项目实战

knowledge_distillation_layer.hpp 5KB

knowledge_distillation_layer.cpp 8KB

README.md 2KB

# KnowledgeDistillation Layer (Caffe implementation) ## Installation 1. Install [Caffe](https://github.com/BVLC/caffe/) in your directory `CAFFE` 2. Download this repository in your directory `ROOT` 3. Move files to your Caffe folder ```bash cp $ROOT/knowledge_distillation_layer.hpp $CAFFE/include/caffe/layers cp $ROOT/knowledge_distillation_layer.cpp $CAFFE/src/caffe/layers ``` 4. Modify `$CAFFE/src/caffe/proto/caffe.proto` add `optional KnowledgeDistillationParameter` in `LayerParameter` ```proto message LayerParameter { ... //next available layer-specific ID optional KnowledgeDistillationParameter knowledge_distillation_param = 147; } ``` add `message KnowledgeDistillationParameter` ```proto message KnowledgeDistillationParameter { optional float temperature = 1 [default = 1]; } ``` 5. Build Caffe ## Usage KnowledgeDistillation Layer has one specific parameter `temperature`. The layer takes 2 or 3 input blobs: `bottom[0]`: the logits of the student `bottom[1]`: the logits of the teacher `bottom[2]`(*optional*): label inputs The logits are first divided by temperatrue T, then mapped to probability distributions over classes using the softmax function. The layer computes KL divergence instead of cross entropy. The gradients are multiplied by T^2, as suggested in the [paper](https://arxiv.org/abs/1503.02531). 1. Common setting in `prototxt` (2 input blobs are given) ``` layer { name: "KD" type: "KnowledgeDistillation" bottom: "student_logits" bottom: "taecher_logits" top: "KL_div" include { phase: TRAIN } knowledge_distillation_param { temperature: 4 } #usually larger than 1 loss_weight: 1 } ``` 2. If you have ignore_label, 3 input blobs should be given ``` layer { name: "KD" type: "KnowledgeDistillation" bottom: "student_logits" bottom: "taecher_logits" bottom: "label" top: "KL_div" include { phase: TRAIN } knowledge_distillation_param { temperature: 4 } loss_param {ignore_label: 2} loss_weight: 1 }

评论收藏

内容反馈

版权申诉