基于强化学习的自动化裁剪CIFAR-10分类任务python源码+项目部署说明(提升模型精度+减少计算量).zip

共22个文件

py：8个

xml：5个

png：4个

版权申诉

毕业设计

python

人工智能

强化学习

5星 · 超过95%的资源 46 浏览量 2023-08-10 10:58:44 上传评论 2 收藏 678KB ZIP 举报

【资源说明】基于强化学习的自动化裁剪CIFAR-10分类任务python源码+项目部署说明(提升模型精度+减少计算量).zip 1、该资源内项目代码都是经过测试运行成功，功能ok的情况下才上传的，请放心下载使用！ 2、本项目适合计算机相关专业(如计科、人工智能、通信工程、自动化、电子信息等)的在校学生、老师或者企业员工下载使用，也适合小白学习进阶，当然也可作为毕设项目、课程设计、作业、项目初期立项演示等。 3、如果基础还行，也可在此代码基础上进行修改，以实现其他功能。目前的强化学习工作很多集中在利用外部环境的反馈训练agent，忽略了模型本身就是一种能够获得反馈的环境。本项目的核心思想是：将模型视为环境，构建附生于模型的 agent ，以辅助模型进一步拟合真实样本。大多数领域的模型都可以采用这种方式来优化，如cv\多模态等。它至少能够以三种方式工作：1.过滤噪音信息，如删减语音或图像特征；2.进一步丰富表征信息，如高效引用外部信息；3.实现记忆、联想、推理等复杂工作，如构建重要信息的记忆池。这里推出一款早期完成的裁剪机制transformer版本(后面称为APT)，实现了一种更高效的训练模式，能够优化模型指标；此外，可以使用动态图丢弃大量的不必要单元，在指标基本不变的情况下，大幅降低计算量。该项目希望为大家抛砖引玉。为什么要做自动剪枝在具体任务中，往往存在大量毫无价值的信息和过渡性信息，有时不但对任务无益，还会成为噪声。比如：表述会存在冗余/无关片段以及过渡性信息；动物图像识别中，有时候背景无益于辨别动物主体，即使是动物部分图像，也仅有小部分是关键的特征。以transformer为例，在进行self-attention计算时其复杂度与序列长度平方成正比。长度为10，复杂度为100；长度为9，复杂度为81。利用强化学习构建agent，能够精准且自动化地动态裁剪已丧失意义部分，甚至能将长序列信息压缩到50-100之内（实验中有从500+的序列长度压缩到个位数的示例），以大幅减少计算量。 ***实验中，发现与裁剪agent联合训练的模型比普通方法训练的模型效果要更好。*** 使用说明环境 ``` torch numpy tqdm tensorboard ml-collections ``` 下载经过预先训练的模型（来自Google官方）本项目使用的型号：ViT-B_16（您也可以选择其它型号进行测试） ``` imagenet21k pre-train wget https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz ``` 训练与推理下载好预训练模型就可以跑了。 ``` 训练 python3 train.py --name cifar10-100_500 --dataset cifar100 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz 推理 python3 infer.py --name cifar10-100_500 --dataset cifar100 --model_type ViT-B_16 --pretrained_dir checkpoint/ViT-B_16.npz ``` CIFAR-10和CIFAR-100会自动下载和培训。如果使用其他数据集，您需要自定义data_utils.py。在裁剪模式的推理过程中，预期您将看到如下格式的输出关于裁剪器的模型结构设计，本模型中认为如何衡量一个信息单元是否对模型有意义，建立于其自身的信息及它与任务的相关性上。因此以信息单元本身及它与CLS单元的交互作为agent的输入信息。

资源推荐

资源详情

资源评论

收起资源包目录

基于强化学习的自动化裁剪CIFAR-10分类任务python源码+项目部署说明(提升模型精度+减少计算量).zip （22个子文件）

pic

model.png 19KB

stormiii.jpg 84KB

vit.png 127KB

example.gif 456KB

APT-main.png 23KB

agent.png 18KB

infer.py 9KB

utils

dist_util.py 711B

scheduler.py 3KB

data_utils.py 2KB

.idea

.name 27B

vcs.xml 185B

workspace.xml 19KB

misc.xml 189B

modules.xml 298B

cv-automatic-pruning-transformer.iml 434B

encodings.xml 135B

models

modeling_resnet.py 6KB

modeling.py 18KB

configs.py 3KB

train.py 16KB

项目部署说明.md 7KB

# coding=utf-8 from __future__ import absolute_import from __future__ import division from __future__ import print_function import copy import logging import math from os.path import join as pjoin import random import torch import torch.nn as nn import numpy as np from torch.nn import CrossEntropyLoss, Dropout, Softmax, Linear, Conv2d, LayerNorm from torch.nn.modules.utils import _pair from scipy import ndimage import models.configs as configs from .modeling_resnet import ResNetV2 logger = logging.getLogger(__name__) ATTENTION_Q = "MultiHeadDotProductAttention_1/query" ATTENTION_K = "MultiHeadDotProductAttention_1/key" ATTENTION_V = "MultiHeadDotProductAttention_1/value" ATTENTION_OUT = "MultiHeadDotProductAttention_1/out" FC_0 = "MlpBlock_3/Dense_0" FC_1 = "MlpBlock_3/Dense_1" ATTENTION_NORM = "LayerNorm_0" MLP_NORM = "LayerNorm_2" def np2th(weights, conv=False): """Possibly convert HWIO to OIHW.""" if conv: weights = weights.transpose([3, 2, 0, 1]) return torch.from_numpy(weights) def swish(x): return x * torch.sigmoid(x) ACT2FN = {"gelu": torch.nn.functional.gelu, "relu": torch.nn.functional.relu, "swish": swish} class Attention(nn.Module): def __init__(self, config, vis): super(Attention, self).__init__() self.vis = vis self.num_attention_heads = config.transformer["num_heads"] self.attention_head_size = int(config.hidden_size / self.num_attention_heads) self.all_head_size = self.num_attention_heads * self.attention_head_size self.query = Linear(config.hidden_size, self.all_head_size) self.key = Linear(config.hidden_size, self.all_head_size) self.value = Linear(config.hidden_size, self.all_head_size) self.out = Linear(config.hidden_size, config.hidden_size) self.attn_dropout = Dropout(config.transformer["attention_dropout_rate"]) self.proj_dropout = Dropout(config.transformer["attention_dropout_rate"]) self.softmax = Softmax(dim=-1) def transpose_for_scores(self, x): new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) x = x.view(*new_x_shape) return x.permute(0, 2, 1, 3) def forward(self, hidden_states, attention_mask): mixed_query_layer = self.query(hidden_states) mixed_key_layer = self.key(hidden_states) mixed_value_layer = self.value(hidden_states) query_layer = self.transpose_for_scores(mixed_query_layer) key_layer = self.transpose_for_scores(mixed_key_layer) value_layer = self.transpose_for_scores(mixed_value_layer) attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) attention_scores = attention_scores / math.sqrt(self.attention_head_size) attention_scores += attention_mask attention_probs = self.softmax(attention_scores) weights = attention_probs if self.vis else None attention_probs = self.attn_dropout(attention_probs) context_layer = torch.matmul(attention_probs, value_layer) context_layer = context_layer.permute(0, 2, 1, 3).contiguous() new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) context_layer = context_layer.view(*new_context_layer_shape) attention_output = self.out(context_layer) attention_output = self.proj_dropout(attention_output) return attention_output, weights class Mlp(nn.Module): def __init__(self, config): super(Mlp, self).__init__() self.fc1 = Linear(config.hidden_size, config.transformer["mlp_dim"]) self.fc2 = Linear(config.transformer["mlp_dim"], config.hidden_size) self.act_fn = ACT2FN["gelu"] self.dropout = Dropout(config.transformer["dropout_rate"]) self._init_weights() def _init_weights(self): nn.init.xavier_uniform_(self.fc1.weight) nn.init.xavier_uniform_(self.fc2.weight) nn.init.normal_(self.fc1.bias, std=1e-6) nn.init.normal_(self.fc2.bias, std=1e-6) def forward(self, x): x = self.fc1(x) x = self.act_fn(x) x = self.dropout(x) x = self.fc2(x) x = self.dropout(x) return x class Embeddings(nn.Module): """Construct the embeddings from patch, position embeddings. """ def __init__(self, config, img_size, in_channels=3): super(Embeddings, self).__init__() self.hybrid = None img_size = _pair(img_size) if config.patches.get("grid") is not None: grid_size = config.patches["grid"] patch_size = (img_size[0] // 16 // grid_size[0], img_size[1] // 16 // grid_size[1]) n_patches = (img_size[0] // 16) * (img_size[1] // 16) self.hybrid = True else: patch_size = _pair(config.patches["size"]) n_patches = (img_size[0] // patch_size[0]) * (img_size[1] // patch_size[1]) self.hybrid = False if self.hybrid: self.hybrid_model = ResNetV2(block_units=config.resnet.num_layers, width_factor=config.resnet.width_factor) in_channels = self.hybrid_model.width * 16 self.patch_embeddings = Conv2d(in_channels=in_channels, out_channels=config.hidden_size, kernel_size=patch_size, stride=patch_size) self.position_embeddings = nn.Parameter(torch.zeros(1, n_patches + 1, config.hidden_size)) self.cls_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size)) self.dropout = Dropout(config.transformer["dropout_rate"]) def forward(self, x): B = x.shape[0] cls_tokens = self.cls_token.expand(B, -1, -1) if self.hybrid: x = self.hybrid_model(x) x = self.patch_embeddings(x) x = x.flatten(2) x = x.transpose(-1, -2) x = torch.cat((cls_tokens, x), dim=1) embeddings = x + self.position_embeddings embeddings = self.dropout(embeddings) return embeddings class Block(nn.Module): def __init__(self, config, vis): super(Block, self).__init__() self.hidden_size = config.hidden_size self.attention_norm = LayerNorm(config.hidden_size, eps=1e-6) self.ffn_norm = LayerNorm(config.hidden_size, eps=1e-6) self.ffn = Mlp(config) self.attn = Attention(config, vis) def forward(self, x, attention_mask): h = x x = self.attention_norm(x) x, weights = self.attn(x, attention_mask) x = x + h h = x x = self.ffn_norm(x) x = self.ffn(x) x = x + h return x, weights def load_from(self, weights, n_block): ROOT = f"Transformer/encoderblock_{n_block}" with torch.no_grad(): query_weight = np2th(weights[pjoin(ROOT, ATTENTION_Q, "kernel")]).view(self.hidden_size, self.hidden_size).t() key_weight = np2th(weights[pjoin(ROOT, ATTENTION_K, "kernel")]).view(self.hidden_size, self.hidden_size).t() value_weight = np2th(weights[pjoin(ROOT, ATTENTION_V, "kernel")]).view(self.hidden_size, self.hidden_size).t() out_weight = np2th(weights[pjoin(ROOT, ATTENTION_OUT, "kernel")]).view(self.hidden_size, self.hidden_size).t() query_bias = np2th(weights[pjoin(ROOT, ATTENTION_Q, "bias")]).view(-1) key_bias = np2th(weights[pjoin(ROOT, ATTENTION_K, "bias")]).view(-1) value_bias = np2th(weights[pjoin(ROOT, ATTENTION_V, "bias")]).view(-1) out_bias = np2th(weights[pjoin(ROOT, ATTENTION_OUT, "bias")]).view(-1) self.attn.q

评论收藏

内容反馈

版权申诉