chatglm-6b微调进行数学计算.zip资源-CSDN文库

共54个文件

py：22个

sample：13个

head：4个

版权申诉

120 浏览量 2023-06-24 15:59:15 上传评论收藏 310KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

chatglm-6b微调进行数学计算.zip （54个子文件）

chatglm-6b微调进行数学计算

LORA

PPO

推理, 样本为自动生成的整数

小数加减乘除运算

__init__.py 101B

.git

index 3KB

HEAD 21B

refs

heads

main 41B

tags

remotes

origin

HEAD 30B

objects

pack

pack-a2d357a64d82c0feb60777da7580025d5f830c90.idx 6KB

pack-a2d357a64d82c0feb60777da7580025d5f830c90.pack 149KB

info

description 73B

packed-refs 112B

info

exclude 240B

logs

HEAD 185B

refs

heads

main 185B

remotes

origin

HEAD 185B

hooks

post-update.sample 189B

prepare-commit-msg.sample 1KB

commit-msg.sample 896B

pre-receive.sample 544B

update.sample 4KB

pre-commit.sample 2KB

pre-rebase.sample 5KB

applypatch-msg.sample 478B

fsmonitor-watchman.sample 5KB

push-to-checkout.sample 3KB

pre-applypatch.sample 424B

pre-push.sample 1KB

pre-merge-commit.sample 416B

config 311B

chatglm_maths

chatglm_6b

__init__.py 101B

config.json 772B

tokenizer_config.json 440B

__init__.py 101B

t00_tet_chat_chatglm.py 3KB

math23k_trainset.sample.json 14KB

c00_toy_lora_train_6b.py 22KB

c00_toy_cpu_train_6b.py 21KB

p00_toy_lora_predict_6b.py 18KB

c00_toy_gpu_train_6b.py 21KB

p10_lora_trl_predict_ppo.py 8KB

p01_toy_cpu_predict_small.py 16KB

p00_toy_cpu_predit_6b.py 16KB

c01_toy_cpu_train_small.py 22KB

t10_lora_trl_train_ppo.py 9KB

models

__init__.py 101B

modeling_chatglm.py 56KB

tokenization_chatglm.py 17KB

quantization.py 15KB

configuration_chatglm.py 4KB

ppo_trainer.py 45KB

README.md 4KB

t10_toy_trl_train_ppo.py 11KB

p10_toy_trl_predict_ppo.py 8KB

c01_toy_gpu_train_small.py 21KB

requirements.txt 135B

README.md 65KB

# chatglm-maths chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu ## 数据集-中文 - [https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [https://github.com/LianjiaTech/BELLE](https://github.com/LianjiaTech/BELLE) - [https://github.com/carbonz0/alpaca-chinese-dataset](https://github.com/carbonz0/alpaca-chinese-dataset) ## 踩坑 ```python 1. eps=1e-5(不要改小), 半精度float16, 以及LN采用的是Post-LN(泛化性更好) + DeepNorm, 【害, Attention前也有LN】目的是大模型为了防止梯度溢出等; 2. 模型输入输出, 默认的tokenization_chatglm.py/modeling_chatglm.py不能用, 因为那是完全为生成generate设置的, 需要自己写好所有缩入参数, 或者机子改成适配的; 2.1 ChatGLMModel中, get_masks()正常, get_position_ids()函数中‘context_length = seq.index(150004) + 1’ 改为 ‘context_length = len(seq)’; 2.2 训练输入input_ids格式暂定为(训练后post-padding, 推理前pre-padding[tokenization_chatglm.py默认pre-padding]) x: prompt_1 + "_" + text_1 + "\n" + prompt_2 + [gMASK] + [BOS] + "_" + text_2 + [PAD]*N 2.3 训练输入label_ids格式暂定为(CrossEntropyLoss默认忽略-100不参与计算loss) y = [-100]*len(text_1) + [BOS] + text_2 + [EOS] + [-100]*N 2.4 注意position/mask(自带的只是推理用的batch_size=1, 所以训练输入还得自己写), 可参考GLM-130的README.md, huozhe 查看GLM-1源码https://github.com/THUDM/GLM/blob/main/tasks/seq2seq/dataset.py 3. 注意chatglm-6b权重是float16的, 不过计算loss时候会转成float32计算, 最后loss再转回float16更新梯度; 4. ChatGLMTokenizer有时候会报奇奇怪怪的错误, 建议生成时候设置max_new_tokens, 最大{"max_new_tokens": 2048}; decode有时候会出现不存在id; 5. 低秩自适应LORA, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 尝试 transformers升级到最新, get_peft_model后再.cuda(), device_map={'':torch.cuda.current_device()}, ``` ## 环境配置 ```shell transformers>=4.26.1 cpm_kernels==1.0.11 icetk==0.0.4 torch>=1.10.1 rouge==1.0.1 nltk==3.6.6 peft>=0.2.0 numpy tqdm lion_pytorch macropodus trl>=0.4.1 ``` ## 微调-计算题 ```shell lora 微调: python c00_toy_lora_train_6b.py 推理: python p00_toy_lora_predict_6b.py ppo 训练: python t10_toy_trl_train_ppo.py 测试: python t10_toy_trl_predict_ppo.py 6b 微调: python c00_toy_cpu_train_6b.py 推理: python p00_toy_cpu_predit_6b.py small-layer 微调: python c01_toy_cpu_train_small.py 推理: python p01_toy_cpu_predict_small.py ``` ## 参考/感谢 - [https://github.com/THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) - [https://github.com/THUDM/GLM](https://github.com/THUDM/GLM) - [https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [https://github.com/LianjiaTech/BELLE](https://github.com/LianjiaTech/BELLE) - [https://github.com/huggingface/peft](https://github.com/huggingface/peft) - [https://github.com/mymusise/ChatGLM-Tuning](https://github.com/mymusise/ChatGLM-Tuning) - [https://github.com/bojone/bert4keras](https://github.com/bojone/bert4keras) - [trl](https://github.com/lvwerra/trl) - [math23k](https://aclanthology.org/D17-1088) ## 推理日志toy ```cpu generator_calculate_line: ('13+75=', '13+75=88') tokenizer.vocab_size: 150344 eval: 0%| | 0/1 [00:00<?, ?it/s]batch_query: ['简便运算: 98+83= 剖析: 98+83=181'] batch_qtext_0: 简便运算: 98+83= 剖析: batch_qans_0: 98+83=181 response_0: 98+83=171 {'rouge-1': 0.0, 'rouge-2': 0.0, 'rouge-l': 0.0, 'bleu': 0.0} 请输入: 25.31+86.35= 请稍等... 25.31+86.35=101.66 ``` ## 微调日志toy ```cpu generator_calculate_line: ('13+75=', '13+75=88') tokenizer.vocab_size: 150344 Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00, 1.31s/it] transformer.word_embeddings.weight False ...... transformer.layers.26.mlp.dense_4h_to_h.bias False transformer.layers.27.input_layernorm.weight True transformer.layers.27.input_layernorm.bias True transformer.layers.27.attention.query_key_value.weight True transformer.layers.27.attention.query_key_value.bias True transformer.layers.27.attention.dense.weight True transformer.layers.27.attention.dense.bias True transformer.layers.27.post_attention_layernorm.weight True transformer.layers.27.post_attention_layernorm.bias True transformer.layers.27.mlp.dense_h_to_4h.weight True transformer.layers.27.mlp.dense_h_to_4h.bias True transformer.layers.27.mlp.dense_4h_to_h.weight True transformer.layers.27.mlp.dense_4h_to_h.bias True transformer.final_layernorm.weight True transformer.final_layernorm.bias True model.chat start 13+75=88, but that's not the correct answer. The correct answer is 13+75=88, which is 90. /anaconda3/envs/py371/lib/python3.7/site-packages/transformers/optimization.py:395: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, epoch: 0%| | 0/21 [00:00<?, ?it/s]epochs: batch_query: ['简便运算: 98+83= 剖析: 98+83=181'] | 0/8 [00:00<?, ?it/s] epoch_global: 0, step_global: 1, step: 0, loss: 4.0625 batch_query: ['口算: 57.84+13.64 解: 57.84+13.64=71.48'] epoch_global: 0, step_global: 2, step: 1, loss: 2.5625███▌ | 2/8 [00:17<00:51, 8.54s/it] batch_query: ['计算题: 48+1 解答: 48+1=49'] epoch_global: 0, step_global: 3, step: 2, loss: 4.15625█████████████████████▎ | 3/8 [00:38<01:09, 13.94s/it] batch_query: ['计算题: 61.65+33.05 解答: 61.65+33.05=94.7'] epoch_global: 0, step_global: 4, step: 3, loss: 2.40625████████████████████████████████████████ | 4/8 [01:01<01:09, 17.43s/it] batch_query: ['计算: 81+75 回答: 81+

评论收藏

内容反馈

版权申诉