# 模型微调过程说明文档
## 概览
本文档提供了使用 XTuner 工具进行模型微调过程的详细指南。该过程包括转换、合并、训练以及为不同规模的模型(1.8B 和 20B)设置网络演示。
## 要求
- XTuner
- DeepSpeed
- Huggingface Transformers
- 具备 SSH 和 Git 的使用权限
### 环境安装
```
# 如果你是在 InternStudio 平台,则从本地 clone 一个已有 pytorch 的环境:
# pytorch 2.0.1 py3.10_cuda11.7_cudnn8.5.0_0
studio-conda xtuner0.1.17
# 如果你是在其他平台:
# conda create --name xtuner0.1.17 python=3.10 -y
# 激活环境
conda activate xtuner0.1.17
# 进入家目录 (~的意思是 “当前用户的home路径”)
cd ~
# 创建版本文件夹并进入,以跟随本教程
mkdir -p /root/xtuner0117 && cd /root/xtuner0117
# 拉取 0.1.17 的版本源码
git clone -b v0.1.17 https://github.com/InternLM/xtuner
# 无法访问github的用户请从 gitee 拉取:
# git clone -b v0.1.15 https://gitee.com/Internlm/xtuner
# 进入源码目录
cd /root/xtuner0117/xtuner
# 从源码安装 XTuner
pip install -e '.[all]'
```
## 1. 1.8B 模型训练
### 1.1 数据准备
```
# 在ft这个文件夹里再创建一个存放数据的data文件夹,存储数据
mkdir -p /root/ft/data && cd /root/ft/data
```
### 1.2 准备模型
```
# 创建目标文件夹,确保它存在。
# -p选项意味着如果上级目录不存在也会一并创建,且如果目标文件夹已存在则不会报错。
mkdir -p /root/ft/model
# 复制内容到目标文件夹。-r选项表示递归复制整个文件夹。
cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/* /root/ft/model/
```
如果是需要自己下载,可以使用transformers库
```
from transformers import AutoModel
# 指定模型名称
model_name = 'internlm/internlm2-chat-1_8b'
# 加载模型
model = AutoModel.from_pretrained(model_name)
# 指定保存模型的目录
model_save_path = '/root/ft/model'
# 保存模型
model.save_pretrained(model_save_path)
```
将这段代码保存为 `download_model.py`,然后在命令行中运行这个脚本:
```
python download_model.py
```
这个脚本会自动下载模型并将其保存到指定的 `/root/ft/model` 目录中。
### 1.3 下载配置文件
```
# XTuner 提供多个开箱即用的配置文件,用户可以通过下列命令查看:
# 列出所有内置配置文件
# xtuner list-cfg
# 假如我们想找到 internlm2-1.8b 模型里支持的配置文件
xtuner list-cfg -p internlm2_1_8b
# 创建一个存放 config 文件的文件夹
mkdir -p /root/ft/config
# 使用 XTuner 中的 copy-cfg 功能将 config 文件复制到指定的位置
xtuner copy-cfg internlm2_1_8b_qlora_alpaca_e3 /root/ft/config
```
### 1.4 修改配置参数
```
# 修改模型地址(在第27行的位置)
- pretrained_model_name_or_path = 'internlm/internlm2-1_8b'
+ pretrained_model_name_or_path = '/root/ft/model'
# 修改数据集地址为本地的json文件地址(在第31行的位置)
- alpaca_en_path = 'tatsu-lab/alpaca'
+ alpaca_en_path = '/root/ft/data/personal_assistant.json'
# 修改max_length来降低显存的消耗(在第33行的位置)
- max_length = 2048
+ max_length = 1024
# 减少训练的轮数(在第44行的位置)
- max_epochs = 3
+ max_epochs = 2
# 增加保存权重文件的总数(在第54行的位置)
- save_total_limit = 2
+ save_total_limit = 3
# 修改每多少轮进行一次评估(在第57行的位置)
- evaluation_freq = 500
+ evaluation_freq = 300
# 修改具体评估的问题(在第59到61行的位置)
# 把 OpenAI 格式的 map_fn 载入进来(在第15行的位置)
- from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
+ from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factory
# 将原本是 alpaca 的地址改为是 json 文件的地址(在第102行的位置)
- dataset=dict(type=load_dataset, path=alpaca_en_path),
+ dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
# 将 dataset_map_fn 改为通用的 OpenAI 数据集格式(在第105行的位置)
- dataset_map_fn=alpaca_map_fn,
+ dataset_map_fn=None,
```
### 1.5 模型训练
```
# 指定保存路径
xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train
# 使用 deepspeed 来加速训练
xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train_deepspeed --deepspeed deepspeed_zero2
```
### 1.6 转换到 Huggingface 格式
1. **创建目录**:为转换后的 Huggingface 模型创建一个存储目录:
```bash
mkdir -p /root/ft/huggingface/i8000
```
2. **模型转换**:使用提供的配置和权重文件进行模型转换:
```bash
xtuner convert pth_to_hf /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py /root/ft/train_deepspeed/iter_18000.pth /root/ft/huggingface/i8000 --fp32
```
3. **合并模型**:合并模型并解决依赖关系:
```bash
mkdir -p /root/ft/final_model_8000
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert merge /root/ft/model /root/ft/huggingface/1i8000 /root/ft/final_model_18000
```
4. **测试模型**:通过启动对话来测试模型:
```bash
xtuner chat /root/ft/final_model_18000 --prompt-template internlm2_chat
```
### 1.7 模型续训
```bash
xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train_deepspeed --resume /root/ft/train_deepspeed/iter_8500.pth --deepspeed deepspeed_zero1
```
### 1.8 网络演示设置
1. **准备环境**:
```bash
mkdir -p /root/ft/web_demo && cd /root/ft/web_demo
git clone https://github.com/InternLM/InternLM.git
cd /root/ft/web_demo/InternLM
```
2. **运行演示** 使用 Streamlit:
```bash
streamlit run /root/ft/web_demo/InternLM/chat/web_demo.py --server.address 127.0.0.1 --server.port 6006
```
3. **通过 SSH 隧道访问演示**:
```bash
ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p 开发机端口号
```
## 2. 20B 模型训练
与1.8B模型训练过程类似,20B模型训练涉及到为配置、数据和最终模型创建相应的目录。此外,这一过程还包括使用多个GPU进行模型训练,并将模型转换为Huggingface格式。
### 2.1 数据准备
为大规模的20B模型训练准备数据。
```bash
# 创建一个专用于存放20B模型数据的目录
mkdir -p /root/ft20b/data && cd /root/ft20b/data
```
### 2.2 准备模型
准备模型包括创建目标文件夹并将预训练的20B模型复制到指定位置。
```bash
# 创建一个目录用来存放20B模型文件
mkdir -p /root/ft20b/model
# 将预训练的模型复制到新创建的目录中
cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-20b/* /root/ft20b/model/
```
### 2.3 下载配置文件
下载并准备20B模型的配置文件,以便进行训练。
```bash
# 列出所有支持20B模型的配置文件
xtuner list-cfg -p internlm2_20b
# 创建一个目录用于存放20B模型的配置文件
mkdir -p /root/ft20b/config
# 复制所需的配置文件到新创建的目录中
xtuner copy-cfg internlm2_20b_qlora_alpaca_e3 /root/ft20b/config
```
### 2.4 修改配置参数
根据训练需求调整配置文件,以优化20B模型的训练。
```bash
# 修改模型路径和数据集路径等关键参数以适配20B模型
- pretrained_model_name_or_path = 'internlm/internlm2-20b'
+ pretrained_model_name_or_path = '/root/ft20b/model'
- alpaca_en_path = 'tatsu-lab/alpaca'
+ alpaca_en_path = '/root/ft20b/data/specific_dataset.json'
- max_length = 2048
+ max_length = 1024
- max_epochs = 3
+ max_epochs = 2
- save_total_limit = 2
+ save_total_limit = 3
- evaluation_freq = 500
+ evaluation_freq = 300
```
### 2.5 模型训练
使用DeepSpeed和多GPU配置来加速20B模型的训练过程。
```bash
# 指定保存路径并开始训练
xtuner train /root/f
没有合适的资源?快使用搜索试试~ 我知道了~
微调internlm2模型实现针对煤矿事故和煤矿安全知识的智能问答:使用煤矿历史事故案例,事故处理报告、安全规程规章制度等数据
![preview](https://csdnimg.cn/release/downloadcmsfe/public/img/white-bg.ca8570fa.png)
共120个文件
txt:48个
py:27个
png:21个
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
需积分: 5 5 下载量 187 浏览量
2024-06-23
16:15:03
上传
评论
收藏 20.4MB ZIP 举报
温馨提示
本项目简介: 近年来,国家对煤矿安全生产的重视程度不断提升。为了确保煤矿作业的安全,提高从业人员的安全知识水平显得尤为重要。鉴于此,目前迫切需要一个高效、集成化的解决方案,该方案能够整合煤矿安全相关的各类知识,为煤矿企业负责人、安全管理人员、矿工提供一个精确、迅速的信息查询、学习与决策支持平台。 为实现这一目标,我们利用包括煤矿历史事故案例、事故处理报告、安全操作规程、规章制度、技术文档以及煤矿从业人员入职考试题库等在内的丰富数据资源,通过微调InternLM2模型,构建出一个专门针对煤矿事故和煤矿安全知识智能问答的煤矿安全大模型。 本项目的特点如下: 支持煤矿安全领域常规题型解答,如:单选题、多选题、判断题、填空题等 (针对煤矿主要负责人及安管人员、煤矿各种作业人员) 支持针对安全规程规章制度、技术等文档内容回答(如《中华人民共和国矿山安全法》、《煤矿建设安全规程》) 支持煤矿历史事故案例,事故处理报告查询,提供事故原因详细分析、事故预防措施以及应急响应知识
资源推荐
资源详情
资源评论
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pptx](https://img-home.csdnimg.cn/images/20210720083543.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![pptx](https://img-home.csdnimg.cn/images/20210720083543.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![pptx](https://img-home.csdnimg.cn/images/20210720083543.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![exe](https://img-home.csdnimg.cn/images/20210720083343.png)
收起资源包目录
![package](https://csdnimg.cn/release/downloadcmsfe/public/img/package.f3fc750b.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/DOC.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/DOCX.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/DOCX.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/JPG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/JPG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/JPG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PDF.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/PNG.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/UNKNOWN.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
![file-type](https://csdnimg.cn/release/download/static_files/pc/images/minetype/TXT.png)
共 120 条
- 1
- 2
资源评论
![avatar-default](https://csdnimg.cn/release/downloadcmsfe/public/img/lazyLogo2.1882d7f4.png)
![avatar](https://profile-avatar.csdnimg.cn/345947af0dd14bf98009d02b9630176c_sinat_39620217.jpg!1)
汀、人工智能
- 粉丝: 8w+
- 资源: 401
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)