中文医疗问诊大模型code_embedding和大模型的关系资源-CSDN文库

共78个文件

py：46个

pyc：21个

png：2个

版权申诉

健康医疗

32 浏览量 2024-02-20 17:07:04 上传评论 1 收藏 21.22MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

MING-main.zip （78个子文件）

MING-main

LICENSE 11KB

fastchat

utils.py 5KB

__init__.py 0B

conversation.py 9KB

train

train_mt5.py 10KB

train_last_head.py 11KB

llama_flash_attn_monkey_patch.py 4KB

train_mem.py 345B

train_adapter.py 7KB

train_lora.py 6KB

train.py 11KB

__pycache__

llama_flash_attn_monkey_patch.cpython-39.pyc 3KB

train.cpython-39.pyc 8KB

protocol

chat_completion.py 855B

data

hardcoded_questions.py 6KB

__init__.py 0B

pretty_json.py 475B

sample.py 1005B

clean_sharegpt.py 5KB

inspect.py 615B

merge.py 649B

split_long_conversation.py 3KB

optional_clean.py 3KB

alpaca-converter.py 2KB

client

__init__.py 105B

api.py 2KB

test_client.py 639B

serve

__init__.py 0B

controller.py 10KB

serve_chatglm.py 904B

yushengliao.code-workspace 70B

register_worker.py 734B

gradio_web_server.py 16KB

gateway

README.md 2KB

nginx.conf 4KB

model_worker.py 7KB

test_throughput.py 4KB

inference.py 8KB

monkey_patch_non_inplace.py 4KB

cacheflow_worker.py 11KB

api.py 5KB

gradio_css.py 3KB

cli.py 5KB

huggingface_api.py 2KB

__pycache__

cli.cpython-39.pyc 4KB

cli_hello.cpython-39.pyc 4KB

inference_beam.cpython-39.pyc 6KB

serve_chatglm.cpython-39.pyc 1000B

__init__.cpython-39.pyc 171B

inference.cpython-39.pyc 6KB

compression.cpython-39.pyc 3KB

monkey_patch_non_inplace.cpython-39.pyc 3KB

inference_hello.cpython-39.pyc 7KB

cli_beam.cpython-39.pyc 4KB

__init__.cpython-38.pyc 175B

cli.cpython-38.pyc 4KB

compression.py 4KB

test_message.py 2KB

gradio_patch.py 7KB

model

__init__.py 0B

make_delta.py 2KB

apply_delta.py 6KB

convert_fp16.py 840B

__pycache__

__init__.cpython-39.pyc 170B

apply_delta.cpython-39.pyc 4KB

__pycache__

constants.cpython-39.pyc 261B

__init__.cpython-39.pyc 165B

utils.cpython-39.pyc 5KB

__init__.cpython-38.pyc 169B

conversation.cpython-39.pyc 7KB

constants.py 84B

img

demo2.gif 11.7MB

bgimage.png 4.17MB

pie-labelLine-adjust1.svg 18KB

guideline_qa.png 600KB

demo1.gif 6.31MB

pyproject.toml 1KB

README.md 20KB

# 明医 (MING)：中文医疗问诊大模型 <p align="center"> <img src=".\img\bgimage.png" width=800px/> </p> <div align="center"><img src="https://img.shields.io/badge/Version-1.3--alpha-brightgreen"> <img src="https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg"> <img src="https://img.shields.io/badge/python-3.9+-blue.svg"></div> ## 🌐项目简介本项目开源了基于医疗指令微调的中文医疗问诊模型：**明医 (MING)**。目前模型的主要功能如下： <!DOCTYPE html> <html> <body> <table style="width: 100%;"> <tr style="border-collapse: collapse; border: transparent;"> <td style="width: 50%; border-collapse: collapse;border: transparent;"><img src=".\img\demo1.gif" alt="demo1"/></td> <td style="width: 50%; border-collapse: collapse;border: transparent;"><img src=".\img\demo2.gif" alt="demo2"/></td> </tr> <tr style="border-collapse: collapse; border: transparent;"> <td style="width: 50%; border-collapse: collapse;border: transparent;" ><div align="center"><strong>医疗问答</strong>：对医疗问题进行解答，对案例进行分析。</div></td> <td style="width: 50%; border-collapse: collapse;border: transparent;"><div align="center"><strong>智能问诊</strong>：多轮问诊后给出诊断结果和建议。</div></td> </tr> </table> </body> </html> ## 💫更新 * [2023/07/25] 开源了基于bloomz-7b指令微调的MING-7B * [2023/07/25] MedicalGPT-zh更名为**MING** ## 🔬模型参数 <!DOCTYPE html> <html> <head> </head> <body> <table style="width: 70%;"> <tr> <td style="width: 20%;"><div align="center"><strong>模型</strong></div></td> <td style="width: 20%;"><div align="center"><strong>基座</strong></div></td> <td style="width: 30%;"><div align="center"><strong>HuggingFace</strong></div></td> </tr> <tr> <td><center>MING-7B</center></td> <td><center><a href="https://huggingface.co/bigscience/bloomz-7b1-mt">bloomz-7b1-mt</a></center></td> <td><center>🤗<a href="https://huggingface.co/BlueZeros/MING-7B">MING-7B</a></center></td> </tr> </table> </body> </html> ## ⚡快速开始 1. 配置环境（测试环境如下，具体版本可以根据实际需求配置） * python==3.9.16 * pytorch==1.13.0+cu116 2. 安装项目依赖 ```bash git clone https://github.com/MediaBrain-SJTU/MING cd MING pip install -e . ``` 2. 下载模型参数并运行（要求单卡显存 >= 15G） ```bash CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.cli \ --model-path {path_to_checkpoint} # 模型路径 --max-new-token 512 # 输出最大长度 --beam-size 3 # beam search宽度 --temperature 1.2 # 采样温度 ``` * 注：由于transformers库的问题，当beam-size > 1时，需要满足temperature>=1.0，否则会报错。 4. 命令行运行实例 * 对话支持多轮 * 对话中输入关键词 `new chat` 能够开启新一轮对话。 ## 🗃️数据集构建数据集主要由四个部分构成： <!DOCTYPE html> <html> <head> </head> <body> <table style="width: 70%;"> <tr> <td style="width: 20%;"><strong>数据类型</strong></td> <td style="width: 50%;"><strong>数据构成</strong></td> <td style="width: 10%;"><strong>数量</strong></td> <td style="width: 10%;"><strong>占比(%)</strong></td> </tr> <tr> <td rowspan="4">医疗知识问答</td> <td>基于临床指南和医疗共识的知识问答</td> <td>168k</td> <td rowspan="4">48.88</td> </tr> <tr> <td>基于医师资格考试题的知识问答</td> <td>77k</td> </tr> <tr> <td>真实医患问答</td> <td>140k</td> </tr> <tr> <td>基于结构化医疗图谱的知识问答</td> <td>160k</td> </tr> <tr> <td rowspan="3">多轮情景诊断与案例分析</td> <td>基于HealthCareMagic构造的多轮情景问答与诊断 </td> <td>200k</td> <td rowspan="3">21.52</td> </tr> <tr> <td>基于USMLE案例分析题的格式化多轮问诊</td> <td>20k</td> </tr> <tr> <td>多轮病人信息推理与诊断</td> <td>20k</td> </tr> <tr> <td rowspan="2">任务指令</td> <td>医疗指令</td> <td>150k</td> <td rowspan="2">26.91</td> </tr> <tr> <td>通用指令</td> <td>150k</td> </tr> <tr> <td rowspan="2">安全性数据</td> <td>敏感性问题</td> <td>15k</td> <td rowspan="2">2.69</td> </tr> <tr> <td>医疗反事实</td> <td>15k</td> </tr> <tr> <td><strong>总计</strong></td> <td>-</td> <td>1.12M</td> <td>100.00</td> </tr> </table> </body> </html> ## 🧭测试样例 <details><summary><strong>体检报告分析</strong></summary> <table style="width: 100%;"> <tr> <td colspan="2"><strong>问题</strong></td> </tr> <tr> <td colspan="2">身高cm=‘null’, 体重kg=‘null’, bmi=‘null’, 收缩压=‘130’, 舒张压=‘75’, 高血压史=‘null’, 心率=‘84’, 糖尿病史=‘null’, 肝功能十项=‘白球比例 = [1.96]、总蛋白 = [74]、白蛋白 = [49]、前白蛋白 = [264]、总胆红素 = [11.6]、直接胆红素 = [2.4]、胆汁酸 = [2.3]、丙氨酸氨基转移酶 = [64]、天门冬氨酸氨基转移酶 = [30]、γ-谷氨酰基转移酶 = [65] ↑、碱性磷酸酶 = [80]’, 血脂四项=‘总胆固醇 = [5.08]、甘油三酯 = [1.75] ↑、高密度脂蛋白胆固醇 = [1.07]、低密度脂蛋白胆固醇 = [3.34]’, 甲状腺(FT3 FT4 TSH)=‘促甲状腺素(TSH) = [0.6415]、游离甲状腺素(FT4) = [12.67]、游离三碘甲腺原氨酸(FT3) = [4.98]’,空腹血糖=‘葡萄糖 = [5.35]’, 癌胚抗原=‘癌胚抗原 = [1.16]’, 甲胎蛋白=‘甲胎蛋白 = [4.68]’, ca199=‘糖类抗原199 = [3.1]’, ca125=‘糖类抗原125 = [5.5]’, ca153=‘null’, 肾功能三项=‘尿素 = [5.2]、肌酐 = [82]、尿酸 = [390]’, 血常规=‘嗜碱性粒细胞计数 = [0.00]、嗜碱性粒细胞% = [0.6]、嗜酸性粒细胞计数 = [0.10]、嗜酸性粒细胞% = [0.8]、红细胞比容 = [0.491]、血红蛋白 = [160]、淋巴细胞计数 = [2.50]、淋巴细胞% = [33.4]、平均血红蛋白量 = [28.5]、平均血红蛋白浓度 = [325]、平均红细胞体积 = [87.5]、单核细胞计数 = [0.60]、单核细胞% = [7.3]、血小板平均体积 = [8.3]、中性粒细胞计数 = [4.40]、中性粒细胞% = [57.9]、血小板计数 = [276]、红细胞计数 = [5.61]、红细胞分布宽度 = [13.2]、白细胞计数 = [7.60]’, 血沉=‘红细胞沉降率 = [1]’, 糖化血红蛋白=‘糖化血红蛋白(HbA1C) = [5.3]’, 尿常规=‘结晶(镜检) = [阴性(-)]、白细胞 = [阴性(-)]、比重 = [1.023]、酸碱度 = [5.0]、亚硝酸盐 = [阴性(-)]、蛋白质 = [阴性(-)]、酮体 = [阴性(-)]、尿胆元 = [阴性(-)]、胆红素 = [阴性(-)]、葡萄糖 = [阴性(-)]、潜血 = [弱阳性]、红细胞(镜检) = [0]、白细胞(镜检) = [0]、上皮细胞(镜检) = [0]、管型(镜检) = [0]、颜色 = [黄色]、清晰度 = [清晰]’, 粪常规+隐血=‘颜色 = [黄色]、性状 = [软便]、虫卵 = [阴性（-）]、红细胞 = [阴性（-）]、白细胞 = [阴性（-）]、隐血试验 = [阴性(-)]’, 内科=‘腹壁厚、心率[84]次/分’, 外科=‘肛拒检、颈部无明显异常’,血压=‘血压[130/75]mmHg’, 胸片(正侧)=‘影像表现：所示胸廓骨骼及胸壁软组织未见异常。纵隔及气管居中未见移位。纵隔未见增宽。心脏形态大小未见异常。两膈光整，两肋膈角锐利。肺门形态大小位置未见异常。两肺野清晰未见异常密度。请根据上述提供的体检指标作出分析及建议。</td> </tr> <tr> <td style="w

评论收藏

内容反馈

版权申诉