python毕业设计-基于大语言模型的心血管OCT智能诊断系统源码+使用说明（高分项目）.zip资源-CSDN文库

共308个文件

py：292个

txt：3个

md：3个

版权申诉

毕业设计

python

语言模型

源码

58 浏览量 2024-05-09 14:12:20 上传评论 2 收藏 70.88MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

python毕业设计-基于大语言模型的心血管OCT智能诊断系统源码+使用说明（高分项目）.zip （308个子文件）

setup.cfg 465B

test.gif 61.81MB

.gitattributes 66B

.gitignore 3KB

ht.ico 230KB

disease_dict.json 9.4MB

msd_dict.json 376KB

result_gpt3.5.json 38KB

eval.lst 1KB

README.md 16KB

README.md 11KB

README.md 3KB

MSD.pkl 5.89MB

transforms.py 70KB

reppoints_v2_head.py 61KB

corner_head.py 45KB

lvis.py 45KB

yolact_head.py 39KB

structures.py 37KB

transformer.py 36KB

guided_anchor_head.py 36KB

auto_augment.py 36KB

vfnet_head.py 35KB

reppoints_head.py 34KB

anchor_head.py 34KB

cascade_rpn_head.py 32KB

anchor_generator.py 30KB

transformer_head.py 30KB

atss_head.py 30KB

paa_head.py 29KB

fcos_head.py 28KB

gfl_head.py 27KB

gswin_transformer.py 27KB

sabl_retina_head.py 27KB

htc_roi_head.py 25KB

loading.py 25KB

yolo_head.py 24KB

sabl_head.py 24KB

swin_transformer.py 24KB

scnet_roi_head.py 24KB

coco.py 24KB

resnet.py 23KB

cascade_roi_head.py 22KB

bbox_head.py 21KB

hrnet.py 20KB

condconv_mask_head.py 20KB

centripetal_head.py 19KB

mean_ap.py 19KB

checkpoint.py 19KB

fsaf_head.py 18KB

dii_head.py 18KB

iou_loss.py 16KB

fpg.py 16KB

fcn_mask_head.py 15KB

formating.py 15KB

grid_head.py 15KB

center_region_assigner.py 15KB

test_mixins.py 15KB

fovea_head.py 14KB

cityscapes.py 14KB

base.py 14KB

sparse_roi_head.py 14KB

bucketing_bbox_coder.py 14KB

anchor_free_head.py 13KB

mask_point_head.py 13KB

utils.py 13KB

res2net.py 12KB

eval_hooks.py 12KB

standard_roi_head.py 12KB

regnet.py 12KB

custom.py 11KB

score_hlr_sampler.py 11KB

free_anchor_retina_head.py 11KB

dataset_wrappers.py 11KB

ssd_head.py 11KB

image.py 11KB

trident_resnet.py 11KB

gfocal_loss.py 11KB

fpn_carafe.py 10KB

rpn_head.py 10KB

ld_head.py 10KB

detectors_resnet.py 10KB

resnest.py 10KB

point_rend_roi_head.py 10KB

bbox_nms.py 10KB

max_iou_assigner.py 10KB

focal_loss.py 9KB

fpn.py 9KB

region_assigner.py 9KB

atss_assigner_v2.py 9KB

delta_xywh_bbox_coder.py 9KB

bifpn.py 9KB

mask_reppoints_v2_detector.py 9KB

legacy_delta_xywh_bbox_coder.py 8KB

tblr_bbox_coder.py 8KB

reprocess.py 8KB

cross_entropy_loss.py 8KB

transforms.py 8KB

point_hm_assigner.py 8KB

atss_assigner.py 8KB

共 308 条

# DISC-MedLLM <div align="center"> [![Generic badge](https://img.shields.io/badge/🤗-Huggingface%20Repo-green.svg)](https://huggingface.co/Flmc/DISC-MedLLM) [![license](https://img.shields.io/github/license/modelscope/modelscope.svg)](https://github.com/FudanDISC/DICS-MedLLM/blob/main/LICENSE) <br> </div> <div align="center"> [Demo](http://med.fudan-disc.com) | [技术报告](https://arxiv.org/abs/2308.14346) <br> 中文 | [EN](https://github.com/FudanDISC/DISC-MedLLM/blob/main/README_EN.md) </div> DISC-MedLLM 是一个专门针对医疗健康对话式场景而设计的医疗领域大模型，由[复旦大学数据智能与社会计算实验室 (Fudan-DISC)](http://fudan-disc.com) 开发并开源。该项目包含下列开源资源: * [DISC-Med-SFT 数据集](https://huggingface.co/datasets/Flmc/DISC-Med-SFT) (不包括行为偏好训练数据) * DISC-MedLLM 的[模型权重](https://huggingface.co/Flmc/DISC-MedLLM) 您可以通过访问这个[链接](http://med.fudan-disc.com)来试用我们的模型。 ## 概述 DISC-MedLLM 是一个专为医疗健康对话场景而打造的领域大模型，它可以满足您的各种医疗保健需求，包括疾病问诊和治疗方案咨询等，为您提供高质量的健康支持服务。 DISC-MedLLM 有效地对齐了医疗场景下的人类偏好，弥合了通用语言模型输出与真实世界医疗对话之间的差距，这一点在实验结果中有所体现。得益于我们以目标为导向的策略，以及基于真实医患对话数据和知识图谱，引入LLM in the loop 和 Human in the loop的多元数据构造机制，DISC-MedLLM 有以下几个特点： * **可靠丰富的专业知识**，我们以医学知识图谱作为信息源，通过采样三元组，并使用通用大模型的语言能力进行对话样本的构造。 * **多轮对话的问询能力**，我们以真实咨询对话纪录作为信息源，使用大模型进行对话重建，构建过程中要求模型完全对齐对话中的医学信息。 * **对齐人类偏好的回复**，病人希望在咨询的过程中获得更丰富的支撑信息和背景知识，但人类医生的回答往往简练；我们通过人工筛选，构建符合人类偏好的高质量的小规模行为微调样本，对齐病人的需求。 <img src="https://github.com/FudanDISC/DISC-MedLLM/blob/main/images/data_construction.png" alt="data-construction" width="85%"/> ## 模型效果演示 ### 疾病问诊 <img src="https://github.com/FudanDISC/DISC-MedLLM/blob/main/images/consultation.gif" alt="sample1" width="60%"/> ### 治疗方案咨询 <img src="https://github.com/FudanDISC/DISC-MedLLM/blob/main/images/advice.gif" alt="sample2" width="60%"/> ## 数据集为了训练 DISC-MedLLM ，我们构建了一个高质量的数据集，命名为 DISC-Med-SFT，其中包含了超过47万个衍生于现有的医疗数据集重新构建得到的样本。我们采用了目标导向的策略，通过对于精心选择的几个数据源进行重构来得到SFT数据集。这些数据的作用在于帮助模型学习医疗领域知识，将行为模式与人类偏好对齐，并对齐真实世界在线医疗对话的分布情况。  <table class="tg" style="undefined;table-layout: fixed; width: 442px"> <colgroup> <col style="width: 204.428571px"> <col style="width: 135.428571px"> <col style="width: 102.428571px"> </colgroup> <thead> <tr> <th class="tg-9wq8" rowspan="2"><br>数据集</th> <th class="tg-9wq8" rowspan="2"><br>数据来源</th> <th class="tg-9wq8" rowspan="2"><br>样本量</th> </tr> <tr> </tr> </thead> <tbody> <tr> <td class="tg-9wq8" rowspan="2">重构AI医患对话</td> <td class="tg-9wq8">MedDialog</td> <td class="tg-9wq8">400k</td> </tr> <tr> <td class="tg-9wq8">cMedQA2</td> <td class="tg-c3ow">20k</td> </tr> <tr> <td class="tg-c3ow">知识图谱问答对</td> <td class="tg-9wq8">CMeKG</td> <td class="tg-9wq8">50k</td> </tr> <tr> <td class="tg-c3ow">行为偏好数据集</td> <td class="tg-9wq8">人为筛选</td> <td class="tg-9wq8">2k</td> </tr> <tr> <td class="tg-9wq8" rowspan="3">其他</td> <td class="tg-c3ow">MedMCQA</td> <td class="tg-c3ow">8k</td> </tr> <tr> <td class="tg-c3ow">MOSS-SFT</td> <td class="tg-c3ow">33k</td> </tr> <tr> <td class="tg-c3ow">Alpaca-GPT4-zh</td> <td class="tg-c3ow">1k</td> </tr> </tbody> </table> <br> ### 下载我们总共发布了近47万条训练数据，其中包括重构AI医患对话和知识图谱问答对。您可以访问这个[链接](https://huggingface.co/datasets/Flmc/DISC-Med-SFT)下载数据集。 <br> ## 部署当前版本的 DISC-MedLLM 是基于[Baichuan-13B-Base](https://github.com/baichuan-inc/Baichuan-13B)训练得到的。您可以直接从 [Hugging Face](https://huggingface.co/Flmc/DISC-MedLLM) 上下载我们的模型权重，或者根据下列代码样例中的方式自动获取。首先，您需要安装项目的依赖环境。 ```shell pip install -r requirements.txt ``` ### 利用Hugging Face的transformers模块来进行推理 ```python >>> import torch >>> from transformers import AutoModelForCausalLM, AutoTokenizer >>> from transformers.generation.utils import GenerationConfig >>> tokenizer = AutoTokenizer.from_pretrained("Flmc/DISC-MedLLM", use_fast=False, trust_remote_code=True) >>> model = AutoModelForCausalLM.from_pretrained("Flmc/DISC-MedLLM", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True) >>> model.generation_config = GenerationConfig.from_pretrained("Flmc/DISC-MedLLM") >>> messages = [] >>> messages.append({"role": "user", "content": "我感觉自己颈椎非常不舒服，每天睡醒都会头痛"}) >>> response = model.chat(tokenizer, messages) >>> print(response) ``` ### 运行命令行Demo ```shell python cli_demo.py ``` ### 运行网页版Demo ```shell streamlit run web_demo.py --server.port 8888 ``` 此外，由于目前版本的 DISC-MedLLM 是以 Baichuan-13B 作为基座的，您可以参考 [Baichuan-13B 项目](https://github.com/baichuan-inc/Baichuan-13B)的介绍来进行 int8 或 int4 量化推理部署。然而需要注意的是，使用模型量化可能会导致性能的下降。 <br> ## 对模型进行微调您可以使用与我们的数据集结构相同的数据对我们的模型进行微调。我们的训练代码在 [Firefly](https://github.com/yangjianxin1/Firefly) 的基础上进行了修改，使用了不同的数据结构和对话格式。这里我们只提供全参数微调的代码： ```shell deepspeed --num_gpus={num_gpus} ./train/train.py --train_args_file ./train/train_args/sft.json ``` > 请您在开始进行模型训练前检查 `sft.json` 中的设置。 <br>如果您想使用其他训练代码来微调我们的模型，请使用如下对话格式。 ```shell <\b><$user_token>content<$assistant_token>content<\s><$user_token>content ... ``` 我们使用的 `user_token` 和 `assistant_token` 分别为 `195` and `196`，这和 Baichuan-13B-Chat 是相同的。 ## 模型评测 <!-- We compare our model with three general-purpose LLMs and two conversational Chinese medical domain LLMs. Specifically, these are GPT-3.5 and GPT-4 from OpenAI, the aligned conversational version of our backbone model Baichuan-13B-Base, Baichuan-13B-Chat, and the open-source Chinese conversational medical mode

评论收藏

内容反馈

版权申诉