Development Status :: 3 - Alpha <br>
*Copyright (c) 2023 MinWoo Park*
<br>
# GPT-BERT Medical QA Chatbot
[![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v2.0%20adopted-black.svg)](code_of_conduct.md)
[![Python Version](https://img.shields.io/badge/python-3.6%2C3.7%2C3.8-black.svg)](code_of_conduct.md)
![Code convention](https://img.shields.io/badge/code%20convention-pep8-black)
![Black Fomatter](https://img.shields.io/badge/code%20style-black-000000.svg)
> **Be careful when cloning this repository**: It contains large NLP model weight. (>0.45GB, [`git-lfs`](https://git-lfs.com/)) <br>
> If you want to clone without git-lfs, use this command before `git clone`. *The bandwidth provided by git-lfs for free is only 1GB per month, so there is almost no chance that a 0.45GB git-lfs download will work. So please download it manually.*
```
git lfs install --skip-smudge &
export GIT_LFS_SKIP_SMUDGE=1
```
[](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot_walle.png)
自 Chat GPT-4 出现以来,该领域发生了重大变化。尽管如此,Chat GPT-2 和 Chat GPT-3 作为大规模自回归自然语言处理模型,在特定领域仍然有效。该存储库旨在定性比较 Chat GPT-2 和 Chat GPT-4 在医疗领域的性能,并估计 Chat GPT-2 微调以达到 Chat GPT-4 的性能水平所需的资源和成本。此外,它还试图评估最新信息的整合和应用程度。
尽管落后 GPT-4 几年,但该存储库的最终目标是最大限度地减少获取权重后更新和获取可用权重所需的成本和资源。我们计划在大规模自然语言处理模型中设计小样本学习实验并测试现有研究。请注意,该存储库仅用于研究和实践目的,我们不承担任何使用责任。
此外,该存储库的最终目标是通过模型轻量化和优化,在某些领域实现与 GPT-4 类似的定性和定量性能。
<br><br><br><br><br><br>
# Contents
- [GPT-BERT Medical QA Chatbot](#medical-qa-bert-chatgpt)
- [Quick Start](#quick-start)
* [Command-Line Interface](#command-line-interface)
* [Streamlit application](#streamlit-application)
- [Docker](#docker)
* [Build from Docker Image](#build-from-docker-image)
* [Build from Docker Compose](#build-from-docker-compose)
* [Build from Docker Hub](#build-from-docker-hub)
* [Pre-trained model infomation](#pre-trained-model-infomation)
- [Dataset](#dataset)
- [Pretrained Models](#pretrained-models)
- [Cites](#cites)
- [Tips](#tips)
* [About data handling](#about-data-handling)
* [About Tensorflow-GPU handling](#about-tensorflow-gpu-handling)
- [References](#references)
<br><br><br><br><br><br>
<br>
# Quick Start
## Command-Line Interface
You can chat with the chatbot through the command-line interface using the following command.
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.gif)
```
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
cd medical-qa-bert-chatgpt
pip install -e .
python main.py
```
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.png)
<br>
## Streamlit application
A simple application can be implemented with streamlit as follows: <br>
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit_app2.gif)
```
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
cd medical-qa-bert-chatgpt
pip install -e .
streamlit run chatbot.py
```
<!-- ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit3.png) -->
# Docker
Check Docker Hub: https://hub.docker.com/r/parkminwoo91/medical-chatgpt-streamlit-v1 <br>
Docker version 20.10.24, build 297e128
## Build from Docker Image
```
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
cd medical-qa-bert-chatgpt
docker build -t chatgpt .
docker run -p 8501:8501 -v ${PWD}/:/usr/src/app/data chatgpt # There is no cost to pay for git-lfs, just download and mount it.
```
##### Since git clone downloads what needs to be downloaded from git-lfs, the volume must be mounted as follows. Or modify `chatbot/config.py` to mount to a different folder.
## Build from Docker Compose
You can also implement it in a docker container like this: <br>
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/docker_build.gif)
```
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
cd medical-qa-bert-chatgpt
docker compose up
```
## Build from Docker Hub
```
docker pull parkminwoo91/medical-chatgpt-streamlit-v1:latest
docker compose up
```
http://localhost:8501/
###### Streamlit is very convenient and quick to view landing pages, but lacks design flexibility and lacks control over the application layout. Also, if your application or data set is large, the entire source code will be re-run on every new change or interaction, so application flow can cause speed issues. That landing page will be replaced by flask with further optimizations. Streamlit chatbot has been recently developed, so it seems difficult to have the meaning of a simple demo now.
## Pre-trained model infomation
`Pre-trained model weight needed`
Downloading datasets and model weights through the Hugging Face Hub is executed, but for some TensorFlow models, you need to manually download and place them at the top of the project folder. The information for the downloadable model is as follows, and you can visit my Hugging Face repository to check it. <br>
<br>
`modules/chatbot/config.py`
```python
class Config:
chat_params = {"gpt_tok":"danielpark/medical-QA-chatGPT2-tok-v1",
"tf_gpt_model":"danielpark/medical-QA-chatGPT2-v1",
"bert_tok":"danielpark/medical-QA-BioRedditBERT-uncased-v1",
"tf_q_extractor": "question_extractor_model",
"data":"danielpark/MQuAD-v1",
"max_answer_len": 20,
"isEval": False,
"runDocker":True, # Exceeds the bandwidth of git-lfs, mounts to local storage to find folder location for free use. I use the python utifunction package.
"container_mounted_folder_path": "/usr/src/app/data"}
```
<br>
# Dataset
The Medical Question and Answering dataset(MQuAD) has been refined, including the following datasets. You can download it through the Hugging Face dataset. Use the DATASETS method as follows. You can find more infomation at [here.](https://huggingface.co/datasets/danielpark/MQuAD-v1)
```python
from datasets import load_dataset
dataset = load_dataset("danielpark/MQuAD-v1")
```
Medical Q/A datasets gathered from the following websites.
- eHealth Forum
- iCliniq
- Question Doctors
- WebMD
Data was gathered at the 5th of May 2017.
<br>
# Pretrained Models
Hugging face pretrained models
- GPT2 pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-v1)
- GPT2 tokenizer [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-tok-v1)
- BIO Reddit BERT pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-BioRedditBERT-uncased-v1)
TensorFlow models for extracting context from QA.
I temporarily share TensorFlow model weights through my personal Google Drive.
- Q extractor [[download]](https://drive.google.com/drive/folders/1VjljBW_HXXIXoh0u2Y1anPCveQCj9vnQ?usp=share_link)
- A extractor [[download]](https://drive.google.com/drive/folders/1iZ6jCiZPqjsNOyVoHcagEf3hDC5H181j?usp=share_link)
<br>
# Cites
```BibTex
@misc {hf_canonical_model_maintainers_2022,
author = { {HF Canonical Model Maintainers} },
title = { gpt2 (Revision 909a290) },
year = 2022,
url = { https://huggingface.co/gpt2 },
doi = { 10.57967/hf/0039 },
publisher = { Hugging Face }
}
@misc{vaswani2017attention,
title = {Attention Is All Y
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
自 Chat GPT-4 出现以来,该领域发生了重大变化。尽管如此,Chat GPT-2 和 Chat GPT-3 作为大规模自回归自然语言处理模型,在特定领域仍然有效。该存储库旨在定性比较 Chat GPT-2 和 Chat GPT-4 在医疗领域的性能,并估计 Chat GPT-2 微调以达到 Chat GPT-4 的性能水平所需的资源和成本。此外,它还试图评估最新信息的整合和应用程度。 尽管落后 GPT-4 几年,但该存储库的最终目标是最大限度地减少获取权重后更新和获取可用权重所需的成本和资源。我们计划在大规模自然语言处理模型中设计小样本学习实验并测试现有研究。请注意,该存储库仅用于研究和实践目的,我们不承担任何使用责任。 此外,该存储库的最终目标是通过模型轻量化和优化,在某些领域实现与 GPT-4 类似的定性和定量性能。
资源推荐
资源详情
资源评论
收起资源包目录
GPT-BERT 医疗 QA 聊天机器人.zip (25个子文件)
GPT-BERT 医疗 QA 聊天机器人
dockerfile 584B
assets
imgs
medichatbot.png 142KB
streamlit3.png 53KB
docker_build.gif 673KB
streamlit_app2.gif 10.79MB
medichatbot_walle.png 4.37MB
streamlit1.gif 592KB
medichatbot.gif 531KB
streamlit2.gif 942KB
main.py 3KB
modules
__init__.py 0B
chatbot
preprocessor.py 2KB
inferencer.py 4KB
dataloader.py 756B
config.py 605B
docker-compose.yml 268B
chatbot.py 3KB
requirements.txt 2KB
README.md 11KB
question_extractor_model
keras_metadata.pb 131B
saved_model.pb 132B
variables
variables.index 130B
variables.data-00000-of-00001 134B
scripts
99.tester.ipynb 2KB
01.chatgpt_api_app_example.py 565B
共 25 条
- 1
资源评论
AI拉呱
- 粉丝: 2862
- 资源: 5510
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- ProtoBuffer3文件转成C#文件Unity3D工具
- Kernel-based Virtual Machine使用介绍
- spotfire使用ironpython print 样例
- C#ASP.NET医药ERP进销存管理系统源码 医药进销存源码数据库 SQL2008源码类型 WebForm
- 三条移动平均线相交的EA交易策略
- JAVA的SpringBoot高校学生公寓宿舍管理系统源码数据库 MySQL源码类型 WebForm
- 2024新版ThinkPHP+Bootstrap后台管理系统
- 猫狗识别系统(python+UI界面)
- 布拉格结构相关资料.zip
- C#ASP.NET教育局公文签收系统源码数据库 SQL2008源码类型 WebForm
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功