GPT-BERT医疗QA聊天机器人.zip_bertchatbot资源-CSDN文库

共25个文件

py：8个

gif：5个

png：3个

版权申诉

bert

健康医疗

163 浏览量 2024-03-09 12:06:16 上传评论收藏 17.24MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

GPT-BERT 医疗 QA 聊天机器人.zip （25个子文件）

GPT-BERT 医疗 QA 聊天机器人

dockerfile 584B

assets

imgs

medichatbot.png 142KB

streamlit3.png 53KB

docker_build.gif 673KB

streamlit_app2.gif 10.79MB

medichatbot_walle.png 4.37MB

streamlit1.gif 592KB

medichatbot.gif 531KB

streamlit2.gif 942KB

main.py 3KB

modules

__init__.py 0B

chatbot

preprocessor.py 2KB

inferencer.py 4KB

dataloader.py 756B

config.py 605B

docker-compose.yml 268B

chatbot.py 3KB

requirements.txt 2KB

README.md 11KB

question_extractor_model

keras_metadata.pb 131B

saved_model.pb 132B

variables

variables.index 130B

variables.data-00000-of-00001 134B

scripts

99.tester.ipynb 2KB

01.chatgpt_api_app_example.py 565B

Development Status :: 3 - Alpha *Copyright (c) 2023 MinWoo Park* # GPT-BERT Medical QA Chatbot [![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v2.0%20adopted-black.svg)](code_of_conduct.md) [![Python Version](https://img.shields.io/badge/python-3.6%2C3.7%2C3.8-black.svg)](code_of_conduct.md) ![Code convention](https://img.shields.io/badge/code%20convention-pep8-black) ![Black Fomatter](https://img.shields.io/badge/code%20style-black-000000.svg) > **Be careful when cloning this repository**: It contains large NLP model weight. (>0.45GB, [`git-lfs`](https://git-lfs.com/)) > If you want to clone without git-lfs, use this command before `git clone`. *The bandwidth provided by git-lfs for free is only 1GB per month, so there is almost no chance that a 0.45GB git-lfs download will work. So please download it manually.* ``` git lfs install --skip-smudge & export GIT_LFS_SKIP_SMUDGE=1 ``` [](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot_walle.png) 自 Chat GPT-4 出现以来，该领域发生了重大变化。尽管如此，Chat GPT-2 和 Chat GPT-3 作为大规模自回归自然语言处理模型，在特定领域仍然有效。该存储库旨在定性比较 Chat GPT-2 和 Chat GPT-4 在医疗领域的性能，并估计 Chat GPT-2 微调以达到 Chat GPT-4 的性能水平所需的资源和成本。此外，它还试图评估最新信息的整合和应用程度。尽管落后 GPT-4 几年，但该存储库的最终目标是最大限度地减少获取权重后更新和获取可用权重所需的成本和资源。我们计划在大规模自然语言处理模型中设计小样本学习实验并测试现有研究。请注意，该存储库仅用于研究和实践目的，我们不承担任何使用责任。此外，该存储库的最终目标是通过模型轻量化和优化，在某些领域实现与 GPT-4 类似的定性和定量性能。 # Contents - [GPT-BERT Medical QA Chatbot](#medical-qa-bert-chatgpt) - [Quick Start](#quick-start) * [Command-Line Interface](#command-line-interface) * [Streamlit application](#streamlit-application) - [Docker](#docker) * [Build from Docker Image](#build-from-docker-image) * [Build from Docker Compose](#build-from-docker-compose) * [Build from Docker Hub](#build-from-docker-hub) * [Pre-trained model infomation](#pre-trained-model-infomation) - [Dataset](#dataset) - [Pretrained Models](#pretrained-models) - [Cites](#cites) - [Tips](#tips) * [About data handling](#about-data-handling) * [About Tensorflow-GPU handling](#about-tensorflow-gpu-handling) - [References](#references) # Quick Start ## Command-Line Interface You can chat with the chatbot through the command-line interface using the following command. ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.gif) ``` git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git cd medical-qa-bert-chatgpt pip install -e . python main.py ``` ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.png) ## Streamlit application A simple application can be implemented with streamlit as follows: ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit_app2.gif) ``` git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git cd medical-qa-bert-chatgpt pip install -e . streamlit run chatbot.py ```  # Docker Check Docker Hub: https://hub.docker.com/r/parkminwoo91/medical-chatgpt-streamlit-v1 Docker version 20.10.24, build 297e128 ## Build from Docker Image ``` git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git cd medical-qa-bert-chatgpt docker build -t chatgpt . docker run -p 8501:8501 -v ${PWD}/:/usr/src/app/data chatgpt # There is no cost to pay for git-lfs, just download and mount it. ``` ##### Since git clone downloads what needs to be downloaded from git-lfs, the volume must be mounted as follows. Or modify `chatbot/config.py` to mount to a different folder. ## Build from Docker Compose You can also implement it in a docker container like this: ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/docker_build.gif) ``` git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git cd medical-qa-bert-chatgpt docker compose up ``` ## Build from Docker Hub ``` docker pull parkminwoo91/medical-chatgpt-streamlit-v1:latest docker compose up ``` http://localhost:8501/ ###### Streamlit is very convenient and quick to view landing pages, but lacks design flexibility and lacks control over the application layout. Also, if your application or data set is large, the entire source code will be re-run on every new change or interaction, so application flow can cause speed issues. That landing page will be replaced by flask with further optimizations. Streamlit chatbot has been recently developed, so it seems difficult to have the meaning of a simple demo now. ## Pre-trained model infomation `Pre-trained model weight needed` Downloading datasets and model weights through the Hugging Face Hub is executed, but for some TensorFlow models, you need to manually download and place them at the top of the project folder. The information for the downloadable model is as follows, and you can visit my Hugging Face repository to check it. `modules/chatbot/config.py` ```python class Config: chat_params = {"gpt_tok":"danielpark/medical-QA-chatGPT2-tok-v1", "tf_gpt_model":"danielpark/medical-QA-chatGPT2-v1", "bert_tok":"danielpark/medical-QA-BioRedditBERT-uncased-v1", "tf_q_extractor": "question_extractor_model", "data":"danielpark/MQuAD-v1", "max_answer_len": 20, "isEval": False, "runDocker":True, # Exceeds the bandwidth of git-lfs, mounts to local storage to find folder location for free use. I use the python utifunction package. "container_mounted_folder_path": "/usr/src/app/data"} ``` # Dataset The Medical Question and Answering dataset(MQuAD) has been refined, including the following datasets. You can download it through the Hugging Face dataset. Use the DATASETS method as follows. You can find more infomation at [here.](https://huggingface.co/datasets/danielpark/MQuAD-v1) ```python from datasets import load_dataset dataset = load_dataset("danielpark/MQuAD-v1") ``` Medical Q/A datasets gathered from the following websites. - eHealth Forum - iCliniq - Question Doctors - WebMD Data was gathered at the 5th of May 2017. # Pretrained Models Hugging face pretrained models - GPT2 pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-v1) - GPT2 tokenizer [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-tok-v1) - BIO Reddit BERT pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-BioRedditBERT-uncased-v1) TensorFlow models for extracting context from QA. I temporarily share TensorFlow model weights through my personal Google Drive. - Q extractor [[download]](https://drive.google.com/drive/folders/1VjljBW_HXXIXoh0u2Y1anPCveQCj9vnQ?usp=share_link) - A extractor [[download]](https://drive.google.com/drive/folders/1iZ6jCiZPqjsNOyVoHcagEf3hDC5H181j?usp=share_link) # Cites ```BibTex @misc {hf_canonical_model_maintainers_2022, author = { {HF Canonical Model Maintainers} }, title = { gpt2 (Revision 909a290) }, year = 2022, url = { https://huggingface.co/gpt2 }, doi = { 10.57967/hf/0039 }, publisher = { Hugging Face } } @misc{vaswani2017attention, title = {Attention Is All Y

评论收藏

内容反馈

版权申诉