Multimodal-GPT-add-baize.zip资源-CSDN文库

共45个文件

py：37个

md：2个

txt：1个

需积分: 5 125 浏览量 2023-07-27 15:14:29 上传评论收藏 108KB ZIP 举报

《Multimodal-GPT-add_baize.zip》是一款与人工智能相关的资源包，主要涉及的是多模态GPT模型的扩展和应用。这个压缩包可能是针对研究人员、开发人员或AI爱好者，提供了一个集成Baize框架的Multimodal GPT模型版本。下面我们将详细探讨多模态GPT模型以及它与Baize框架的结合。 1. **多模态GPT模型**：多模态GPT模型是基于Transformer架构的预训练语言模型，它不仅能够理解文本信息，还能处理图像、音频等多种形式的数据。这种模型通过学习多种输入模式的关联，提升了理解和生成多模态内容的能力。在自然语言处理领域，多模态GPT模型常用于图像描述生成、视觉问答、视频理解等任务，极大地拓宽了AI的应用范围。 2. **GPT模型的核心特点**： - **自回归性**：GPT模型是一种自回归语言模型，它根据已知的上下文预测下一个词的概率，逐词生成序列。 - **Transformer架构**：GPT模型采用Transformer结构，利用自注意力机制和位置编码，能够捕捉到序列中的长距离依赖。 - **预训练-微调范式**：GPT模型先在大规模无标注数据上进行预训练，然后在特定任务的有标注数据上进行微调，以提高其在下游任务的性能。 3. **Baize框架**： Baize（北极星）是一个用于构建和部署大规模机器学习模型的开源框架。它为开发者提供了高效的分布式训练和推理能力，支持灵活的模型并行和数据并行策略，旨在简化模型开发流程，加速AI应用的落地。Baize可能包含优化的计算库、分布式调度系统以及易于使用的API接口，使得多模态GPT模型的训练和部署更为便捷。 4. **整合Baize与多模态GPT**：将Multimodal GPT模型与Baize框架整合，意味着用户可以利用Baize的强大计算能力对多模态模型进行高效训练。这有助于解决多模态数据处理带来的计算复杂性和内存需求，提高训练速度，并实现模型的快速迭代。同时，Baize可能提供了针对多模态任务的优化算法，使得模型在各种多模态应用场景中表现出更好的性能。 5. **应用示例**：使用《Multimodal-GPT-add_baize.zip》中的资源，开发者可以进行以下操作： - 在社交媒体上生成图文并茂的评论或故事。 - 构建智能聊天机器人，使其能理解并回应带有图片或视频的对话。 - 开发视觉搜索功能，让AI理解用户上传的图片并找到相关信息。 - 创建自动字幕生成系统，将视频中的语音和画面转化为文字。 6. **学习与实践**：对于想深入了解多模态GPT模型和Baize框架的初学者，可以通过解压这个zip文件，查看其中的代码和文档，了解模型的实现细节和使用方法。同时，可以尝试运行提供的示例代码，进行模型的微调和实验，从而加深对多模态学习和分布式训练的理解。《Multimodal-GPT-add_baize.zip》是一个宝贵的资源，为AI研究者和开发者提供了集成Baize框架的多模态GPT模型，对于推动多模态智能应用的发展具有重要意义。

资源推荐

资源详情

资源评论

收起资源包目录

Multimodal-GPT-add_baize.zip （45个子文件）

Multimodal-GPT-add_baize

setup.py 1KB

mmgpt

__init__.py 99B

train

__init__.py 1B

distributed.py 4KB

instruction_finetune.py 16KB

train_utils.py 9KB

datasets

aokvqa_dataset.py 1KB

__init__.py 203B

gqa_dataset.py 2KB

nlvr_dataset.py 7KB

samplers

__init__.py 46B

infinite_sampler.py 904B

vqa_dataset.py 8KB

alpaca_gpt4_dataset.py 875B

builder.py 4KB

cc_sbu_align_dataset.py 5KB

dial_dataset.py 4KB

clevr_dataset.py 2KB

llava_dataset.py 712B

text_ocr_dataset.py 2KB

coco_caption_dataset.py 6KB

snli_ve_datasets.py 3KB

ocr_vqa_dataset.py 626B

baize_dataset.py 3KB

dolly_dataset.py 6KB

models

__init__.py 0B

open_flamingo

utils.py 943B

__init__.py 121B

flamingo.py 8KB

flamingo_lm.py 5KB

builder.py 5KB

helpers.py 8KB

builder.py 2KB

blip2

__init__.py 0B

app.py 13KB

LICENSE 11KB

configs

dataset_config.py 1KB

lora_config.py 235B

docs

images

demo_image.jpg 47KB

新建文件夹

environment.yml 143B

requirements.txt 195B

checkpoints

.gitkeep 0B

.gitignore 2KB

README.md 8KB

README_zh-CN.md 8KB

# ð¤ Multi-modal GPT Train a multi-modal chatbot with visual and language instructions! Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data. The **joint training** of visual and language instructions effectively improves the performance of the model! For more details please refer to our [technical report](https://arxiv.org/abs/2305.04790). Welcome to join us! </div> <div align="center"> English | [ç®ä½ä¸æ](README_zh-CN.md) </div> <div align="center"> <a href="https://openmmlab.medium.com/" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219255827-67c1a27f-f8c5-46a9-811d-5e57448c61d1.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://discord.com/channels/1037617289144569886/1046608014234370059" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://twitter.com/OpenMMLab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218346637-d30c8a0f-3eba-4699-8131-512fb06d46db.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://www.youtube.com/openmmlab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218346691-ceb2116a-465a-40af-8424-9f30d2348ca9.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://space.bilibili.com/1293512903" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219026751-d7d14cce-a7c9-4e82-9942-8375fca65b99.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://www.zhihu.com/people/openmmlab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219026120-ba71e48b-6e94-4bd4-b4e9-b7d175b5e362.png" width="3%" alt="" /></a> </div> ## Online Demo ð [***Demo Link***](https://mmgpt.openmmlab.org.cn/) <img src="https://user-images.githubusercontent.com/12907710/237001772-f6e94884-db35-47a0-9fb8-09c2c6a692ff.png" width="70%" alt="" /> ## Features - Support various vision and language instruction data - Parameter efficient fine-tuning with LoRA - Tuning vision and language at the same time, complement each other ## Installation To install the package in an existing environment, run ```bash git clone https://github.com/open-mmlab/Multimodal-GPT.git cd Multimodal-GPT pip install -r requirements.txt pip install -v -e . ``` or create a new conda environment ```bash conda env create -f environment.yml ``` ## Launch Demo Locally 1. Download the pre-trained weights. Use [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to Hugging Face format. Download the OpenFlamingo pre-trained model from [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B). Download our LoRA Weight from [here](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt). Then place these models in `checkpoints` folders like this: ``` checkpoints âââ llama-7b_hf â âââ config.json â âââ pytorch_model-00001-of-00002.bin â âââ ...... â âââ tokenizer.model âââ OpenFlamingo-9B â âââcheckpoint.pt âââmmgpt-lora-v0-release.pt 2. launch the gradio demo ```bash python app.py ``` ## Examples ### Recipe: ![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png) ### Travel plan: ![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png) ### Movie: ![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png) ### Famous person: ![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png) ## Fine-tuning ### Prepare datasets 1. [A-OKVQA](https://allenai.org/project/a-okvqa/home) Download annotation from [this link](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz) and unzip to `data/aokvqa/annotations`. It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home). 2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/) Download from [this link](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip) and unzip to `data/coco`. It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home). 3. [OCR VQA](https://ocr-vqa.github.io/) Download from [this link](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) and place in `data/OCR_VQA/`. 4. [LlaVA](https://llava-vl.github.io/) Download from [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and place in `data/llava/`. It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home). 5. [Mini-GPT4](https://minigpt-4.github.io/) Download from [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) and place in `data/cc_sbu_align/`. 6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html) Download from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and place it in `data/dolly/databricks-dolly-15k.jsonl`. 7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) Download it from [this link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) and place it in `data/alpaca_gpt4/alpaca_gpt4_data.json`. You can also customize the data path in the [configs/dataset_config.py](configs/dataset_config.py). 8. [Baize](https://github.com/project-baize/baize-chatbot) Download it from [this link](https://github.com/project-baize/baize-chatbot/blob/main/data/quora_chat_data.json) and place it in `data/baize/quora_chat_data.json`. ## Start training ```bash torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \ --lm_path checkpoints/llama-7b_hf \ --tokenizer_path checkpoints/llama-7b_hf \ --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \ --run_name train-my-gpt4 \ --learning_rate 1e-5 \ --lr_scheduler cosine \ --batch_size 1 \ --tuning_config configs/lora_config.py \ --dataset_config configs/dataset_config.py \ --report_to_wandb ``` ## Acknowledgements - [OpenFlamingo](https://github.com/mlfoundations/open_flamingo) - [LAVIS](https://github.com/salesforce/LAVIS) - [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4) - [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main) - [I

评论收藏

内容反馈