【免费】MiniGPT-4（KAUST）手写草稿创建网站资源-CSDN文库

共112个文件

png：50个

py：43个

yaml：8个

自然语言处理

计算机视觉

深度学习

数据集

需积分: 0 157 浏览量 2023-04-27 15:36:04 上传评论收藏 34.41MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

MiniGPT-4（KAUST）手写草稿创建网站（112个子文件）

README.md 8KB

README_1_STAGE.md 3KB

PrepareVicuna.md 2KB

LICENSE_Lavis.md 1KB

LICENSE.md 1KB

README_2_STAGE.md 535B

MiniGPT_4.pdf 6.31MB

overview.png 2.42MB

online_demo.png 1.2MB

story_1.png 853KB

rhyme_2.png 805KB

fun_1.png 713KB

web_1.png 711KB

fix_1.png 690KB

describe_1.png 679KB

fact_2.png 658KB

op_2.png 634KB

op_1.png 603KB

fun_2.png 597KB

rhyme_1.png 588KB

fix_2.png 586KB

cook_2.png 586KB

story_2.png 567KB

wop_2.png 565KB

describe_2.png 555KB

cook_1.png 538KB

wop_1.png 519KB

fact_1.png 468KB

ad_2.png 457KB

ad_1.png 380KB

people_2.png 305KB

people_1.png 249KB

logo_1.png 189KB

Qformer.py 47KB

modeling_llama.py 33KB

runner_base.py 23KB

eva_vit.py 19KB

config.py 15KB

utils.py 13KB

randaugment.py 11KB

mini_gpt4.py 10KB

registry.py 10KB

base_task.py 9KB

base_dataset_builder.py 8KB

base_model.py 8KB

blip2.py 8KB

conversation.py 7KB

data_utils.py 6KB

logger.py 6KB

demo.py 6KB

__init__.py 6KB

dataloader_utils.py 5KB

blip2_outputs.py 4KB

blip_processors.py 4KB

dist_utils.py 4KB

optims.py 3KB

image_text_pair_builder.py 3KB

train.py 3KB

caption_datasets.py 3KB

base_dataset.py 2KB

__init__.py 2KB

cc_sbu_dataset.py 2KB

laion_dataset.py 1KB

__init__.py 951B

__init__.py 823B

gradcam.py 815B

__init__.py 736B

base_processor.py 610B

image_text_pretrain.py 538B

convert_laion.py 508B

convert_cc_sbu.py 504B

__init__.py 306B

__init__.py 0B

共 112 条

# MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models [Deyao Zhu](https://tsutikgiau.github.io/)* (On Job Market!), [Jun Chen](https://junchen14.github.io/)* (On Job Market!), [Xiaoqian Shen](https://xiaoqian-shen.github.io), [Xiang Li](https://xiangli.ac.cn), and [Mohamed Elhoseiny](https://www.mohamed-elhoseiny.com/). *Equal Contribution **King Abdullah University of Science and Technology** <a href='https://minigpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2304.10592'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) ## News We now provide a pretrained MiniGPT-4 aligned with Vicuna-7B! The demo GPU memory consumption now can be as low as 12GB. ## Online Demo Click the image to chat with MiniGPT-4 around your images [![demo](figs/online_demo.png)](https://minigpt-4.github.io) ## Examples | | | :-------------------------:|:-------------------------: ![find wild](figs/examples/wop_2.png) | ![write story](figs/examples/ad_2.png) ![solve problem](figs/examples/fix_1.png) | ![write Poem](figs/examples/rhyme_1.png) More examples can be found in the [project page](https://minigpt-4.github.io). ## Introduction - MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer. - We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted. - To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset. - The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100. - MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. ![overview](figs/overview.png) ## Getting Started ### Installation **1. Prepare the code and the environment** Git clone our repository, creating a python environment and ativate it via the following command ```bash git clone https://github.com/Vision-CAIR/MiniGPT-4.git cd MiniGPT-4 conda env create -f environment.yml conda activate minigpt4 ``` **2. Prepare the pretrained Vicuna weights** The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B. Please refer to our instruction [here](PrepareVicuna.md) to prepare the Vicuna weights. The final weights would be in a single folder in a structure similar to the following: ``` vicuna_weights ├── config.json ├── generation_config.json ├── pytorch_model.bin.index.json ├── pytorch_model-00001-of-00003.bin ... ``` Then, set the path to the vicuna weight in the model config file [here](minigpt4/configs/models/minigpt4.yaml#L16) at Line 16. **3. Prepare the pretrained MiniGPT-4 checkpoint** Download the pretrained checkpoints according to the Vicuna model you prepare. | Checkpoint Aligned with Vicuna 13B | Checkpoint Aligned with Vicuna 7B | :------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------: [Downlad](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) Then, set the path to the pretrained checkpoint in the evaluation config file in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 11. ### Launching Demo Locally Try out our demo [demo.py](demo.py) on your local machine by running ``` python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0 ``` To save GPU memory, Vicuna loads as 8 bit by default, with a beam search width of 1. This configuration requires about 23G GPU memory for Vicuna 13B and 11.5G GPU memory for Vicuna 7B. For more powerful GPUs, you can run the model in 16 bit by setting low_resource to False in the config file [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml) and use a larger beam search width. Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run our code on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) ### Training The training of MiniGPT-4 contains two alignment stages. **1. First pretraining stage** In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets to align the vision and language model. To download and prepare the datasets, please check our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). After the first stage, the visual features are mapped and can be understood by the language model. To launch the first stage training, run the following command. In our experiments, we use 4 A100. You can change the save path in the config file [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml ``` A MiniGPT-4 checkpoint with only stage one training can be downloaded [here](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link). Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. **2. Second finetuning stage** In the second stage, we use a small high quality image-text pair dataset created by ourselves and convert it to a conversation format to further align MiniGPT-4. To download and prepare our second stage dataset, please check our [second stage dataset preparation instruction](dataset/README_2_STAGE.md). To launch the second stage alignment, first specify the path to the checkpoint file trained in stage 1 in [train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). You can also specify the output path there. Then, run the following command. In our experiments, we use 1 A100. ```bash torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml ``` After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. ## Acknowledgement + [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before! + [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis! + [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source! If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX: ```bibtex @misc{zhu2022minigpt4, tit

评论收藏

内容反馈