大语言模型量化-对LLMs进行量化以进行搞笑Finetuning微调-附项目源码-优质项目分享.zip资源-CSDN文库

共272个文件

jsonl：249个

sh：7个

py：4个

版权申诉

47 浏览量 2024-05-11 11:14:20 上传评论收藏 50.93MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

大语言模型量化-对LLMs进行量化以进行搞笑Finetuning微调-附项目源码-优质项目分享.zip （272个子文件）

vicuna_benchmark_human_annotations.csv 21.02MB

mturk_ui.html 4KB

generations_qualitative_comparison_guanaco65b_vs_gpt35.ipynb 327KB

guanaco_7B_demo_colab.ipynb 15KB

five_shot_mmlu_test.json 40.41MB

zero_shot_mmlu_test.json 8.35MB

five_shot_mmlu_val.json 4.43MB

zero_shot_mmlu_val.json 936KB

7b-hh-rlhf-oa-generations-topp0.9-temp0.7.jsonl 8.07MB

13b-hh-rlhf-oa-generations-topp0.9-temp0.7.jsonl 7.85MB

30b-hh-rlhf-oa-generations-topp0.9-temp0.7.jsonl 7.75MB

7b-longform-oa-generations-topp0.9-temp0.7.jsonl 6.69MB

13b-longform-oa-generations-topp0.9-temp0.7.jsonl 6.65MB

13b-guanaco-oa-generations-topp0.9-temp0.7.jsonl 6.19MB

7b-guanaco-oa-generations-topp0.9-temp0.7.jsonl 6.12MB

65b-hh-rlhf-oa-generations-topp0.9-temp0.7.jsonl 5.95MB

65b-guanaco-oa-generations-topp0.9-temp0.7.jsonl 5.91MB

30b-guanaco-oa-generations-topp0.9-temp0.7.jsonl 5.87MB

7b-self-instruct-oa-generations-topp0.9-temp0.7.jsonl 5.38MB

vicuna-13b-oa-generations.jsonl 5.31MB

65b-self-instruct-oa-generations-topp0.9-temp0.7.jsonl 5.11MB

13b-self-instruct-oa-generations-topp0.9-temp0.7.jsonl 4.93MB

7b-alpaca-oa-generations-topp0.9-temp0.7.jsonl 4.9MB

gpt-4-oa-generations.jsonl 4.73MB

65b-longform-oa-generations-topp0.9-temp0.7.jsonl 4.7MB

7b-unnatural-instructions-oa-generations-topp0.9-temp0.7.jsonl 4.65MB

7b-chip2-oa-generations-topp0.9-temp0.7.jsonl 4.65MB

30b-self-instruct-oa-generations-topp0.9-temp0.7.jsonl 4.57MB

30b-longform-oa-generations-topp0.9-temp0.7.jsonl 4.32MB

30b-unnatural-instructions-oa-generations-topp0.9-temp0.7.jsonl 4.25MB

65b-alpaca-oa-generations-topp0.9-temp0.7.jsonl 4.24MB

13b-alpaca-oa-generations-topp0.9-temp0.7.jsonl 4.2MB

13b-unnatural-instructions-oa-generations-topp0.9-temp0.7.jsonl 4.19MB

13b-chip2-oa-generations-topp0.9-temp0.7.jsonl 4.19MB

30b-alpaca-oa-generations-topp0.9-temp0.7.jsonl 4.16MB

65b-chip2-oa-generations-topp0.9-temp0.7.jsonl 4.12MB

gpt-3.5-oa-generations.jsonl 3.99MB

65b-unnatural-instructions-oa-generations-topp0.9-temp0.7.jsonl 3.97MB

13b-flan-oa-generations-topp0.9-temp0.7.jsonl 3.95MB

30b-chip2-oa-generations-topp0.9-temp0.7.jsonl 3.89MB

oa_questions.jsonl 3.87MB

65b-flan-oa-generations-topp0.9-temp0.7.jsonl 3.68MB

7b-flan-oa-generations-topp0.9-temp0.7.jsonl 3.57MB

30b-flan-oa-generations-topp0.9-temp0.7.jsonl 3.54MB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.03MB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1.03MB

vicuna-13b-oa-generations-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.02MB

gpt-3.5-oa-generations-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.02MB

gpt-4-oa-generations-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.02MB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.02MB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1.02MB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1.02MB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-13b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

gpt-4-oa-generations-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

vicuna-13b-oa-generations-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

gpt-4-oa-generations-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1.01MB

gpt-3.5-oa-generations-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-7b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1.01MB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1MB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1MB

vicuna-13b-oa-generations-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1MB

vicuna-13b-oa-generations-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1MB

gpt-4-oa-generations-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1022KB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1020KB

gpt-3.5-oa-generations-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1020KB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1019KB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-30b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1018KB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1018KB

gpt-3.5-oa-generations-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1018KB

vicuna-13b-oa-generations-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1018KB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1017KB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1017KB

gpt-4-oa-generations-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1014KB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-vicuna-13b-oa-generations-gpt-4-reviewer-threeclass.jsonl 1014KB

gpt-3.5-oa-generations-vs-65b-guanaco-oa-generations-topp0.9-temp0.7-gpt-4-reviewer-threeclass.jsonl 1011KB

13b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 1010KB

30b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 1008KB

gpt-3.5-oa-generations-vs-gpt-4-oa-generations-gpt-4-reviewer-threeclass.jsonl 1004KB

vicuna-13b-oa-generations-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 1004KB

65b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 1003KB

7b-guanaco-oa-generations-topp0.9-temp0.7-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 1003KB

gpt-4-oa-generations-vs-gpt-3.5-oa-generations-gpt-4-reviewer-threeclass.jsonl 996KB

65b-guanaco-vicuna-generations-topp0.9-temp0.7.jsonl 290KB

7b-guanaco-vicuna-generations-topp0.9-temp0.7.jsonl 288KB

30b-guanaco-vicuna-generations-topp0.9-temp0.7.jsonl 282KB

13b-guanaco-vicuna-generations-topp0.9-temp0.7.jsonl 262KB

30b-hh-rlhf-vicuna-generations-topp0.9-temp0.7.jsonl 257KB

7b-hh-rlhf-vicuna-generations-topp0.9-temp0.7.jsonl 254KB

13b-hh-rlhf-vicuna-generations-topp0.9-temp0.7.jsonl 253KB

65b-longform-vicuna-generations-topp0.9-temp0.7.jsonl 213KB

65b-hh-rlhf-vicuna-generations-topp0.9-temp0.7.jsonl 205KB

13b-longform-vicuna-generations-topp0.9-temp0.7.jsonl 180KB

7b-longform-vicuna-generations-topp0.9-temp0.7.jsonl 177KB

answer_gpt4.jsonl 174KB

7b-alpaca-vicuna-generations-topp0.9-temp0.7.jsonl 153KB

30b-longform-vicuna-generations-topp0.9-temp0.7.jsonl 143KB

共 272 条

# QLoRA: Efficient Finetuning of Quantized LLMs ## Demo Guanaco is a system purely intended for research purposes and could produce problematic outputs. 1. Access the [live demo here](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi). Note this is the 33B model, the 65B model demo will come later. 2. Or host your own Guanaco gradio demo directly in Colab with [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing). Works with free GPUs for 7B and 13B models. 3. Alternatively, can you distinguish ChatGPT from Guanaco? Give it a try! You can access [the model response Colab here](https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing) comparing ChatGPT and Guanaco 65B on Vicuna prompts. ## Installation To load models in 4bits with transformers and bitsandbytes, you have to install accelerate and transformers from source and make sure you have the latest version of the bitsandbytes library. After installing PyTorch (follow instructions [here](https://pytorch.org/get-started/locally/)), you can achieve the above with the following command: ```bash pip install -U -r requirements.txt ``` ## Getting Started The `qlora.py` code is a starting point for finetuning and inference on various datasets. Basic command for finetuning a baseline model on the Alpaca dataset: ```bash python qlora.py --model_name_or_path <path_or_name> ``` For models larger than 13B, we recommend adjusting the learning rate: ```bash python qlora.py –learning_rate 0.0001 --model_name_or_path <path_or_name> ``` To replicate our Guanaco models see below. ### Tutorials and Demonstrations Here is [a blog](https://huggingface.co/blog/4bit-transformers-bitsandbytes) discussing 4-bit quantization, QLoRA, and how they are integrated in transformers. You can host your own gradio Guanaco demo directly in Colab following [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing). In addition, here are Colab notebooks with examples for inference and finetuning using QLoRA: - [Inference notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing) - [Finetuning notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing) Other examples are found under the `examples/` folder. We include a generation getting started example with guanaco at `examples/guanaco_generate.py`. ### Quantization Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF documenation](https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig)) as follows: - Loading in 4 bits is activated through `load_in_4bit` - The datatype used for the linear layer computations with `bnb_4bit_compute_dtype` - Nested quantization is activated through `bnb_4bit_use_double_quant` - The datatype used for qunatization is specified with `bnb_4bit_quant_type`. Note that there are two supported quantization datatypes `fp4` (four bit float) and `nf4` (normal four bit float). The latter is theoretically optimal for normally distributed weights and we recommend using `nf4`. ```python model = AutoModelForCausalLM.from_pretrained( model_name_or_path='/name/or/path/to/your/model', load_in_4bit=True, device_map='auto', max_memory=max_memory, torch_dtype=torch.bfloat16, quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4' ), ) ``` ### Paged Optimizer You can access the paged optimizer with the argument `--optim paged_adamw_32bit` ### Guanaco Finetuning You can select `--dataset oasst1` to load the OpenAssistant dataset that was used to train Guanaco. You can also find it on HF at [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco). We include scripts to reproduce the hyperparameters of Guanaco model training for various sizes at `./scripts/finetune_guanaco*.sh`. Make sure to adjust `per_device_train_batch_size` and `gradient_accumulation_steps` so that their product is 16 and training fits on your GPUs. ### Using Local Datasets You can specify the path to your dataset using the `--dataset` argument. If the `--dataset_format` argument is not set, it will default to the Alpaca format. Here are a few examples: - Training with an *alpaca* format dataset: ```bash python qlora.py --dataset="path/to/your/dataset" ``` - Training with a *self-instruct* format dataset: ```bash python qlora.py --dataset="path/to/your/dataset" --dataset_format="self-instruct" ``` ### Multi GPU Multi GPU training and inference work out-of-the-box with Hugging Face's Accelerate. Note that the `per_device_train_batch_size` and `per_device_eval_batch_size` arguments are global batch sizes unlike what their name suggest. When loading a model for training or inference on multiple GPUs you should pass something like the following to `AutoModelForCausalLM.from_pretrained()`: ```python device_map = "auto" max_memory = {i: '46000MB' for i in range(torch.cuda.device_count())} ``` ## Sample Outputs We provide generations for the models described in the paper for both OA and Vicuna queries in the `eval/generations` folder. These are intended to foster further research on model evaluation and analysis. Can you distinguish ChatGPT from Guanaco? Give it a try! You can access [the model response Colab here](https://colab.research.google.com/drive/1kK6xasHiav9nhiRUJjPMZb4fAED4qRHb?usp=sharing) comparing ChatGPT and Guanaco 65B on Vicuna prompts. ## Evaluation We include scripts adapted from the FastChat repo to automatically evaluate model generations using GPT-4. We include script for comparisons relative to ChatGPT with scores out of 10 as well as "pairwise comparisons" with three class labeling (win, loose, or tie). These are found in the `eval` folder. To facilitate the replication of our evaluation and future work in this area, we release GPT-4 and human ratings of our systems. These are found under `eval/ratings-human` and `eval/ratings-gpt4`. More details can be found at `eval/EVAL_README.md`.

评论收藏

内容反馈

版权申诉