# LLaMA-Adapter: Efficient Fine-tuning of LLaMA
## Released Models
| Name | Approach | Data | Modality | Visual | Text |
| ------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------ | ---------------------------------- | -------------- | --------------------- |
| [LLaMA-Adapter V1](./alpaca_finetuning_v1) | prefix, gate | Alpaca | Text | × | LLaMA-7B |
| [LLaMA-Adapter V2 dialog](./llama_adapter_v2_chat65b) | scale, bias, norm | ShareGPT | Text | × | LLaMA-65B |
| [LLaMA-Adapter V2 multimodal](./llama_adapter_v2_multimodal7b) | [P] prefix, projection, gate <br />[F] bias, norm | [P] Image-Text-V1<br />[F] GPT4LLM, LLaVA | Image&Text | CLIP-ViT-L/14 | LLaMA-7B |
| [LLaMA-Adapter V2.1 multimodal](./llama_adapter_v2_multimodal7b) | [P] prefix, projection, gate <br />[F] bias, norm, lora | [P] Image-Text-V1<br />[F] GPT4LLM, LLaVA, VQAv2 | Image&Text | CLIP-ViT-L/14 | LLaMA-7B |
| [ImageBind-LLM](./imagebind_LLM) | [P] prefix, projection, gate<br />[F] bias, norm, lora | [P] Image-Text-V1<br />[F] [Instruction Following](https://github.com/OpenGVLab/LLaMA-Adapter/blob/main/imagebind_LLM/docs/train.md#data-1) | ImageBind Modalities + Point Cloud | imagebind_huge | Open-Chinese-LLaMA-7B |
| ImageBind-dialog | [P] prefix, projection, gate<br />[F] bias, norm, lora | [P] Image-Text-V1<br />[F] LLaVA, ShareGPT | ImageBind Modalities + Point Cloud | imagebind_huge | Open-Chinese-LLaMA-7B |
+ [P] means **P**re-train and [F] means **F**ine-tune
+ **Image-Text-V1** is a concatenation of LAION400M, COYO, MMC4, SBU, Conceptual Captions, and COCO
+ **ImageBind Modalities** include image, video, text, audio, depth, thermal, IMU
+ **ImageBind-dialog** will be release soon
## Overview
Efficiency Comparison:
| Model | Parameters | Storage Space | Training Time
| :-----: | :-----: |:-----:| :-----: |
| [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) | 7B |13G| 3 Hours|
| LLaMA-Adapter | 1.2M |4.7M| 1 Hour|
By inserting adapters into LLaMA's transformer, our method only introduces **1.2M** learnable parameters, and turns a LLaMA into an instruction-following model within **1 hour**. For stablizing training at early stages, we propose a novel **Zero-init Attention** with zero gating mechanism to adaptively incorporate the instructional signals. After fine-tuning, LLaMA-Adapter can generate high-quality instruction-following sentences, comparable to the fully fine-tuned [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [Alpaca-Lora](https://github.com/tloen/alpaca-lora).
<div align="center">
<img src="docs/pipeline.png"/ width="90%">
</div>
Our approach can be simply extended to **Multi-modal Input Instructions**. The reasoning framework of image-conditioned LLaMA-Adapter for [ScienceQA](https://scienceqa.github.io/) is as follows, which is also shared by other modalities, such as audio and video.
<div align="center">
<img src="docs/multimodal.png"/ width="90%">
</div>
## Setup
Here is a from-scratch script for **LLaMA-Adapter V1**.
```bash
conda create -n llama_adapter -y python=3.8
conda activate llama_adapter
# install pytorch
conda install pytorch cudatoolkit -c pytorch -y
# install dependency and llama-adapter
pip install -r requirements.txt
pip install -e .
```
**Note**: **To setup other models**, please refer to [llama_adapter_v2_chat65b](llama_adapter_v2_chat65b), [llama_adapter_v2_multimodal7b](llama_adapter_v2_multimodal7b) and [imagebind_LLM](imagebind_LLM) for more details.
## Inference
Please request access to the pre-trained LLaMA from [this form](https://forms.gle/jk851eBVbX1m5TAv5) (official) or download the LLaMA-7B from [Hugging Face](https://huggingface.co/nyanko7/LLaMA-7B/tree/main) (unofficial). Then, obtain the weights of our LLaMA-Adapter from [here](https://github.com/OpenGVLab/LLaMA-Adapter/releases/download/v.1.0.0/llama_adapter_len10_layer30_release.pth). We denote the path to the downloaded weights of LLaMA and adapters as `TARGET_FOLDER` and `ADAPTER_PATH`.
Here is an example to generate instruction-following sentences with 7B LLaMA model and our LLaMA-Adapter:
```bash
torchrun --nproc_per_node 1 example.py \
--ckpt_dir $TARGET_FOLDER/model_size\
--tokenizer_path $TARGET_FOLDER/tokenizer.model \
--adapter_path $ADAPTER_PATH
```
## Training
We release the simple fine-tuning code of LLaMA-Adapter on LLaMA-7B model at [here](alpaca_finetuning_v1), which is for effortless reproduction with minimal dependencies. We will soon release the fine-tuning code for LLaMA-65B and multi-model LLaMA-Adapter.
Please download the 52K instruction-following training [data](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) from Standford Alpaca, and put it under `DATA_PATH`. Then run:
```bash
cd alpaca_finetuning_v1
torchrun --nproc_per_node 8 finetuning.py \
--model Llama7B_adapter \
--llama_model_path $TARGET_FOLDER/ \
--data_path $DATA_PATH/alpaca_data.json \
--adapter_layer 30 \
--adapter_len 10 \
--max_seq_len 512 \
--batch_size 4 \
--epochs 5 \
--warmup_epochs 2 \
--blr 9e-3 \
--weight_decay 0.02 \
--output_dir ./checkpoint/
```
## Comparison with Other Methods
### Instruction 1:
How can I develop my critical thinking skills? Give me three suggestions.
* **LLaMA-Adapter V2:**
1. Read widely: Reading books, articles, and other sources of information can help you develop your critical thinking skills. By exposing yourself to different perspectives and ideas, you can learn to analyze and evaluate information more effectively.
2. Practice mindfulness: Mindfulness is the practice of being present in the moment and paying attention to your thoughts and feelings. By being more aware of your thoughts and emotions, you can better understand and evaluate them. This can help you develop a more objective and rational approach to thinking.
3. Engage in open-minded discussions: Join a discussion group or debate club where you can engage in open-minded discussions with others. This can help you develop your critical thinking skills by challenging your assumptions and encouraging you to consider different perspectives.
* **LLaMA-Adapter V1:**
1. Read widely and critically.
2. Engage in discussions with people who have different perspectives.
3. Ask questions and seek out answers.
### Instruction 2:
As a superhero, how would you explain your origin story to a curious child?
* **LLaMA-Adapter V2:**
As a superhero, I have a very interesting and exciting origin story to share with you. Once upon a time, there was a young boy named John who lived in a small town. John was always curious and wanted to help others, so he spent his days exploring the world around him and learning about different cultures and people. One day, while he was out exploring, he stumbled upon a mysterious cave. Inside the cave, John found a strange device that had the power to give him incredible abilities. With these abilities, John could fly, shoot lasers from his eyes, and even turn invisible.
John knew that he had to use the