<p align="center">
<img src="assets/image/ds-shiba.png" alt="DeepSpeed Shiba Inu!"/>
</p>
<div align="center">
## ðDeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scalesð
</div>
<div align="center">
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](LICENSE)
</div>
A fast, affordable, scalable and open system framework for enabling end-to-end Reinforcement Learning Human Feedback (RLHF) training experience to generate high-quality ChatGPT-style models at all scales.
<div align="center">
<img src="assets/image/four_blocks.png" alt="DeepSpeed ChatGPT-Like Models Banner"/>
</div>
<!-- Three language version (Eng/Chinese/Japanese) -->
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
## Table of Contents
- [ð° Latest News ð°](#-latest-news-)
- [ð What is DeepSpeed Chat ðï¸](#-what-is-deepspeed-chat-)
- [𧨠Capabilities ð§¨](#-capabilities-)
- [â Quick Start â](#-quick-start-)
- [ð¼ Installation](#-installation)
- [ð¼ Single Script for Training 3-Step RLHF Pipeline](#-one-single-script-completes-all-three-stages-of-rlhf-training-and-generate-your-first-chatgpt-model)
- [ð¼ Demonstration: Individual Step Fine-Tuning](#-demonstration-individual-step-fine-tuning)
- [ð Step 1 - Supervised Fine-Tuning](#-step-1---supervised-fine-tuning)
- [ð Step 2 - Reward Model](#-step-2---reward-model)
- [ð Step 3 - Reinforcement Learning with Human Feedback](#-step-3---reinforcement-learning-with-human-feedback)
- [ð¼ Adding and using your own datasets in DeepSpeed-Chat](#-adding-and-using-your-own-datasets-in-deepspeed-chat)
- [ð¼ Customizing RLHF training pipeline via DeepSpeed-Chatâs APIs](#-customizing-your-own-rlhf-training-pipeline-using-deepspeed-chats-rlhf-apis)
- [ð¼ Serving Your Model: Plug-in and Test!](#-serving-plug-in-your-final-model-trained-by-deepspeed-chat-and-test-it-out)
- [ð¥ Training Performance Evaluation ð¥](#-training-performance-evaluation-)
- [ð½ Supported Models ð½](#-supported-models-)
- [ð¬ Build Pipeline Status ð¬](#-build-pipeline-status-)
- [â Documentation and Tutorial â](#-documentation-and-tutorial-)
- [ð± DeepSpeed Chat's Roadmap ð±](#-deepspeed-chats-roadmap-)
- [ð¬ DeepSpeed Chat and DeepSpeed Community ð¬](#-deepspeed-chat-and-deepspeed-community-)
- [ð Acknowledgement and Citation ð](#-acknowledgement-and-citation-)
<!-- markdown-toc end -->
## ð° Latest News ð°
* ***[2023/04] ð [DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat)*** [[English](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat/README.md)] [[ä¸æ](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat/chinese/README.md)] [[æ¥æ¬èª](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat/japanese/README.md)]ð
To cite DeepSpeed Chat, please cite our [arxiv report](https://arxiv.org/abs/2308.01320):
```
@article{yao2023dschat,
title={{DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales}},
author={Zhewei Yao and Reza Yazdani Aminabadi and Olatunji Ruwase and Samyam Rajbhandari and Xiaoxia Wu and Ammar Ahmad Awan and Jeff Rasley and Minjia Zhang and Conglong Li and Connor Holmes and Zhongzhu Zhou and Michael Wyatt and Molly Smith and Lev Kurilenko and Heyang Qin and Masahiro Tanaka and Shuai Che and Shuaiwen Leon Song and Yuxiong He},
journal={arXiv preprint arXiv:2308.01320},
year={2023}
}
```
## ð What is DeepSpeed Chat ð
<div align="center">
https://user-images.githubusercontent.com/124002815/230290966-a78ea171-ab65-4fcc-b91e-67c7c4403497.mp4
</div>
In the spirit of democratizing ChatGPT-style models and their capabilities, DeepSpeed is proud to introduce a general system framework for enabling an end-to-end training experience for ChatGPT-like models, named ***DeepSpeed Chat***. It can automatically take your favorite pre-trained large language models though an OpenAI InstructGPT style three stages to produce your very own high-quality ChatGPT-style model. DeepSpeed Chat makes training for high-quality ChatGPT-style models easy, fast, affordable and scalable.
With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems, and can handle training of ChatGPT-like models with over 200 billion parameters: another impossible feat with the existing systems. For the full range of discussion on various model sizes and low training cost enabled by DeepSpeed-Chat, please refer to the [Release Blog](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat) and [Training Performance Evaluation](#-training-performance-evaluation-).
Beyond this release, DeepSpeed system has been proudly serving as the system backend for accelerating a range of on-going efforts for fast training/fine-tuning Chat-Style models (e.g., LLaMA). The following are some of the open-source examples that are powered by DeepSpeed:
- [Databricks Dolly](https://github.com/databrickslabs/dolly)
- [LMFlow](https://github.com/OptimalScale/LMFlow)
- [CarperAI-TRLX](https://github.com/CarperAI/trlx)
- [Huggingface-PEFT](https://github.com/huggingface/peft)
## 𧨠Capabilities ð§¨
DeepSpeed Chat is evolving fast to accommodate the increasing demand for system-level acceleration support for training/finetuning as well as serving emerging models. Please stay tuned with our upcoming milestones at [Roadmap](#-deepspeed-chats-roadmap-).
A summary of DeepSpeed Chat includes:
+ **DeepSpeed Chat**: a complete end-to-end three-stage OpenAI InstructGPT training strategy with Reinforcement Learning Human Feedback (RLHF), to generate high-quality ChatGPT-style models from usersâ favorite pre-trained large language model checkpoints;
+ **DeepSpeed Hybrid Engine**: A new system support for fast, affordable and scalable RLHF training at All Scales. It is built upon your favorite DeepSpeed's system capability such as ZeRO technologies and DeepSpeed-Inference;
+ **Easy-breezy Training Experience**: A single script capable of taking a pre-trained Huggingface model and running it though all three steps of the RLHF training.
+ **A Universal System Support for Todayâs ChatGPT-like Model Training**: DeepSpeed Chat can serve as the system backend for not only the 3-step instruct-base RLHF pipeline, but also the current single model finetuning exploration (e.g., LLaMA-centric finetuning) and generic RLHF training for various models and scenarios.
Please check out our [Blog Release](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat) and [Documentation and Tutorial](#-documentation-and-tutorial-) for more details on our training methodology and new system technologies.
## â Quick Start â
### ð¼ Installation
```bash
pip install deepspeed>=0.9.0
git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txt
```
### ð¼ One Single Script Completes All Three Steps of RLHF Training and Generate Your First ChatGPT Model
**:yellow_heart: DeepSpeed-Chatâs RLHF Example 1: Coffee Time Training for a 1.3B ChatGPT Model**
<details><summary> Expand </summary><p>
If you only have around **1-2 hour** for coffee or lunch break, you c