# [UniControl](https://arxiv.org/abs/2305.11147) [![arXiv](https://img.shields.io/badge/ð-arXiv-ff69b4)](https://arxiv.org/pdf/2305.11147.pdf) [![webpage](https://img.shields.io/badge/ð¥-Website-9cf)](https://canqin001.github.io/UniControl-Page/) [![HuggingFace space](https://img.shields.io/badge/ð¤-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/Robert001/UniControl-Demo)
<div align="center">
<a><img src="figs/salesforce.png" height="100px" ></a>
<a><img src="figs/northeastern.png" height="100px" ></a>
<a><img src="figs/stanford.png" height="100px" ></a>
</div>
This repository is for the paper:
> **[UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild](https://arxiv.org/abs/2305.11147)** \
> Can Qin <sup>1,2</sup>, Shu Zhang<sup>1</sup>, Ning Yu <sup>1</sup>, Yihao Feng<sup>1</sup>, Xinyi Yang<sup>1</sup>, Yingbo Zhou <sup>1</sup>, Huan Wang <sup>1</sup>, Juan Carlos Niebles<sup>1</sup>, Caiming Xiong <sup>1</sup>, Silvio Savarese <sup>1</sup>, Stefano Ermon <sup>3</sup>, Yun Fu <sup>2</sup>, Ran Xu <sup>1</sup> \
> <sup>1</sup> Salesforce AI <sup>2</sup> Northeastern University <sup>3</sup> Stanford University \
> Work done when Can Qin was an intern at Salesforce AI Research.
![img](figs/demo_simple.png)
## Introduction
We introduce **UniControl**, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation.
![img](figs/method.png)
## Updates
* **05/18/23**: ***[UniControl](https://arxiv.org/abs/2305.11147) paper uploaded to arXiv.***
* **05/26/23**: ***UniControl inference code and checkpoint open to public.***
* **05/28/23**: ***Latest UniControl model [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol.ckpt) (1.4B #params, 5.78GB) updated.***
* **06/08/23**: ***Latest UniControl model [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol.ckpt) updated which supports 12 tasks now (***Canny***, ***HED***, ***Sketch***, ***Depth***, ***Normal***, ***Skeleton***, ***Bbox***, ***Seg***, ***Outpainting***, ***Inpainting***, ***Deblurring*** and ***Colorization***) !***
* **06/08/23**: ***Training dataset ([MultiGen-20M](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset)) is fully released.***
* **06/08/23**: ***Training code is public.***:blush:
* **07/06/23**: ***Latest UniControl model v1.1 [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol_v1.1.ckpt) updated which supports 12 tasks now (***Canny***, ***HED***, ***Sketch***, ***Depth***, ***Normal***, ***Skeleton***, ***Bbox***, ***Seg***, ***Outpainting***, ***Inpainting***, ***Deblurring*** and ***Colorization***) !***
* **07/25/23**: ***Huggingface Demo API is available! [![HuggingFace space](https://img.shields.io/badge/ð¤-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/Robert001/UniControl-Demo)***
* **07/25/23**: ***Safetensors model is available! [checkpoint](https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.st)***
* **09/21/23**: ***UniControl is accepted to NeurIPS 2023.***:blush:
## MultiGen-20M Datasets
There are more than 20M image-prompt-condition triplets [here](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset) with total size ***> 2TB***. It includes all 12 tasks (`Canny, HED, Sketch, Depth, Normal, Skeleton, Bbox, Seg, Outpainting, Inpainting, Deblurring, Colorization`) which are fully released.
## Instruction
### Environment Preparation
Setup the env first (need to wait a few minutes).
```
conda env create -f environment.yaml
conda activate unicontrol
```
### Checkpoint Preparation
The checkpoint of pre-trained UniControl model is saved at `./ckpts/unicontrol.ckpt`.
```
cd ckpts
wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol.ckpt
```
You can also use the latest trained model (ckpt and safetensors)
```
wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.ckpt
wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.st
```
If you want to train from scratch, please follow the ControlNet to prepare the checkpoint initialization. ControlNet provides a simple script for you to achieve this easily. If your SD filename is `./ckpts/v1-5-pruned.ckpt` and you want the script to save the processed model (SD+ControlNet) at location `./ckpts/control_sd15_ini.ckpt`, you can just run:
```
python tool_add_control.py ./ckpts/v1-5-pruned.ckpt ./ckpts/control_sd15_ini.ckpt
```
### Data Preparation
Please download the training dataset ([MultiGen-20M](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset)) to `./multigen20m`. Please:
```
cd multigen20m
gsutil -m cp -r gs://sfr-unicontrol-data-research/dataset ./
```
Then unzip the all the files.
### Model Training (CUDA 11.0 and Conda 4.12.0 work)
Training from Scratch:
```
python train_unicontrol.py --ckpt ./ckpts/control_sd15_ini.ckpt --config ./models/cldm_v15_unicontrol_v11.yaml --lr 1e-5
```
Model Finetuning:
```
python train_unicontrol.py --ckpt ./ckpts/unicontrol.ckpt --config ./models/cldm_v15_unicontrol.yaml --lr 1e-7
```
### Model Inference (CUDA 11.0 and Conda 4.12.0 work)
For different tasks, please run the code as follows. If you meet OOM error, please decrease the "--num_samples".
If you use safetensors model, you can load the model following ./load_model/load_safetensors_model.py
Canny to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task canny
```
HED Edge to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task hed
```
HED-like Skech to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task hedsketch
```
Depth Map to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task depth
```
Normal Surface Map to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task normal
```
Segmentation Map to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task seg
```
Human Skeleton to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task openpose
```
Object Bounding Boxes to Image Generation:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task bbox
```
Image Outpainting:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task outpainting
```
Image Inpainting:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task inpainting
```
Image Deblurring:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task blur
```
Image Colorization:
```
python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task grayscale
```
### Gradio Demo ([App Demo Video](https://github.com/salesforce/UniControl/issues/1), CUDA 11.0 and Conda 4.12.0 work)
We have provided gradio demos for different tasks to use. The example images are saved at `./test_imgs`.
<div align="center">
<a><img src="f
没有合适的资源?快使用搜索试试~ 我知道了~
Unified Controllable Visual Generation Model.zip
共1436个文件
py:1266个
jpg:70个
png:54个
需积分: 0 0 下载量 45 浏览量
2023-12-24
13:13:51
上传
评论
收藏 66.14MB ZIP 举报
温馨提示
AIGC(Artificial Intelligence Generated Content,人工智能生成内容)的重要性体现在以下几个方面: 内容创作效率提升: AIGC能够快速生成大量高质量的内容,包括文本、图像、音频、视频等,极大地提高了创作效率。这不仅降低了人力成本,也使得内容更新和迭代的速度加快,满足了信息爆炸时代人们对新鲜内容的高需求。 个性化和定制化服务: AIGC可以根据用户的需求和偏好自动生成个性化的内容。这种能力在教育、娱乐、营销等领域具有巨大价值,能够提供高度定制化的用户体验,增强用户黏性和满意度。 创新与发现新应用: AIGC技术的不断发展和普及促进了新的应用场景和商业模式的诞生。通过降低开发门槛,更多的开发者和企业能够探索和实验AIGC的应用,有可能催生出全新的现象级应用和服务。 商业效益增长: AIGC在数字商业化领域具有显著优势。它能够赋能营销策略,提高广告和推广的精准度和效果,从而带动企业收入的增长。同时,通过自动化的内容生成,企业可以节省资源并专注于核心业务的创新和发展。 知识传播与教育: AIGC能够生成教育材料、教程和知识摘要,帮助人们更高效地获取和学习新知识。在教育领域,AIGC可以个性化定制学习路径和内容,适应不同学生的学习速度和方式。 行业效率优化: 在保险、出版、法律等行业,AIGC可以自动处理大量的文档、报告和合同,提高工作效率,减少人为错误,并提供数据分析和决策支持。 学术研究与伦理考量: AIGC在学术研究中的应用需要遵循特定的使用边界和准则,以防止学术不端行为。明确的指南有助于确保研究成果的真实性和可信度,同时推动AI技术在科研领域的健康发展。
资源推荐
资源详情
资源评论
收起资源包目录
Unified Controllable Visual Generation Model.zip (1436个子文件)
CODEOWNERS 141B
.gitignore 2KB
control_87875.jpg 174KB
control_592.jpg 89KB
control_315.jpg 79KB
control_24456.jpg 79KB
control_598.jpg 78KB
control_248.jpg 74KB
control_50426.jpg 63KB
man-donut.jpg 56KB
Zelda-Breath-of-the-Wild.jpg 55KB
control_00663.jpg 53KB
control_27182.jpg 52KB
control_246.jpg 48KB
control_78.jpg 48KB
control_332.jpg 47KB
kitchen.jpg 46KB
control_66524.jpg 40KB
girl-brush-teeth.jpg 39KB
control_276.jpg 37KB
super-man-city.jpg 36KB
woman-hat.jpg 35KB
control_00785.jpg 35KB
control_64786.jpg 34KB
baseball-player.jpg 32KB
control_92236.jpg 32KB
control_158.jpg 32KB
control_238.jpg 32KB
control_457.jpg 32KB
control_117.jpg 32KB
control_25062.jpg 31KB
control_66523.jpg 31KB
control_19578.jpg 31KB
control_73237.jpg 30KB
mousse-cake.jpg 29KB
control_81859.jpg 29KB
control_22.jpg 28KB
control_47.jpg 27KB
control_235.jpg 27KB
control_84477.jpg 27KB
control_66141.jpg 26KB
control_223.jpg 24KB
control_482.jpg 23KB
control_61.jpg 21KB
control_31269.jpg 20KB
man-shirt-tie.jpg 20KB
control_386.jpg 19KB
control_89381.jpg 19KB
control_147.jpg 18KB
control_217.jpg 17KB
control_276.jpg 17KB
control_148.jpg 16KB
control_194.jpg 16KB
control_75015.jpg 15KB
control_167.jpg 15KB
control_217.jpg 14KB
control_165.jpg 14KB
control_68294.jpg 13KB
control_40175.jpg 13KB
control_177.jpg 13KB
control_83.jpg 13KB
control_152.jpg 13KB
control_158.jpg 12KB
control_12836.jpg 12KB
control_01135.jpg 12KB
control_72517.jpg 12KB
control_53860.jpg 11KB
control_09798.jpg 11KB
control_80560.jpg 10KB
control_223.jpg 10KB
control_13403.jpg 10KB
control_186.jpg 8KB
open_mmlab.json 5KB
open_mmlab.json 5KB
mmcls.json 4KB
mmcls.json 4KB
hedsketch.json 2KB
openpose.json 1KB
normal.json 1KB
hed.json 1KB
depth.json 1KB
outpainting.json 919B
seg.json 907B
canny.json 871B
bbox.json 637B
grayscale.json 386B
blur.json 320B
inpainting.json 242B
deprecated.json 217B
deprecated.json 217B
LICENSE 11KB
LICENSE 11KB
LICENSE 11KB
LICENSE 9KB
LICENSE 1KB
README.md 12KB
CODE_OF_CONDUCT.md 5KB
README.md 415B
SECURITY.md 400B
README.md 13B
共 1436 条
- 1
- 2
- 3
- 4
- 5
- 6
- 15
资源评论
极致人生-010
- 粉丝: 3237
- 资源: 3077
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功