【免费】UnifiedControllableVisualGenerationModel.zip资源-CSDN文库

共1436个文件

py：1266个

jpg：70个

png：54个

AI源码

需积分: 0 45 浏览量 2023-12-24 13:13:51 上传评论收藏 66.14MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Unified Controllable Visual Generation Model.zip （1436个子文件）

CODEOWNERS 141B

.gitignore 2KB

control_87875.jpg 174KB

control_592.jpg 89KB

control_315.jpg 79KB

control_24456.jpg 79KB

control_598.jpg 78KB

control_248.jpg 74KB

control_50426.jpg 63KB

man-donut.jpg 56KB

Zelda-Breath-of-the-Wild.jpg 55KB

control_00663.jpg 53KB

control_27182.jpg 52KB

control_246.jpg 48KB

control_78.jpg 48KB

control_332.jpg 47KB

kitchen.jpg 46KB

control_66524.jpg 40KB

girl-brush-teeth.jpg 39KB

control_276.jpg 37KB

super-man-city.jpg 36KB

woman-hat.jpg 35KB

control_00785.jpg 35KB

control_64786.jpg 34KB

baseball-player.jpg 32KB

control_92236.jpg 32KB

control_158.jpg 32KB

control_238.jpg 32KB

control_457.jpg 32KB

control_117.jpg 32KB

control_25062.jpg 31KB

control_66523.jpg 31KB

control_19578.jpg 31KB

control_73237.jpg 30KB

mousse-cake.jpg 29KB

control_81859.jpg 29KB

control_22.jpg 28KB

control_47.jpg 27KB

control_235.jpg 27KB

control_84477.jpg 27KB

control_66141.jpg 26KB

control_223.jpg 24KB

control_482.jpg 23KB

control_61.jpg 21KB

control_31269.jpg 20KB

man-shirt-tie.jpg 20KB

control_386.jpg 19KB

control_89381.jpg 19KB

control_147.jpg 18KB

control_217.jpg 17KB

control_276.jpg 17KB

control_148.jpg 16KB

control_194.jpg 16KB

control_75015.jpg 15KB

control_167.jpg 15KB

control_217.jpg 14KB

control_165.jpg 14KB

control_68294.jpg 13KB

control_40175.jpg 13KB

control_177.jpg 13KB

control_83.jpg 13KB

control_152.jpg 13KB

control_158.jpg 12KB

control_12836.jpg 12KB

control_01135.jpg 12KB

control_72517.jpg 12KB

control_53860.jpg 11KB

control_09798.jpg 11KB

control_80560.jpg 10KB

control_223.jpg 10KB

control_13403.jpg 10KB

control_186.jpg 8KB

open_mmlab.json 5KB

mmcls.json 4KB

hedsketch.json 2KB

openpose.json 1KB

normal.json 1KB

hed.json 1KB

depth.json 1KB

outpainting.json 919B

seg.json 907B

canny.json 871B

bbox.json 637B

grayscale.json 386B

blur.json 320B

inpainting.json 242B

deprecated.json 217B

LICENSE 11KB

LICENSE 9KB

LICENSE 1KB

README.md 12KB

CODE_OF_CONDUCT.md 5KB

README.md 415B

SECURITY.md 400B

README.md 13B

共 1436 条

# [UniControl](https://arxiv.org/abs/2305.11147) [![arXiv](https://img.shields.io/badge/ð-arXiv-ff69b4)](https://arxiv.org/pdf/2305.11147.pdf) [![webpage](https://img.shields.io/badge/ð¥-Website-9cf)](https://canqin001.github.io/UniControl-Page/) [![HuggingFace space](https://img.shields.io/badge/ð¤-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/Robert001/UniControl-Demo) <div align="center"> <a><img src="figs/salesforce.png" height="100px" ></a> <a><img src="figs/northeastern.png" height="100px" ></a> <a><img src="figs/stanford.png" height="100px" ></a> </div> This repository is for the paper: > **[UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild](https://arxiv.org/abs/2305.11147)** \ > Can Qin 1,2, Shu Zhang1, Ning Yu 1, Yihao Feng1, Xinyi Yang1, Yingbo Zhou 1, Huan Wang 1, Juan Carlos Niebles1, Caiming Xiong 1, Silvio Savarese 1, Stefano Ermon 3, Yun Fu 2, Ran Xu 1 \ > 1 Salesforce AI 2 Northeastern University 3 Stanford University \ > Work done when Can Qin was an intern at Salesforce AI Research. ![img](figs/demo_simple.png) ## Introduction We introduce **UniControl**, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation. ![img](figs/method.png) ## Updates * **05/18/23**: ***[UniControl](https://arxiv.org/abs/2305.11147) paper uploaded to arXiv.*** * **05/26/23**: ***UniControl inference code and checkpoint open to public.*** * **05/28/23**: ***Latest UniControl model [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol.ckpt) (1.4B #params, 5.78GB) updated.*** * **06/08/23**: ***Latest UniControl model [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol.ckpt) updated which supports 12 tasks now (***Canny***, ***HED***, ***Sketch***, ***Depth***, ***Normal***, ***Skeleton***, ***Bbox***, ***Seg***, ***Outpainting***, ***Inpainting***, ***Deblurring*** and ***Colorization***) !*** * **06/08/23**: ***Training dataset ([MultiGen-20M](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset)) is fully released.*** * **06/08/23**: ***Training code is public.***:blush: * **07/06/23**: ***Latest UniControl model v1.1 [checkpoint](https://console.cloud.google.com/storage/browser/_details/sfr-unicontrol-data-research/unicontrol_v1.1.ckpt) updated which supports 12 tasks now (***Canny***, ***HED***, ***Sketch***, ***Depth***, ***Normal***, ***Skeleton***, ***Bbox***, ***Seg***, ***Outpainting***, ***Inpainting***, ***Deblurring*** and ***Colorization***) !*** * **07/25/23**: ***Huggingface Demo API is available! [![HuggingFace space](https://img.shields.io/badge/ð¤-Huggingface%20Space-cyan.svg)](https://huggingface.co/spaces/Robert001/UniControl-Demo)*** * **07/25/23**: ***Safetensors model is available! [checkpoint](https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.st)*** * **09/21/23**: ***UniControl is accepted to NeurIPS 2023.***:blush: ## MultiGen-20M Datasets There are more than 20M image-prompt-condition triplets [here](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset) with total size ***> 2TB***. It includes all 12 tasks (`Canny, HED, Sketch, Depth, Normal, Skeleton, Bbox, Seg, Outpainting, Inpainting, Deblurring, Colorization`) which are fully released. ## Instruction ### Environment Preparation Setup the env first (need to wait a few minutes). ``` conda env create -f environment.yaml conda activate unicontrol ``` ### Checkpoint Preparation The checkpoint of pre-trained UniControl model is saved at `./ckpts/unicontrol.ckpt`. ``` cd ckpts wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol.ckpt ``` You can also use the latest trained model (ckpt and safetensors) ``` wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.ckpt wget https://storage.googleapis.com/sfr-unicontrol-data-research/unicontrol_v1.1.st ``` If you want to train from scratch, please follow the ControlNet to prepare the checkpoint initialization. ControlNet provides a simple script for you to achieve this easily. If your SD filename is `./ckpts/v1-5-pruned.ckpt` and you want the script to save the processed model (SD+ControlNet) at location `./ckpts/control_sd15_ini.ckpt`, you can just run: ``` python tool_add_control.py ./ckpts/v1-5-pruned.ckpt ./ckpts/control_sd15_ini.ckpt ``` ### Data Preparation Please download the training dataset ([MultiGen-20M](https://console.cloud.google.com/storage/browser/sfr-unicontrol-data-research/dataset)) to `./multigen20m`. Please: ``` cd multigen20m gsutil -m cp -r gs://sfr-unicontrol-data-research/dataset ./ ``` Then unzip the all the files. ### Model Training (CUDA 11.0 and Conda 4.12.0 work) Training from Scratch: ``` python train_unicontrol.py --ckpt ./ckpts/control_sd15_ini.ckpt --config ./models/cldm_v15_unicontrol_v11.yaml --lr 1e-5 ``` Model Finetuning: ``` python train_unicontrol.py --ckpt ./ckpts/unicontrol.ckpt --config ./models/cldm_v15_unicontrol.yaml --lr 1e-7 ``` ### Model Inference (CUDA 11.0 and Conda 4.12.0 work) For different tasks, please run the code as follows. If you meet OOM error, please decrease the "--num_samples". If you use safetensors model, you can load the model following ./load_model/load_safetensors_model.py Canny to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task canny ``` HED Edge to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task hed ``` HED-like Skech to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task hedsketch ``` Depth Map to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task depth ``` Normal Surface Map to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task normal ``` Segmentation Map to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task seg ``` Human Skeleton to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task openpose ``` Object Bounding Boxes to Image Generation: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task bbox ``` Image Outpainting: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task outpainting ``` Image Inpainting: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task inpainting ``` Image Deblurring: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task blur ``` Image Colorization: ``` python inference_demo.py --ckpt ./ckpts/unicontrol.ckpt --task grayscale ``` ### Gradio Demo ([App Demo Video](https://github.com/salesforce/UniControl/issues/1), CUDA 11.0 and Conda 4.12.0 work) We have provided gradio demos for different tasks to use. The example images are saved at `./test_imgs`. <div align="center"> <a><img src="f

评论收藏

内容反馈