第三届计图人工智能挑战赛赛道一代码开源_jittor-jieke-semantic_images

共38个文件

py：28个

txt：3个

jpg：3个

需积分: 5 102 浏览量 2024-09-28 16:52:13 上传评论收藏 3.58MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

第三届计图人工智能挑战赛赛道一代码开源_jittor-jieke-semantic_images_synthesis.zip （38个子文件）

jittor-jieke-semantic_images_synthesis-main

select.txt 80B

预训练模型说明.txt 165B

image_sample.py 7KB

merge_model.py 907B

requirements.txt 187B

selects

4270669_be37f5c2ea_b.jpg 40KB

mask_2.png 1.89MB

5233570907_1780158de7_b.jpg 44KB

510659657_f6e93df6aa_b.jpg 46KB

mask_1.png 1.5MB

image_train.py 4KB

models

autoencoder

utils.py 7KB

__init__.py 83B

loss.py 14KB

ema.py 2KB

util.py 858B

quantize.py 5KB

dataset.py 10KB

model.py 15KB

autoencoder.py 10KB

distributions.py 2KB

guided_diffusion

__init__.py 138B

losses.py 3KB

fp16_util.py 8KB

nn.py 5KB

train_util.py 14KB

gaussian_diffusion.py 34KB

resample.py 2KB

image_datasets.py 14KB

unet.py 39KB

logger.py 12KB

script_util.py 15KB

respace.py 5KB

data_preprocess.py 3KB

train.py 706B

test.py 2KB

README.md 4KB

config

config-jittor.yaml 348B

# 第三届计图人工智能挑战赛 - 风景图像生成赛道 - IIDM ## 简介本项目包含了第三届计图挑战赛 - 风景图像生成赛道的代码实现。 ## 安装由于在Jittor上Gradient-checkpoint机制难以实现，本项目需要在6张3090上训练，训练时间为8天左右。我们后期会继续尝试实现，该机制可显著降低显存使用。 ### 运行环境 - ubuntu 22.04 LTS - python 3.8.5 - `sudo apt install libomp-dev` - 安装mpi `sudo apt install mpich` - 安装python库： `pip install -r requirements.txt` - 根据cuda版本安装cupy： v11.1 (x86_64) `pip install cupy-cuda111` v11.2 ~ 11.8 (x86_64 / aarch64) `pip install cupy-cuda11x` v12.x (x86_64 / aarch64) `pip install cupy-cuda12x` ## 数据集下载清华大学计算机系图形学实验室从Flickr官网收集了12000张高清（宽512、高384）的风景图片，并制作了它们的语义分割图。其中，10000对图片被用来训练。**其中 label 是值在 0~28 的灰度图** 签包括29类物体，分别是 ``` "mountain", "sky", "water", "sea", "rock", "tree", "earth", "hill", "river", "sand", "land", "building", "grass", "plant", "person", "boat", "waterfall", "wall", "pier", "path", "lake", "bridge", "field", "road", "railing", "fence", "ship", "house", "other" ``` - 训练数据集可以从[这里](https://cloud.tsinghua.edu.cn/f/063e7fcfe6a04184904d/?dl=1)下载。 - A榜测试数据集可以从[这里](https://cloud.tsinghua.edu.cn/d/cb748039138145f2b971/)下载。 - B榜测试数据集可以从[这里](https://cloud.tsinghua.edu.cn/d/9dd48340bbde4d9b9ffa/)下载。 ## 预训练模型使用在ImageNet上训练的VQ-GAN作为AutoEncoder。 [下载checkpoint](https://drive.google.com/file/d/1nNpUzZSbYA5yWsNdzLeKNvHWwHcaJAPV/view?usp=sharing) 使用比赛数据集训练的AutoEncoder效果可能会更佳。 [如何预训练AutoEncoder?](https://github.com/CompVis/taming-transformers/tree/master#training-on-custom-data) ## 数据预处理在训练开始时会自动对数据进行预处理。 ## 推理 ``` python test.py --input_path 测试数据路径 --img_path 参考图像路径 --output_path ./results ``` ## 训练 ``` python train.py --input_path 训练集路径 ``` 数据路径目录结构如下： ``` --input_path - val_B_labels_resized - label_to_img.json --img_path - imgs - labels ``` 训练结束之后，可以使用`merge-model.py`进行模型集成。 ``` python merge_model --model_1 ./ckpts/model.pkl --weight_1 0.5 --model_2 ./ckpts/model2.pkl --weight_2 0.5 ``` [下载我们训练好的checkpoint](https://drive.google.com/file/d/12rqVXy7AJNM10oGtJkC76JbDhJplsR_C/view?usp=sharing) ## 生成图片样例从左到右分别为 **参考图片**、**语义分割图片**、**生成图片** ![image-20230919171215839](./selects/mask_1.png) ![image-20230919171248415](./selects/mask_2.png) ## 致谢此项目基于论文[Semantic Image Synthesis via Diffusion Models (SDM)](https://arxiv.org/abs/2207.00050)实现，部分代码参考了[Taming Transformers for High-Resolution Image Synthesis](https://github.com/CompVis/taming-transformers/tree/master)。 ## Citation If you find this code useful, please kindly cite the following paper: ``` @article{liu2024iidm, title={IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis}, author={Liu, Feng and Chang, Xiaobin}, booktitle = {Computational Visual Media Journal (CVMJ)}, year={2024} } ```

评论收藏

内容反馈