Image-to-ImageTranslationwithConditionalAdversarialNets资源-CSDN文库

需积分: 12 78 浏览量 2018-05-07 19:43:51 上传评论收藏 6.89MB PDF 举报

在当前的图像处理、计算机图形学和计算机视觉领域，存在着大量需要将输入图像翻译成对应输出图像的问题。比如，将场景渲染为RGB图像、梯度场、边缘图、语义标签图等。这些问题往往采用特定算法来处理，尽管它们通常都是将一个像素映射到另一个像素。条件对抗网络提供了一种通用解决方案，这一方案似乎在多种问题上都能表现良好。本文将探讨条件对抗网络在图像到图像翻译问题上的应用。条件对抗网络的结构包括两个子网络：生成网络和判别网络。生成网络的任务是产生尽可能真实的输出图像，而判别网络的任务是区分生成的图像和真实图像。这两个网络在训练过程中相互竞争，生成网络力求生成高质量的图像，而判别网络则力争提高判别准确性。随着训练的进行，生成网络将学习如何生成越来越真实的图像。与传统的特定算法不同，条件对抗网络的学习过程并不需要人为设计复杂的损失函数。其能够同时学习图像之间的映射以及用于训练这种映射的损失函数，这就使得我们可以用相同的一般性方法来解决传统上需要非常不同的损失函数来描述的问题。这种方法在将标签图转换成照片、从边缘图重建对象以及图像上色等任务中被证明是有效的。本文的研究表明，通过使用相同的架构和目标，并在不同的数据上进行训练，条件对抗网络在多个不同的图像到图像翻译问题上都能取得较好的效果。这为不再手工设计映射函数和损失函数提供了可能，让社区在不需要手工工程化损失函数的情况下也能获得合理的结果。在这类研究中，对抗网络显示出了其在无监督学习方面的强大能力。它们无需显式地指定从输入到输出的具体映射规则，而是在大量数据上自主学习这种映射，同时优化生成模型和判别模型。这在处理图像转换问题时尤其有用，因为图像转换任务通常涉及到复杂的、非线性的映射关系，很难用传统的数学模型精确描述。生成对抗网络（GAN）及其变体，如条件对抗网络（cGAN），是深度学习领域近年来的热点。它们在图像生成、图像修复、风格转换、图像超分辨率等多个方面都取得了突破性的进展。条件对抗网络在这些应用中，通过引入条件信息（如标签、边缘、语义等），使得模型能够更好地控制生成图像的属性和风格，使其在特定约束条件下进行图像生成。在具体的应用中，比如将语义标签图转换成真实感照片，条件对抗网络能够学习到像素到像素的精确映射，并在语义层面上保证生成图片的真实感。这种能力对于计算机视觉领域中的语义分割、场景理解等任务有着重要的意义。尽管条件对抗网络带来了许多可能性，但它的训练过程复杂且不易调试。网络的设计、训练的稳定性以及生成结果的评估都是挑战。研究者们正在不断探索新的网络架构和训练技巧来改进这一方法，例如，引入循环一致性损失、对抗损失以及其他正则化策略，以提高模型的稳定性和生成图像的质量。条件对抗网络在图像到图像翻译方面的应用展现了深度学习在处理复杂视觉任务中的巨大潜力，同时也在持续推动着相关算法的改进和优化。随着技术的不断进步，我们可以期待这一领域的研究会带来更多令人振奋的成果。

资源推荐

资源详情

资源评论

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola Jun-Yan Zhu Tinghui Zhou Alexei A. Efros

Berkeley AI Research (BAIR) Laboratory

University of California, Berkeley

{isola,junyanz,tinghuiz,efros}@eecs.berkeley.edu

Labels to Facade BW to Color

Aerial to Map

Labels to Street Scene

Edges to Photo

input output input

inputinput

input output

output

outputoutput

input output

Day to Night

Figure 1: Many problems in image processing, graphics, and vision involve translating an input image into a corresponding output image.

These problems are often treated with application-speciﬁc algorithms, even though the setting is always the same: map pixels to pixels.

Conditional adversarial nets are a general-purpose solution that appears to work well on a wide variety of these problems. Here we show

results of the method on several. In each case we use the same architecture and objective, and simply train on different data.

Abstract

We investigate conditional adversarial networks as a

general-purpose solution to image-to-image translation

problems. These networks not only learn the mapping from

input image to output image, but also learn a loss func-

tion to train this mapping. This makes it possible to apply

the same generic approach to problems that traditionally

would require very different loss formulations. We demon-

strate that this approach is effective at synthesizing photos

from label maps, reconstructing objects from edge maps,

and colorizing images, among other tasks. As a commu-

nity, we no longer hand-engineer our mapping functions,

and this work suggests we can achieve reasonable results

without hand-engineering our loss functions either.

Many problems in image processing, computer graphics,

and computer vision can be posed as “translating” an input

image into a corresponding output image. Just as a concept

may be expressed in either English or French, a scene may

be rendered as an RGB image, a gradient ﬁeld, an edge map,

a semantic label map, etc. In analogy to automatic language

translation, we deﬁne automatic image-to-image translation

as the problem of translating one possible representation of

a scene into another, given sufﬁcient training data (see Fig-

ure 1). One reason language translation is difﬁcult is be-

cause the mapping between languages is rarely one-to-one

– any given concept is easier to express in one language

than another. Similarly, most image-to-image translation

problems are either many-to-one (computer vision) – map-

ping photographs to edges, segments, or semantic labels,

or one-to-many (computer graphics) – mapping labels or

sparse user inputs to realistic images. Traditionally, each of

these tasks has been tackled with separate, special-purpose

machinery (e.g., [7, 15, 11, 1, 3, 37, 21, 26, 9, 42, 46]),

despite the fact that the setting is always the same: predict

pixels from pixels. Our goal in this paper is to develop a

common framework for all these problems.

arXiv:1611.07004v1 [cs.CV] 21 Nov 2016

The community has already taken signiﬁcant steps in this

direction, with convolutional neural nets (CNNs) becoming

the common workhorse behind a wide variety of image pre-

diction problems. CNNs learn to minimize a loss function –

an objective that scores the quality of results – and although

the learning process is automatic, a lot of manual effort still

goes into designing effective losses. In other words, we still

have to tell the CNN what we wish it to minimize. But,

just like Midas, we must be careful what we wish for! If

we take a naive approach, and ask the CNN to minimize

Euclidean distance between predicted and ground truth pix-

els, it will tend to produce blurry results [29, 46]. This is

because Euclidean distance is minimized by averaging all

plausible outputs, which causes blurring. Coming up with

loss functions that force the CNN to do what we really want

– e.g., output sharp, realistic images – is an open problem

and generally requires expert knowledge.

It would be highly desirable if we could instead specify

only a high-level goal, like “make the output indistinguish-

able from reality”, and then automatically learn a loss func-

tion appropriate for satisfying this goal. Fortunately, this is

exactly what is done by the recently proposed Generative

Adversarial Networks (GANs) [14, 5, 30, 36, 47]. GANs

learn a loss that tries to classify if the output image is real

or fake, while simultaneously training a generative model

to minimize this loss. Blurry images will not be tolerated

since they look obviously fake. Because GANs learn a loss

that adapts to the data, they can be applied to a multitude of

tasks that traditionally would require very different kinds of

loss functions.

In this paper, we explore GANs in the conditional set-

ting. Just as GANs learn a generative model of data, condi-

tional GANs (cGANs) learn a conditional generative model

[14]. This makes cGANs suitable for image-to-image trans-

lation tasks, where we condition on an input image and gen-

erate a corresponding output image.

GANs have been vigorously studied in the last two

years and many of the techniques we explore in this pa-

per have been previously proposed. Nonetheless, ear-

lier papers have focused on speciﬁc applications, and

it has remained unclear how effective image-conditional

GANs can be as a general-purpose solution for image-to-

image translation. Our primary contribution is to demon-

strate that on a wide variety of problems, conditional

GANs produce reasonable results. Our second contri-

bution is to present a simple framework sufﬁcient to

achieve good results, and to analyze the effects of sev-

eral important architectural choices. Code is available at

https://github.com/phillipi/pix2pix.

1. Related work

Structured losses for image modeling Image-to-image

translation problems are often formulated as per-pixel clas-

siﬁcation or regression [26, 42, 17, 23, 46]. These for-

mulations treat the output space as “unstructured” in the

sense that each output pixel is considered conditionally in-

dependent from all others given the input image. Condi-

tional GANs instead learn a structured loss. Structured

losses penalize the joint conﬁguration of the output. A large

body of literature has considered losses of this kind, with

popular methods including conditional random ﬁelds [2],

the SSIM metric [40], feature matching [6], nonparametric

losses [24], the convolutional pseudo-prior [41], and losses

based on matching covariance statistics [19]. Our condi-

tional GAN is different in that the loss is learned, and can, in

theory, penalize any possible structure that differs between

output and target.

Conditional GANs We are not the ﬁrst to apply GANs

in the conditional setting. Previous works have conditioned

GANs on discrete labels [28], text [32], and, indeed, im-

ages. The image-conditional models have tackled inpaint-

ing [29], image prediction from a normal map [39], image

manipulation guided by user constraints [49], future frame

prediction [27], future state prediction [48], product photo

generation [43], and style transfer [25]. Each of these meth-

ods was tailored for a speciﬁc application. Our framework

differs in that nothing is application-speciﬁc. This makes

our setup considerably simpler than most others.

Our method also differs from these prior works in sev-

eral architectural choices for the generator and discrimina-

tor. Unlike past work, for our generator we use a “U-Net”-

based architecture [34], and for our discriminator we use a

convolutional “PatchGAN” classiﬁer, which only penalizes

structure at the scale of image patches. A similar Patch-

GAN architecture was previously proposed in [25], for the

purpose of capturing local style statistics. Here we show

that this approach is effective on a wider range of problems,

and we investigate the effect of changing the patch size.

2. Method

GANs are generative models that learn a mapping from

random noise vector z to output image y: G : z → y

[14]. In contrast, conditional GANs learn a mapping from

observed image x and random noise vector z, to y: G :

{x, z} → y. The generator G is trained to produce outputs

that cannot be distinguished from “real” images by an ad-

versarially trained discrimintor, D, which is trained to do as

well as possible at detecting the generator’s “fakes”. This

training procedure is diagrammed in Figure 2.

2.1. Objective

The objective of a conditional GAN can be expressed as

cGAN

(G, D) =E

x,y∼p

data

(x,y )

[log D(x, y)]+

x∼p

data

(x),z∼p

(z)

[log(1 −D(x, G(x, z))],

(1)

Real or fake pair?

Positive examples Negative examples

Real or fake pair?

G tries to synthesize fake

images that fool D

D tries to identify the fakes

Figure 2: Training a conditional GAN to predict aerial photos from

maps. The discriminator, D , learns to classify between real and

synthesized pairs. The generator learns to fool the discriminator.

Unlike an unconditional GAN, both the generator and discrimina-

tor observe an input image.

where G tries to minimize this objective against an ad-

versarial D that tries to maximize it, i.e. G

∗

arg min

max

cGAN

(G, D).

To test the importance of conditioning the discrimintor,

we also compare to an unconditional variant in which the

discriminator does not observe x:

GAN

(G, D) =E

y ∼p

data

(y )

[log D(y)]+

x∼p

data

(x),z∼p

(z)

[log(1 −D(G(x, z))].

(2)

Previous approaches to conditional GANs have found it

beneﬁcial to mix the GAN objective with a more traditional

loss, such as L2 distance [29]. The discriminator’s job re-

mains unchanged, but the generator is tasked to not only

fool the discriminator but also to be near the ground truth

output in an L2 sense. We also explore this option, using

L1 distance rather than L2 as L1 encourages less blurring:

(G) = E

x,y ∼p

data

(x,y ),z∼p

(z)

[ky − G(x, z)k

]. (3)

Our ﬁnal objective is

∗

= arg min

max

cGAN

(G, D) + λL

(G). (4)

Without z, the net could still learn a mapping from x to

y, but would produce deterministic outputs, and therefore

fail to match any distribution other than a delta function.

Past conditional GANs have acknowledged this and pro-

vided Gaussian noise z as an input to the generator, in addi-

tion to x (e.g., [39]). In initial experiments, we did not ﬁnd

Encoder-decoder

U-Net

Figure 3: Two choices for the architecture of the generator. The

“U-Net” [34] is an encoder-decoder with skip connections be-

tween mirrored layers in the encoder and decoder stacks.

this strategy effective – the generator simply learned to ig-

nore the noise – which is consistent with Mathieu et al. [27].

Instead, for our ﬁnal models, we provide noise only in the

form of dropout, applied on several layers of our generator

at both training and test time. Despite the dropout noise, we

observe very minor stochasticity in the output of our nets.

Designing conditional GANs that produce stochastic out-

put, and thereby capture the full entropy of the conditional

distributions they model, is an important question left open

by the present work.

2.2. Network architectures

We adapt our generator and discriminator architectures

from those in [30]. Both generator and discriminator use

modules of the form convolution-BatchNorm-ReLu [18].

Details of the architecture are provided in the appendix,

with key features discussed below.

2.2.1 Generator with skips

A deﬁning feature of image-to-image translation problems

is that they map a high resolution input grid to a high resolu-

tion output grid. In addition, for the problems we consider,

the input and output differ in surface appearance, but both

are renderings of the same underlying structure. Therefore,

structure in the input is roughly aligned with structure in the

output. We design the generator architecture around these

considerations.

Many previous solutions [29, 39, 19, 48, 43] to problems

in this area have used an encoder-decoder network [16]. In

such a network, the input is passed through a series of lay-

ers that progressively downsample, until a bottleneck layer,

at which point the process is reversed (Figure 3). Such a

network requires that all information ﬂow pass through all

the layers, including the bottleneck. For many image trans-

lation problems, there is a great deal of low-level informa-

tion shared between the input and output, and it would be

剩余15页未读，继续阅读

评论收藏

内容反馈

weixin_42157757

粉丝: 0
资源: 1

Image-to-Image Translation with Conditional Adversarial Nets

最新资源

Image-to-Image Translation with Conditional Adversarial Nets

Image-to-Image Translation with Conditional Adversarial Networks

Unpaired Image-to-Image Translation using Adversarial Consistency Loss.pdf

Unpaired Image-to-Image Translation using Cycle-consistent adversarial networks

Generative-Adversarial-Networks-Projects-master.zip

（CGAN）Conditional Generative Adversarial Nets

matlab椒盐去噪代码-Image-Recovery-Using-Conditional-Adversarial-Networks:分析条件

对抗学习-图像生成Gan.zip

rh-perl524-perl-Module-Load-Conditional-0.64-379.el7.noarch.rpm

Worst-Case_Conditional_Value-at-Risk_with_Applicat.pdf

Conditional-Generative-Adversarial-Network

Li-StoryGAN-A-Sequential-Conditional-GAN-for-Story-Visualization

Faster Long-Range Transformers with Conditional Computation.pdf

Python-Pix2PixImagetranslationusingconditionalgenerativeadversarialnetwork

Diverse-Image-Synthesis-from-Semantic-Layout

颜色分类leetcode-awesome-colorization:着色资源的精选列表

credit risk optimization with conditional VaR criterion.pdf

Conditional GAN (Generative Adversarial Network) with MNIST：手写数字是使用名为Conditional GAN的生成对抗网络合成的。使用Conditional GAN生成手写数字。-matlab开发

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

CIFAR10数据集免费下载

Deep Learning Tuning Playbook（中译版）

最新资源