没有合适的资源?快使用搜索试试~ 我知道了~
生成式对抗网络研究综述(2020年05月最新版本)
需积分: 48 37 下载量 123 浏览量
2020-05-14
19:19:55
上传
评论 2
收藏 4.51MB PDF 举报
温馨提示
最近,为了解决这些挑战,一些更好地设计和优化GANs的解决方案已经被研究,基于重新设计的网络结构、新的目标函数和替代优化算法的技术。据我们所知,目前还没有一项综述特别侧重于这些解决办法的广泛和系统的发展。
资源推荐
资源详情
资源评论
!
1!
Generative Adversarial Networks (GANs):
Challenges, Solutions, and Future Directions
Divya Saxena
University Research Facility in Big Data Analytics (UBDA), The Hong Kong Polytechnic University,
Hong Kong, divya.saxena.2015@ieee.org
Jiannong Cao
Department of Computing and UBDA, The Hong Kong Polytechnic University, Hong Kong,
csjcao@comp.polyu.edu.hk
ABSTRACT
Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently
gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images,
audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-
convergence and instability, due to inappropriate design of network architecture, use of objective function
and selection of optimization algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of re-engineered network
architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge,
there is no existing survey that has particularly focused on broad and systematic developments of these
solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and
optimization solutions proposed to handle GANs challenges. We first identify key research issues within each
design and optimization technique and then propose a new taxonomy to structure solutions by key research
issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants
proposed within each solution and their relationships. Finally, based on the insights gained, we present the
promising research directions in this rapidly growing field.
Index Terms—Generative Adversarial Networks, Deep learning, GANs, Deep Generative models, GANs
solution, GANs applications, Image generation
1. INTRODUCTION
eep generative models (DGMs), such as Restricted Boltzmann Machines (RBMs), Deep Belief
Networks (DBNs), Deep Boltzmann Machines (DBMs), Denoising Autoencoder (DAE), and Generative
Stochastic Network (GSN), have recently drawn significant attention for capturing rich underlying
distributions of the data, such as audio, images or video and synthesize new samples. These deep generative
models are modelled by Markov chain Monte Carlo (MCMC) based algorithms [1][2]. MCMC-based
approaches calculate the gradient of log-likelihood where gradients vanish during training advances. This is
the major reason that sample generation from the Markov Chain is slow as it could not mix between modes
fast enough. Another generative model, variational autoencoder (VAE), uses deep learning with statistical
inference for representing a data point in a latent space [3] and experiences the complexity in the
approximation of intractable probabilistic computations. In addition, these generative models are trained by
maximizing training data likelihood where likelihood-based methods go through the curse of dimensionality
in many datasets, such as image, video. Moreover, sampling from the Markov Chain in high-dimensional
spaces is blurry, computationally slow and inaccurate.
To handle the abovementioned issues, Goodfellow, et al. [4] proposed Generative Adversarial Nets (GANs),
an alternative training methodology to generative models. GANs is a novel class of deep generative models
in which backpropagation is used for training to evade the issues associated with MCMC training. GANs
training is a minimax zero-sum game between a generative model and a discriminative model. GANs has
D
!
2!
gained a lot of attention recently for generating realistic images as it avoids the difficulty related to maximum
likelihood learning [5]. Figure 1 shows an example of progress in GANs capabilities from the year 2014 to
2018.
Figure 1. Progress in the GANs capabilities for image generation from year 2014 to 2018. Figure from
[4][6][7][8][9]
GANs are a structured probabilistic model which comprises of two adversarial models: a generative model,
called Generator (G) for capturing the data distribution and a discriminative model, called Discriminator (D)
for estimating the probability to find whether a data generated is from the real data distribution or generated
by G’s distribution. A two-player minimax game is played by D and G until Nash equilibrium using a
gradient-based optimization technique (Simultaneous Gradient Descent), i.e., G can generate images like
sampled from the true distribution, and D cannot differentiate between the two sets of images. To update G
and D, gradient signals are received from the loss induced by calculating divergences between two
distributions by D. We can say that the three main GANs design and optimization components are as follows:
(i) network architecture, (ii) objective (loss) function, and (iii) optimization algorithm.
Figure 2. Patches from the natural image manifold (red) and super-resolved patches obtained with MSE (blue)
and GANs (Yellow). Figure from [10]
For a task that models multi-modal data, a particular input can be related to several different correct and
acceptable answers. Figure 2 shows an illustration having several natural image manifolds (in red color),
result achieved by basic machine learning model using mean squared error (MSE) which computes pixel-
wise average over numerous a little bit different possible answers in the pixel space (i.e., causes the blurry
image) and result achieved by GANs which drives reconstruction towards the natural image manifolds.
Because of this advantage, GANs has been gaining huge attention and the applicability of GANs is growing
in many fields.
GANs has worked well on several realistic tasks, such as image generation [8][9], video generation [11],
domain adaptation [12], and image super-resolution [10], etc. Despite its success in many applications,
traditional GANs is highly unstable in training because of the unbalanced D and G training. D utilizes a
logistic loss which saturates quickly. In addition, if D can easily differentiate between real and fake images,
D’s gradient vanishes and when D cannot provide gradient, G stops updating. In recent times, many
improvements have been introduced for handling the mode collapse problem as G produces samples based
on few modes rather than the whole data space. On the other hand, several objective (loss) functions have
!
3!
been introduced to minimize a divergence different from the traditional GANs formulation. Further, several
solutions have been proposed to stabilize the training.
1.1. Motivation and Contributions
In recent times, GANs has achieved outstanding performance in producing natural images. However, there
exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to
inappropriate design of network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs
have been investigated based on techniques of re-engineered network architectures, new objective functions
and alternative optimization algorithms. To study GANs design and optimization solutions proposed to
handle GANs challenges in contiguous and coherent way, this survey proposes a novel taxonomy of different
GANs solutions. We define taxonomic classes and sub-classes addressing to structure the current works in
the most promising GANs research areas. By classifying proposed GANs design and optimization solutions
into different categories, we analyze and discuss them in a systematic way. We also outline major open issues
that can be pursued by researchers further.
There are a limited number of existing reviews on the topic of GANs. [13] discussed how GANs and state-
of-the-art GANs works. [14]–[16] provided a brief introduction of some of the GANs models, while [16] also
present development trends of GANs, and relation of GANs with parallel intelligence. [160] reviewed various
GANs methods from the perspectives of algorithms, theory, and applications. On the other hand, several
researchers reviewed specific topics related to GANs in detail. [17] reviewed GANs based image synthesis
and editing approaches. [18] surveyed threat of adversarial attacks on deep learning. [19] discussed various
types of adversarial attacks and defenses in detail.
Despite reviewing the state-of-the-art GANs, none of these surveys, to the best of our knowledge, has
particularly focused on broad and systematic view of the GANs developments introduced to address the
GANs challenges. In this study, our main aim is to comprehensively structure and summarize different GANs
design and optimization solutions proposed to alleviate GANs challenges, for researchers that are new to this
field.
Our Contributions. Our paper makes notable contributions summarized as follows:
New taxonomy. In this study, we identify key research issues within each design and optimization technique
and present a novel taxonomy to structure solutions by key research issues. Our proposed taxonomy will
facilitate researchers to enhance the understanding of the current developments handling GANs challenges
and future research directions.
Comprehensive survey. In accordance with the taxonomy, we provide the comprehensive review of
different solutions proposed to handle the major GANs challenges. For each type of solution, we provide
detailed descriptions and systematic analysis of the GANs variants and their relationships. But still, due to
wide range of GANs applications, different GANs variants are formulated, trained, and evaluated in a
heterogenous ways and direct comparison among these GANs is complicated. Therefore, we make a
necessary comparison and summarize the corresponding approaches w.r.to their novel solutions to address
GANs challenges. This survey can be used as a guide for understanding, using, and developing different
GANs approaches for various real-life applications.
Future directions. This survey also highlights the most promising future research directions.
1.2. Organization
In this paper, we first discuss three main components for designing and training GANs framework, analyze
challenges with GANs framework, and present a detailed understanding of the current developments handling
GANs challenges from the GANs design and optimization perspective.
!
4!
Figure 3 shows the organization of the paper. Section 2 explains about the GANs framework from the
designing and training perspective. In Section 3, we present the challenges in the training of GANs. In Section
4, we identify key issues related to the design and training of GANs and present a novel taxonomy of GANs
solutions handling these key issues. In accordance with the taxonomy, Section 5, 6 and 7 summarizes GANs
design and optimization solutions, their pros and cons, and relationships. Section 7 discusses the future
directions and Section 8 summarizes the paper.
2. GENERATIVE ADVERSARIAL NETWORKS
Before discussing in detail about solutions for better design and optimization of GANs in the proposed
taxonomy, in this section we will provide an overview of GANs framework and main GANs design and
optimization components.
2.1. Overview
In recent years, generative models are continuously growing and have been applied well for a broad range of
real applications. Generative models’ compute the density estimation where model distribution p
model
is
learned to approximate the true and new data distribution p
data
. Methods to compute the density estimation
have two major concerns: selection of suitable objective (loss) function and appropriate selection of
formulation for the density function of p
model
. The selection of objective functions for generative model’s
training plays an important role for the better learning behaviors and performance [20][21]. The de-facto
standard of the most widely used objective is based on the maximum likelihood estimation theory in which
model parameters maximize the training data likelihood.
Researchers have shown that maximum likelihood is not a good option as training objective because a model
trained using maximum likelihood mostly overgeneralise and generate unplausible samples [20]. In addition,
marginal likelihood is intractable which requires a solution to overcome this for learning the model
parameters. One possible solution to handle the marginal likelihood intractability issue is not to compute it
ever and learn model parameters via a tool indirectly [22].
GANs achieves this by having a powerful D which have a capability to discriminate samples from p
data
and
p
model
. When D is unable to discriminate samples from p
data
and p
model
, then model has learned to generate
samples similar to the samples from the real data. A possible solution for formulating density function of
p
model
is to use an explicit density function in which maximum likelihood framework is followed for estimating
the parameters. Another possible solution is to use an implicit density function for estimating the data
distribution excluding analytical forms of p
model
, i.e., train a G where if real and generated data are mapped
to the feature space, they are enclosed in the same sphere [23][24]. However, GANs is the most notably
pioneered class of this possible solution.
GANs is an expressive class of generative models as it supports exact sampling and approximate estimation.
GANs learns high-dimensional distributions implicitly over images, audio, and data which are challenging
to model with an explicit likelihood. Basic GANs are algorithmic architectures of two neural networks
competing with each other to capture the real data distribution. Both neural nets try to optimize different and
opposing objective (loss) function in the zero-sum game to find (global) the Nash equilibrium. The three
main components for design and optimization of GANs are: (i) network architecture, (ii) objective (loss)
function, and (iii) optimization algorithm. There has been a large amount of works towards improving GANs
by re-engineering architecture [5][6][25], better objective functions [26]–[28], and alternative optimization
algorithms [29][30].
In the following sections, we shall discuss three main components for the GANs design and optimization,
namely network architecture, loss function and the optimization algorithm followed by the minimax
optimization for Nash equilibrium in detail.
!
5!
2.2. Network Architecture
GANs learns to map the simple latent distribution to the more complex data distribution. GANs is based on
the concept of a non-cooperative game of two networks, a generator G and a discriminator D, in which G and
D plays against each other. GANs can be part of deep generative models or generative neural models where
G and D are parameterized via neural networks and updates are made in parameter space.
Figure 3. Basic GANs Architecture
Both G and D play a minimax game where G’s main aim is to produce samples similar to the samples
produced from real data distribution and D’s main goal is to discriminate the samples generated by G and
samples generated from the real data distribution by assigning higher and lower probabilities to samples from
real data and generated by G, respectively. On the other hand, the main target of GANs training is to keep
moving the generated samples in the direction of the real data manifolds through the use of the gradient
information from D (see Figure 3).
2.3. Loss Function
In GANs, x is data extracted from the real data distribution,
!
"#$#
, noise vector z is taken from a Gaussian
prior distribution with zero-mean and unit variance
!
%
, while
!
&
refers the G’s distribution over data x. Latent
vector z is passed to G as an input and then G outputs an image G(z) with the aim that D cannot differentiate
between G(z) and D(x) data samples, i.e., G(z) resembles with D(x) as close as possible. In addition, D
simultaneously tries to restrain itself from getting fooled by G. D is a classifier where D(x) = 1 if x '
!
"#$#
and D(x) = 0 if x '
!
&
, i.e., x is from
!
"#$#
or from
!
&
. The following minimax objective applied for training
G and D models jointly via solving:
()*
+
,
(-.
+
/
0
1
234
5
67()*
8
(-.
9
:
;<=
>?@?
ABCD41E5FG7:
%<=
H
I
BCD7
J
KL74
M
2
1
N
5
O
P
Q
77777777777777777777777777777777
(1)
0
1
234
5
is a binary cross entropy function, commonly used in binary classification problems [31]. In Eq. 1,
for updating the model parameters, training of G and D are performed by backpropagating the loss via their
respective models. In practice, Eq. 1 is solved by alternating the following two gradient updates:
R
9
$ST
6
R
9
$
+
U
$
V
W
/
7
V
1
4
$
32
$
5
R
8
$ST
6
R
8
$
+
U
$
V
W
,
7
V
1
4
$ST
32
$
5
where
X
8
is the parameter of G,
X
9
is the parameter D, λ is the learning rate, and t is the iteration number.
In practice, second term in Eq. 1,
BCD
J
KL4
M
2
1
Y
5
O
P
saturates and makes insufficient gradient flow through
G, i.e., gradients value gets smaller and stop learning. To overcome the vanishing gradient problem, the
objective function in Eq. 1 is reframed into two separate objectives:
(-.
+
/
:
;<=
>?@?
ABCD41E5FG7:
%<=
H
I
BCD7
J
KL74
M
2
1
Y
5
O
P
Q
and
(-.
+
,
7:
%<=
H
I
BCD
J
4
M
2
1
Y
5
O
P
Q
(2)
Moreover, G’s gradient for these two separate objectives have the same fixed points and are always trained
in the same direction.
Noise
2D Gaussian
Generator (G)
Samples
Discriminator (D)
Prediction of
samples
Real
Fake
Real
Fake
D lossG loss
Gradients
剩余40页未读,继续阅读
资源评论
syp_net
- 粉丝: 158
- 资源: 1187
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功