生成式对抗网络研究综述（2020年05月最新版本）

生成式对抗网络研究综述

需积分: 48 123 浏览量 2020-05-14 19:19:55 上传评论 2 收藏 4.51MB PDF 举报

资源推荐

资源详情

资源评论

Generative Adversarial Networks (GANs):

Challenges, Solutions, and Future Directions

Divya Saxena

University Research Facility in Big Data Analytics (UBDA), The Hong Kong Polytechnic University,

Hong Kong, divya.saxena.2015@ieee.org

Jiannong Cao

Department of Computing and UBDA, The Hong Kong Polytechnic University, Hong Kong,

csjcao@comp.polyu.edu.hk

ABSTRACT

Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently

gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images,

audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-

convergence and instability, due to inappropriate design of network architecture, use of objective function

and selection of optimization algorithm. Recently, to address these challenges, several solutions for better

design and optimization of GANs have been investigated based on techniques of re-engineered network

architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge,

there is no existing survey that has particularly focused on broad and systematic developments of these

solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and

optimization solutions proposed to handle GANs challenges. We first identify key research issues within each

design and optimization technique and then propose a new taxonomy to structure solutions by key research

issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants

proposed within each solution and their relationships. Finally, based on the insights gained, we present the

promising research directions in this rapidly growing field.

Index Terms—Generative Adversarial Networks, Deep learning, GANs, Deep Generative models, GANs

solution, GANs applications, Image generation

1. INTRODUCTION

eep generative models (DGMs), such as Restricted Boltzmann Machines (RBMs), Deep Belief

Networks (DBNs), Deep Boltzmann Machines (DBMs), Denoising Autoencoder (DAE), and Generative

Stochastic Network (GSN), have recently drawn significant attention for capturing rich underlying

distributions of the data, such as audio, images or video and synthesize new samples. These deep generative

models are modelled by Markov chain Monte Carlo (MCMC) based algorithms [1][2]. MCMC-based

approaches calculate the gradient of log-likelihood where gradients vanish during training advances. This is

the major reason that sample generation from the Markov Chain is slow as it could not mix between modes

fast enough. Another generative model, variational autoencoder (VAE), uses deep learning with statistical

inference for representing a data point in a latent space [3] and experiences the complexity in the

approximation of intractable probabilistic computations. In addition, these generative models are trained by

maximizing training data likelihood where likelihood-based methods go through the curse of dimensionality

in many datasets, such as image, video. Moreover, sampling from the Markov Chain in high-dimensional

spaces is blurry, computationally slow and inaccurate.

To handle the abovementioned issues, Goodfellow, et al. [4] proposed Generative Adversarial Nets (GANs),

an alternative training methodology to generative models. GANs is a novel class of deep generative models

in which backpropagation is used for training to evade the issues associated with MCMC training. GANs

training is a minimax zero-sum game between a generative model and a discriminative model. GANs has

gained a lot of attention recently for generating realistic images as it avoids the difficulty related to maximum

likelihood learning [5]. Figure 1 shows an example of progress in GANs capabilities from the year 2014 to

2018.

Figure 1. Progress in the GANs capabilities for image generation from year 2014 to 2018. Figure from

[4][6][7][8][9]

GANs are a structured probabilistic model which comprises of two adversarial models: a generative model,

called Generator (G) for capturing the data distribution and a discriminative model, called Discriminator (D)

for estimating the probability to find whether a data generated is from the real data distribution or generated

by G’s distribution. A two-player minimax game is played by D and G until Nash equilibrium using a

gradient-based optimization technique (Simultaneous Gradient Descent), i.e., G can generate images like

sampled from the true distribution, and D cannot differentiate between the two sets of images. To update G

and D, gradient signals are received from the loss induced by calculating divergences between two

distributions by D. We can say that the three main GANs design and optimization components are as follows:

(i) network architecture, (ii) objective (loss) function, and (iii) optimization algorithm.

Figure 2. Patches from the natural image manifold (red) and super-resolved patches obtained with MSE (blue)

and GANs (Yellow). Figure from [10]

For a task that models multi-modal data, a particular input can be related to several different correct and

acceptable answers. Figure 2 shows an illustration having several natural image manifolds (in red color),

result achieved by basic machine learning model using mean squared error (MSE) which computes pixel-

wise average over numerous a little bit different possible answers in the pixel space (i.e., causes the blurry

image) and result achieved by GANs which drives reconstruction towards the natural image manifolds.

Because of this advantage, GANs has been gaining huge attention and the applicability of GANs is growing

in many fields.

GANs has worked well on several realistic tasks, such as image generation [8][9], video generation [11],

domain adaptation [12], and image super-resolution [10], etc. Despite its success in many applications,

traditional GANs is highly unstable in training because of the unbalanced D and G training. D utilizes a

logistic loss which saturates quickly. In addition, if D can easily differentiate between real and fake images,

D’s gradient vanishes and when D cannot provide gradient, G stops updating. In recent times, many

improvements have been introduced for handling the mode collapse problem as G produces samples based

on few modes rather than the whole data space. On the other hand, several objective (loss) functions have

been introduced to minimize a divergence different from the traditional GANs formulation. Further, several

solutions have been proposed to stabilize the training.

1.1. Motivation and Contributions

In recent times, GANs has achieved outstanding performance in producing natural images. However, there

exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to

inappropriate design of network architecture, use of objective function and selection of optimization

algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs

have been investigated based on techniques of re-engineered network architectures, new objective functions

and alternative optimization algorithms. To study GANs design and optimization solutions proposed to

handle GANs challenges in contiguous and coherent way, this survey proposes a novel taxonomy of different

GANs solutions. We define taxonomic classes and sub-classes addressing to structure the current works in

the most promising GANs research areas. By classifying proposed GANs design and optimization solutions

into different categories, we analyze and discuss them in a systematic way. We also outline major open issues

that can be pursued by researchers further.

There are a limited number of existing reviews on the topic of GANs. [13] discussed how GANs and state-

of-the-art GANs works. [14]–[16] provided a brief introduction of some of the GANs models, while [16] also

present development trends of GANs, and relation of GANs with parallel intelligence. [160] reviewed various

GANs methods from the perspectives of algorithms, theory, and applications. On the other hand, several

researchers reviewed specific topics related to GANs in detail. [17] reviewed GANs based image synthesis

and editing approaches. [18] surveyed threat of adversarial attacks on deep learning. [19] discussed various

types of adversarial attacks and defenses in detail.

Despite reviewing the state-of-the-art GANs, none of these surveys, to the best of our knowledge, has

particularly focused on broad and systematic view of the GANs developments introduced to address the

GANs challenges. In this study, our main aim is to comprehensively structure and summarize different GANs

design and optimization solutions proposed to alleviate GANs challenges, for researchers that are new to this

field.

Our Contributions. Our paper makes notable contributions summarized as follows:

New taxonomy. In this study, we identify key research issues within each design and optimization technique

and present a novel taxonomy to structure solutions by key research issues. Our proposed taxonomy will

facilitate researchers to enhance the understanding of the current developments handling GANs challenges

and future research directions.

Comprehensive survey. In accordance with the taxonomy, we provide the comprehensive review of

different solutions proposed to handle the major GANs challenges. For each type of solution, we provide

detailed descriptions and systematic analysis of the GANs variants and their relationships. But still, due to

wide range of GANs applications, different GANs variants are formulated, trained, and evaluated in a

heterogenous ways and direct comparison among these GANs is complicated. Therefore, we make a

necessary comparison and summarize the corresponding approaches w.r.to their novel solutions to address

GANs challenges. This survey can be used as a guide for understanding, using, and developing different

GANs approaches for various real-life applications.

Future directions. This survey also highlights the most promising future research directions.

1.2. Organization

In this paper, we first discuss three main components for designing and training GANs framework, analyze

challenges with GANs framework, and present a detailed understanding of the current developments handling

GANs challenges from the GANs design and optimization perspective.

Figure 3 shows the organization of the paper. Section 2 explains about the GANs framework from the

designing and training perspective. In Section 3, we present the challenges in the training of GANs. In Section

4, we identify key issues related to the design and training of GANs and present a novel taxonomy of GANs

solutions handling these key issues. In accordance with the taxonomy, Section 5, 6 and 7 summarizes GANs

design and optimization solutions, their pros and cons, and relationships. Section 7 discusses the future

directions and Section 8 summarizes the paper.

2. GENERATIVE ADVERSARIAL NETWORKS

Before discussing in detail about solutions for better design and optimization of GANs in the proposed

taxonomy, in this section we will provide an overview of GANs framework and main GANs design and

optimization components.

2.1. Overview

In recent years, generative models are continuously growing and have been applied well for a broad range of

real applications. Generative models’ compute the density estimation where model distribution p

model

learned to approximate the true and new data distribution p

data

. Methods to compute the density estimation

have two major concerns: selection of suitable objective (loss) function and appropriate selection of

formulation for the density function of p

model

. The selection of objective functions for generative model’s

training plays an important role for the better learning behaviors and performance [20][21]. The de-facto

standard of the most widely used objective is based on the maximum likelihood estimation theory in which

model parameters maximize the training data likelihood.

Researchers have shown that maximum likelihood is not a good option as training objective because a model

trained using maximum likelihood mostly overgeneralise and generate unplausible samples [20]. In addition,

marginal likelihood is intractable which requires a solution to overcome this for learning the model

parameters. One possible solution to handle the marginal likelihood intractability issue is not to compute it

ever and learn model parameters via a tool indirectly [22].

GANs achieves this by having a powerful D which have a capability to discriminate samples from p

data

and

model

. When D is unable to discriminate samples from p

data

and p

model

, then model has learned to generate

samples similar to the samples from the real data. A possible solution for formulating density function of

model

is to use an explicit density function in which maximum likelihood framework is followed for estimating

the parameters. Another possible solution is to use an implicit density function for estimating the data

distribution excluding analytical forms of p

model

, i.e., train a G where if real and generated data are mapped

to the feature space, they are enclosed in the same sphere [23][24]. However, GANs is the most notably

pioneered class of this possible solution.

GANs is an expressive class of generative models as it supports exact sampling and approximate estimation.

GANs learns high-dimensional distributions implicitly over images, audio, and data which are challenging

to model with an explicit likelihood. Basic GANs are algorithmic architectures of two neural networks

competing with each other to capture the real data distribution. Both neural nets try to optimize different and

opposing objective (loss) function in the zero-sum game to find (global) the Nash equilibrium. The three

main components for design and optimization of GANs are: (i) network architecture, (ii) objective (loss)

function, and (iii) optimization algorithm. There has been a large amount of works towards improving GANs

by re-engineering architecture [5][6][25], better objective functions [26]–[28], and alternative optimization

algorithms [29][30].

In the following sections, we shall discuss three main components for the GANs design and optimization,

namely network architecture, loss function and the optimization algorithm followed by the minimax

optimization for Nash equilibrium in detail.

2.2. Network Architecture

GANs learns to map the simple latent distribution to the more complex data distribution. GANs is based on

the concept of a non-cooperative game of two networks, a generator G and a discriminator D, in which G and

D plays against each other. GANs can be part of deep generative models or generative neural models where

G and D are parameterized via neural networks and updates are made in parameter space.

Figure 3. Basic GANs Architecture

Both G and D play a minimax game where G’s main aim is to produce samples similar to the samples

produced from real data distribution and D’s main goal is to discriminate the samples generated by G and

samples generated from the real data distribution by assigning higher and lower probabilities to samples from

real data and generated by G, respectively. On the other hand, the main target of GANs training is to keep

moving the generated samples in the direction of the real data manifolds through the use of the gradient

information from D (see Figure 3).

2.3. Loss Function

In GANs, x is data extracted from the real data distribution,

"#$#

, noise vector z is taken from a Gaussian

prior distribution with zero-mean and unit variance

, while

refers the G’s distribution over data x. Latent

vector z is passed to G as an input and then G outputs an image G(z) with the aim that D cannot differentiate

between G(z) and D(x) data samples, i.e., G(z) resembles with D(x) as close as possible. In addition, D

simultaneously tries to restrain itself from getting fooled by G. D is a classifier where D(x) = 1 if x '

"#$#

and D(x) = 0 if x '

, i.e., x is from

"#$#

or from

. The following minimax objective applied for training

G and D models jointly via solving:

()*

(-.

234

67()*

(-.

;<=

>?@?

ABCD41E5FG7:

%<=

BCD7

KL74

77777777777777777777777777777777

(1)

234

is a binary cross entropy function, commonly used in binary classification problems [31]. In Eq. 1,

for updating the model parameters, training of G and D are performed by backpropagating the loss via their

respective models. In practice, Eq. 1 is solved by alternating the following two gradient updates:

$ST

where

is the parameter of G,

is the parameter D, λ is the learning rate, and t is the iteration number.

In practice, second term in Eq. 1,

BCD

KL4

saturates and makes insufficient gradient flow through

G, i.e., gradients value gets smaller and stop learning. To overcome the vanishing gradient problem, the

objective function in Eq. 1 is reframed into two separate objectives:

(-.

;<=

>?@?

ABCD41E5FG7:

%<=

BCD7

KL74

and

(-.

%<=

BCD

(2)

Moreover, G’s gradient for these two separate objectives have the same fixed points and are always trained

in the same direction.

Noise

2D Gaussian

Generator (G)

Samples

Discriminator (D)

Prediction of

samples

Real

Fake

Real

Fake

D lossG loss

Gradients

剩余40页未读，继续阅读

评论收藏

内容反馈

syp_net

粉丝: 158
资源: 1187

生成式对抗网络研究综述（2020年05月最新版本）

最新《生成式对抗网络》技术综述

生成对抗网络的研究进展综述1

最新《生成式对抗网络GAN进展》综述论文

GAN生成对抗网络综述.pdf

生成式对抗网络GAN综述

文本对抗样本攻击与防御技术综述

生成对抗网络综述：算法、理论与应用.rar

《GANs生成式对抗网络综述：算法、理论与应用》【密歇根大学28页综述论文】.zip

生成式对抗网络GAN的研究进展与展望_王坤峰.pdf

生成对抗网络研究综述.docx

最新《生成式对抗网络异常检测》综述论文

GAN-生成式对抗网络

Python-展开的生成式对抗网络

深度学习生成式对抗网络综述.pdf

生成式对抗网络的通信网络安全技术分析.pdf

生成式对抗网络GAN 综述

生成式对抗神经网络

生成式对抗网络研究进展_王万良

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

机器学习期末复习题及答案

神经网络回归预测--气温数据集

Ollama软件windows安装包(版本0.3.10)

Mathwork+Matlab+编程手册

中文短信数据集-带标签

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！

最新资源