没有合适的资源?快使用搜索试试~ 我知道了~
Image2StyleGAN++翻译1
需积分: 0 0 下载量 143 浏览量
2022-08-04
13:02:20
上传
评论
收藏 2.06MB PDF 举报
温馨提示
试读
20页
摘要We propose Image2StyleGAN++, a flexible image edit-ingframework with many appl
资源详情
资源评论
资源推荐
Image2StyleGAN++: How to Edit the Embedded Images?
图像 2stylegan + + : 如何编辑嵌入式图像?
Rameen Abdal
拉面 Abdal
Yipeng Qin
秦一鹏
Peter Wonka
Peter Wonka
KAUST
卡斯特
Cardiff University
卡迪夫大学
KAUST
卡斯特
rameen.abdal@kaust.edu.sa
Rameen.abdal@kaust.edu. sa
qiny16@cardiff.ac.uk
Qini16@cardiff.ac.uk
pwonka@gmail.co
m
Pwonka@gmail.
com
(a) (b) (c) (d)
(a)(b)(c)(d)
Figure 1: (a) and (b): input images; (c): the “two-face” generated by naively copying the left half from (a) and the right half
图 1: (a)和(b) : 输入图像; (c) : 通过天真地从(a)和右边复制左边的一半而产生的“双面”
from (b); (d): the “two-face” generated by our Image2StyleGAN++ framework.
来自(b) ; (d) : 我们的 Image2StyleGAN + + 框架生成的“双面”。
Abstract
摘要
We propose Image2StyleGAN++, a flexible image edit-ing
framework with many applications. Our framework ex-tends
the recent Image2StyleGAN [1] in three ways. First, we
introduce noise optimization as a complement to the W
+
latent space embedding. Our noise optimization can restore
high frequency features in images and thus significantly im-
proves the quality of reconstructed images, e.g. a big in-
crease of PSNR from 20 dB to 45 dB. Second, we extend the
global W
+
latent space embedding to enable local embed-
dings. Third, we combine embedding with activation tensor
manipulation to perform high quality local edits along with
global semantic edits on images. Such edits motivate vari-ous
high quality image editing applications, e.g. image re-
construction, image inpainting, image crossover, local style
transfer, image editing using scribbles, and attribute level
feature transfer. Examples of the edited images are shown
across the paper for visual inspection.
我们提出
Image2StyleGAN + +
,一个灵活的图像编辑
框架,具有许多应用程序。我们的框架在三个方面扩展
了最近的
Image2StyleGAN [1]
。首先,我们引入噪声优化
作为
w +
潜在空间嵌入的补充。我们的噪声优化可以恢
复图像中的高频特征,从而显著提高重建图像的质量,
如
PSNR
从
20
分贝大幅提高到
45
分贝。其次,我们扩展
了全局
w +
潜在空间嵌入以实现局部嵌入。第三,我们
将嵌入与激活张量操作相结合,以执行高质量的局部编
辑以及图像上的全局语义编辑。这样的编辑激发了各种
高质量的图像编辑应用,如图像重建、图像修复、图像
交叉、局部风格转换、使用涂鸦的图像编辑和属性级别
的特征转换。编辑过的图像的例子在纸上显示,以供视
觉检查。
1. Introduction
引言
Recent GANs [19, 6] demonstrated that synthetic im-
ages can be generated with very high quality. This mo-
tivates research into embedding algorithms that embed a
given photograph into a GAN latent space. Such embed-
最近甘斯[19,6]证明,合成图像可以产生非常高的
质量。这激发了将给定照片嵌入 GAN 潜在空间的嵌
入算法的研究。这样的嵌入
ding algorithms can be used to analyze the limitations of
GANs [5], do image inpainting [8, 39, 38, 36], local im-
age editing [40, 17], global image transformations such as
image morphing and expression transfer [1], and few-shot
video generation [35, 34].
Ding 算法可用于分析 GANs [5]的局限性,进行图像修
补[8,39,38,36] ,局部图像编辑[40,17] ,全局图像变形
和表达式转换[1] ,少镜头视频生成[35,34]。
In this paper, we propose to extend a very recent em-
bedding algorithm, Image2StyleGAN [1]. In particular, we
would like to improve this previous algorithm in three as-
pects. First, we noticed that the embedding quality can be
further improved by including Noise space optimization into
the embedding framework. The key insight here is that stable
Noise space optimization can only be conducted if the
optimization is done sequentially with W
+
space and not
jointly. Second, we would like to improve the capabili-ties of
the embedding algorithm to increase the local control over
the embedding. One way to improve local control is to
include masks in the embedding algorithm with undefined
content. The goal of the embedding algorithm should be to
find a plausible embedding for everything outside the mask,
while filling in reasonable semantic content in the masked
pixels. Similarly, we would like to provide the option of
approximate embeddings, where the specified pixel colors are
only a guide for the embedding. In this way, we aim to
achieve high quality embeddings that can be controlled by
user scribbles. In the third technical part of the paper, we
investigate the combination of embedding algorithm and di-
在本文中,我们提出扩展一个非常新的嵌入算法,
Image2StyleGAN [1]。特别是,我们想从三个方面改进以
前的算法。首先,我们注意到嵌入质量可以通过在嵌入
框架中引入噪声空间优化来进一步提高。这里的关键观
点是,稳定的噪声空间优化只能进行,如果优化是顺序
进行的 w + 空间,而不是联合。其次,我们希望提高嵌
入算法的能力,以增加对嵌入的局部控制。提高局部控
制的一个方法是在嵌入算法中包含含有未定义内容的掩
码。嵌入算法的目标应该是为掩码之外的所有内容找到
一个合理的嵌入,同时在掩码像素中填充合理的语义内
容。同样,我们也想提供近似嵌入的选项,其中指定的
像素颜色只是嵌入的指南。通过这种方式,我们的目标
是实现高质量的嵌入,可以通过用户涂鸦来控制。在论
文的第三个技术部分,我们研究了嵌入算法和 di- 的结合
18296
18296
rect manipulations of the activation maps (called
activation tensors in our paper).
激活图的直接操作(在我们的论文中称为激活张量)。
Our main contributions are:
我们的主要贡献是:
1. We propose Noise space optimization to restore the
high frequency features in an image that cannot be re-
produced by other latent space optimization of GANs.
The resulting images are very faithful reconstructions
of up to 45 dB compared to about 20 dB (PSNR) for
the previously best results.
我们提出噪声空间优化来恢复图像中的高频特征,
这些特征是其他甘斯潜在空间优化不能再现的。
由此产生的图像是非常忠实的重建高达 45 分贝,
而约 20 分贝(PSNR)为以前最好的结果。
2. We propose an extended embedding algorithm into the
我们提出了一个扩展嵌入算法到
W
+
space of StyleGAN that allows for local
modifica-tions such as missing regions and locally
approximate embeddings.
StyleGAN 的 w + 空间,允许局部修改,如缺失
区域和局部近似嵌入。
3. We investigate the combination of embedding and acti-
vation tensor manipulation to perform high quality lo-
cal edits along with global semantic edits on images.
我们研究嵌入和激活张量操作的组合,以执行高质量的
局部编辑以及图像上的全局语义编辑。
4. We apply our novel framework to multiple image
edit-ing and manipulation applications. The results
show that the method can be successfully used to
develop a state-of-the-art image editing software.
我们将我们的新框架应用于多种图像编辑和处理应
用。结果表明,该方法可以成功地用于开发最先
进的图像编辑软件。
2. Related Work
相关工作
Generative Adversarial Networks (GANs) [14, 29] are one
of the most popular generative models that have been
successfully applied to many computer vision applications,
e.g. object detection [23], texture synthesis [22, 37, 31],
image-to-image translation [16, 42, 28, 25] and video gen-
eration [33, 32, 35, 34]. Backing these applications are the
massive improvements on GANs in terms of architec-ture [19,
6, 28, 16], loss function design [26, 2], and regu-larization
[27, 15]. On the bright side, such improvements significantly
boost the quality of the synthesized images. To date, the two
highest quality GANs are StyleGAN [19] and BigGAN [6].
Between them, StyleGAN produces excellent results for
unconditional image synthesis tasks, especially on face
images; BigGAN produces the best results for con-ditional
image synthesis tasks (e.g. ImageNet [9]). While on the dark
side, these improvements make the training of GANs more
and more expensive that nowadays it is almost a privilege of
wealthy institutions to compete for the best performance. As
a result, methods built on pre-trained gen-erators start to
attract attention very recently. In the follow-ing, we would
like to discuss previous work of two such ap-proaches:
embedding images into a GAN latent space and the
manipulation of GAN activation tensors.
生成对抗网络[14,29]是最流行的生成模型之一,已成
功应用于许多计算机视觉应用,如目标检测[23] ,纹理
合成[22,37,31] ,图像到图像的转换[16,42,28,25]和视频生
成[33,32,35,34]。支持这些应用的是 GANs 在体系结构方
面的巨大改进[19,6,28,16] ,损失函数设计[26,2]和规范化
[27,15]。好的一面是,这样的改进显着提高了合成图像
的质量。迄今为止,两个质量最高的甘斯是 StyleGAN
[19]和 BigGAN [6]。在两者之间,StyleGAN 在无条件图
像合成任务中,尤其是在人脸图像上,产生了极好的结
果; BigGAN 在有条件的图像合成任务中,产生了最好的
结果(例如 ImageNet [9])。然而,这些改进使得甘斯的培
训费用越来越昂贵,如今,竞争最佳表现几乎成了富裕
机构的特权。因此,建立在预先训练的发电机上的方法
最近开始引起人们的注意。在下文中,我们将讨论以前
两种方法的工作: 在 GAN 潜在空间中嵌入图像和操纵
GAN 活化张量。
Latent Space Embedding. The embedding of an image into
the latent space is a longstanding topic in both machine
learning and computer vision. In general, the embedding
潜在空间嵌入。将图像嵌入到潜在空间是机器学习和计
算机视觉中一个长期存在的话题。一般来说,嵌入
can be implemented in two ways: i) passing the input im-age
through an encoder neural network (e.g. the Variational Auto-
Encoder [21]); ii) optimizing a random initial latent code to
match the input image [41, 7]. Between them, the first
approach dominated for a long time. Although it has an
inherent problem to generalize beyond the training dataset, it
produces higher quality results than the naive latent code
optimization methods [41, 7]. While recently, Abdal et al. [1]
obtained excellent embedding results by optimizing the latent
codes in an enhanced W
+
latent space instead of the initial Z
latent space. Their method suggests a new direc-tion for
various image editing applications and makes the second
approach interesting again.
可以通过 两种方式实现: i)通过编 码器神经网 络(例如
Variational Auto-Encoder [21])传递输入图像; ii)优化随机
初始潜在代码以匹配输入图像[41,7]。在他们之间,第一
种方法占主导地位很长时间。虽然它有一个固有的问题,
泛化超出了训练数据集,它产生了更高的质量结果比幼
稚的潜在代码优化方法[41,7]。最近,Abdal 等[1]通过在
增强的 w + 潜在空间中优化潜在码而不是在初始的 z 潜
在空间中优化潜在码,得到了很好的嵌入结果。他们的
方法为各种图像编辑应用程序提供了一个新的方向,并
使第二种方法再次变得有趣。
Activation Tensor Manipulation. With fixed neural net-
work weights, the expression power of a generator can be
fully utilized by manipulating its activation tensors. Based on
this observation, Bau [4] et al. investigated what a GAN can
and cannot generate by locating and manipulating rel-evant
neurons in the activation tensors [4, 5]. Built on the
understanding of how an object is “drawn” by the genera-tor,
they further designed a semantic image editing system that
can add, remove or change the appearance of an object in an
input image [3]. Concurrently, Fruhst¨uck¨ et al. [11]
investigated the potential of activation tensor manipulation in
image blending. Observing that boundary artifacts can be
eliminated by by cropping and combining activation tensors
at early layers of a generator, they proposed an algorithm to
create large-scale texture maps of hundreds of megapixels by
combining outputs of GANs trained on a lower resolu-tion.
激活张量操作。使用固定的神经网络权重,通过操纵激
活张量,可以充分利用发生器的表达能力。基于这一观
察,Bau [4]等人通过定位和操纵激活张量中的相关神经
元来研究 GAN 能够和不能产生什么[4,5]。在理解生成器
如何“绘制”对象的基础上,他们进一步设计了一个语义
图像编辑系统,可以添加、删除或改变输入图像中对象
的外观[3]。同时,Fruhst uck 等[11]研究了激活张量操作
在图像混合中的潜力。他们观察到边界伪影可以通过裁
剪和合并生成器早期层的激活张量来消除,他们提出了
一种算法,通过合并受过较低分辨率训练的 GANs 的输
出来创建数百万像素的大规模纹理映射。
3. Overview
3. 概览
Our paper is structured as follows. First, we describe an
extended version of the Image2StyleGAN [1] embedding
algorithm (See Sec. 4). We propose two novel modifica-tions:
1) to enable local edits, we integrate various spatial masks
into the optimization framework. Spatial masks en-able
embeddings of incomplete images with missing values and
embeddings of images with approximate color values such as
user scribbles. In addition to spatial masks, we ex-plore layer
masks that restrict the embedding into a set of selected layers.
The early layers of StyleGAN [19] encode content and the
later layers control the style of the image. By restricting
embeddings into a subset of layers we can better control what
attributes of a given image are extracted.
我们的论文结构如下。首先,我们描述了
Image2StyleGAN [1]嵌入算法的扩展版本(见第 4 节)。我
们提出了两个新的修改: 1)使局部编辑成为可能,我们将
各种空间掩模整合到优化框架中。空间掩码能够嵌入缺
失值的不完整图像和具有近似颜色值的图像,如用户涂
鸦。除了空间蒙版,我们还探索了层蒙版,它限制了嵌
入到一组选定的图层中。StyleGAN [19]的早期层对内容
进行编码,后面的层控制图像的样式。通过限制嵌入到
一个子集的图层,我们可以更好地控制什么属性的给定
图像被提取。
2) to further improve the embedding quality, we optimize
for an additional group of variables n that control additive
noise maps. These noise maps encode high frequency de-
tails and enable embedding with very high reconstruction
quality.
为了进一步提高嵌入质量,我们优化了一组额外的变
量 n 控制加性噪声映射。这些噪声图对高频细节进行
编码,并使嵌入具有非常高的重建质量。
Second, we explore multiple operations to directly ma-
nipulate activation tensors (See Sec. 5). We mainly explore
其次,我们探索多种操作直接操纵激活张量(见第 5
节)。我们主要探索
8297
8297
剩余19页未读,继续阅读
Msura
- 粉丝: 62
- 资源: 323
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0