没有合适的资源?快使用搜索试试~ 我知道了~
Li-StoryGAN-A-Sequential-Conditional-GAN-for-Story-Visualization
0 下载量 106 浏览量
2023-12-25
16:17:18
上传
评论
收藏 951KB PDF 举报
温馨提示
试读
10页
Li-StoryGAN-A-Sequential-Conditional-GAN-for-Story-Visualization
资源推荐
资源详情
资源评论
StoryGAN: A Sequential Conditional GAN for Story Visualization
Yitong Li
∗ 1
, Zhe Gan
2
, Yelong Shen
4
, Jingjing Liu
2
, Yu Cheng
2
, Yuexin Wu
5
,
Lawrence Carin
1
, David Carlson
1
and Jianfeng Gao
3
1
Duke University,
2
Microsoft Dynamics 365 AI Research,
3
Microsoft Research
4
Tencent AI Research,
5
Carnegie Mellon University
Abstract
In this work, we propose a new task called Story Vi-
sualization. Given a multi-sentence paragraph, the story
is visualized by generating a sequence of images, one for
each sentence. In contrast to video generation, story vi-
sualization focuses less on the continuity in generated im-
ages (frames), but more on the global consistency across dy-
namic scenes and characters – a challenge that has not been
addressed by any single-image or video generation meth-
ods. Therefore, we propose a new story-to-image-sequence
generation model, StoryGAN, based on the sequential con-
ditional GAN framework. Our model is unique in that it
consists of a deep Context Encoder that dynamically tracks
the story flow, and two discriminators at the story and im-
age levels, to enhance the image quality and the consistency
of the generated sequences. To evaluate the model, we mod-
ified existing datasets to create the CLEVR-SV and Pororo-
SV datasets. Empirically, StoryGAN outperformed state-
of-the-art models in image quality, contextual consistency
metrics, and human evaluation.
1. Introduction
Learning to generate meaningful and coherent sequences
of images from a natural language story is a challenging
task that requires understanding and reasoning on both nat-
ural language and images. In this work, we propose a new
Story Visualization task. Specifically, the goal is to generate
a sequence of images to describe a story written in a multi-
sentence paragraph, as shown in Figure
1.
There are two main challenges in this task. First, the se-
quence of images must consistently and coherently depict
the whole story. This task is highly related to text-to-image
generation [
35, 28, 17, 36, 34], where an image is generated
∗
This work was done while the first author was an intern at Microsoft
Dynamics 365 AI Research.
Pororo and Crong are
fishing together.
Crong is looking at
the bucket.
Pororo has a fish on
his fishing rod.
Real/Fake? Coherent to text? Consistent?
“Pororo and Crong are
fishing together. Crong
is looking at the
bucket.
Pororo has a fish on his
fishing rod.”
“Pororo
and Crong are
fishing together. Crong
is looking at the bucket.
Pororo has a fish on his
fishing rod.
”
“Pororo
and Crong are
fishing together. Crong
is looking at the bucket.
Pororo has a fish on his
fishing rod.”
Image
Generator
Image
Generator
Image
Generator
+
+ +
Figure 1: The input story is “Pororo and Crong are fishing to-
gether. Crong is looking at the bucket. Pororo has a fish on his
fishing rod.” Each sentence is visualized with one image. In this
work, the image generation for each sentence is enriched with con-
textual information from the Context Encoder. Two discriminators
at different levels guide the generation process.
based on a short description. However, by sequentially ap-
plying text-to-image methods to a story will not generate a
coherent image sequence, failing on the story visualization
task. For instance, consider the story “A red metallic cylin-
der cube is at the center. Then add a green rubber cube at
the right.” The second sentence alone does not capture the
entire scene.
The second challenge is how to display the logic of the
storyline. Specifically, the appearance of objects and the
layout in the background must evolve in a coherent way
as the story progresses. This is similar to video genera-
tion. However, story visualization and video generation dif-
fer as: (i) Video clips are continuous with smooth motion
transitions, so video generation models focus on extract-
ing dynamic features to maintain realistic motions [
32, 31].
1
6329
资源评论
全是头发的羊羊羊
- 粉丝: 302
- 资源: 14
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功