Li-StoryGAN-A-Sequential-Conditional-GAN-for-Story-Visualization资源-CSDN文库

深度学习

106 浏览量 2023-12-25 16:17:18 上传评论收藏 951KB PDF 举报

资源推荐

资源详情

资源评论

StoryGAN: A Sequential Conditional GAN for Story Visualization

Yitong Li

∗ 1

, Zhe Gan

, Yelong Shen

, Jingjing Liu

, Yu Cheng

, Yuexin Wu

Lawrence Carin

, David Carlson

and Jianfeng Gao

Duke University,

Microsoft Dynamics 365 AI Research,

Microsoft Research

Tencent AI Research,

Carnegie Mellon University

Abstract

In this work, we propose a new task called Story Vi-

sualization. Given a multi-sentence paragraph, the story

is visualized by generating a sequence of images, one for

each sentence. In contrast to video generation, story vi-

sualization focuses less on the continuity in generated im-

ages (frames), but more on the global consistency across dy-

namic scenes and characters – a challenge that has not been

addressed by any single-image or video generation meth-

ods. Therefore, we propose a new story-to-image-sequence

generation model, StoryGAN, based on the sequential con-

ditional GAN framework. Our model is unique in that it

consists of a deep Context Encoder that dynamically tracks

the story ﬂow, and two discriminators at the story and im-

age levels, to enhance the image quality and the consistency

of the generated sequences. To evaluate the model, we mod-

iﬁed existing datasets to create the CLEVR-SV and Pororo-

SV datasets. Empirically, StoryGAN outperformed state-

of-the-art models in image quality, contextual consistency

metrics, and human evaluation.

1. Introduction

Learning to generate meaningful and coherent sequences

of images from a natural language story is a challenging

task that requires understanding and reasoning on both nat-

ural language and images. In this work, we propose a new

Story Visualization task. Speciﬁcally, the goal is to generate

a sequence of images to describe a story written in a multi-

sentence paragraph, as shown in Figure

There are two main challenges in this task. First, the se-

quence of images must consistently and coherently depict

the whole story. This task is highly related to text-to-image

generation [

35, 28, 17, 36, 34], where an image is generated

∗

This work was done while the ﬁrst author was an intern at Microsoft

Dynamics 365 AI Research.

Pororo and Crong are

fishing together.

Crong is looking at

the bucket.

Pororo has a fish on

his fishing rod.

Real/Fake? Coherent to text? Consistent?

“Pororo and Crong are

fishing together. Crong

is looking at the

bucket.

Pororo has a fish on his

fishing rod.”

“Pororo

and Crong are

fishing together. Crong

is looking at the bucket.

Pororo has a fish on his

fishing rod.

”

“Pororo

and Crong are

fishing together. Crong

is looking at the bucket.

Pororo has a fish on his

fishing rod.”

Image

Generator

Image

Generator

Image

Generator

+ +

Figure 1: The input story is “Pororo and Crong are ﬁshing to-

gether. Crong is looking at the bucket. Pororo has a ﬁsh on his

ﬁshing rod.” Each sentence is visualized with one image. In this

work, the image generation for each sentence is enriched with con-

textual information from the Context Encoder. Two discriminators

at different levels guide the generation process.

based on a short description. However, by sequentially ap-

plying text-to-image methods to a story will not generate a

coherent image sequence, failing on the story visualization

task. For instance, consider the story “A red metallic cylin-

der cube is at the center. Then add a green rubber cube at

the right.” The second sentence alone does not capture the

entire scene.

The second challenge is how to display the logic of the

storyline. Speciﬁcally, the appearance of objects and the

layout in the background must evolve in a coherent way

as the story progresses. This is similar to video genera-

tion. However, story visualization and video generation dif-

fer as: (i) Video clips are continuous with smooth motion

transitions, so video generation models focus on extract-

ing dynamic features to maintain realistic motions [

32, 31].

6329

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

全是头发的羊羊羊

粉丝: 302
资源: 14

Li-StoryGAN-A-Sequential-Conditional-GAN-for-Story-Visualization

StoryGAN:StoryGAN

CosRec- 2D Convolutional Neural Networks for Sequential Recommendation-重点分析.pdf

Context_aware Sequential Recommender

Sequential anaerobic-aerobic treatment for domestic wastewater - A review

Algorithms Sequential and Parallel A Unified -- A Unified Approach

Processor Arch-Sequential

论文Sequential Minimal Optimization for SVM（smo）--有smo程序

The Case for a Single-Chip Multiprocessor

Fault-Modeling-of-Combinational-and-Sequential-Cir

Sequential Minimal Optimization:A Fast Algorithm for Training Support Vector.pdf

Sequential Analysis - Hypothesis Testing and Changepoint Detection

施舟行__A Deep Sequential Model for Discourse Parsing on Multi-Part

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Sequential.Logic.and.Verilog.HDL.Fundamentals.pdf

Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

A filter-sequential semidefinite programming method for nonlinear semidefinite programming

Sequential Minimal Optimization for SVM

Sequential Minimal Optimization A Fast Algorithm for Training

《点燃我温暖你》中李峋的同款爱心代码

122版本Chrome最新驱动-122.0.6261.58

第十五届蓝桥杯大赛软件赛省赛-PythonB组题目

Python入门基础教程全套.ppt

Stable Diffusion WebUI linux部署问题

Tesseract最新中文语言包chi-sim.traineddata

第十五届蓝桥杯大赛软件赛省赛-PythonA组题目

PyCharm安装教程一篇搞定包括下载PyCharm、安装PyCharm、PyCharm简单使用教程

Python学习笔记(干货) 中文PDF完整版.pdf

谷歌浏览器驱动最新版(123.0.6312.122)

抢购haiwei.rar

最新资源