Semi-Automatic2D-to-3DConversionUsingdisparitypropagation资源-CSDN文库

需积分: 10 13 浏览量 2014-10-10 17:36:40 上传评论收藏 1.2MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 2, JUNE 2011 491

Semi-Automatic 2D-to-3D Conversion Using

Disparity Propagation

Xun Cao, Student Member, IEEE, Zheng Li, and Qionghai Dai, Senior Member, IEEE

Abstract—Estimating 3D information from an image sequence

has long been a challenging problem, especially for dynamic

scenes. In this paper, a novel semi-automatic 2D-to-3D conversion

method is presented to estimate the disparity maps for regular

2D video shots. Our method requires only a few user-scribbles

on very sparse key frames, and then other frames of the video

are converted to 3D automatically. Multiple objects are ﬁrst

segmented by the input user-scribbles. Then, the initial disparity

map is assigned to each key frame with the aid of various preset

disparity models for each object. After the disparity assignment

step, disparity maps for other frames of the video are obtained

through a disparity propagation strategy taking into account both

color similarity and motion information. Finally, the 3D video

is synthesized according to the type of 3D display device. Our

method is veriﬁed on different kinds of challenging sequences

containing occlusion, textureless regions, color ambiguity, large

displacement movements, etc. The experimental results show

that our method has better performance than the state-of-the-art

2D-to-3D conversion systems.

Index Terms—Broadcasting, multimedia system, three-dimen-

sional vision.

I. INTRODUCTION

3D can be regarded as the next revolution for many applica-

tions such as television, movies, and video games. The smash

hit movie “Avatar” has demonstrated great success in the use

of 3D and announced the approach of the 3D era. However,

the tremendous production cost and the complicated 3D gener-

ation process have revealed another fact, that is, despite the re-

markable development of stereoscopic display technologies and

3D display devices (e.g. stereo projectors, auto-stereoscopic dis-

plays, holographic), there is still little 3D content to be played

on these systems. The lack of 3D content is becoming a severe

bottleneck for the entire 3D industry (see Fig. 1).

The purpose of 2D-to-3D conversion techniques is to esti-

mate 3D information from monocular video shots, which is

useful for converting conventional 2D video into 3D content

[1]. An effective and efﬁcient 2D-to-3D conversion technique

can lead to 3D content creation at a lower cost and with less

time. Moreover, 2D-to-3D conversion can make full use of the

Manuscript received July 15, 2010; revised January 10, 2011; accepted Feb-

ruary 07, 2011. Date of publication April 19, 2011; date of current version

May 25, 2011. The work was supported by the National Basic Research Project

(2010CB731800) and the Key Projects of NSFC (61035002 and 60932007).

The authors are with the Department of Automation, Tsinghua University

and Tsinghua National Laboratory for Information Science and Technology

(TNList), Beijing 100084, China (e-mail: cao-x06@mails.tsinghua.edu.cn;

lizheng07@mails.tsinghua.edu.cn; qhdai@tsinghua.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TBC.2011.2127650

vast amount of old material generated many years ago, which

is almost impossible to re-capture in 3D. As a result, 2D-to-3D

conversion techniques can greatly alleviate the serious shortage

of 3D content. Generally, current 2D-to-3D conversion algo-

rithms can be divided into two categories: 1) semi-automatic

methods with human-computer interactive operations [2]–[6],

2) fully-automatic methods which directly output 3D video

from 2D input without any user interactions involved [7]–[9].

Semi-automatic 2D-to-3D conversion unsurprisingly has better

performance than fully-automatic methods because of the

high-level knowledge provided by users. However, the study of

automatic 2D-to-3D conversion methods is still very necessary

because human participation is impractical in many scenarios.

In most semi-automatic 2D-to-3D conversion frameworks

[4]–[6], certain frames (

key frames) of the video sequence are

annotated with 3D information (e.g., depth or disparity informa-

tion) by users, and other frames (non-key frames) are converted

to 3D automatically. This framework is feasible based on the

observation that most frames are similar to others and there

exists much correlation within a single video shot. Some other

works deal with 2D-to-3D conversion from a machine learning

perspective [6], [10]; they ﬁrst train from user-annotated pixels,

then infer the 3D information of other pixels based on their

training. These methods, with key frames and non-key frames,

are also semi-automatic. Nevertheless, a tradeoff must be faced

between labor cost and 3D conversion effect. Better 3D effects

are expected if more time is spent annotating key frames, with

an extreme case being the manual conversion of all frames in

the video, which would be extraordinarily time-consuming and

almost impossible in practice. Therefore, recent efforts have

been made on 3D conversion with just a few user-scribbles

[4]. These methods greatly facilitate user operation, but the

conversion results still leave much room for improvement. In

this paper, we propose a convenient semi-automatic 2D-to-3D

conversion scheme with only a few user strokes while main-

taining high accuracy in 3D effects. A dense disparity map on

each key frame is ﬁrst generated with the aid of strokes. This

includes two steps: multiple object segmentation and disparity

assignment. Then, the dense disparity maps are propagated

from the key frames to non-key frames. Although a similar

disparity propagation strategy is presented in [5], there remain

many problems such as occlusion, large displacement move-

ments, color ambiguity, textureless regions, blurred edges,

and camera zoom in/out. In the following, we will detail the

proposed 2D-to-3D scheme and show how to tackle the afore-

mentioned issues. The most important contributions of the

paper are: 1) a convenient multiple objects segmentation tool

called “multi-snapping” together with disparity assignment to

492 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 2, JUNE 2011

Fig. 1. Three-dimensional television (3DTV) system diagram. A typical 3DTV system includes 3D content generation, coding and transmission, 3D reco

nstruc-

tion, and 3D display. The lack of 3D content is currently the bottleneck of the entire 3DTV system.

provide a dense disparity map on each key frame; 2) a novel

disparity propagation algorithm using shifted bilateral ﬁltering

(SBF) which can take advantage of both local and temporal

information; 3) a bi-directional propagation strategy to tackle

the camera zoom in/out problem, with motion compensation to

improve object edges.

The rest of the paper is organized as follows: some related

works are introduced in Section II; the proposed semi-automatic

2D-to-3D framework is described in Section III, including each

step of the algorithm pipeline. Experimental results are demon-

strated in Section IV, and we conclude our paper with a discus-

sion of future work in Section V.

II. R

ELATED WORK

Recovering 3D information from an image sequence is a

classical image analysis problem if the camera moves while

the scene remains static or has just a few rigid movements.

This research topic, called structure-from-motion, is different

from 2D-to-3D conversion, which deals with video of dynamic

scenes. Structure-from-motion has occupied researchers in

computer vision and artiﬁcial intelligence for a long time.

Generally speaking, structure-from-motion techniques ﬁrst

extract corresponding feature points over an image sequence,

then analyze the motion and camera geometry based on these

feature points, and ﬁnally reconstruct the scene structure and

object surfaces. Faugeras [11] and Huang and Netravali [12]

each provide classical reviews on the structure-from-motion

problem. Research on structure-from-motion has recently

achieved some important progress [13], [14], mainly because

of the emergence of new feature extraction algorithms and

optimization methods. Novel feature extraction and matching

algorithms such as SIFT [15] can provide more accurate and

robust data for the following processing steps. Meanwhile,

the development of optimization methods such as graph-based

approaches [16] has also provided new tools for global op-

timization and smoother results. Knorr et al. [17] proposed

a system which combines moving objects segmentation and

structure-from-motion. This work reconstructs a 3D scene from

TV broadcast video containing independently moving objects

Fig. 2. Basic principle of the time difference method. The left and right views

of the stereo image pair are selected from different frames of the video sequence.

(IMO), but is unable to handle more complicated cases such as

object deformation.

Besides structure-from-motion, there is other shape-from-X

research dedicated to recovering 3D shape from other depth

clues, such as shape-from-shading and shape-from-focus. How-

ever, most of this research still deals with static scenes. As for ar-

bitrary video shots containing dynamic scenes, many techniques

have been reported in recent years. We provide a short review

here, organized by different types of approaches.

A. Time Difference Methods

Time difference methods generally extract stereo image pairs

from the video for 3D computation. The left view and right

view of the stereo image pair are selected from among the video

frames according to a given time difference [18]. For example,

if the camera and background are static and the foreground ob-

ject is moving from left to right, the left view image and right

view image are chosen as a certain frame and the frame at a cer-

tain time interval later (see Fig. 2). Planar transformations [8]

and stereo occlusions [19] are discussed in later works based on

the time difference idea. Zhang et al. [20] formulates stereo pair

剩余8页未读，继续阅读

评论收藏

内容反馈

tju_hanmengxin

粉丝: 0
资源: 1

Semi-Automatic 2D-to-3D Conversion Using disparity propagation

最新资源

Semi-Automatic 2D-to-3D Conversion Using disparity propagation

2Dto3D图像方法总结

CSS3之2D与3D变换的实现方法

2D-3D可转换显示器

Logarithmic Voltage-to-Time Converter for Analog-to-Digital Signal Conversion

Book-CIRCUITS FOR DC-TO-DC POWER CONVERSION

2D to 3D Conversion with Motion-Type Adaptive DepthEstimation

Sampling Theory and Analog-to-Digital Conversion --2016 [204].pdf

Back-propagation with diversive curiosity: An automatic conversion from search stagnation to exploration

Ultrahigh speed OOK-to-PSK conversion using linear filtering in silicon ring resonators

2D图块转换3D模型：2D To 2.5D&3D Conversion Pack v1.01

Ascii-Char.zip_char to ascii_conversion

All-optical NRZ-to-AMI conversion using linear filtering effect of silicon microring resonator

AlgoLab Raster to Vector Conversion SDK

Geodetic Toolbox

[IBM] - Oracle to DB2 Udb Conversion Guide.chm

OpenXML

PADS-PCB-to-Allegro-brd-Conversion

Conversion LUT - Apple Log to Rec709.cube

Mastering Digital Television: The Complete Guide to the DTV Conversion

array-length-number-conversion.rar_conversion

Theoretical study of all-optical RZ-OOK to NRZ-OOK format conversion in uniform FBG for mixed line-rate DWDM systems

2D/3D Mixed Service in T-DMB System Using Depth Image Based Rendering

Learning.Microcontrollers.From.Assembly.Language.to.C.Using.the.PIC24

CS1-CJ1 Floating Point to Fixed Point Conversion for HMI.rar

Java Language Conversion Assistant

最新资源