基于恰到好处的渲染失真模型的深度视频时空相关增强算法资源-CSDN文库

133 浏览量 2021-04-13 04:10:30 上传评论收藏 2.85MB PDF 举报

基于恰到好处的渲染失真模型的深度视频时空相关增强算法是一篇发表于《***mun. Image R.》期刊的研究论文，该论文由来自宁波大学信息科学与工程学院的Zongju Peng、Fen Chen等人以及韩国光州科学技术院（GIST）信息与机电学院的Gangyi Jiang、Mei Yu、Feng Shao和Yo-Song Ho共同撰写。文章涉及的关键词包括三维视频系统、深度视频、恰到好处的渲染失真、虚拟视图渲染、空间和时间增强、人类视觉感知、深度视频分割以及比特率节省。从文章描述中可以提炼出以下知识点： 1. 深度视频处理算法：该算法基于人类的感知特性，旨在改善三维视频系统中深度视频的空间和时间不一致性问题，从而提高编码效率。 2. 恰到好处的渲染失真（JNRD）模型：此模型是通过结合深度失真对虚拟视图渲染影响的分析与人类视觉感知特性来制定的。其目的在于找到一个能够兼顾视觉质量与数据压缩率的平衡点。 3. 空间相关增强：在空间相关增强过程中，深度视频首先被分割成边缘、前景和背景区域，并使用高斯滤波器和均值滤波器进行平滑处理。这种处理方式有助于减少数据冗余，提高编码效率。 4. 时间相关增强：时间相关增强包括时空间转置（TST）、时间平滑滤波器以及逆时空间转置的操作。这些技术能够进一步提高视频编码的效率，同时确保视频质量。 5. 编码与虚拟视图渲染实验：作者们进行了编码和虚拟视图渲染实验，用以评估所提出算法的有效性。实验结果显示，所提算法在降低比特率的同时能很好地保持虚拟视图的质量。 6. 三维视频系统：随着图像和视频采集技术、三维显示技术和网络基础设施的发展，传统的二维视频已无法满足用户日益增长的需求。三维视频（3DV）的发展是满足这些需求的一个重要方向。 7. 人类视觉感知特性：算法的制定充分考虑了人类视觉系统对图像细节的敏感度，以确保即使在压缩数据后，观众也难以察觉视频质量的下降。 8. 深度视频分割：深度视频分割是深度视频处理过程中的一个重要步骤，它有助于区分视频中的边缘、前景和背景，为后续的滤波和平滑处理提供便利。 9. 比特率节省：该算法通过优化深度视频的处理，能够在不损失视觉质量的前提下减少所需的比特率，对降低存储和传输成本具有显著作用。 10. 实验结果：文章最终通过实验验证了所提出的算法在三维视频系统中的应用效果，表明确实可以在保持虚拟视图质量的同时，有效降低数据的比特率。该研究论文为我们提供了一种通过深度视频时空相关增强算法来优化三维视频编码的方法，该方法不仅能够提高视频质量，还能有效降低数据传输和存储所需的比特率，对于三维视频技术的发展具有重要的实践意义。

资源推荐

资源详情

资源评论

Depth video spatial and temporal correlation enhancement algorithm

based on just noticeable rendering distortion model

Zongju Peng

⇑

, Fen Chen

, Gangyi Jiang

, Mei Yu

, Feng Shao

, Yo-Song Ho

Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China

School of Information and Mechatronics, Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Republic of Korea

article info

Article history:

Received 15 March 2015

Accepted 8 October 2015

Available online 22 October 2015

Keywords:

Three dimensional video system

Depth video

Just noticeable rendering distortion

Virtual view rendering

Spatial and temporal enhancement

Human visual perception

Depth video segmentation

Bit rate saving

abstract

Spatial and temporal inconsistency of depth video deteriorates encoding efﬁciency in three dimensional

video systems. A depth video processing algorithm based on human perception is presented. Firstly, a just

noticeable rendering distortion (JNRD) model is formulated by combining the analyses of the inﬂuence of

depth distortion on virtual view rendering with human visual perception characteristics. Then, depth

video is processed based on the JNRD model from two aspects, spatial and temporal correlation enhance-

ment. During the process of spatial correlation enhancement, depth video is segmented into edge,

foreground, and background regions, and smoothened by Gaussian and mean ﬁlters. The operations of

the temporal correlation enhancement include temporal–spatial transpose (TST), temporal smoothing

ﬁlter and inverse TST. Finally, encoding and virtual view rendering experiments are conducted to

evaluate the proposed algorithm. Experimental results show that the proposed algorithm can greatly

reduce the bit rate while it maintains the quality of virtual view.

1. Introduction

With development in the areas of image and video capturing

technologies, three-dimensional (3D) display technologies and

network infrastructures, the traditional two-dimensional video

cannot satisfy the increasing demands of users. 3D video (3DV)

systems provide viewers with depth perception of the observed

scene and interactive interface. With these characteristics, 3DV

systems have a broad prospect in consumer electronic applications

such as mobile phones, games, cinema, and television [1,2]. In 3DV

system, 3D scenes can be represented by two or more videos cap-

tured by different viewpoints. However, this representation

method only provides ﬁxed views, and users cannot continuously

switch among views. Multiview video plus depth (MVD) is a more

effective scheme of 3D scene representation, and it can be utilized

to realize 3DV systems [3]. In an MVD-based 3DV system, MVD sig-

nals are captured, compressed and transmitted to the client. The

decoded videos, along with videos of virtual views rendered by

depth image based rendering (DIBR) technique [4,5], are displayed

by 3D devices. MVD signals include multiple color videos and asso-

ciated depth videos of the same scene. Depth video is geometric

information of a scene, and denotes the distance between the

captured scene and camera. Acquisition and transmission of depth

video are two key techniques of realizing an MVD based 3DV

system.

Recently, depth video coding has become an active research

focus. The most straightforward method is to compress depth

video using most advanced encoding standard [6,7]. However,

depth video exhibits the following particular characteristics com-

pared with conventional color video: (1) depth video is mono-

chrome, (2) the texture of depth video is not as rich as that of

the corresponding color video and (3) depth video is not used for

display, but rather for virtual view rendering. Many related tech-

nologies were proposed for depth video/map encoding [8–20].

Platelet-based [8], silhouette-based [9], enhanced context-based

adaptive binary arithmetic [10], and object-based coding algo-

rithms [11] were proposed for improving encoding performance.

Liu et al. presented two depth compression techniques: trilateral

ﬁlter and sparse dyadic mode [12]. Kang et al. designed an adaptive

geometry-based intra prediction scheme for depth video coding

[13]. Oh et al. proposed a depth boundary reconstruction ﬁlter to

code depth video [14]. Spatial down sampling and temporal sub-

sampling approaches are utilized to improve compression ratio

[15,16]. Bosc et al. investigated factors which impact on the best

bit rate ratio between depth and color video [17]. Zhang et al.

proposed regional bit allocation and rate distortion optimization

http://dx.doi.org/10.1016/j.jvcir.2015.10.003

This paper has been recommended for acceptance by Yehoshua Zeevi.

⇑

Corresponding author. Fax: +86 057487600582.

E-mail address: pengzongju@nbu.edu.cn (Z. Peng).

J. Vis. Commun. Image R. 33 (2015) 309–322

Contents lists available at ScienceDirect

J. Vis. Commun. Image R.

journal homepage: www.elsevier.com/locate/jvci

algorithms for multiview depth video coding by imbalance bit rate

allocation for different regions [18]. Zhang and Shao respectively

proposed depth video coding algorithms based on distortion anal-

yses [19,20].

Accurate and consistent depth video is the foundation of high

compression performance of depth video coding. In general, depth

acquisition methods include depth extraction from computer gra-

phic content [21], depth from structured light [22], Kinect sensors

[23], depth camera system [24,25] and depth estimation software

[26]. Limited by the principle of Kinect sensors based on structured

light technique, depth images suffer from temporal ﬂickering,

noise, holes and inconsistent edges between depth and color

images [23]. In depth camera system, depth video is captured

based on the principle of time-of-ﬂight [24,25]. The captured depth

maps may be inconsistent with the scene because of ambient light

noise, motion artifacts, specular reﬂections, and so on. In addition,

depth camera is too expensive to use on a large scale. So far, depth

estimation software is the alternative method of depth map acqui-

sition [26]. However, depth video obtained by depth estimation

software usually contains discrete and rugged noises. Hence, depth

video is inaccurate and inconsistent. The temporal and spatial cor-

relation is weak so as to decrease the compression performance.

In order to improve encoding and rendering performance, many

depth video processing algorithms [27–34] have been proposed.

Mueller et al. produced accurate depth video for artifact-free vir-

tual view synthesis by combining hybrid recursive matching with

motion estimation, cross-bilateral post-processing and mutual

depth map fusion [27]. Min et al. presented a weighted mode ﬁlter-

ing method that enhances temporal consistency and addressed the

ﬂicking problem in virtual view [28]. Nguyen et al. suppressed cod-

ing artifacts over object boundaries using weighted mode ﬁltering

[29]. Ekmekcioglu et al. proposed a content adaptive enhancement

method based on median ﬁltering to enforce the coherence of

depth maps across the spatial, temporal and inter-view dimensions

[30]. Kim et al. presented a series of processing steps to solve the

critical problems of depth video captured by depth camera [31].

One of these processing steps is the enhancement of temporal con-

sistency by an algorithm based on motion estimation. Zhao et al.

proposed a depth no-synthesis-error (D-NOSE) model and pre-

sented a related smoothing scheme for depth video coding [32].

Fu et al. proposed a temporal enhancement algorithm for depth

video by utilizing adaptive temporal ﬁltering [33]. In our previous

work, depth video was enhanced by temporal pixel classiﬁcation

and smoothing [34].

However, depth video processing algorithms [27–34] do not

consider the perception of human visual system (HVS), and still

leave room for improvement. Zhao et al. proposed a binocular

just-noticeable-difference model to measure the perceptible

distortion of binocular vision for stereoscopic images [35]. Silva

et al. experimentally derived a just noticeable depth difference

(JNDD) model [36] and applied it to depth video preprocessing

[37]. Jung proposed a modiﬁed JNDD model that considers size

consistency, and then used it in depth sensation enhancement

[38]. The JNDD models are built by subjective tests on stereoscopic

displays. Hence, they are display dependent, and not suitable for

estimating depth distortion range in virtual view rendering. In this

study, we propose a just noticeable rendering distortion (JNRD)

model and apply it for spatial and temporal correlation enhance-

ment. Different from other distortion models [35,36,38], the JNRD

model is built based on the perception distortion of virtual view.

Firstly, the JNRD model is formulated by combining geometric dis-

placement in DIBR with just noticeable distortion (JND) model that

reﬂects human visual perception. Then, the spatial and temporal

correlation of depth video is enhanced using the JNRD model.

Finally, the proposed algorithm is appraised from three aspects,

compression ratio, objective quality and subjective quality of the

virtual view. The experimental results show that compression per-

formance of the processed depth video is improved in comparison

with the original depth video and the processed video of other

algorithms. The proposed algorithm maintains quality of the

virtual view.

The rest of this paper is organized as follows. Section 2

describes the problem of the depth in spatial and temporal corre-

lation, and presents the overall block diagram of the proposed

algorithm. Section 3 describes the JNRD model. Sections 4 and 5

present the spatial and temporal enhancement algorithm in detail.

Experimental results are given in Section 6. Finally, conclusions are

made in Section 7.

2. Proposed depth video correlation enhancement algorithm

2.1. Problem description

Depth video is inconsistent along spatial and temporal direc-

tions because of the limitations of depth video capture technolo-

gies. Fig. 1 shows the frames and frame difference of the color

video and the corresponding depth video in ‘Leave Laptop’

sequence. The frame S

denotes the frame in the ith view at the

jth time instant in the video sequence. Fig. 1(a), (b), (d), and (e)

are the frames S

and S

of color video, and the associated

depth frames, Fig. 1(c) and (f) are the texture and the depth frame

difference images between frames S

and S

where black

means larger difference. The scene in the red rectangular region

of the color video is nearly at the same imaging plane. Correspond-

ingly, the depth value in the corresponding region should be nearly

the same. However, the depth value in the corresponding region is

not consistent with the corresponding color video. Depth inconsis-

tency decreases the spatial correlation of depth video.

Depth video inconsistency also decreases temporal correlation.

In the scene in Fig. 1, only the men and chair are seen moving

slightly. Hence, nearly total frame difference image of the color

video, with the exception of the men and chair in the scene, is dark,

which represents the content consistency along the temporal

direction. In contrast, some areas in the frame difference image

of the depth video, e.g., the blue rectangular region in Fig. 1(f),

are dark, which means temporal inconsistency.

Consequently, depth video inconsistency eventually deterio-

rates encoding performance because the spatial and temporal

correlation is the theoretical basis of high compression efﬁciency

of video signals.

2.2. Proposed depth video correlation enhancement algorithm

To improve the compression performance of depth video, a new

spatial and temporal correlation enhancement algorithm is

proposed in this paper. Fig. 2 shows the block diagram of the pro-

posed algorithm, which includes three parts, JNRD model building,

depth video spatial correlation enhancement, and depth video

temporal correlation enhancement. The JNRD model is the basis

for depth video spatial and temporal correlation enhancement. In

Fig. 2, G is the JNRD of the corresponding depth video; R, D and

are color video, original depth video and processed depth video,

respectively; E, F and B are the edge, foreground and background

regions of the depth video, respectively; and D

, D

, and D

are

intermediate processing results of depth video.

In the proposed algorithm, the JNRD model of depth video is

built ﬁrstly. Then, the depth video is processed in the order of spa-

tial and temporal correlation enhancement. JNRD model building,

spatial and temporal correlation enhancements are detailed in

Sections 3, 4 and 5, respectively.

310 Z. Peng et al. / J. Vis. Commun. Image R. 33 (2015) 309–322

剩余13页未读，继续阅读

评论收藏

内容反馈

weixin_38547532

粉丝: 5
资源: 962

基于恰到好处的渲染失真模型的深度视频时空相关增强算法

基于CORDIC的反正弦和反余弦计算的FPGA实现

BA无标度网络中的SIR模型

使用3DCNN和卷积LSTM进行手势识别学习时空特征

基于三次贝塞尔曲线的类汽车曲率连续路径平滑

基于机器学习的设备剩余寿命预测方法综述

基于维纳过程的退化模型，具有递归过滤算法，可用于估计剩余使用寿命

基于FPGA的奇异值和特征值分解的快速实现。

基于BP神经网络的人口预测

磁悬浮系统自适应模糊PID控制器的设计

两轮平衡车的建模与控制研究

无人机协同目标的多无人机协同搜索方法

基于改进遗传算法的六自由度机器人时间最优轨迹规划

一种基于深度学习的机械臂抓取方法

基于深度神经网络的交通流量预测

一种去除ECG中基线漂移和工频干扰的高效滤波方法

基于稀疏贝叶斯学习的高效DOA估计方法

适用于1-8GHz宽带应用的原始Vivaldi天线

亮度保持和细节增强的红外图像增强方法

近场中的磁偶极子模型

一种基于LMS算法的流水线ADC数字校准算法

一种鲁棒的三维点云骨架提取方法

弗兰克（Frank）编码的LFM波形及其在MIMO雷达中的应用

一种新的基于维纳过程的自适应使用寿命剩余寿命预测方法

MUSIC算法中频谱峰值搜索的研究及FPGA实现

最新资源