基于空间角度分解核的纹理增强光场超分辨率_Texture-enhancedLightFieldSuper-resoluti资源-CSDN文库

版权申诉

7 浏览量 2022-01-22 21:24:02 上传评论收藏 11.64MB PDF 举报

资源详情

资源评论

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Texture-enhanced Light Field Super-resolution

with Spatio-Angular Decomposition Kernels

Zexi Hu, Xiaoming Chen, Henry Wing Fung Yeung, Yuk Ying Chung, Member, IEEE

and Zhibo Chen, Senior Member, IEEE

Abstract—Despite the recent progress in light ﬁeld super-

resolution (LFSR) achieved by convolutional neural networks,

the correlation information of light ﬁeld (LF) images has not

been sufﬁciently studied and exploited due to the complexity

of 4D LF data. To cope with such high-dimensional LF data,

most of the existing LFSR methods resorted to decomposing it

into lower dimensions and subsequently performing optimization

on the decomposed sub-spaces. However, these methods are

inherently limited as they neglected the characteristics of the

decomposition operations and only utilized a limited set of LF

sub-spaces ending up failing to comprehensively extract spatio-

angular features and leading to a performance bottleneck. To

overcome these limitations, in this paper, we thoroughly discover

the potentials of LF decomposition and propose a novel concept

of decomposition kernels. In particular, we systematically unify

the decomposition operations of various sub-spaces into a series

of such decomposition kernels, which are incorporated into our

proposed Decomposition Kernel Network (DKNet) for compre-

hensive spatio-angular feature extraction. The proposed DKNet

is experimentally veriﬁed to achieve substantial improvements

by 1.35 dB, 0.83 dB, and 1.80 dB PSNR in 2×, 3×, and 4×

LFSR scales, respectively, when compared with the state-of-the-

art methods. To further improve DKNet in producing more

visually pleasing LFSR results, based on the VGG network, we

propose a LFVGG loss to guide the Texture-Enhanced DKNet

(TE-DKNet) to generate rich authentic textures and enhance LF

images’ visual quality signiﬁcantly. We also propose an indirect

evaluation metric by taking advantage of LF material recognition

to objectively assess the perceptual enhancement brought by the

LFVGG loss.

Index Terms—Light ﬁeld, image processing, deep learning,

convolutional neural network.

I. INTRODUCTION

Compared with regular images captured by monocular cam-

eras, light ﬁeld (LF) images can supply richer information

with light rays from multiple angular directions in one single

capture. Such a characteristic has facilitated several vision-

based measurement applications, e.g. material recognition [1],

[2], 3D measurement [3]–[7], salient object detection under

complex scenarios [8], [9] and anti-spoof face recognition

[10]–[13], which has achieved considerable improvement com-

pared with other types of sensors, e.g. monocular cameras [14],

stereo vision [15] and structured light [16].

Zexi Hu, Henry Wing Fung Yeung and Yuk Ying Chung are with the School

of Computer Science, University of Sydney, Darlington, NSW 2008, Australia.

Xiaoming Chen is with the School of Computer Science and Engineering,

Beijing Technology and Business University, Beijing 102488, China.

Zhibo Chen is with CAS Key Laboratory of Technology in Geo-spatial

Information Processing and Application System, University of Science and

Technology of China, Hefei 230027, China.

In the past, LF images were usually captured by self-built

dense camera arrays [17], [18], which are experimental and

expensive for general consumers. With the recent development

of more sophisticated LF cameras, e.g. Raytrix [19], Lytro

Illum [20] and Google’s Light Field VR Camera [21], LF

devices have become increasingly practical in both commercial

and industrial usage. However, LF cameras, especially portable

LF cameras, usually face a trade-off between the angular

and spatial resolutions due to the inherent limitation on the

camera’s sensor capability [22]. Hence the spatial resolution of

the images captured from the LF cameras is usually lower than

those from the traditional cameras. For instance, a Lytro Illum

camera could capture 14 × 14 sub-aperture images (SAIs) or

views, i.e. the angular resolution, but each SAI has a low

spatial resolution of only 376 × 540.

To alleviate this problem, a considerable amount of works

developed light ﬁeld super-resolution (LFSR) solutions to in-

crease the spatial resolution of LF images. With convolutional

neural networks (CNN), the recent learning-based methods

[23]–[26] have achieved substantial progress compared with

traditional methods [27]–[29]. The vast majority of these

methods are designed to decompose the 4D data structure into

sub-spaces of two or three dimensions, which can be optimized

with simpler operations, e.g. regular 2D and 3D convolutions.

Although these methods have achieved considerable per-

formance with the aforementioned decomposition operations,

they are still limited in the following aspects. Firstly, the

primary justiﬁcation of these methods is simplifying the com-

plexity of the 4D data structure and reducing the number of

model parameters. The characteristics of the decomposition

operations themselves are largely neglected and have not been

studied for assisting LFSR. Secondly, due to the suboptimal

architecture design of these methods, their decomposition is

conﬁned to limited sub-spaces. Given a LF image shown in

Fig. 1 (a), some methods [26], [30] can only process the spatial

and angular sub-spaces which are shown in the gray and purple

boxes of Fig. 1 (b) and (c), and some others [4], [24], [31]–[33]

can only process the two typical epipolar-image (EPI) sub-

spaces which are shown in green and red boxes of Fig. 1 (d)

and (g). Thirdly, the typical form of EPIs reﬂects insufﬁcient

sub-spaces coverage, i.e. the sub-spaces shown in the yellow

box of Fig. 1 (e) and the blue box of Fig. 1 (f), are long

neglected by these EPI-based methods. In fact, these two EPI

sub-spaces also carry visual patterns as their siblings do in

the green and red boxes, reﬂecting complementary correlation

information from other perspectives. Due to the above lim-

itations, the existing methods cannot extract comprehensive

arXiv:2111.04069v1 [eess.IV] 7 Nov 2021

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

** **

(a) The Light Field Image

…

(b) Spatial

(g) EPI (v,y)

(d) EPI (u,x)

(e) EPI (v,x)

(f) EPI (u,y)

Fig. 1. The sub-spaces of an LF image. LR and HR stand for low and high resolutions. The illustration consists of seven parts: (a) the whole 4D LF image,

(b) the spatial sub-space, (c) the angular sub-space, (d)-(g) the EPI sub-space of (u, x), (v, x), (u, y) and (v, y) respectively.

spatio-angular features from their limited decomposed sub-

spaces, resulting in impaired performance.

To address this problem, in this paper, we thoroughly

discover the potential of the decomposition operations and

systematically unify them into the novel concept of "decom-

position kernels" to extract spatio-angular features from all

the decomposition sub-spaces comprehensively. We propose

a novel end-to-end network, namely Decomposition Kernel

Network (DKNet), to embed these decomposition kernels.

To support a deep network architecture, we employ dense

connections and raw image connections to supply long-range

information and improve the information ﬂow. The proposed

network is experimentally veriﬁed to achieve 1.35 dB, 0.83

dB, and 1.80 dB higher in PSNR when compared with the

previous state-of-the-art method [26] in 2×, 3× and 4× LFSR

scales, respectively.

Despite the high performance of DKNet under the PSNR

and SSIM metrics, the reconstructed LF images may still

be visually unsatisfactory with blurry details caused by the

conventional pixel-wise loss functions, e.g. the mean square

error (MSE). To this end, inspired by the recent perception-

based SISR methods [34], based on the VGG network [35],

we propose a LFVGG loss to guide the network to focus

on perceptual differences. In the experiment, the subjective

results show that the Texture-Enhanced DKNet (TE-DKNet)

can produce rich textures and visually pleasing SAIs with

the guidance of LFVGG loss despite the lower scores under

the conventional PSNR and SSIM criteria. To quantitatively

evaluate the perceptual improvement, we also propose an

indirect and non-pixel-wise metric by taking advantage of

the relevant research on LF material recognition with an

LF image dataset featuring rich material textures [1]. The

proposed metric takes both human perception and LF structure

into consideration so that it can validate the capability of the

LFVGG loss in helping TE-DKNet achieve higher accuracy

in the recognition task, thus conﬁrming its ability to produce

authentic textures.

The contribution of this paper is summarized as follows:

1) We propose a novel concept of "decomposition kernels"

and systematically unify the decomposition operations

regarding their sub-spaces into a series of decomposi-

tion kernels for comprehensive spatio-angular feature

extraction. The end-to-end DKNet incorporates these

kernels as well as dense connections and raw image

connections for supplying long-range information in

achieving superior LFSR performance.

2) We conduct extensive evaluations to prove that DKNet

can achieve state-of-the-art performance surpassing the

existing methods by a large margin. Through compre-

hensive ablation studies, we delve into the characteristics

of proposed components to gain insights for LFSR.

3) To produce more visual details, we propose a novel

LFVGG loss to guide TE-DKNet in texture generation.

To quantitatively evaluate the authenticity of generated

textures, we also propose an indirect perception-based

metric by taking advantage of LF material recognition.

Both the subjective and objective results show that our

proposed LFVGG loss can guide TE-DKNet to produce

realistic textures and enhance visual quality signiﬁcantly.

The remainder of the paper is organized as follows. Section

II reviews the related works. Section III elaborates on the

proposed method. Section IV reports the experimental results

and analysis. Section V presents the conclusion.

II. RELATED WORKS

A. Light Field Image Processing

Conventional methods for LF image processing were usu-

ally based on disparity and hand-crafted features. Wanner et

al. [27] exploited a structure tensor to compute depth maps

yielded from disparity maps, and super-resolve the images

with the depth map through variational optimization. Wang

et al. [36] developed a simpliﬁed variational regularization to

reduce the required computation. Rossi et al. [28] proposed

to utilize the correlation of LF images by graph-based regu-

larization and obtain the ﬁnal super-resolved LF images by

optimization. Ghassab et al. [29] proposed to consider the

LFSR problem as two sub-problems, a reconstruction tech-

nique and a denoising methodology, and employ the graph-

based regularization and the edge-preserving ﬁlter to solve

these two sub-problems accordingly. Some other methods

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

resorted to coding [37] and compression [38]–[40] to tackle

the extraordinarily large size of LF data.

With the introduction of deep learning, signiﬁcant improve-

ment had been achieved for LFSR. Yoon et al. [23] proposed

to tackle the problem by performing optimization on vertical,

horizontal and central pairs of SAI. Similarly, Zhang et al.

[25] proposed to decompose the problem into multiple sub-

networks with different receptive ﬁelds of the angular sub-

space. Yuan et al. proposed an EPI-based method [24] that per-

formed SISR on EPI sub-spaces. EPI-based methods were also

proposed for LF reconstruction [31], [32] as super-resolving

the angular sub-space is feasible. However, these methods suf-

fered from poor performance and low efﬁciency because their

network architectures were not end-to-end, which means the

sub-networks have no access to other sub-spaces and cannot

be optimized jointly. Moreover, their decomposed components

need to be operated in a sliding window fashion, leading to

duplicated execution costs.

To address these drawbacks, Yeung et al. [26] proposed

a SAS convolution that convolves on the 4D LF data by

two 2D convolutions for the spatial and angular sub-spaces

respectively. Using a deep end-to-end architecture with 16

SAS convolutions, the network can extract rich spatio-angular

features with access to all the SAIs. Their experiments demon-

strated that SAS convolutions were competent to achieve

approximate performance compared to plain 4D convolutions

at substantially lower costs of memory and computation.

As indicated in [41], separable convolutions can introduce

potential non-linearity to the networks, which explained why

SAS convolutions can even outperform 4D convolutions on

some occasions [26]. Similar success was also achieved in LF

reconstruction [30]. However, this work still cannot extract

comprehensive spatio-angular features as it has not fully

utilized the correlation in all sub-spaces. Based on SAS con-

volutions, Hu et al. further addressed the domain asymmetry

issue [42] which commonly exists in LF image processing

[32]. However, these works didn’t sufﬁciently correlate the

spatial and angular information but counted on stacking layers

to implicitly associate these two domains separately processed

in SAS convolutions. This paper will discover the potential of

building explicit connections.

B. Perception Enhanced Super-resolution

For a long time, PSNR and SSIM have been utilized as the

metrics to measure the quality of SISR [43]–[51]. However,

some studies [34], [52] discovered that these metrics were not

necessarily correlated with human perception. Though many

recent works have achieved a very high PSNR by training the

neural networks with pixel-wise loss functions such as MSE,

they still produce blurry results in recovering areas with rich

textures. To address this, Johnson et al. [34] claimed that pixel-

wise loss functions cannot capture perceptual differences.

Instead, they proposed a feature loss that measured the error

in the feature space of the VGG network [35], which was

veriﬁed to generate more visual details. Based on this, Ledig

et al. [52] proposed an adversarial loss to discriminate the

generated images from the ground-truth and guide the network

to produce photo-realistic details. Sajjadi et al. [53] extended

the loss function with a texture matching loss to learn the

correlation of the features, and the experimental results showed

that the enhanced details could boost the performance of object

recognition which was reasonably associated with human

perception. These methods usually achieved lower PSNR but

produced images that are more visually appealing. Such a

phenomenon has been studied by Vasu et al. [54] as a trade-

off between perception and distortion. However, despite the

considerable progress of SISR, texture enhancement for visual

improvement in LFSR is still undiscovered.

III. PROPOSED METHOD

In this section, we will begin with a formal elaboration of

the light ﬁeld sub-spaces in Section III-A. Then. we introduce

the proposed decomposition kernels in Section III-B and the

design of DKNet in Section III-C. Lastly, we describe the

proposed LFVGG loss for further texture enhancement in

Section III-D.

A. Light Field Sub-spaces

Formally, the objective of LFSR is to recover a high-

resolution (HR) LF image Y from a low-resolution (LR) LF

image X, denoted as

Y = F(X) X(u, v, x, y) ∈ R

U×V ×W ×H

Y (u, v, x, y) ∈ R

U×V ×rW ×rH

(1)

where

Y is an approximation to Y , (U, V ) and (W, H) denote

the LR image’s angular and spatial resolutions respectively,

and r is the scale. (u, v) denotes an angular location, and

(x, y) denotes a spatial location. Generally, 4D LF data can be

decomposed into spatial and angular sub-spaces, or EPI sub-

spaces, which are visualized in Fig. 1 and brieﬂy described as

follows.

1) Spatial and angular sub-spaces: Give a 4D LF image

shown in Fig. 1 (a), the spatial sub-space, as depicted in

the gray box of Fig. 1 (b), is comprised of the two spatial

dimensions (x, y) carrying regular 2D image information in

individual SAIs. The angular sub-space, marked by the purple

box of Fig. 1 (c), on the other hand, is formed by the two

angular dimensions (u, v), which encodes the angular infor-

mation correlating the SAIs. The decomposition into spatial

and angular sub-spaces can transform a high-complexity 4D

LF operation into low-complexity 2D operations. For example,

Yeung et al. [26], [30] proposed a spatio-angular separated

convolution (SAS) to approximate the 4D convolution by

decomposing it into a 2D spatial convolution and a 2D angular

convolution.

2) EPI sub-spaces: On the other hand, an EPI space is

comprised of an angular and a spatial dimension to encode

the correlation information. Typically, the existing EPI-based

methods [4], [24], [31]–[33] decompose the 4D LF data into

the sub-spaces (u, x) and (v, y), as depicted in the green box

of Fig. 1 (d) and the red box of Fig. 1 (g), and perform op-

timization on them. In EPI sub-spaces, certain visual patterns

can be observed such as straight lines and sharp edges, and

剩余13页未读，继续阅读

评论收藏

内容反馈

版权申诉

基于空间角度分解核的纹理增强光场超分辨率_Texture-enhanced Light Field Super-resoluti

评论0

最新资源

基于空间角度分解核的纹理增强光场超分辨率_Texture-enhanced Light Field Super-resoluti

评论0

最新资源

相关推荐

Developer_Track_5_ASTC-The_Future_of_Texture_Compression.pdf

Android平台OpenGLES3将GL-TEXTURE-2D纹理id渲染到ImageReader提供的Surface上

Mali_Texture_Compression_Tool_4.3

Mali_Texture_Compression_Tool_v4Windows_x64

TexturePacker_5-2-0_WaterMark_Fix_Activation

Texture-recognition.rar_MATLAB纹理识别_matlab 纹理分割_texture recogniti

texture-identify.rar_texture_texture segmentation_texture synthe

texture-GLCM.rar_GLCM 灰度共生_GLCM纹理_texture matlab_灰度共生纹理_纹理 mat

glcm.rar_GLCM_GLCM-texture_GLCM纹理_共生矩阵_灰度共生矩阵

Wavelet-based-texture-analysis.rar_texture analysis_纹理检测 matlab_

Correction algorithm.zip_escapem9b_self-adaption_texture mapping

土壤数据全球30S精度TIF栅格格式 ：T_TEXTURE(土壤质地)

NVIDIA_Texture_Tools_Exporter_for_Adobe_Photoshop_2020.1.3.exe

KHR_texture_compression_astc_hdr.pdf

中国土壤数据中国30S栅格格式 ：T_TEXTURE(顶层土壤质地)

ExternalOES纹理数据 转换为 TEXTURE-2D纹理数据 工程代码

texturefeature.rar_flotation_texture distinctive_图像纹理特征提取_泡沫_纹理特

Texture-Generation.rar_C Builder_in_texture generation

tamura.rar_coarseness in matlab_tamura_tamura-texture

echarts-gl.min.js

自主研发的软著申请代码文档整理输出工具

2023前端面试八股文.pdf

javascript网页设计期末作业 购物网站

bdms.js插桩后文件

javaWeb楠小弟自助图书系统项目，使用注解方式配合原生js、axios方式完成整个项目的开发，系统只适合在javaWeb阶段

基于JavaScript网红太空人表盘

jquery-3.7.0.min.js（jQuery下载）

小兔鲜项目源码（动态网页，包含首页、注册页、登录页、购物页）原生HTML、CSS、Javascript

chromedriver-122.0.6261.70-64

土壤数据全球30S精度TIF栅格格式：T_TEXTURE(土壤质地)

中国土壤数据中国30S栅格格式：T_TEXTURE(顶层土壤质地)

ExternalOES纹理数据转换为 TEXTURE-2D纹理数据工程代码

javascript网页设计期末作业购物网站