JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Texture-enhanced Light Field Super-resolution
with Spatio-Angular Decomposition Kernels
Zexi Hu, Xiaoming Chen, Henry Wing Fung Yeung, Yuk Ying Chung, Member, IEEE
and Zhibo Chen, Senior Member, IEEE
Abstract—Despite the recent progress in light field super-
resolution (LFSR) achieved by convolutional neural networks,
the correlation information of light field (LF) images has not
been sufficiently studied and exploited due to the complexity
of 4D LF data. To cope with such high-dimensional LF data,
most of the existing LFSR methods resorted to decomposing it
into lower dimensions and subsequently performing optimization
on the decomposed sub-spaces. However, these methods are
inherently limited as they neglected the characteristics of the
decomposition operations and only utilized a limited set of LF
sub-spaces ending up failing to comprehensively extract spatio-
angular features and leading to a performance bottleneck. To
overcome these limitations, in this paper, we thoroughly discover
the potentials of LF decomposition and propose a novel concept
of decomposition kernels. In particular, we systematically unify
the decomposition operations of various sub-spaces into a series
of such decomposition kernels, which are incorporated into our
proposed Decomposition Kernel Network (DKNet) for compre-
hensive spatio-angular feature extraction. The proposed DKNet
is experimentally verified to achieve substantial improvements
by 1.35 dB, 0.83 dB, and 1.80 dB PSNR in 2×, 3×, and 4×
LFSR scales, respectively, when compared with the state-of-the-
art methods. To further improve DKNet in producing more
visually pleasing LFSR results, based on the VGG network, we
propose a LFVGG loss to guide the Texture-Enhanced DKNet
(TE-DKNet) to generate rich authentic textures and enhance LF
images’ visual quality significantly. We also propose an indirect
evaluation metric by taking advantage of LF material recognition
to objectively assess the perceptual enhancement brought by the
LFVGG loss.
Index Terms—Light field, image processing, deep learning,
convolutional neural network.
I. INTRODUCTION
Compared with regular images captured by monocular cam-
eras, light field (LF) images can supply richer information
with light rays from multiple angular directions in one single
capture. Such a characteristic has facilitated several vision-
based measurement applications, e.g. material recognition [1],
[2], 3D measurement [3]–[7], salient object detection under
complex scenarios [8], [9] and anti-spoof face recognition
[10]–[13], which has achieved considerable improvement com-
pared with other types of sensors, e.g. monocular cameras [14],
stereo vision [15] and structured light [16].
Zexi Hu, Henry Wing Fung Yeung and Yuk Ying Chung are with the School
of Computer Science, University of Sydney, Darlington, NSW 2008, Australia.
Xiaoming Chen is with the School of Computer Science and Engineering,
Beijing Technology and Business University, Beijing 102488, China.
Zhibo Chen is with CAS Key Laboratory of Technology in Geo-spatial
Information Processing and Application System, University of Science and
Technology of China, Hefei 230027, China.
In the past, LF images were usually captured by self-built
dense camera arrays [17], [18], which are experimental and
expensive for general consumers. With the recent development
of more sophisticated LF cameras, e.g. Raytrix [19], Lytro
Illum [20] and Google’s Light Field VR Camera [21], LF
devices have become increasingly practical in both commercial
and industrial usage. However, LF cameras, especially portable
LF cameras, usually face a trade-off between the angular
and spatial resolutions due to the inherent limitation on the
camera’s sensor capability [22]. Hence the spatial resolution of
the images captured from the LF cameras is usually lower than
those from the traditional cameras. For instance, a Lytro Illum
camera could capture 14 × 14 sub-aperture images (SAIs) or
views, i.e. the angular resolution, but each SAI has a low
spatial resolution of only 376 × 540.
To alleviate this problem, a considerable amount of works
developed light field super-resolution (LFSR) solutions to in-
crease the spatial resolution of LF images. With convolutional
neural networks (CNN), the recent learning-based methods
[23]–[26] have achieved substantial progress compared with
traditional methods [27]–[29]. The vast majority of these
methods are designed to decompose the 4D data structure into
sub-spaces of two or three dimensions, which can be optimized
with simpler operations, e.g. regular 2D and 3D convolutions.
Although these methods have achieved considerable per-
formance with the aforementioned decomposition operations,
they are still limited in the following aspects. Firstly, the
primary justification of these methods is simplifying the com-
plexity of the 4D data structure and reducing the number of
model parameters. The characteristics of the decomposition
operations themselves are largely neglected and have not been
studied for assisting LFSR. Secondly, due to the suboptimal
architecture design of these methods, their decomposition is
confined to limited sub-spaces. Given a LF image shown in
Fig. 1 (a), some methods [26], [30] can only process the spatial
and angular sub-spaces which are shown in the gray and purple
boxes of Fig. 1 (b) and (c), and some others [4], [24], [31]–[33]
can only process the two typical epipolar-image (EPI) sub-
spaces which are shown in green and red boxes of Fig. 1 (d)
and (g). Thirdly, the typical form of EPIs reflects insufficient
sub-spaces coverage, i.e. the sub-spaces shown in the yellow
box of Fig. 1 (e) and the blue box of Fig. 1 (f), are long
neglected by these EPI-based methods. In fact, these two EPI
sub-spaces also carry visual patterns as their siblings do in
the green and red boxes, reflecting complementary correlation
information from other perspectives. Due to the above lim-
itations, the existing methods cannot extract comprehensive
arXiv:2111.04069v1 [eess.IV] 7 Nov 2021
评论0
最新资源