96 Z. Wang et al. / Pattern Recognition Letters 65 (2015) 95–102
Table 1
Advantages and disadvantages of three categories of representative methods.
Categories Advantages Disadvantages Representatives
Filtering-based Simple implementations and afford a clean image Smooth out depth discontinuities and may fail to fill in
large holes
Refs. [8–12]
Inpainting-based Achieve good quality for smooth regions Introduce artifacts, e.g., jagging, blurring, and ringing,
around thin structures or sharp discontinuities
Refs. [14–17]
Reconstruction-based Preserve sufficient accuracy in flat regions and sharp
discontinuities around object edges simultaneously
Have to been guided by accompanied color images, and
incorrect prediction may happen
Refs. [18–24,28]
classified into three categories: filtering-based, inpainting-based and
reconstruction-based methods. Qi et al. [8] proposed a fusion based
method using non-local filtering scheme for restoring depth maps.
He et al. [9] proposed a guided filter that can preserve sharp edge
and avoid reversal artifacts when smoothing a depth map. Dakkak
et al. [10] proposed an iterative diffusion method which utilizes both
available depth values and color segmentation results to recover
missing depth information, but the shown results are sensitive to the
segmentation accuracy. In order to obtain more precise filter coef-
ficients, Camplani et al. [11] used a joint bilateral filter to calculate
the weights of available depth pixels according to collocated pixels
in color image. Based on a joint histogram, Min et al. [12] instead
proposed a weighted mode filter to prevent the output depth values
from being blurred on the depth boundaries. However, filtering-
based approaches often yield poor results near depth discontinuities,
especially when large holes occur in a depth map.
Inpainting techniques seem more promising in depth hole filling
than filtering, interpolation and extrapolation algorithms. A popular
inpainting algorithm is fast marching method (FMM) by Telea [13],
but it does poorly when applied to depth maps as it is designed
for generic color images. With an aligned color image, Liu et al.
[14] proposed an extended FMM approach to guide depth inpaint-
ing. Structure-based inpainting [15] fills the holes by propagating
structure into the target regions via diffusion. The diffusion process
makes holes blurred, and texture is thus lost. Xu et al. [16] further
introduced the exemplar-based texture synthesis into structure
propagation so that the blurring effects can be somewhat avoided. In
order to prevent edge fatting or shrinking after hole inpainting, Miao
et al. [17] used the fluctuating edge region in depth map to assist
hole completion. However, the missing depth values near the object
contour are directly assigned to the mean of available depth values in
fluctuating edge region, which is hence inaccurate for representing
the depth contours.
The above reviewed methods achieve good quality for smooth
depth regions, but may introduce artifacts, e.g., jagging, blur-
ring, and ringing, around thin structures or sharp discontinuities.
Reconstruction-based methods apply image synthesis techniques to
predict missing depth values. Since the reconstruction coefficients
are resolved in a closed-loop scheme in terms of the minimization
of residuals, higher hole-filling accuracy is achievable. A variety of
representation models have been employed to formulate hole filling
problems. Chen et al. [18,19] cast the depth recovery as an energy
minimization problem, which addresses the depth hole filling and
denoising simultaneously. In [20], an additional total variant (TV) reg-
ularization term is introduced to produce smooth depth maps with
sharp boundaries. Yang et al. [21] proposed an adaptive color-guided
autoregressive (AR) model for high quality depth recovery, where the
depth recovery task is converted into a minimization of AR predic-
tion errors subject to measurement consistency. The AR predictor for
each pixel is constructed according to both the local correlation in
the initial depth map and the nonlocal similarity in the accompanied
high quality color image. In contrast to the bilateral filtering meth-
ods [4,11], obtaining reconstruction coefficients by solving minimiza-
tion problem can avoid incorrect prediction in hole filling, whereas,
overemphasis on energy minimization [18,21] or total variant penalty
[20] is not conducive to depth discontinuities.
Sparse representation (SR) has proven successful in natural
images, where a sparsity prior on an over-complete dictionary solves
inverse problems such as denosing and inpainting. Such priors would
be expected to also play a crucial role in solving the depth recovery
problem. Following this assumption, SR has been recently applied
to stereo vision fields [22–24], showing promising results in depth
map denoising, depth estimation and scene reconstruction. However,
due to the fact that the depth values in hole regions are unavailable,
reconstruction coefficients have to be learnt from complementary
color images. Otherwise, the generated coefficients are not applicable
when naively used for depth prediction.
To facilitate quickly grasping main features of the above dis-
cussed three categories of representative methods (i.e., filtering-
based, inpainting-based and reconstruction-based methods), we
briefly summarize their advantages and disadvantages in Table 1.
Inspired by the success of locality constraints [25] in image clas-
sification [26] and image super-resolution [27], in our previous work
[28], we employed a color image guided locality regularized repre-
sentation (LRR) to determine the optimal weights from collocated
patches in color image. Locality constraint demands the reconstruc-
tion only rely on the most relevant pixels rather than all pixels, and
so gives impressive results for Kinect depth hole filling. Neverthe-
less, the sharp depth edges between objects cannot be adequately
retained due to the inherent over-smoothing of the employed under-
lying ridge regression (RR) model. Besides, [28] only accounts for the
impact of locality (also intensity similarity in Euclidean distance) on
coefficient learning but ignores the other two important factors: geo-
metric distance and position. Since depth maps demonstrate prop-
erties of flatness within objects and sharpness at boundaries, the
spatially neighboring pixels are more likely to share close depth in-
formation. Therefore, the spatial distances of reference patches from
the center patch should be taken into account when formulating reg-
ularized cost function. In addition, because the center pixel is more
concerned, the fitness accuracy of an individual in a patch should not
be evenly treated, but be correlated to its coordinate. In this paper, we
represent the missing depth in occluded regions as the linear combi-
nation of the surrounding available depth values, and establish a tri-
lateral constrained sparse representation (TCSR) to solve the optimal
weights with the help of the associated color image. TCSR comprises
similarity-distance-inducing weighted
1
sparsity penalty term and
position-inducing weighted data-fidelity term, which thus not only
readily grasps the salient features of depth images but also consider-
ably promotes the representation accuracy.
The remainder of this paper is organized as follows. Section 2
describes the proposed method based on constrained sparse repre-
sentation in detail. Experimental results and analysis are provided in
Section 3, and we conclude this paper in Section 4.
2. Proposed method
In this section, we focus on trilateral constrained sparse represen-
tation model as well as its optimization.