affect more on human visual system (HVS) [12,13], compared with
the conventional 2D contents. Therefore, the realization of the HVS
properties on 3D content is researched to help more accurately
evaluate perceptual quality of the 3D contents.
According to the availability of the reference image, the conven-
tional 2D image quality assessment (IQA) methods can be divided into
three categories, specifically the full reference (FR) [1 4–16],nore-
ference (NR) [17–22], and reduced reference (RR) [22–28],respectively.
FR metrics require the full assess of the original image to eval-
uate the perceptual quality of the distorted image. The original
image is assumed to be artifact free and of perfect quality. Such
metrics can be employed to guide the perceptual quality optimi-
zation during image/video compression, watermarking, and so on.
The most appealing quality metrics are the mean square error
(MSE) and its related peak signal-to-noise ratio (PSNR), because of
their simplicity, clear physical meaning, and easy optimization.
However, MSE and PSNR do not correlate with HVS properties.
Therefore, many FR metrics are developed to incorporate the HVS
properties and image signal properties. Wang et al. developed the
most popular image quality metric structural similarity (SSIM) [14]
that captures the structure information loss to depict the perceptual
quality of the distorted image. A wavelet-based visual signal-to-
noise ratio (VSNR) is developed to capture the distortions in wavelet
domain [15]. A simple quality metric considering texture masking
and contrast sensitivity function is developed for perceptual image
coding [29].In[16], Ma et al. proposed to incorporate the horizontal
effect of HVS into SSIM for a better image quality metric.
However, in real-world applications, we are not able to access
the original image for quality analysis in most cases, where only the
distorted image is available. The NR quality metrics are thus em-
ployed. Many researchers employ the behaviors of specificdistor-
tions for the NR quality assessment, such as the blocking artifact of
JPEG coded images, ringing artifact of the JPEG 2000 coded images,
and so on. As JPEG 2000 employs the wavelet transform to com-
press the image, the wavelet statistical model is utilized to capture
the compression distortion [18]. Liang et al. [19] combined the
sharpness, blurring, and ringing measurements together to depict
the perceptual quality of the JPEG 2000 coded image. The dis-
tribution of the DCT coefficient after quantization is modeled in [20]
to predict the PSNR value of the JPEG coded image. Furthermore,
Ferzli et al. [21] did the psychophysical experiment to test the
blurring tolerance ability of the HVS, based on which the just-no-
ticeable blur (JNB) model is developed. These methods employ the
behaviors of specific distortions to predict the degradation level.
Therefore, if a new distortion is introduced, these methods can
hardly evaluate the perceptual quality of the distorted image. In
order to compromise between the FR and NR IQAs, RR IQAs are
developed. It is expected that the RR methods can effectively
evaluate the image perceptual quality based on a limited number of
RR features extracted from the reference image. Only a small
number of bits are required for representing the extracted features,
which can be efficiently encoded and transmitted for the quality
analysis. Consequently, it will be very useful for the quality mon-
itoring during the image transmission and communication.
For RR quality metrics, only partial information of the original
image is available for quality analysis, which can be further cate-
gorized into the following three classes.
Distortion-based RR metrics: The behaviors of the distortions are
modeled for RR quality metric design. Wolf et al. [30,31]
proposed to extract a set of spatial and temporal features for
measuring the distortions introduced in the standard video
compression and communication environment. The features
that are associated with the blurring, blocking, and frame
differences are extracted in [32] to depict the compression
artifacts introduced by MPEG-2. These RR quality metrics are
designed for some specific distortions, which cannot be effec-
tively applied to the other images of different distortions.
Therefore, a general RR IQA for evaluating the image perceptual
quality of different distortions is required.
HVS-based RR metrics: HVS properties should be considered for
quality assessment, as the human eyes are the ultimate re-
ceivers. Le Callet et al. [33] employed a neural network to train
and evaluate the perceptual qualities of video sequences based
on the perceptual-related features extracted from the video
frames. In [34,35], the perceptual features motivated from the
computational models of low level vision are extracted as the
reduced descriptors to represent the image perceptual quality.
The merits from the contourlet transform, the contrast sensi-
tivity function, and Webers law of just noticeable difference are
incorporated to derive an RR IQA [36], which are employed for
evaluating the perceptual qualities of the JPEG and JPEG 2000
coded images. Recently, an RR IQA [37] for wireless imaging is
developed by considering different structural information that
is observed in the distortion model of wireless link.
Statistics-based RR metrics: The statistical modeling of the image
signal has been investigated for the image perceptual quality
assessment for RR IQAs. In [38,23], Wang et al. proposed a
wavelet-domain natural image statistic metric (WNISM), which
models the marginal probability distribution of the wavelet
coefficients of a natural image by the generalized Gaussian
density (GGD) function. The Kullback–Leibler distance (KLD) is
used to depict the distribution difference. To improve the
performance and reduce the number of features, the probability
distribution was represented by the Gaussian scale mixture
(GSM) model in wavelet domain [39].In[24–26], the RR features
are extracted in the reorganized domain. In [24], the DCT
coefficients are first reorganized into several representative
subbands, whose distributions are modeled by the GGD. The
city-block distance (CBD) is utilized to capture the image per-
ceptual quality. In [40], the statistics of image gradient magni-
tude are modeled by the Weibull distribution to develop an RR
image quality metric. Also the statistics of the edge [41] are
utilized for developing the RR IQA. In [42],theauthorsmeasure
the differences between the entropies of wavelet coefficients of
the reference and distorted image to quantify the image informa-
tion change, which can indicate the image perceptual quality.
In this paper, we proposed a novel RR SIQA based on the sta-
tistical modeling of the stereoscopic image. The difference image is
computed by referring to the left view and right view image. After
performing the block-based DCT and reorganization process, the
coefficients of the images (left view, right view, and the different
image) are reorganized into different RDCT subbands. The statis-
tical property within each RDCT subband is exploited by the GGD
modeling of the coefficient distribution. The statistical property
across RDCT subbands is modeled by the energy distribution ratio
(EDR), which can be further employed for modeling the HVS
mutual masking property. The main contributions of our proposed
method are listed as follows.
The statistical property of stereoscopic image is studied for
perceptual quality analysis. The statistical properties of the
obtained difference image are firstly investigated in the RDCT
domain. By considering the difference image, the left and right
view images are considered together for perceptual quality
analysis, which matches the HVS perception of the stereoscopic
image.
The statistical properties of the difference image are char-
acterized from two perspectives, specifically the within and
across RDCT subband statistical properties, respectively. The
statistical properties depicted within and across the RDCT
L. Ma et al. / Neurocomputing 215 (2016) 21–3122