Convolutional Neural Networks for No-Reference Image Quality Assessment
Le Kang
1
, Peng Ye
1
, Yi Li
2
, and David Doermann
1
1
University of Maryland, College Park, MD, USA
2
NICTA and ANU, Canberra, Australia
1
{lekang,pengye,doermann}@umiacs.umd.edu
2
yi.li@cecs.anu.edu.au
Abstract
In this work we describe a Convolutional Neural Net-
work (CNN) to accurately predict image quality without a
reference image. Taking image patches as input, the CNN
works in the spatial domain without using hand-crafted fea-
tures that are employed by most previous methods. The net-
work consists of one convolutional layer with max and min
pooling, two fully connected layers and an output node.
Within the network structure, feature learning and regres-
sion are integrated into one optimization process, which
leads to a more effective model for estimating image quality.
This approach achieves state of the art performance on the
LIVE dataset and shows excellent generalization ability in
cross dataset experiments. Further experiments on images
with local distortions demonstrate the local quality estima-
tion ability of our CNN, which is rarely reported in previous
literature.
1. Introduction
This paper presents a Convolutional Neural Network
(CNN) that can accurately predict the quality of distorted
images with respect to human perception. The work focuses
on the most challenging category of objective image qual-
ity assessment (IQA) tasks: general-purpose No-Reference
IQA (NR-IQA), which evaluates the visual quality of digi-
tal images without access to reference images and without
prior knowledge of the types of distortions present.
Visual quality is a very complex yet inherent character-
istic of an image. In principle, it is the measure of the dis-
tortion compared with an ideal imaging model or perfect
reference image. When reference images are available, Full
Reference (FR) IQA methods [14, 22, 16, 17, 19] can be ap-
The partial support of this research by DARPA through BBN/DARPA
Award HR0011-08-C-0004 under subcontract 9500009235, the US Gov-
ernment through NSF Awards IIS-0812111 and IIS-1262122 is gratefully
acknowledged.
plied to directly quantify the differences between distorted
images and their corresponding ideal versions. State of the
art FR measures, such as VIF [14] and FSIM [22], achieve
a very high correlation with human perception.
However, in many practical computer vision applications
there do not exist perfect versions of the distorted images,
so NR-IQA is required. NR-IQA measures can directly
quantify image degradations by exploiting features that are
discriminant for image degradations. Most successful ap-
proaches use Natural Scene Statistics (NSS) based features.
Typically, NSS based features characterize the distributions
of certain filter responses. Traditional NSS based features
are extracted in image transformation domains using, for
example the wavelet transform [10] or the DCT transform
[13]. These methods are usually very slow due to the use of
computationally expensive image transformations. Recent
development in NR-IQA methods – CORNIA [20, 21] and
BRISQUE [9] promote extracting features from the spatial
domain, which leads to a significant reduction in compu-
tation time. CORNIA demonstrates that it is possible to
learn discriminant image features directly from the raw im-
age pixels, instead of using handcrafted features.
Based on these observations, we explore using a Convo-
lutional Neural Network (CNN) to learn discriminant fea-
tures for the NR-IQA task. Recently, deep neural networks
have gained researchers’ attention and achieved great suc-
cess on various computer vision tasks. Specifically, CNN
has shown superior performance on many standard object
recognition benchmarks [6, 7, 4]. One of CNN’s advan-
tages is that it can take raw images as input and incorporate
feature learning into the training process. With a deep struc-
ture, the CNN can effectively learn complicated mappings
while requiring minimal domain knowledge.
To the best of our knowledge, CNN has not been ap-
plied to general-purpose NR-IQA. The primary reason is
that the original CNN is not designed for capturing image
quality features. In the object recognition domain good fea-
tures generally encode local invariant parts, however, for
the NR-IQA task, good features should be able to capture
1
评论0