没有合适的资源?快使用搜索试试~ 我知道了~
基于稀疏表示的图像超分辨是基于学习的超分辨方法,比双三线性插值能取得更好的效果
资源推荐
资源详情
资源评论
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
1
Image Super-Resolution via Sparse Representation
Jianchao Yang, Student Member, IEEE, John Wright, Member, IEEE Thomas Huang, Life Fellow, IEEE and
Yi Ma, Senior Member, IEEE
Abstract—This paper presents a new approach to single-image
superresolution, based on sparse signal representation. Research
on image statistics suggests that image patches can be well-
represented as a sparse linear combination of elements from
an appropriately chosen over-complete dictionary. Inspired by
this observation, we seek a sparse representation for each patch
of the low-resolution input, and then use the coefficients of this
representation to generate the high-resolution output. Theoretical
results from compressed sensing suggest that under mild condi-
tions, the sparse representation can be correctly recovered from
the downsampled signals. By jointly training two dictionaries for
the low- and high-resolution image patches, we can enforce the
similarity of sparse representations between the low resolution
and high resolution image patch pair with respect to their
own dictionaries. Therefore, the sparse representation of a low
resolution image patch can be applied with the high resolution
image patch dictionary to generate a high resolution image patch.
The learned dictionary pair is a more compact representation
of the patch pairs, compared to previous approaches, which
simply sample a large amount of image patch pairs [1], reducing
the computational cost substantially. The effectiveness of such
a sparsity prior is demonstrated for both general image super-
resolution and the special case of face hallucination. In both
cases, our algorithm generates high-resolution images that are
competitive or even superior in quality to images produced by
other similar SR methods. In addition, the local sparse modeling
of our approach is naturally robust to noise, and therefore the
proposed algorithm can handle super-resolution with noisy inputs
in a more unified framework.
Index Terms—Image super-resolution, sparse representation,
sparse coding, face hallucination, non-negative matrix factoriza-
tion.
I. INTRODUCTION
Super-resolution (SR) image reconstruction is currently a
very active area of research, as it offers the promise of
overcoming some of the inherent resolution limitations of
low-cost imaging sensors (e.g. cell phone or surveillance
cameras) allowing better utilization of the growing capability
of high-resolution displays (e.g. high-definition LCDs). Such
resolution-enhancing technology may also prove to be essen-
tial in medical imaging and satellite imaging where diagnosis
or analysis from low-quality images can be extremely difficult.
Conventional approaches to generating a super-resolution im-
age normally require as input multiple low-resolution images
Jianchao Yang and Thomas Huang are with Beckman Institute, Uni-
versity of Illinois Urbana-Champaign, Urbana, IL 61801 USA (email:
jyang29@ifp.uiuc.edu; huang@ifp.uiuc.edu). John Wright is with the Visual
Computing Group, Microsoft Research Asia (email: jnwright@uiuc.edu). Yi
Ma is with the Visual Computing Group, Microsoft Research Aisa, as well
as Coordinated Science Laboratory, University of Illinois Urbana-Champaign,
Urbana, IL 61801 USA (email: yima@uiuc.edu).
This work was supported in part by the U.S. Army Research Laboratory
and the U.S. Army Research Office under grant number W911NF-09-1-0383.
It was also supported by grants NSF IIS 08-49292, NSF ECCS 07-01676,
and ONR N00014-09-1-0230.
of the same scene, which are aligned with sub-pixel accuracy.
The SR task is cast as the inverse problem of recovering the
original high-resolution image by fusing the low-resolution
images, based on reasonable assumptions or prior knowledge
about the observation model that maps the high-resolution im-
age to the low-resolution ones. The fundamental reconstruction
constraint for SR is that the recovered image, after applying the
same generation model, should reproduce the observed low-
resolution images. However, SR image reconstruction is gen-
erally a severely ill-posed problem because of the insufficient
number of low resolution images, ill-conditioned registration
and unknown blurring operators, and the solution from the
reconstruction constraint is not unique. Various regularization
methods have been proposed to further stabilize the inversion
of this ill-posed problem, such as [2], [3], [4].
However, the performance of these reconstruction-based
super-resolution algorithms degrades rapidly when the desired
magnification factor is large or the number of available input
images is small. In these cases, the result may be overly
smooth, lacking important high-frequency details [5]. Another
class of SR approach is based on interpolation [6], [7],
[8]. While simple interpolation methods such as Bilinear or
Bicubic interpolation tend to generate overly smooth images
with ringing and jagged artifacts, interpolation by exploiting
the natural image priors will generally produce more favorable
results. Dai et al. [7] represented the local image patches using
the background/foreground descriptors and reconstructed the
sharp discontinuity between the two. Sun et. al. [8] explored
the gradient profile prior for local image structures and ap-
plied it to super-resolution. Such approaches are effective in
preserving the edges in the zoomed image. However, they are
limited in modeling the visual complexity of the real images.
For natural images with fine textures or smooth shading, these
approaches tend to produce watercolor-like artifacts.
A third category of SR approach is based on ma-
chine learning techniques, which attempt to capture the co-
occurrence prior between low-resolution and high-resolution
image patches. [9] proposed an example-based learning strat-
egy that applies to generic images where the low-resolution
to high-resolution prediction is learned via a Markov Random
Field (MRF) solved by belief propagation. [10] extends this
approach by using the Primal Sketch priors to enhance blurred
edges, ridges and corners. Nevertheless, the above methods
typically require enormous databases of millions of high-
resolution and low-resolution patch pairs, and are therefore
computationally intensive. [11] adopts the philosophy of Lo-
cally Linear Embedding (LLE) [12] from manifold learning,
assuming similarity between the two manifolds in the high-
resolution and the low-resolution patch spaces. Their algorithm
maps the local geometry of the low-resolution patch space to
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2
the high-resolution one, generating high-resolution patch as
a linear combination of neighbors. Using this strategy, more
patch patterns can be represented using a smaller training
database. However, using a fixed number K neighbors for
reconstruction often results in blurring effects, due to over- or
under-fitting. In our previous work [1], we proposed a method
for adaptively choosing the most relevant reconstruction neigh-
bors based on sparse coding, avoiding over- or under-fitting of
[11] and producing superior results. However, sparse coding
over a large sampled image patch database directly is too time-
consuming.
While the mentioned approaches above were proposed for
generic image super-resolution, specific image priors can be
incorporated when tailored to SR applications for specific
domains such as human faces. This face hallucination prob-
lem was addressed in the pioneering work of Baker and
Kanade [13]. However, the gradient pyramid-based prediction
introduced in [13] does not directly model the face prior, and
the pixels are predicted individually, causing discontinuities
and artifacts. Liu et al. [14] proposed a two-step statistical
approach integrating the global PCA model and a local patch
model. Although the algorithm yields good results, the holistic
PCA model tends to yield results like the mean face and the
probabilistic local patch model is complicated and compu-
tationally demanding. Wei Liu et al. [15] proposed a new
approach based on TensorPatches and residue compensation.
While this algorithm adds more details to the face, it also
introduces more artifacts.
This paper focuses on the problem of recovering the super-
resolution version of a given low-resolution image. Similar
to the aforementioned learning-based methods, we will rely
on patches from the input image. However, instead of work-
ing directly with the image patch pairs sampled from high-
and low-resolution images [1], we learn a compact repre-
sentation for these patch pairs to capture the co-occurrence
prior, significantly improving the speed of the algorithm.
Our approach is motivated by recent results in sparse signal
representation, which suggest that the linear relationships
among high-resolution signals can be accurately recovered
from their low-dimensional projections [16], [17]. Although
the super-resolution problem is very ill-posed, making precise
recovery impossible, the image patch sparse representation
demonstrates both effectiveness and robustness in regularizing
the inverse problem.
a) Basic Ideas: To be more precise, let D ∈ R
n×K
be an overcomplete dictionary of K atoms (K>n), and
suppose a signal x ∈ R
n
can be represented as a sparse linear
combination with respect to D. That is, the signal x can be
written as x = Dα
0
where where α
0
∈ R
K
is a vector with
very few ( n) nonzero entries. In practice, we might only
observe a small set of measurements y of x:
y
.
= Lx = LDα
0
, (1)
where L ∈ R
k×n
with k<nis a projection matrix. In our
super-resolution context, x is a high-resolution image (patch),
while y is its low-resolution counter part (or features extracted
from it). If the dictionary D is overcomplete, the equation
x = Dα is underdetermined for the unknown coefficients α.
Fig. 1. Reconstruction of a raccoon face with magnification factor 2. Left:
result by our method. Right: the original image. There is little noticeable
difference visually even for such a complicated texture. The RMSE for the
reconstructed image is 5.92 (only the local patch model is employed).
The equation y = LDα is even more dramatically under-
determined. Nevertheless, under mild conditions, the sparsest
solution α
0
to this equation will be unique. Furthermore, if
D satisfies an appropriate near-isometry condition, then for
a wide variety of matrices L, any sufficiently sparse linear
representation of a high-resolution image patch x in terms
of the D can be recovered (almost) perfectly from the low-
resolution image patch [17], [18].
1
Fig. 1 shows an example
that demonstrates the capabilities of our method derived from
this principle. The image of the raccoon face is blurred and
downsampled to half of its original size in both dimensions.
Then we zoom the low-resolution image to the original size
using the proposed method. Even for such a complicated
texture, sparse representation recovers a visually appealing
reconstruction of the original signal.
Recently sparse representation has been successfully applied
to many other related inverse problems in image processing,
such as denoising [19] and restoration [20], often improving on
the state-of-the-art. For example in [19], the authors use the
K-SVD algorithm [21] to learn an overcomplete dictionary
from natural image patches and successfully apply it to the
image denoising problem. In our setting, we do not directly
compute the sparse representation of the high-resolution patch.
Instead, we will work with two coupled dictionaries, D
h
for
high-resolution patches, and D
l
for low-resolution ones. The
sparse representation of a low-resolution patch in terms of
D
l
will be directly used to recover the corresponding high-
resolution patch from D
h
. We obtain a locally consistent
solution by allowing patches to overlap and demanding that the
reconstructed high-resolution patches agree on the overlapped
areas. In this paper, we try to learn the two overcomplete
dictionaries in a probabilistic model similar to [22]. To enforce
that the image patch pairs have the same sparse representations
with respect to D
h
and D
l
, we learn the two dictionaries
simultaneously by concatenating them with proper normal-
ization. The learned compact dictionaries will be applied to
both generic image super-resolution and face hallucination to
demonstrate their effectiveness.
Compared with the aforementioned learning-based methods,
our algorithm requires only two compact learned dictionaries,
instead of a large training patch database. The computation,
mainly based on linear programming or convex optimization,
1
Even though the structured projection matrix defined by blurring and
downsampling in our SR context does not guarantee exact recovery of α
0
,
empirical experiments indeed demonstrate the effectiveness of such a sparse
prior for our SR tasks.
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
3
is much more efficient and scalable, compared with [9], [10],
[11]. The online recovery of the sparse representation uses the
low-resolution dictionary only – the high-resolution dictionary
is used to calculate the final high-resolution image. The
computed sparse representation adaptively selects the most
relevant patch bases in the dictionary to best represent each
patch of the given low-resolution image. This leads to superior
performance, both qualitatively and quantitatively, compared
to the method described in [11], which uses a fixed number
of nearest neighbors, generating sharper edges and clearer
textures. In addition, the sparse representation is robust to
noise as suggested in [19], and thus our algorithm is more
robust to noise in the test image, while most other methods
cannot perform denoising and super-resolution simultaneously.
b) Organization of the Paper: The remainder of this
paper is organized as follows. Section II details our formula-
tion and solution to the image super-resolution problem based
on sparse representation. Specifically, we study how to apply
sparse representation for both generic image super-resolution
and face hallucination. In Section III, we discuss how to
learn the two dictionaries for the high- and low-resolution
image patches respectively. Various experimental results in
Section IV demonstrate the efficacy of sparsity as a prior for
regularizing image super-resolution.
c) Notations: X and Y denote the high- and low-
resolution images respectively, and x and y denote the high-
and low-resolution image patches respectively. We use bold
uppercase D to denote the dictionary for sparse coding,
specifically we use D
h
and D
l
to denote the dictionaries
for high- and low-resolution image patches respectively. Bold
lowercase letters denote vectors. Plain uppercase letters denote
regular matrices, i.e., S is used as a downsampling operation
in matrix form. Plain lowercase letters are used as scalars.
II. I
MAGE SUPER-RESOLUTION FROM SPARSITY
The single-image super-resolution problem asks: given a
low-resolution image Y , recover a higher-resolution image X
of the same scene. Two constraints are modeled in this work
to solve this ill-posed problem: 1) reconstruction constraint,
which requires that the recovered X should be consistent with
the input Y with respect to the image observation model;
and 2) sparsity prior, which assumes that the high resolution
patches can be sparsely represented in an appropriately chosen
overcomplete dictionary, and that their sparse representations
can be recovered from the low resolution observation.
1) Reconstruction constraint: The observed low-resolution
image Y is a blurred and downsampled version of the high
resolution image X:
Y = SHX (2)
Here, H represents a blurring filter, and S the downsampling
operator.
Super-resolution remains extremely ill-posed, since for a
given low-resolution input Y , infinitely many high-resolution
images X satisfy the above reconstruction constraint. We
further regularize the problem via the following prior on small
patches x of X:
2) Sparsity prior: The patches x of the high-resolution
image X can be represented as a sparse linear combination in
a dictionary D
h
trained from high-resolution patches sampled
from training images:
x ≈ D
h
α for some α ∈ R
K
with α
0
K. (3)
The sparse representation α will be recovered by representing
patches y of the input image Y , with respect to a low
resolution dictionary D
l
co-trained with D
h
. The dictionary
training process will be discussed in Section III.
We apply our approach to both generic images and face
images. For generic image super-resolution, we divide the
problem into two steps. First, as suggested by the sparsity
prior (3), we find the sparse representation for each local
patch, respecting spatial compatibility between neighbors.
Next, using the result from this local sparse representation,
we further regularize and refine the entire image using the
reconstruction constraint (2). In this strategy, a local model
from the sparsity prior is used to recover lost high-frequency
for local details. The global model from the reconstruction
constraint is then applied to remove possible artifacts from
the first step and make the image more consistent and natural.
The face images differ from the generic images in that the face
images have more regular structure and thus reconstruction
constraints in the face subspace can be more effective. For
face image super-resolution, we reverse the above two steps
to make better use of the global face structure as a regularizer.
We first find a suitable subspace for human faces, and apply
the reconstruction constraints to recover a medium resolution
image. We then recover the local details using the sparsity
prior for image patches.
The remainder of this section is organized as follows: in
Section II-A, we discuss super-resolution for generic images.
We will introduce the local model based on sparse represen-
tation and global model based on reconstruction constraints.
In Section II-B we discuss how to introduce the global face
structure into this framework to achieve more accurate and
visually appealing super-resolution for face images.
A. Generic Image Super-Resolution from Sparsity
1) Local model from sparse representation: Similar to
the patch-based methods mentioned previously, our algorithm
tries to infer the high-resolution image patch for each low-
resolution image patch from the input. For this local model,
we have two dictionaries D
h
and D
l
, which are trained to
have the same sparse representations for each high-resolution
and low-resolution image patch pair. We subtract the mean
pixel value for each patch, so that the dictionary represents
image textures rather than absolute intensities. In the recovery
process, the mean value for each high-resolution image patch
is then predicted by its low-resolution version.
For each input low-resolution patch y, we find a sparse
representation with respect to D
l
. The corresponding high-
resolution patch bases D
h
will be combined according to these
coefficients to generate the output high-resolution patch x.
The problem of finding the sparsest representation of y can
be formulated as:
min α
0
s.t. F D
l
α − F y
2
2
≤ , (4)
剩余12页未读,继续阅读
资源评论
yancai345
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 常用工具集参考用于图像等数据处理
- 音乐展示网页、基于Stenography的图像数字水印添加与提取,以及基于颜色矩和Tamura算法的图像相似度评估算法py源码
- 基于EmguCV(OpenCV .net封装),图像数字水印加解密算法的实现,其中包含最低有效位算法,离散傅里叶变换算法+文档书
- 基于matlab+DWT的图像水印项目,数字水印+源代码+文档说明+图片+报告pdf
- (优秀毕业设计)基于python实现的数字图像可视化水印系统的设计与实现,多种数字算法实现+源代码+文档说明+理论演示pdf
- 基于DWT-DCT-SVD和deflate压缩的数字水印方法python源码+Gui界面+演示视频(高分毕业设计)
- 基于matlab实现DWT、DCT、SVD算法数字图像水印可视化系统+GUI界面+文档说明+详细注释(高分毕业设计)
- NCIAE-Data-Structure大一大二笔记
- 学习wireshark笔记
- digital-image-数据可视化笔记
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功