3454 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 9, SEPTEMBER 2012
majorization–minimization [18] and TwIST algorithms [19].
The EM algorithm is an iterative procedure developed to max-
imize the likelihood function corresponding to the observation
model. Each iteration consists of two steps: an E-step solving
the deblurring and an M-step managing the noise. Duijster et al.
extended such a technique to the restoration of MS images [3].
An even more recent regularization strategy is based on the total
variation minimization [20]–[22].
When more than one observation is available, a popular
technique is image fusion. As a prototype problem, usually, an
image of high spectral resolution is combined with an image of
high spatial resolution to obtain an image of optimal resolutions
both spectrally and spatially.
Most fusion techniques were developed for the specific pur-
pose of enhancing MS images by using a panchromatic (Pan)
image of higher spatial resolution (also referred to as pansharp-
ening). Principal component analysis (PCA)-based [23], [24]
and intensity–hue–saturation (IHS)-transform-based [25]–[27]
techniques are the most commonly used ones. The Pan image
is applied to totally or partially substitute the first principal
component or intensity component of the coregistered and
resampled MS image. To generalize to more than three bands
and to reduce spectral degradation, generalized IHS transforms
[28] and generalized intensity modulation techniques [29] were
defined. High-pass filtering and high-pass modulation tech-
niques were developed [23], [24], [30], in which spatial high-
frequency information is extracted and injected adequately into
each band of the MS image. With the rise of multiresolu-
tion analysis, many researchers have proposed pansharpening
techniques, using Gaussian and Laplacian pyramids as well as
discrete decimated and undecimated WTs [31]–[33]. Detailed
description and comparison of these techniques are given in
[34] and [35].
As a more general problem, consider that an MS image
of high spatial resolution and an HS image of lower spatial
resolution are available. Because the high-spatial-resolution
MS image is multiband, the pansharpening techniques cannot
be applied directly to the problem of MS and HS image fusion.
Usually, a spatial high-frequency component of the MS image
is extracted (by PCA, IHS, etc.) first and then fused with the HS
image, which may lead to spectral distortion.
A technique using a 2-D WT was proposed by Gomez et al.
[2]. Zhang and He [4] proposed an MS and HS image fusion
technique by using a 3-D WT where the third dimension is
the spectral one. In both approaches, the MS and HS images
were spectrally and spatially resampled apriori. These two ap-
proaches were both capable of enhancing the spatial resolution
of the HS image effectively. However, the performance was
highly dependent on the spectral resampling methods adopted.
Some researchers proposed statistical estimation techniques
for MS and HS image fusion. In [36], a specific method
was developed for the enhancement of the thermal band of a
Landsat image, for which the resolution difference with the
other bands is assumed to be known. Since only one band was
envisaged, spectral resampling was not an issue, while spatial
resampling was performed according to the known resolution
difference. The method in [5] employed MAP estimation based
on a spatially varying statistical model to enhance the HS
image resolution. The framework developed was validated for
pansharpening but allowed for any number of spectral bands in
both the MS and HS images. Extensions of this work applied
an extended stochastic mixing model [6], [7]. These statistical
estimation methods defined an observation model of the HS
image and a model between the high-resolution MS and HS
images. This leads to a complex framework with assumptions
on the knowledge of the spectral and spatial response functions
and the need to solve an incomplete deconvolution problem.
Although additive noise was included in the imaging model,
the technique worked in the image domain, so that the noise
handling resorted to a Wiener type of filtering, which is known
to be less effective. In [8], the technique was applied in the
wavelet domain, proven to be much more robust to noise.
However, a correct treatment of the restoration problem is still
lacking.
In this work, we treat the problem of enhancing a low-spatial-
resolution HS observation with a high-spatial-resolution MS
observation as auxiliary. We will refer to the former as the HS
image and the latter as the MS image. We propose a Bayesian
approach in which the estimation is performed by assuming an
observation model for the HS image and a joint statistical model
between the HS and MS images. The estimation problem is
separated into a deblurring problem and a denoising problem
and solved in an iterative way using the EM algorithm. As
a result, an expression is obtained, which is a mixture of a
restoration of the HS image and a fusion of the HS and MS
images. Simulation experiments are employed for verification
and comparison.
In the next section, the problem is described, the estimation
framework is elaborated, and a practical implementation is pro-
vided. In Section III, simulation experiments with a reference
are performed, in which the method is evaluated against the
pure deconvolution and image fusion. Finally, the conclusions
are given in Section IV.
II. EM-B
ASED BAY ESI AN RESTORATION
APPROACH FOR HS IMAGE
A. Problem Description and Previous Work
The general problem discussed in this paper is to describe
a scene S by the solar reflectance with specific spatial and
spectral details. Assume that a number of Q observed images
{M
1
, M
2
,...,M
Q
} are available, each with specific spatial
and spectral resolutions. As a prototype problem, we will
consider the case where a low-spatial-resolution high-spectral-
resolution observation M
1
(HS image) is available, together
with a high-spatial-resolution low-spectral-resolution observa-
tion M
2
(MS image).
Although the two observations may be presented at different
spatial and spectral sampling rates, in this paper, we will assume
that all images are equally spatially sampled at a grid of N
pixels, sufficiently fine to reveal the spatial resolution of S .
Since we will concentrate on the optimization of the spatial
resolution and no spectral enhancement will be performed,
we assume that the spectral resolution of M
1
is optimal so
that the spectral sampling rate of M
1
is sufficient. Each im-
age is presented in band-interleaved-by-pixel lexicographical