imizing a cost function. Section 2.1 shows how the GCRF
estimator can be motivated as a probabilistic model, while
Section 2.3 shows how it can be motivated as the minimum
of a quadratic cost function.
2.1. Basic Gaussian Conditional Random Field
Model
Before presenting the actual Gaussian Conditional Ran-
dom Field (GCRF) model used to denoise images, this sec-
tion will first describe a GCRF model for images that suffers
from the problem of oversmoothing. Section 2.2 will then
show how this model can be improved to handle sharp edges
and give much better results.
This GCRF model is defined by a set of linear fea-
tures that are convolution kernels. For a set of features,
f
1
. . . f
N
f
, the probability density of an image, X , condi-
tioned on the observed image, O, is defined to be
p(X )=
1
Z
exp
0
@
−
N
f
X
i=1
X
x,y
“
X ∗ f
i
)(x, y) − r
i
(x, y; O
”
2
1
A
,
(1)
where (X ∗ f
i
)(x, y) denotes the value at location (x, y) in
the image produced by convolving X with f
i
. We assume
that this image only contains those pixels where the entire
filter fits in X . This corresponds to “valid” convolution in
MATLAB. The function r
i
(x, y; O) contains the estimated
value of (X ∗ f
i
)(x, y). For each feature f
i
, the function r
i
uses the observed image O to estimate the value of the filter
response at each pixel. In the rest of this paper, r
i
(x, y; O)
will be shortened to r
i
for conciseness.
The exponent:
N
f
X
i=1
X
x,y
(X ∗ f
i
)(x, y) − r
2
i
(2)
can be written in matrix form by creating a set of matrices
F
1
. . . F
N
f
. Each matrix F
i
performs the same set of linear
operations as convolving an image with a filter f
i
. In other
words, if
ˆ
X is a vector created by unwrapping the image X ,
then F
i
ˆ
X is identical to X ∗ f
i
. These matrices can then be
stacked and Equation 2 can be rewritten as
(F
ˆ
X − r)
T
(F
ˆ
X − r), (3)
where
F =
F
1
F
2
.
.
.
F
N
f
and r =
r
1
r
2
.
.
.
r
N
f
.
Equation 3 can be rewritten in the standard form of the
exponent of a multivariate normal distribution by setting
the precision matrix, Λ
−1
, to be F
T
F and the mean, µ, to
(F
T
F )
−1
F
T
r.
Note that the difference between (
ˆ
X − µ)
T
Λ
−1
(
ˆ
X − µ)
and Equation 3 is the constant r
T
r − µ
T
µ. However, this
value is constant for all X, so it only affects the normaliza-
tion constant, Z, and the distributions defined using Equa-
tion 1 or Λ and µ are identical.
2.2. Utilizing Weights
One of the weaknesses of the GCRF model in Equation
1 is that it does not properly capture the statistics of nat-
ural images. Natural images tend to have smooth regions
interrupted by sharp discontinuities. The Gaussian model
imposes smoothness, but penalizes sharp edges. In appli-
cations like image denoising, optic flow, and range estima-
tion, this violates the actual image statistics and leads to
oversmoothing.
One way to avoid this oversmoothing is to use non-
convex, robust potential functions, such as those used in
[18] and [12]. Unfortunately, the convenience of the
quadratic model is lost when using these models.
Alternatively, the quadratic model can be modified by
assigning weights to the various terms in the sum. For ex-
ample, if the filters, f
1
. . . f
N
f
, include derivative filters, the
basic GCRF model penalizes the strong derivatives around
sharp edges. This causes the images estimated fr om basic
GCRF models to have blurry edges. Using the observed
image to lower the weight for those derivatives that cross
edges will preserve the sharpness of the estimated image.
It should be noted that this example is illustrative. Section
3 will show how use training data to learn the relationship
between the observed image and the weights.
Adding weights to the exponent of Equation 1 can be
expressed formally by modifying Equation 2:
N
f
X
i=1
X
x,y
w
i
(x, y; O, θ) ((X ∗ f
i
)(x, y) − r
i
)
2
, (4)
where w
i
(x, y; O, θ) is a positive weighting function that
uses the observed image, O, to assign a positive weight to
each quadratic term in Equation 4.
In the denoising example that is described in Section
5, the weight function associated with each filter is based
on the absolute r esponse of the observed image to a set of
multi-scale oriented edge and bar filters. These filters will
be shown later in Figure 2. These filters are designed to
be sensitive t o edges. The underlying idea is that these fil-
ter responses will enable the system to guess where edges
occur in the image and reduce the smoothness constraints
appropriately.
Assigning a weight to each quadratic term improves the
model’s ability to handle strong edges. For a term based on
a derivative filter, the weight could be increased in flat re-
gions of the image and decreased along edges. This would
enable the model to smooth out noise, while preserving
sharp edges. Section 3 will discuss how to learn the pa-
rameters, θ, of this weighting function.
The potential difficulty in using this model is the fact that
the weights must be computed f rom the observed image.
This will be problematic if the observed image is too noisy
or the necessary information is not easy to compute. Fortu-
nately, as Section 5 will show, t he system is able to perform
well with high-noise levels when denoising images. This
indicates that good weights can still be computed in high-
noise problems.