Parameter-Free Fast Pixelwise Non-Local Means Denoising
NLM-Pa is that the increase in quality provided by the Gaussian kernel is low, while it requires the
additional parameter a to be tuned. However, it seems that there is no experimental study using
image databases to quantify the effect of the Gaussian kernel. It is a goal of this article to examine
to what extend the previous assertion is valid.
To unify the presentation of NLM-P and NLM-Pa, we may denote by K the kernel used to weight
the norm between patches (K = K
G
a
for NLM-Pa, K = 1/d
2
for NLM-P) and the corresponding
norm will be written k.k
2,K
. Before considering optimization, let us present the basic algorithm.
2 Basic Algorithm for the Pixelwise NLM
The pixelwise NLM method may be implemented in a straightforward manner using the above
equations. To denoise a pixel x ∈ Ω, one has to compute the patch V (y) for all y ∈ Ω
x
and
therefore to scan each pixel z such that ky − zk
∞
≤ d. A window of size (D + d)
2
centered in x is
then to be accessed and this requires adding to the borders of the input image v a strip of width
D−1
2
+
d−1
2
= D
s
+d
s
. As usual in image processing, the padding is performed by mirror reflection and
the resulting symmetrized image will be denoted Vsym in the following pseudo-codes (to assist the
reader, Table 1 lists the main notations specifying those used in the mathematical context and those
used for pseudo and source codes). The kernel K is precomputed as an array of size d
2
using (10)
for NLM-Pa (or as a constant K = 1/d
2
for NLM-P).
The main loop consists in going through each pixel x ∈ Ω and as explained in the next section,
parallelization may be performed at this stage by assigning a specific x to a specific thread. A first
pass of all neighboring pixels y ∈ Ω
x
is performed in order to compute the array of NLM-weights,
using (5) and (9) or (12) and (11). A second pass of all neighboring pixels y ∈ Ω
x
is performed to get
the denoised pixel ˜u(x) using the estimation in (2). Note that, in case of gray-level images, these two
passes can be merged together. A more detailed sketch of this algorithm is proposed as pseudo-code
in Algorithm 1.
From this pseudo-code, one can easily deduce the complexity of the pixelwise NLM in its ba-
sic implementation. Let us count the number of operations following the standard uniform cost
model [27]. Let N = |Ω| = N
1
N
2
be the number of pixels of the input image v, N
c
the number of
color channels, N
S
= (N
1
+ 2(D
s
+ d
s
))(N
2
+ 2(D
s
+ d
s
)) the number of pixels of the symmetrized
image and c a generic constant. The symmetrization of the input image requires cN
s
N
c
operations
and the precomputation of the kernel K, cd
2
operations. Inside the main loop x ∈ Ω, the first pass
of all neighboring pixels y ∈ Ω
x
used to compute NLM-weights requires cN
c
D
2
d
2
operations, while
the second pass that computes the estimator ˜u(x) needs cN
c
D
2
operations only. Therefore and using
the big O notation to hide constant factors and smaller terms, the complexity of the plain pixelwise
NLM is given by that of the first pass of the main loop, that is O(NN
c
D
2
d
2
).
3 Fast Algorithms for the Pixelwise NLM
Different strategies can be deployed to increase the speed of pixelwise NLM. First, it should be
noted that the algorithm is particularly well suited to parallel computing, thanks to the independent
processing of each pixel x ∈ Ω to be denoised. The easiest way to implement parallel computations
is therefore to assign, in the main loop level, a specific pixel x to a specific thread. This allows to
roughly divide the execution time by the number of available threads (the gain is actually a bit lower
due to input/output processes and because of the initialization pass, although this one may also be
parallelized). In the code associated to this article, parallelization is implemented in the main loop
level only, both for the plain and the proposed fast implementations.
303