1
CONTENTS
I Introduction 3
II Background 5
II-A Manifold Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II-B Regularized Random Walks Ranking . . . . . . . . . . . . . . . . . . . . . 6
III The Proposed Algorithm 7
III-A Multilevel Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
III-B Background Saliency Estimation . . . . . . . . . . . . . . . . . . . . . . . 8
III-C Foreground Saliency Estimation . . . . . . . . . . . . . . . . . . . . . . . 9
III-D Saliency Map Formulation by Regularized Random Walks Ranking . . . . 10
III-E Otsu Binarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
IV Experiment Results 12
IV-A Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
IV-B Examination of design options . . . . . . . . . . . . . . . . . . . . . . . . 12
IV-C Comparison with State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . 13
V Conclusion 14
2
Robust Saliency Detection Algorithm via
Multi-Level Graph Structure and Accurate
Background Queries Selection
Ziyu Shu, Yongan Shu
a) Abstract: In the field of saliency detection, many graph-based algorithms use boundary
pixels as background seeds to estimate the background and foreground saliency,which leads
to significant errors in some of pictures. In addition, local context with high contrast will
mislead the algorithms. In this paper, we propose a novel multilevel bottom-up saliency detection
approach that accurately utilizes the boundary information and takes advantage of both region-
based features and local image details. To provide more accurate saliency estimations, we build
a three-level graph model to capture both region-based features and local image details. By
using superpixels of all four boundaries, we first roughly figure out the foreground superpixels.
After calculating the RGB distances between the average of foreground superpixels and every
boundary superpixel, we discard the boundary superpixels with the longest distance to get a
set of accurate background boundary queries. Finally, we propose the regularized random walks
ranking to formulate pixel-wise saliency maps. Experiment results on two public datasets indicate
the significantly promoted accuracy and robustness of our proposed algorithm in comparison with
7 state-of-the-art saliency detection approaches.
b) Index terms:: saliency detection, manifold ranking, regularized random walks ranking,
accurate background queries selection, multi-level graph structure.
.
Anhui Provincial Natural Science Foundation founded
project (1408085MF125).
3
I. INTRODUCTION
One of the most important capabilities of our
human visual system is to figure out salient
objects from a complicated visual scene. Such
capability is also very important for compu-
tational visual systems. By using this capabil-
ity, visual systems can identify and process the
most salient objects at first, which can extraor-
dinary save the time and get rid of the infor-
mation overload problem. Saliency detection
is used to imitate the human visual system to
identify the most salient parts of an image and
neglect the remaining parts. It has been widely
applied to numerous vision problems including
image segmentation [1], object recognition [2],
image compression [3], content based image
retrieval [4].
In computer vision, both bottom-up models
[1, 5–10] and top-down models [4, 11–13] can
detect a salient object. Top-down models an-
alyze task-driven visual attention, which often
entail supervised learning with class labels from
a large set of training sets [13, 14]. Bottom-up
models are fast, data-driven, and pre-attentive
[10] that always model saliency by visual dis-
tinctness or rarity using low-level image in-
formation such as contrast, color, texture, and
Ziyu Shu is with NYU Langone Medical Center, New York
University, USA, email:zs919@nyu.edu
Yongan Shu is the corresponding author and he is with the
Computer Science and Technology Institute, Anhui University,
Anhui 230039, China, email: shuya@mail.ustc.edu.cn.
boundary. Bottom-up models are usually faster
to execute and easier to adapt to various im-
ages than top-down models. In this paper, we
propose a bottom-up model to detect salient
objects in images.
Existing methods always use visual cues of
foreground objects for saliency detection, e.g.,
color [5, 15], distinct patterns [16], focuses [17].
Recently, methods using the background cues,
especially using the boundaries of images to
detect salient objects have been developed [10,
18, 19]. These methods use three or four bound-
aries of the images as background queries to
detect salient object. As for saliency detection,
an image is represented by a set of nodes to be
labeled, and the labeling task is transformed to
an energy minimization problem [20, 21] or a
random walks problem [22].
We observe that although background regions
usually contain image boundaries, using bound-
aries as background queries may also cause
some mistakes. On the one hand, salient ob-
jects may touch one or two boundaries (such
as a portrait) and cause mistakes when using
all four boundaries as background queries (Fig-
ure 1), on the other hand, existing methods
sometimes select wrong boundaries (Figure 2),
which makes things even worse. In this work,
we first use all four boundaries as background
queries to roughly estimate the salient objects,
then we calculate the RGB distances between
every boundary superpixel and salient objects
4
to get the accurate background queries. With
a set of accurate background queries, we can
figure out salient objects better than previous
boundary selection methods.
Fig. 1. The shortcoming of using all boundaries as background
queries. Row A: Original image. Row B: Solutions using all
four boundaries as background. Row C: Ground Truth. In the
picture, we can see that using all four boundaries may wrongly
regard a part of background as a salient object (B1) or lose parts
of a salient object (B2, B3, B4, B5).
Fig. 2. The shortcoming of using three boundaries as back-
ground queries. Row A: Original image. Row B: Solutions
using three boundaries as background. Row C: Ground Truth.
In the picture, we can see that this method may wrongly regard
large area of background as salient objects (B2, B3) or even
come up with a totally wrong solution (B1, B4).
We also observe that the local contrast and
textures play important roles in saliency detec-
tion. When the color of salient object is similar
to the background, local contrast and texture
like edges can help us to figure out the salient
object. But sometimes, they also mislead algo-
rithms into neglecting the salient object, e.g.,
eyes on people’s face, local textures on objects’
surface (Figure 3). To deal with this problem,
we construct a multi-level graph structure to
simultaneously capture both local and global
structure information of an image, where we
use Gaussian filter to smooth the input image
on different levels. With a multi-level graph
structure, our algorithm can not only get infor-
mation from edges and textures, but also get rid
of the bad influence of the high local contrast
and textures.
Fig. 3. The error caused by local textures. Row A: Origin
image. Row B: Error solutions caused by local textures. Row
C: Ground Truth.
Specifically, we use both manifold ranking
and regularized random walks ranking [10, 18]
to improve the overall quality of the saliency
5
map. Figure 4 shows the main steps of the pro-
posed algorithm. In the first step, we use each
of the boundary superpixels as background queries
to roughly label the salient object. Then, we
calculate the differences between salient object
and each of the boundary superpixels and select
the superpixels which are most likely to be
background queries. With these more accurate
background queries, we calculate the saliency
map again to improve the output. In the third
step, we use the salient object got from the
first step as foreground queries to improve the
quality of the saliency map. In the fourth step,
we use pixel-wise regularized random walks
ranking to get a new saliency map. At the last
step, we use Otsu method [23] to binarize the
saliency map to get the final solution.
The main contributions of this work are sum-
marized as follows:
1) We construct a multi-level graph model
to capture the characteristics of both local
textures and long-range spatial connec-
tions between pairwise pixels.
2) We raise a new method to select back-
ground queries more accurately.
3) Experimental results on four benchmark
data sets show that the proposed algo-
rithm performs more efficiently and fa-
vorably than the state-of-the-art saliency
detection methods.
Fig. 4. Main steps of the propose method. A: Input image,
B: Three level graph model, C: Using every boundary as
background query to estimate saliency map, D: Output of step
C, E: Average of step D, F: Estimating foreground saliency
map, G: Selecting accurate background queries (white area),
H: Calculating background saliency, I: Calculating foreground
saliency map, apply Otsu method.
II. BACKGROUND
In this section, we will provide a brief re-
view of the manifold ranking model, regular-
ized random walks ranking model as prelimi-
nary knowledge.
A. Manifold Ranking
Manifold ranking is first used in pattern clas-
sification [24, 25]. Given a dataset A = {a
1
, . . . ,
a
m
, a
m+1
, . . . , a
n
}, where n is the total number
of elements, the first m elements are the queries
and the rest of them are the unknown elements
which we want to rank according to their rel-
evances to the queries. Let G = (V, E) denote
the graph structure of A, where the node set V
denotes every point of the data set A, and the
edge set E denotes all the connections of any