Interactive Segmentation on RGBD Images via Cue Selection
Jie Feng
1
Brian Price
2
Scott Cohen
2
Shih-Fu Chang
1
1
Columbia University
2
Adobe Research
{jiefeng, sfchang}@cs.columbia.edu {bprice, scohen}@adobe.com
Abstract
Interactive image segmentation is an important problem
in computer vision with many applications including im-
age editing, object recognition and image retrieval. Most
existing interactive segmentation methods only operate on
color images. Until recently, very few works have been pro-
posed to leverage depth information from low-cost sensors
to improve interactive segmentation. While these methods
achieve better results than color-based methods, they are
still limited in either using depth as an additional color
channel or simply combining depth with color in a lin-
ear way. We propose a novel interactive segmentation al-
gorithm which can incorporate multiple feature cues like
color, depth, and normals in an unified graph cut framework
to leverage these cues more effectively. A key contribution of
our method is that it automatically selects a single cue to be
used at each pixel, based on the intuition that only one cue is
necessary to determine the segmentation label locally. This
is achieved by optimizing over both segmentation labels and
cue labels, using terms designed to decide where both the
segmentation and label cues should change. Our algorithm
thus produces not only the segmentation mask but also a cue
label map that indicates where each cue contributes to the
final result. Extensive experiments on five large scale RGBD
datasets show that our proposed algorithm performs signif-
icantly better than both other color-based and RGBD based
algorithms in reducing the amount of user inputs as well as
increasing segmentation accuracy.
1. Introduction
Binary image segmentation is the process of separating
pixels into foreground and background. It is an important
problem for many computer vision applications, e.g. image
editing, object recognition, image retrieval, etc. Automatic
segmentation is intrinsically ambiguous and thus cannot ob-
tain satisfactory results on an arbitrary image without any
high-level understanding of the content. On the other hand,
interactive image segmentation allows a user to tell the al-
Figure 1: Example foreground/background cases. (a) com-
plex appearance, clean depth separation; (b) touching sur-
face; (c) same surface, different appearance; (d) touching
surface, similar appearance, background clutter.
gorithm what should be selected or not. An ideal interactive
segmentation algorithm should: 1) require minimal amount
of user interaction; 2) achieve good accuracy. However, the
colors in an image are affected by illumination, appearance,
occlusion, etc., making them less reliable for the segmen-
tation task. Due to this fact, significant effort from users
is still necessary to achieve satisfying results on complex
images.
Recent years have witnessed the emergence of low-cost
depth sensors, such as Microsoft Kinect, Intel Realsense
and Google Project Tango. These sensors are able to ac-
quire a depth image which captures the physical distance
of the scene to the camera at each pixel. This information
is very useful for image segmentation but is lost in color
imaging process. Besides depth, other feature cues can also
be extracted from a depth image to describe the scene, e.g.
normal map, 3D point cloud, mesh structure, etc. Together
with paired RGB color image, an RGBD image allows the
possibility of combining multiple complimentary cues for
the interactive segmentation problem to reduce user input
while maintaining or even improving accuracy.
Despite this, few works [5] [6] have been published on
interactive segmentation on RGBD images. These works
either treat depth as an additional color channel or simply
perform a global linear combination of different cue confi-
dences to produce the final result, allowing them to achieve
better performance than if using color alone. However, ob-
jects and scenes in the real world are complicated, as shown
in Fig. 1. Mixing depth and color cues in a fixed way can
4321