CVPR2016有关图像分割和物体检测的论文资源-CSDN文库

共7个文件

pdf：7个

CVPR2016

论文

需积分: 13 50 浏览量 2017-01-10 15:33:16 上传评论 1 收藏 18.49MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

CVPR2016_Image Segmentation.rar （7个子文件）

CVPR2016_Image Segmentation

A New Finsler Minimal Path Model with Curvature Penalization for Image.pdf 1.88MB

Primary Object Segmentation in Videos via Alternate Convex Optimization.pdf 1.76MB

Predicting If Computers or Humans Should Segment Images.pdf 1.89MB

Interactive Segmentation on RGBD Images via Cue Selection.pdf 7.36MB

Scale-Aware Alignment of Hierarchical Image Segmentation.pdf 1.2MB

MCMC Shape Sampling for Image Segmentation with Nonparametric Shaper.pdf 2.32MB

In the Shadows, Shape Priors Shine.pdf 5.58MB

Interactive Segmentation on RGBD Images via Cue Selection

Jie Feng

Brian Price

Scott Cohen

Shih-Fu Chang

Columbia University

Adobe Research

{jiefeng, sfchang}@cs.columbia.edu {bprice, scohen}@adobe.com

Abstract

Interactive image segmentation is an important problem

in computer vision with many applications including im-

age editing, object recognition and image retrieval. Most

existing interactive segmentation methods only operate on

color images. Until recently, very few works have been pro-

posed to leverage depth information from low-cost sensors

to improve interactive segmentation. While these methods

achieve better results than color-based methods, they are

still limited in either using depth as an additional color

channel or simply combining depth with color in a lin-

ear way. We propose a novel interactive segmentation al-

gorithm which can incorporate multiple feature cues like

color, depth, and normals in an uniﬁed graph cut framework

to leverage these cues more effectively. A key contribution of

our method is that it automatically selects a single cue to be

used at each pixel, based on the intuition that only one cue is

necessary to determine the segmentation label locally. This

is achieved by optimizing over both segmentation labels and

cue labels, using terms designed to decide where both the

segmentation and label cues should change. Our algorithm

thus produces not only the segmentation mask but also a cue

label map that indicates where each cue contributes to the

ﬁnal result. Extensive experiments on ﬁve large scale RGBD

datasets show that our proposed algorithm performs signif-

icantly better than both other color-based and RGBD based

algorithms in reducing the amount of user inputs as well as

increasing segmentation accuracy.

1. Introduction

Binary image segmentation is the process of separating

pixels into foreground and background. It is an important

problem for many computer vision applications, e.g. image

editing, object recognition, image retrieval, etc. Automatic

segmentation is intrinsically ambiguous and thus cannot ob-

tain satisfactory results on an arbitrary image without any

high-level understanding of the content. On the other hand,

interactive image segmentation allows a user to tell the al-

Figure 1: Example foreground/background cases. (a) com-

plex appearance, clean depth separation; (b) touching sur-

face; (c) same surface, different appearance; (d) touching

surface, similar appearance, background clutter.

gorithm what should be selected or not. An ideal interactive

segmentation algorithm should: 1) require minimal amount

of user interaction; 2) achieve good accuracy. However, the

colors in an image are affected by illumination, appearance,

occlusion, etc., making them less reliable for the segmen-

tation task. Due to this fact, signiﬁcant effort from users

is still necessary to achieve satisfying results on complex

images.

Recent years have witnessed the emergence of low-cost

depth sensors, such as Microsoft Kinect, Intel Realsense

and Google Project Tango. These sensors are able to ac-

quire a depth image which captures the physical distance

of the scene to the camera at each pixel. This information

is very useful for image segmentation but is lost in color

imaging process. Besides depth, other feature cues can also

be extracted from a depth image to describe the scene, e.g.

normal map, 3D point cloud, mesh structure, etc. Together

with paired RGB color image, an RGBD image allows the

possibility of combining multiple complimentary cues for

the interactive segmentation problem to reduce user input

while maintaining or even improving accuracy.

Despite this, few works [5] [6] have been published on

interactive segmentation on RGBD images. These works

either treat depth as an additional color channel or simply

perform a global linear combination of different cue conﬁ-

dences to produce the ﬁnal result, allowing them to achieve

better performance than if using color alone. However, ob-

jects and scenes in the real world are complicated, as shown

in Fig. 1. Mixing depth and color cues in a ﬁxed way can

4321

Figure 2: Example result of our algorithm. Foreground

click is colored in green and background click is red.

reduce the original discriminative power of each individ-

ual cue and not allow the algorithm to adapt, e.g. when

an object and background are separated in depth (Fig. 1a),

the depth is more useful in determining the foreground than

when selecting a color region on a single surface (Fig. 1c).

Such methods also provide no way of knowing which cues

where most useful for achieving a given segmentation.

To better handle and integrate multiple cues, we propose

a novel cue-selection-based interactive RGBD segmenta-

tion algorithm within a graph cut framework. We observe

that at least one cue will have highest conﬁdence in distin-

guishing foreground and background locally, which means

only one cue per pixel is necessary to infer the ﬁnal segmen-

tation result. Thus, we convert the standard binary labeling

problem into a multi-label problem with each label repre-

senting both the segmentation label (foreground or back-

ground) and cue label (color, depth, normal). This allows

different cues to take effect in different areas of the image,

and allows the algorithm to respond to the individual image

and the user input. An example result of our algorithm is

given in Fig. 2 where in the ﬁrst row the algorithm segments

out the whole dress based on the depth cue given the ﬁrst

foreground click since the depth cue gives a clear separa-

tion of the dress and background. By adding a background

click on the upper white part, our algorithm can intelligently

obtain only the lower blue part by switching to use the color

cue as shown in the second row.

Our primary contribution is this multi-cue-selection-

based interactive selection paradigm as applied to RGBD

image selection. We model the foreground/background

probablities using a geodesic-distance-based adaptive fore-

ground conﬁdence map. The pairwise term is designed to

ensure smoothness of both segmentation label and cue la-

bel. Alpha-beta swap is used to efﬁciently ﬁnd the optimal

labels. Our approach is similar to applying a dynamic bi-

nary weighting among multiple cues at each pixel location

where only one cue gets a non-zero cue weight. To evaluate

our algorithm, extensive experiments are conducted on ﬁve

large scale RGBD datasets captured from different depth

sensors. Our method is able to achieve similar or better seg-

mentation accuracy with signiﬁcant fewer user inputs when

comparing with color based algorithms and other RGBD

based algorithms.

2. Related Works

2.1. Interactive segmentation on color images

A large body of work has been proposed for segmen-

tation on color images. Among them, the graph cut based

framework has been very popular since it was introduced by

Boycov et. al [18]. In their work, the image is represented

by a graph and user inputs act as hard constraints. Graph

cut is used to ﬁnd a globally optimal segmentation based

on an energy function with balanced region and boundary

information. However, [18] is limited in its reliance on

color information only, and so can fail in cases where the

foreground and background color distributions are overlap-

ping or complicated. Various works tried to improve on

this color-based segmentation. GrabCut [14] iteratively up-

dates a Gaussian Mixture Model (GMM) of the foreground

and background to try to improve the segmentation. Bai et

al. [2] uses geodesic paths instead of graph cut to avoid its

boundary-length bias. Price et al. [13] combines geodesics

with graph cut to try to take advantage of their relative

strengths. Gulshan et al. [7] introduces a star-convexity

shape constraint to work with geodesic distances for seg-

mentation. While these enhancements can improve the in-

teractive experience, they are still limited to the fact that

they rely on the color information only and so still struggle

in cases of overlapping or complex foreground/background

color distributions.

2.2. Interactive segmentation on RGBD images

Unlike the ﬂourish of interactive segmentation methods

for color images, there are very few works about interactive

segmentation on RGBD images. Diebold et al. [5] utilizes

a segmentation formulation based on total variation. Depth

is added as an additional color channel and a joint distri-

bution for foreground pixels is computed by incorporating

three Gaussian kernels for distance, color and depth respec-

tively. This method extends the spatially varying color dis-

tributions [11] using 3D geometry and the distance is also

computed using depth information. Experiments on a small

RGBD dataset shows the proposed method achieves better

segmentation quality with less user scribbles required. In

4322

another recent work, Ge et al. [6] employs a binary graph

cut framework where color and depth cues are used sepa-

rately to compute costs that are linearly combined as the

ﬁnal unary term. Histogram and geodesic distance are used

for each cue respectively. A hierarchical image pyramid

is also used to speed up graph cut process. Experiments

performed on the RGBD saliency dataset [12] and a stereo

dataset demonstrates better results than [5].

Although the above methods show the obvious advan-

tage of using RGBD image for interactive segmentation,

they either add depth as an additional color channel to com-

pute the foreground conﬁdence or take a simple linear com-

bination with color and depth cues. This additive nature can

be problematic when only one cue is useful in segmentation,

e.g. different colors on the same depth surface or similar

color for foregrounds and backgrounds or differing depth.

Adding the cues directly can reduce the discriminability of

cues overall thus making it harder to produce good results

with limited user inputs. Our approach instead selects only

one cue to determine the segmentation locally, resulting in a

practical and general method for fusing multiple cues while

preserving the original foreground conﬁdence for each cue.

3. Approach

We ﬁrst introduce the basic graph cut framework for

color image segmentation. Then we describe how this

framework can be adopted for our RGBD segmentation

with cue selection capability.

3.1. Binary MRF for Interactive Segmentation

Let i denote a pixel in image I and Ω denote the set of

all pixels in I. N is the set of adjacent pixel pairs. Interac-

tive image segmentation is the problem of dividing Ω into

two disjoint sets, Ω

for foreground and Ω

for background,

given some user inputs. It is usually formulated as a binary

labeling problem via Markov Random Field (MRF) with the

following energy function:

E(S) =

i∈Ω

D(S

) + λ

(i,j)∈N

f(S

, S

) (1)

where S

is the segmentation label for pixel i and S is the

labeling of all pixels. S

takes the value of 0 or 1 to indicate

the pixel belongs to background or foreground respectively.

In Eq. 1, D(S

) represents the cost to assign label S

to pixel

i. It is often referred to as unary term which usually takes

the form of:

D(S

) = −log P (S

) (2)

where P (S

) is the probability of pixel i being assigned to

label S

. For user speciﬁed foreground or background pix-

els, the probability is set to 1 for corresponding labels to

make sure the ﬁnal result obeys user inputs. f(S

, S

) is the

cost for assigning a pixel pair S

and S

and is referred to

as the pairwise term. In the color image case, f(S

, S

) has

the form:

f(S

, S

) =

(

0 if S

= S

g(i, j) if S

6= S

(3)

with the similarity between adjacent pixels given by

g(i, j) = exp(

−|I

−I

2σ

) where I

is the color pixel value

for pixel i, so the cost will be small if we assign different

labels to nearby pixels with low similarity. λ controls the

balance between unary and pairwise terms. By minimiz-

ing Eq. 1, we are able to get the optimal pixel labeling S

∗

Graph cut [3] can be used to efﬁciently minimize this en-

ergy function.

3.2. RGBD Segmentation as Multi-label MRF

To better handle complex natural scenes, we use three

different cues to infer foreground conﬁdence for each pixel.

Color is useful for identifying foregrounds with different

appearance from their backgrounds. Given a depth map,

we extract two types of cues for each pixel. First, we use

the depth value directly as one cue since it gives important

information about the relative spatial distance between fore-

ground and background. Furthermore, to be able to distin-

guish objects with different geometry but similar distance,

normal vectors are computed from a depth-projected 3D

point cloud.

To tackle segmentation by fusing multiple cues, we pro-

pose a cue selection approach based on the assumption that

given certain user inputs, only one cue is required to explain

the segmentation result for each pixel. More speciﬁcally, we

want to know 1) if the pixel is foreground or background; 2)

which cue is most discriminative in determining the label-

ing. This assumption aligns well with our intuition, allows

our algorithm to determine how to apply the cues on a local

basis, and also provides an interpretation of how each cue

contributes to the segmentation. Based on this motivation,

we form a label pair X

=< S

, C

> for each pixel i. S

the segment label which takes the value of 0 or 1 to indicate

if the pixel is background or foreground respectively. C

the cue label which takes the value from 0 to N-1 if there

are N cues. By linearizing the label pair into a label within

a [0, 2×N) range, we can reformulate the labeling problem

as a multi-label MRF.

E(X) =

i∈Ω

D(X

) + λ

(i,j)∈N

f(X

, X

) (4)

To adapt this model to our problem, we need to provide

appropriate energy terms.

4323

评论收藏

内容反馈

STU_11wxzou

粉丝: 2453
资源: 9

CVPR2016有关图像分割和物体检测的论文

CVPR 中有关图像分割的paper

图像分割的相关论文

图像分割论文

CVPR中ImageSegmentation 18篇Paper

图像分割领域经典论文

图片分割matlab代码-crfasrnn:该存储库包含ICCV2015论文中描述的语义图像分割方法的源代码：条件随机字段作为递归神经网络。h

Scribble_Saliency:通过涂抹注释进行弱监督的显着物体检测，CVPR2020

经典目标检测论文合集

图像分割论文 经典 图像分割论文 经典

纹理图像分割论文+代码

基于ARM11的视频图像中运动物体检测跟踪系统-论文

Style-transfer-with-neural-algorithm:通过张量流实现样式转换，有关详细信息，请参见论文“使用卷积神经网络进行图像样式转换”（CVPR2016）

论文研究-一种基于边缘图像的快速物体检测方法.pdf

论文研究-一种航拍图像中小物体检测方法 .pdf

CVPR2016有关跟踪的论文

2021CVPR-图像分割.zip

图像分割和边缘检测源代码

顶会CVPR 2021上与【图像分类】相关的论文（5篇）

Python-TensorFlow弱监督图像分割

cvpr16_deblur_study, 单图像盲模糊化( CVPR 2016 )的比较研究.zip

最新图像去雾代码以及对应CVPR文章

CVPR2018目标检测论文

CVPR2021目标检测论文列表及摘要.docx

图像超分入门必备：2019CVPR图像超分论文集合免费资源

计算机视觉论文 Redmon_You_Only_Look_CVPR_2016_paper

Image Inpainting图像修复（包括视频修复）近五年CVPR论文合集

CVPR-2018论文合集二

CVPR-2018论文合集六

CVPR2020论文.rar

最新资源

图像分割论文经典图像分割论文经典