SUN RGB-D
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
Shuran Song Samuel P. Lichtenberg Jianxiong Xiao
Motivation
!"#$%&'()$*+$ ,#!$-./01$
,23&$ !"##$% !&"''(%
,&4567$89'&$ !% #%
:;<&=($
>446(?@64$
)*%+,-.,/0123/% )*%+,-.,/0123/%
'*%456,70%
:;<&=($A65&$ 83/,% 9,+%
-66B$>446(?@64$ 83/,% :33.%;1<3=0%
Kinect v1 Kinect v2Asus XtionIntel Realsense
colorraw depthrened depthraw pointsrened points
bedroomclassroom
dining roombathroomoffice
home office conference room
kitchen
2D segmentation 3D annotaion 2D segmentation 3D annotaion
Effective free space
Outside the room
Inside some objects
Beyond cutoff distance
bathroom(6.4%)
others(8.0%)
classroom
(9.3%)
office
(11.0%)
furniture store
(11.3%)
bedroom(12.6%)
computer room(1.0%)
lecture theatre(1.2%)
library(1.4%)
study space(1.9%)
home office(1.9%)
discussion area(2.0%)
dining area(2.4%)
conference room(2.6%)
lab(3.0%)
corridor(3.8%)
kitchen(5.6%)
living room(6.0%)
rest space(6.3%)
dining room(2.3%)
(a) object distribution (b) scene distribution
17712
0
1250
2500
3750
5000
chair
table
desk
pillow
sofa
bed
box
cabinet
garbage bin
lamp
shelf
sofa chair
monitor
drawer
frame
sink
side table
paper
trash can
night stand
book
door
book shelf
computer
dresser
curtain
toilet
rack
bag
keyboard
cpu
tv
kitchen cabinet
coffee table
fridge
white board
printer
dining table
stool
bottle
towel
cup
plant
painting
hanging cabinet
mirror
unknown
computer monitor
laptop
board
tray
steel cabinet
oven
bench
clothes
bowl
cooker
kitchen tool
carton box
Kinect v2 SUN3D (ASUS Xtion)
NYUv2 (Kinect v1)Intel RealSense
Benchmark Tasks
Manhattan Box (0.99)Ground Truth Geometric Context (0.27)Convex Hull (0.90)
Convex Hull (0.85)
Geometric Context (0.61)Convex Hull (0.43)Manhattan Box (0.72)
Geometric Context (0.57)Ground Truth
Ground Truth
Manhattan Box (0.811)
Data Capturing Annotation
IoU 50.7 Rr: 0.333 Rg: 0.333 Pg : 0.375IoU: 53.1 Rr: 0.333 Rg: 0.333 Pg: 0.125 IoU: 57.3 Rr :0.33 Rg: 0.667 Pg:0.125
IoU: 53.1 Rr: 0.111 Rg : 0.111 Pg: 0.5IoU 72.9 Rr: 0.333 Rg: 0.667 Pg: 0.667 IoU 63.9 Rr: 0.333 Rg: 0.667 Pg:1IoU: 77.0 Rr: 0.25 Rg: 0.25 Pg: 0.5
IoU: 78.8 Rr: 1 Rg: 1 Pg: 0.5
Ground truthSliding Shapes3D RCNN
IoU: 54.6 Rr : 0.333 Rg : 0.333 Pg: 0.125
IoU:60 Rr: 0.50 Rg : 0.0.50 Pg: 0.5
bathtub bed bookshelf box chair counter desk door dresser garbage bin lamp monitor night stand pillow sink sofa table tv toilet
Scene
Classication
Room Layout
Estimation
Total Scene
Understanding
3D Object
Detection and
Pose Estimation
Semantic
Segmentation
NYU
1,449
LSUN
10,195,373
PASCAL VOC
11,530
ImageNet
131,067
log
10
RGB datasets RGB-D
SUNRGBD
10,335
NYU
1,449
PASCAL VOC
RGB-D
LSUN
10,195,373
PASCAL VOC
11,530
ImageNet
131,067
RGB datasets
PASCAL VOC
11,530
NYU depth V2 B3DO SUN3D
RealSense Xtion Kinect v1 Kinect v2
weight (pound) 0.077 0.5 4 4.5
size (inch) 5.2× 0.25× 0.75 7.1× 1.4× 2 11× 2.3× 2.7 9.8× 2.7× 2.7
power 2.5W USB 2.5W USB 12.96W 115W
depth resolution 628× 468 640× 480 640× 480 512× 424
color resolution 1920× 1080 640× 480 640× 480 1920× 1080
RealSense
RGB-D Sensors
2. Top
4. Front
3. Side
1. Image
2. Top
4. Front
3. Side
1. Image
Example Scenes
Annotation Tool for 3D Object and 3D Room Layout
Statistics of Semantic Annotation
Examples of 2D and 3D Annotation
References
[NYU] Indoor Segmentation and Support Inference from RGBD Images. N. Silber-
man, P. Kohli, D. Hoiem and R. Fergus. In ECCV, 2012.
[SUN3D] SUN3D: A database of big spaces reconstructed using SfM and object
labels. J. Xiao, A. Owens, and A. Torralba. In ICCV, 2013.
[B3DO] A category-level 3-d object dataset: Putting the kinect to work. A.
Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. In ICCV
Workshop, 2011.
home oce
RGB (38.1) D (27.7) RGB-D (39.0)
RGB (19.7) D (20.1) RGB-D (23.0)
bathroom
bedroom
classroom
computer room
conference room
corridor
dining area
dining room
discussion area
furniture store
kitchen
lab
lecture theatre
library
living room
rest space
bathroom
bedroom
classroom
computer room
conference room
corridor
dining area
dining room
discussion area
furniture store
kitchen
lab
lecture theatre
library
living room
rest space
GIST +
RBF kernel
PLACES-
CNN +
RBF kernel
Precision =
# prediction boxes
# matched pairs with a correct category label
Recall =
# matched pairs with a correct category label
# ground truth boxes
Free Space IoU Precision Recall for all objects
We introduce SUN RGB-D, a Pascal-scale RGB-D scene
understanding dataset, which has 2D and 3D annota-
tion for both objects and rooms.
RGB-D sensors have also enabled rapid progress for
scene understanding. However, the small dataset size
has become a major bottleneck.
Another problem of existing RGB-D datasets is that
most of them are only labeled in 2D.
Data & Code:
http://rgbd.cs.princeton.edu
Example Objects
This work is supported by gift funds from Intel.
Acknowledgement
Kinect v2 and Battery Capturing Setup