Real-time RGB-D images stitching using
multiple Kinects for improved field of
view
Journal Title
XX(X):1–7
c
The Author(s) 2017
Reprints and permission:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/ToBeAssigned
www.sagepub.com/
Hengyu Li
1
, Hang Liu
1*
, Ning Cao
1
, Yan Peng
1
, Shaorong Xie
1
, Jun Luo
1
and Yu Sun
2
Abstract
This paper concerns the problems of defective depth map and limited field of view (FOV) of Kinect-style RGB-D sensors.
An anisotropic diffusion (AD) based hole filling method is proposed to recover invalid depth data in depth map. The
FOV of Kinect-style RGB-D sensor is extended by stitching depth and color images from multiple RGB-D sensors. By
aligning depth map with color image the registration data calculated by registering color images can be used to stitch
depth and color images into depth and color panoramic image in real-time concurrently. Experimental results show that
the proposed stitching method can generate RGB-D panorama with no invalid depth data and little distortion in real-
time and can be extended to incorporate more RGB-D sensors to construct even 360-degree FOV panoramic RGB-D
image.
Keywords
depth images stitching, RGB-D panorama, improved field of view, depth map hole filling, Kinect, image registration
Introduction
Depth information is an important complement for visual
(RGB) information based computer vision applications.
Traditional depth sensors include time-of-flight camera, laser
range scanner, structured light scanner and binocular camera.
Another type of depth sensor is an infrared (IR) based
sensor like Microsoft Kinect that generates depth map by
matching a dot in IR image with a dot in a pre-calibrated IR
pattern (Zhang (2012)). By comparison with laser scanner
and binocular camera, Kinect costs much lower and can
generate reliable depth map at much higher speed (Smisek et
al. (2013), Zug et al. (2012)). Kinect has been widely used as
the primary 3D sensor for computer vision applications like
detection, segmentation and recognition of objects (Gupta et
al. (2014), Shahroudy et al. (2016)), 3D modeling (Henry et
al. (2013), Barron and Malik (2013)) and SLAM (Whelan et
al. (2015)). However, a main limit when applying Kinect in
these applications is the narrow field of view (FOV) of Kinect
that limits the coverage of more objects in scenes (Zug et
al. (2012), Han et al. (2013)]). The depth camera of Kinect
owns a horizontal FOV of 57
◦
that is much smaller than the
240
◦
FOV of Hokuyo URG-04LX-UG01, a laser scanner
owns similar maximum range and accuracy compared with
the Kinect sensor (Zug et al. (2012)).
To extend the sensing area of single Kinect, multiple
Kinects have been used in 3D reconstruction (Tong et al.
(2012), Alexiadis et al. (2013)) or 3D detection (Susanto et
al. (2012), Asteriadis et al. (2013), Morato et al. (2014)).
In these works, multiple Kinects were placed to face the
same object or to observe the same scenario to cover full
sides of the model and avoid depth shadows caused by
occlusion. Instead of placing Kinects to face inward, they
can also be placed to face outward to extend the limited FOV
through image stitching which is the purpose of our work.
Song et al. provided a solution to extend the FOV by a pre-
calibrated rotated top-bottom arrangement of two Kinects
and the pair of depth maps were perspectively transformed
to a common frontal flat reference coordinate to form a
panoramic depth map by use of the homography between
depth maps(Song et al. (2015)). Though the depth maps
can be stitched seamlessly, the depth panorama had much
distortion since for larger fields of view we can not maintain a
flat representation without excessively stretching pixels near
the border of the image (Szeliski and Richard (2006)).
To generate little distorted depth panorama with large
FOV, the cylindrical or spherical projection is usually
chosen. Each input image is warped to cylindrical plane
or spherical plane according to an estimated 3 × 3 camera
matrix or homography(Szeliski and Richard (2006)). The
problem is well addressed by the work in (Brown and Lowe
(2007)) where the camera matrix was estimated and refined
based on matched SIFT features between input color images.
However, since depth maps lack of SIFT features to be
extracted this estimation method can not be directly applied
to depth maps registration. In this paper, we found that
by aligning depth map with color image the registration
matrix of color images can be used to register depth maps.
The problem of registering depth maps is transformed to
the problem of registering color images. It is also to be
found that if the scenes around cameras do not change
much the registration matrixes do not need to be updated
1
School of Mechatronic Engineering and Automation, Shanghai
University, 200072, China
2
Department of Mechanical and Industrial Engineering, University of
Toronto, Toronto, ON, M5S 3G8, Canada
Corresponding author:
Hang Liu
Email: liuhang shu@126.com
Prepared using sagej.cls [Version: 2015/06/09 v1.01]