Real-time Augmented Reality with Occlusion Handling Based on RGBD Images
Xiaozhi Guo, Chen Wang, Yue Qi
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University
Beijing 100191, China
Beihang University Qingdao Research Institute
Qingdao 266100, China
xiaozhi@buaa.edu.cn, vr_wangchen@buaa.edu.cn, qy@buaa.edu.cn
Abstract—Augmented Reality (AR) is one of the latest
developments in human-computer interaction technology. It
aims to generate illusions from seamless fusion of virtual
objects and real world. Typical AR system requires two basic
parts: three-dimensional registration and real-virtual fusion.
Occlusion handling is crucial for visual realism. To optimize
visual realism, we generated a real-time systematic
architecture to operate occlusion handling. The architecture is
based on RGBD images, and it consists of three parts: real-
time camera tracking system, 3D reconstruction system and
AR fusion system. Specifically, we used a two-pass scheme
strategy to execute the AR system. The first pass tracks
camera poses timely at video rate, which allows the
reconstruction results be updated and visualized
correspondingly during the scanning. The second pass takes
place simultaneously to handle occlusion between virtual
objects and real scene according to camera pose. Finally, the
render results of virtual objects and the color images are fused
to generate AR contents. Our results indicate that this method
is stable and precise for occlusion handling, and can effectively
improve realism in AR system.
Keywords-augmented reality; occlusion handling; scene
reconstruction; rgbd
I. INTRODUCTION
Augmented Reality (AR) has been a hot spot in research
for a long time [1]. It combines virtual items produced by
PC with real scene. AR system can produce more semantic
implications than either virtual or real world by orchestrating
them to be a whole. The primary challenge in generating
convincing augmented reality is to project 3D models onto a
user’s view of the real world and create a spatial sustained
illusion that the virtual and real scene coexist.
For a system to meet Azuma's definition of augmented
reality system [2], it must fulfill two fundamental
necessities: occlusion handling and 3D register. They play a
very important role in convincing augmented reality system.
Current AR system can be generally characterized into two
groups according to its tracking method [3]. One is based on
the sensor tracking technology that rotation and position of
the camera are acquired by the data from accelerometer,
GPS and compass. The cost of such system is generally
high. The other one is based on the technology of computer
vision. In this AR system, marker or scene features is
tracked by means of computer vision technology. Such as
QR-codes Based [4], Edge Snapping Based [5], 3D Line
Segment Based [6], and convex polygon marker Based [7].
Since Davison [8] and other researchers proposed
simultaneous localization and mapping (SLAM), it has been
widely considered in the field of augmented reality
[9][10][11].
One of the main problems of current augmented reality is
the lack of reliable depth information. It simply overlays
virtual 3D objects on real world imagery [12][13]. Such
overlay is not fantastic when displaying data in three
dimensions because the occlusion between real and
computer-generated objects is not addressed. Hauck JDVS
et al. utilized single depth image to handle Occlusion [14],
but its performance is not good enough in detail due to the
unstable depth value (Figure 1). As we can see,occlusion
handling is a key issue of AR realism.
To handle the occlusion between virtual objects and
reality scene, and improve the accuracy of Camera tracking,
we adopt a method of computer vision for camera tracking,
and a model-based approach for occlusion handling. In this
paper, we proposed a robust marker-less AR architecture
based on RGBD images.
II. S
YSTEM OVERVIEW
The goal of our system is to generate a convincing
model-based augmented reality system, which can handle
occlusion correctly and track camera precisely. In order to
achieve the goal, we adopt a specific method to rebuild the
real scene while tracking the camera simultaneously. We
take advantage of the kinect sensor, a conventional and low-
cost RGBD camera.
In our system, we adopted a two-pass scheme. The first
pass performs parallel camera tracking in real-time, and it
allows the reconstruction results to be updated and
visualized during the scanning process. The second pass
handles the occlusion between virtual object and real scene
model according to camera pose, and fuses the render result
with the color image.
The data processing in our system consists of three major
components. The flowchart of our system is shown in Fig.2.
• Camera tracking. Camera tracking needs to run at
the start of the system. We utilize a fast bilateral
filter to pre-process raw depth image. Then, we
exploit camera parameters to transform the depth
image into a point cloud in the 3D space, and