EnhancedComputerVisionwithMicrosoftKinectSensor:AReview资源-CSDN文库

需积分: 9 25 浏览量 2015-07-10 17:22:40 上传评论收藏 437KB PDF 举报

### 增强型计算机视觉与微软Kinect传感器：一项综述 #### 摘要与引言本文深入探讨了低成本微软Kinect传感器在计算机视觉领域的应用与进展。随着Kinect传感器的出现，高分辨率深度（Depth）与视觉（RGB）感知技术变得普及起来。Kinect提供的深度与视觉信息具有互补性，为解决计算机视觉领域中的基本问题开辟了新的途径。文章对近年来基于Kinect的计算机视觉算法和应用进行了全面回顾，并根据Kinect能够解决或增强的视觉问题类型对这些方法进行了分类。文章覆盖的主题包括预处理、对象追踪与识别、人体活动分析、手势分析以及室内三维（3D）建模等。对于每种方法类别，文章概述了它们的主要算法贡献，并总结了与传统RGB方法相比的优势和差异。文章还概述了该领域的挑战及未来的研究趋势。这篇综述旨在为基于Kinect的计算机视觉研究者提供一个教程和参考资源。 #### 主要内容 ##### 预处理预处理阶段是计算机视觉算法的基础，对于基于Kinect的数据来说尤为重要。预处理包括噪声去除、数据校正、特征提取等步骤。Kinect传感器捕获的深度图像往往包含噪声，因此需要采用滤波器来平滑图像，如中值滤波器或双边滤波器等。此外，由于传感器的位置和角度差异，可能还需要进行几何校正，以确保深度信息的准确性。特征提取则是为了后续的分析和处理，例如边缘检测、角点检测等。 ##### 对象追踪与识别基于Kinect的物体追踪和识别主要依赖于深度信息。传统的RGB图像只能提供二维信息，而Kinect通过深度图提供了第三维度的信息，这对于目标的定位和跟踪非常有用。通过结合颜色和深度信息，可以更准确地确定物体的位置、形状和尺寸，从而实现精确的对象追踪。此外，利用深度信息还可以提高目标识别的准确率，尤其是在复杂背景下的对象识别更为有效。 ##### 人体活动分析 Kinect传感器因其高精度的人体骨骼追踪功能，在人体活动分析方面展现出巨大潜力。通过捕捉到的深度图像，可以提取出人体关节的关键点位置，进而进行姿态估计和动作识别。这种技术广泛应用于健身指导、虚拟现实互动等领域。例如，可以通过分析人体的动作模式来进行健身指导，或者在游戏中通过玩家的动作来控制游戏人物的行为。 ##### 手势分析手势分析是Kinect传感器应用中的另一个重要领域。通过深度信息，可以更精确地捕捉和分析手部的细微动作，这对于交互式应用非常重要。例如，在智能家居系统中，用户可以通过简单的手势控制家电设备；在虚拟现实环境中，手部动作可以用来控制虚拟物品或进行交互操作。Kinect提供的深度信息可以帮助算法区分不同手势之间的微小差异，从而实现更精准的手势识别。 ##### 室内3D建模 Kinect传感器不仅可以用于动态场景下的分析，还能应用于静态场景的3D建模。通过对环境进行扫描并收集深度信息，可以构建出室内空间的三维模型。这种方法特别适用于建筑和室内设计行业，可以在项目初期就创建出逼真的室内布局模型，便于设计师和客户进行讨论和修改。此外，通过融合RGB图像，还可以为3D模型添加纹理，使得最终的模型更加真实。 #### 结论与展望基于Kinect的计算机视觉技术已经在多个领域取得了显著成果。尽管如此，该领域仍然面临着许多挑战，比如如何进一步提高算法的鲁棒性和实时性、如何更好地处理遮挡问题等。未来的研究将致力于开发更加高效和精确的算法，以克服现有技术的局限性，并探索更多应用场景的可能性。随着技术的进步和成本的降低，Kinect传感器及其相关技术将在更多行业中得到广泛应用。

资源推荐

资源详情

资源评论

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CYBERNETICS 1

Enhanced Computer Vision with

Microsoft Kinect Sensor: A Review

Jungong Han, Member, IEEE, Ling Shao, Senior Member, IEEE, Dong Xu, Member, IEEE, and

Jamie Shotton,

Member, IEEE

Abstract—With the invention of the low-cost Microsoft Kinect

sensor, high-resolution depth and visual (RGB) sensing has

become available for widespread use. The complementary nature

of the depth and visual information provided by the Kinect sensor

opens up new opportunities to solve fundamental problems in

computer vision. This paper presents a comprehensive review of

recent Kinect-based computer vision algorithms and applications.

The reviewed approaches are classiﬁed according to the type of

vision problems that can be addressed or enhanced by means

of the Kinect sensor. The covered topics include preprocessing,

object tracking and recognition, human activity analysis, hand

gesture analysis, and indoor 3-D mapping. For each category of

methods, we outline their main algorithmic contributions and

summarize their advantages/differences compared to their RGB

counterparts. Finally, we give an overview of the challenges in

this ﬁeld and future research trends. This paper is expected to

serve as a tutorial and source of references for Kinect-based

computer vision researchers.

Index Terms—Computer vision, depth image, information

fusion, Kinect sensor.

I. Introduction

INECT is an RGB-D sensor providing synchronized

color and depth images. It was initially used as an

input device by Microsoft for the Xbox game console [1].

With a 3-D human motion capturing algorithm, it enables

interactions between users and a game without the need

to touch a controller. Recently, the computer vision society

discovered that the depth sensing technology of Kinect could

be extended far beyond gaming and at a much lower cost than

traditional 3-D cameras (such as stereo cameras [2] and time-

of-ﬂight (TOF) cameras [3]). Additionally, the complementary

nature of the depth and visual (RGB) information provided

Manuscript received November 12, 2012; revised April 4, 2013; accepted

May 13, 2013. This work was supported by the Multiplatform Game In-

novation Centre (MAGIC), Nanyang Technological University, Singapore.

Recommended by Associate Editor D. Goldgof. (Corresponding author: L.

Shao)

J. Han is with Civolution Technology, Eindhoven 5656AE, The Netherlands

(e-mail: jungonghan77@gmail.com).

L. Shao is with the Department of Electronic and Electrical Engineering,

University of Shefﬁeld, Shefﬁeld, South Yorkshire, S1 3JD, U.K. (e-mail:

ling.shao@shefﬁeld.ac.uk).

D. Xu is with the School of Computer Engineering, Nanyang Technological

University, 639798, Singapore (e-mail: DongXu@ntu.edu.sg).

J. Shotton is with Microsoft Research, Cambridge CB1 2FB, U.K. (e-mail:

jamiesho@microsoft.com).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCYB.2013.2265378

by Kinect bootstraps potential new solutions for classical

problems in computer vision. In just two years after Kinect

was released, a large number of scientiﬁc papers as well

as technical demonstrations have already appeared in diverse

vision conferences/journals.

In this paper, we review the recent developments of Kinect

technologies from the perspective of computer vision. The

criteria for topic selection are that the new algorithms are

far beyond the algorithmic modules provided by Kinect de-

velopment tools, and meanwhile, these topics are relatively

more popular with a substantial number of publications. Fig. 1

illustrates a tree-structured taxonomy that our review follows,

indicating the type of vision problems that can be addressed

or enhanced by means of the Kinect sensor. More speciﬁcally,

the reviewed topics include object tracking and recognition,

human activity analysis, hand gesture recognition, and indoor

3-D mapping. The broad diversity of topics clearly shows the

potential impact of Kinect in the computer vision ﬁeld. We do

not contemplate details of particular algorithms or results of

comparative experiments but summarize main paths that most

approaches follow and point out their contributions.

Until now, we have only found one other survey-like paper

to introduce Kinect-related research [4]. The objective of that

paper is to unravel the intelligent technologies encoded in

Kinect, such as sensor calibration, human skeletal tracking

and facial-expression tracking. It also demonstrates a prototype

system that employs multiple Kinects in an immersive tele-

conferencing application. The major difference between our

paper and [4] is that [4] tries to answer what is inside Kinect,

while our paper intends to give insights on how researchers

exploit and improve computer vision algorithms using

Kinect.

The rest of the paper is organized as follows. First, we

discuss the mechanism of the Kinect sensor taking both

hardware and software into account in Section II. The purpose

is to answer what signals the Kinect can output, and what ad-

vantages the Kinect offers compared to conventional cameras

in the context of several classical vision problems. In Section

III, we introduce two preprocessing steps: Kinect recalibration

and depth data ﬁltering. From Section IV to Section VII, we

give technical overviews for object tracking and recognition,

human activity analysis, hand gesture recognition and indoor

3-D mapping, respectively. Section VIII summarizes the cor-

responding challenges of each topic, and reports the major

trends in this exciting domain.

2168-2267/$31.00

 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CYBERNETICS

Fig. 1. Tree-structured taxonomy of this review.

Fig. 2. Hardware conﬁguration of Kinect, on which we point out the location of each sensor. Additionally, two image samples captured by the RGB camera

and the depth camera are provided.

II. Kinect Mechanism

Kinect, in this paper, refers to both the advanced RGB/depth

sensing hardware and the software-based technology that

interprets the RGB/depth signals. The hardware contains a

normal RGB camera, a depth sensor and a four-microphone

array, which are able to provide depth signals, RGB images,

and audio signals simultaneously. With respect to the soft-

ware, several tools are available, allowing users to develop

products for various applications. These tools provide facilities

to synchronize image signals, capture human 3-D motion,

identify human faces, and recognize human voice, and others.

Here, recognizing human voice is achieved by a distant speech

recognition technique, thanks to the recent progresses on the

surround sound echo cancelation and the microphone array

processing. More details about Kinect audio processing can

be found in [5] and [6]. In this paper, we focus on techniques

relevant to computer vision, and so leave out the discussion

of the audio component.

A. Kinect Sensing Hardware

Fig. 2 shows the arrangement of a Kinect sensor, consisting

of an infrared (IR) projector, an IR camera, and a color

camera. The depth sensor comprises the IR projector and the

IR camera. The IR projector casts an IR speckle dot pattern

into the 3-D scene while the IR camera captures the reﬂected

IR speckles. Kinect is therefore an instance of a structured

light depth sensor. The geometric relation between the IR

projector and the IR camera is obtained through an off-line

calibration procedure. The IR projector projects a known light

speckle pattern into the 3-D scene. The speckle is invisible

to the color camera but can be viewed by the IR camera.

Since each local pattern of projected dots is unique, matching

between observed local dot patterns in the image with the

calibrated projector dot patterns is feasible. The depth of a

point can be deduced by the relative left-right translation of

the dot pattern. This translation changes, dependent on the

distance of the object to the camera-projector plane. Such a

procedure is illustrated in Fig. 3. More details concerning the

structured light 3-D imaging technology can be found in [7].

Each component of the Kinect hardware is described below.

1) RGB Camera: It delivers three basic color components

of the video. The camera operates at 30 Hz, and can

offer images at 640 ×480 pixels with 8-bit per channel.

Kinect also has the option to produce higher resolution

images, running at 10 frames/s at the resolution of

1280 × 1024 pixels.

2) 3-D Depth Sensor: It consists of an IR laser projector

and an IR camera. Together, the projector and the camera

create a depth map, which provides the distance infor-

mation between an object and the camera. The sensor

has a practical ranging limit of 0.8m − 3.5m distance,

and outputs video at a frame rate of 30 frames/s with

the resolution of 640 × 480 pixels. The angular ﬁeld of

viewis57

◦

horizontally and 43

◦

vertically.

3) The Motorized Tilt: It is a pivot for sensor adjustment.

The sensor can be tilted up to 27

◦

either up or down.

B. Kinect Software Tools

Kinect software refers to the Kinect development library

(tool) as well as the algorithmic components included in the

library. Currently, there are several available tools including

OpenNI [8], Microsoft Kinect SDK [9] and OpenKinect

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: ENHANCED COMPUTER VISION WITH MICROSOFT KINECT SENSOR 3

Fig. 3. Illustration of Kinect depth measurement.

(LibFreeNect) [10]. OpenNI always works together with a

Compliant middleware called NITE, and its highest version

until March 2013 is 2.0. Microsoft Kinect SDK is released

by Microsoft, and its current version is 1.7. OpenKinect is a

free, open source library maintained by an open community

of Kinect people. Since the majority of users are using the

ﬁrst two libraries, we provide details concerning OpenNI and

Microsoft SDK. The Microsoft SDK (version 1.7) is only

available for Windows whereas OpenNI (version 2.0) is a

multiplatform and open-source tool. Table I gives a compar-

ison between these two tools in terms of their algorithmic

components.

In general, most corresponding components provided by

these two libraries are functionally comparable. Here, we men-

tion a few differences between them. For example, OpenNI’s

skeletal tracker requires a user to hold a predeﬁned calibration

pose until the tracker identiﬁes enough joints. The calibration

time varies greatly depending on environment conditions and

processing power. On the contrary, Microsoft SDK does not

need a speciﬁc pose initialization. However, it is more prone

to false positives than OpenNI, especially when the initial

pose of a human is too complicated. Moreover, the newest

version of the Microsoft SDK is capable of tracking a user’s

upper body (ten joints) in case the lower body is not visible.

This is particularly useful when analyzing human postures

with a sitting position. Furthermore, OpenNI focuses on hand

detection and hand-skeletal tracking whereas Microsoft SDK

realizes simple gesture recognition, such as “grip” and “push”

recognition.

It is worth highlighting that the new version of OpenNI (2.0)

allows users to install Microsoft Kinect SDK on the same

machine and run both packages using the Microsoft Kinect

driver, which means that the OpenNI is now compatible with

the Kinect driver. By doing so, switching between two drivers

is not necessary anymore even when users want to beneﬁt from

both packages.

TABLE I

Comparisons of the OpenNI Library and the Microsoft SDK

OpenNI Microsoft SDK

Camera calibration

√ √

Automatic body calibration ×

√

Standing skeleton

√

(15 joints)

√

(20 joints)

Seated skeleton ×

√

Body gesture recognition

√ √

Hand gesture analysis

√ √

Facial tracking

√ √

Scene analyzer

√ √

3-D scanning

√ √

Motor control

√ √

C. Kinect Performance Evaluation

There are a few papers that evaluate the performance of

Kinect from either the hardware or the software perspective.

These evaluations help us to understand both the advantages

and limitations of the Kinect sensor and thus to better design

our own system for a given application.

In [11], the authors experimentally investigate the depth

measurement of Kinect in terms of its resolution and precision.

Moreover, they make a quantitative comparison of the 3-D

measurement capability for three different cameras, including

a Kinect camera, a stereo camera, and a TOF camera. The

experimental results reveal that Kinect is superior in accuracy

to the TOF camera and close to a medium-resolution stereo

camera. In another paper, Stoyanov et al. [12] compare the

Kinect sensor with two other TOF 3-D ranging cameras. The

ground truth data is produced by a laser range sensor with

high accuracy, and the test is performed in an uncontrolled

indoor environment. The experiments yield these conclusions.

1) the performance of the Kinect sensor is very close to that of

the laser for short range environments (distance< 3.5 meters);

2) the two TOF cameras have slightly worse performance in

the short range test; and 3) no sensor achieves performance

comparable to the laser sensor at the full distance range. This

implicitly suggests that Kinect might be a better choice (over

the TOF cameras) if the application only needs to deal with

short range environments, since TOF cameras are usually more

expensive than the Kinect sensor. Instead of comparing Kinect

with other available depth cameras, Khoshelham et al. [13]

provide an insight into the geometric quality of Kinect depth

data based on analyzing the accuracy and resolution of the

depth signal. Experimental results show that the random error

of depth measurement increases when the distance between the

scene and the sensor increases, ranging from a few millimeters

at close range to about 4 cm at the maximum range of the

sensor.

Another cluster of papers focus on studying the software

capability of Kinect, especially the performance of skeletal

tracking algorithm. It is indeed important when applying

Kinect to human posture analysis in a context other than

gaming, where the posture may be more arbitrary. In [14], the

3-D motion capturing capability offered by Kinect is tested in

order to know if the Kinect sensor has comparable accuracy

of existing marker-based motion acquiring systems. The result

剩余16页未读，继续阅读

评论收藏

内容反馈

ailesargentees

粉丝: 0
资源: 1

Enhanced Computer Vision with Microsoft Kinect Sensor: A Review

SensorKinect

Kinect的详细介绍

Beginning Kinect Programming with the Microsoft Kinect SDK

ERNIE：Enhanced Language Representation with Informative Entities.pdf

Computer Architecture: A Quantitative Approach 6th

Windows Enhanced Storage

Enhanced-Microsoft-Band-Alarms:在游戏商店！

Review on Enhanced GPSR protocol For Wireless Sensor Networks.pdf

vision.rar_Line following robot_Line of Vision_Vision Enhancemen

Fiber Bragg grating pressure sensor with enhanced sensitivity

Sensitivity-enhanced surface plasmon resonance sensor utilizing a tungsten disulfide (WS2) nanosheets overlayer

mastodon-enhanced-beep：:bell::speaker_high_volume:哔哔另一哔哔！

蔡氏电路matlab仿真代码-tree_enhanced_embedding_model:TEM：可解释建议的树增强嵌入模型，WWW2018

Computer aided DMM calibration software with enhanced AC precision.

WildPackets OmniPeek Enterprise with Enhanced Voice Option v5.0

DSSS_enhanced_with_a_coarse_time_synchronization_l_ Early Late G

enhanced-cooja-radiologger-headless:增强型cooja-radiologger-headless-开源

A Survey of Knowledge Enhanced Pre-trained.pdf

ODBC c# 数据库

enhanced-media-reviews-app:具有增强功能的评论应用程序

enhanced-toolbar-link-dialog:增强的工具栏链接对话框是一个Kirby 3插件，用于处理内部链接

SELinux: NSA's Open Source Security Enhanced Linux By Bill McCarty

微纳传感器原理Principles of Electronic Nanobiosensors

JDK5增强for循环的使用

最新资源