Real-Time Robust Hand Tracking Based on Camshift
and Motion Velocity
Chenyang Chen
College of Computer
Science and
Technology
Zhejiang University
Hangzhou, China
ccyang1897@gmail.com
Mingmin Zhang
College of Computer
Science and
Technology
Zhejiang University
Hangzhou, China
zmm@cad.zju.edu.cn
Kaijia Qiu
College of Computer
Science and
Technology
Zhejiang University
Hangzhou, China
qiukaijia@gmail.com
Zhigeng Pan
Institute of Service
Hangzhou Normal
University
Hangzhou, China
zhigengpan@gmail.com
Abstract—Although a lot of works have been done in the
domain of hand tracking, it is still a challenge to robustly
tracking hand motion. Traditional Camshift algorithm which can
efficiently tracking object in a simple scene is sensitive to the
changing of background, other variant such as
Camshift&Kalman tracking is still not robust enough to give a
reliable result. This paper proposes a real-time hand tracking
algorithm just using normal camera. KLT feature tracking is
used to tracking good features in the hand, and we use this
tracking result to calculate the main velocity of hand motion.
Additionally, global velocity which is calculated from probability
of Bayesian skin color is used to refine the velocity of hand
motion. After this step, we can update Camshift tracking window
using the velocity of hand. Finally Camshift is used to detect a
more precise hand region. This approach is relatively insensitive
to background, achieving robust tracking performance in real-
time.
Keywords—Hand tracking; Camshift; Human-Computer
Interaction
I. INTRODUCTION
Hand tracking and Gesture Recognition is one of the most
natural ways of Human Computer Interaction. Lately a lot of
studies focus on this domain trying to figure out an efficient
way to solve the problem how to let compute track the hand or
recognize the meaning of hand motion. In the past, Data glove
is probably the most popular way to track hand motion[1],
however, because of its high cost it is not widely used. [2] and
[3] try to use a normal camera to recognize the hand gestures,
but those methods are not robust enough, a probably nasty
background will make which fail to track the hand.
Furthermore the existence of motion blur makes this condition
even worse. [4], [5] and [6] introduced in 3D cameras to solve
this problem. By using depth information the tracking result
and recognition rate are improved sharply, however, it is
relatively more expensive than the normal camera and, like
Date Glove, more difficult to be widely accepted by people.
This paper focuses on how to use an ordinary camera to
give a stable tracking of the hand. Camshift(Continuously
Adaptive Mean-SHIFT)[7], which is improved from Mean-
SHIFT algorithm, can reach a real-time tracking of hand in a
continuous frames captured from a camera. Although this
algorithm can efficiently track hand in a simple background
scene, it cannot get the same result when the scene is clutter.
Once there are some big skin-like areas, and the hand is
occluded with that region, Camshift will confuse that region
with the hand itself. What’s worse, after the hand leaves that
area, the search window may stay focusing on the skin-like
area, and lose the continuous tracking of hand.
[8], [9] and [10] are also based on Camshift, however, they
introduced in Kalman Filter and made a great progress.
However, those algorithms are stable only in the condition that
the motion of hand is in regular. In other words, it will fail to
track in random motion. For example, when a hand moves
back once it reaches a big skin-like area, the search window
may stay in that area. The tracking will also fail if hand stay in
a big skin-like area for a relatively long time. The other main
drawback shared by this method and pure Camshift algorithm
is that once the search window is captured by fake area, the
algorithm is not easy to recover from this error state. [11] uses
motion mask and motion prediction to do tracking process, but
its needs to calculate entire mask of image makes it time-
consuming.
Kolsch M.introduced a KTL tracking based hand tracking
algorithm[12].This algorithm which is performed well in
controllable scenes cannot get the same result in a nasty scene
in which contains several regions looking like skin. Zhigen
Pan[13] proposed a similar approach, however, the method
introduced in multi-cue features that are good feature[14]
combined with velocity of certain features. Though this method
outperforms all above methods, the randomly chosen of
features to replace lost features makes the performance of this
algorithm worse as time goes on.
This paper proposed an improved Camshift algorithm, with
the help of KLT tracking we can efficiently update the search
window of Camshift, and then choose totally new features in
search window updated by Camshift. Those two previous
methods work together to reach an excellent tracking result. It
is shown in the experimental result that our new proposed
algorithm outperforms all of above methods. The main
contributions of this paper are:
2014 International Conference on Digital Home
978-1-4799-4284-8/14 $31.00 © 2014 IEEE
DOI 10.1109/ICDH.2014.11
20
2014 International Conference on Digital Home
978-1-4799-4284-8/14 $31.00 © 2014 IEEE
DOI 10.1109/ICDH.2014.11
20