【免费】运动员的运动员的自动检测..论文资源-CSDN文库

自动检测.

需积分: 0 120 浏览量 2012-02-28 16:53:25 上传评论收藏 1.18MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 3, MARCH 2010 351

Automatic Detection and Analysis of Player Action

in Moving Background Sports Video Sequences

Haojie Li, Jinhui Tang, Member, IEEE, Si Wu, Yongdong Zhang, and Shouxun Lin, Member, IEEE

Abstract—This paper presents a system for automatically

detecting and analyzing complex player actions in moving back-

ground sports video sequences, aiming at action-based sports

videos indexing and providing kinematic measurements for coach

assistance and performance improvement. The system works

in a coarse-to-ﬁne fashion. For an input video, in the coarse

granularity level, we automatically segment the highlights, that

is, the video clips containing the desired action as summaries

for general user viewing purposes; in the middle granularity

level, we recognize the action types to support action-based

video indexing and retrieval; and ﬁnally in the ﬁne granularity

level, the critical kinematic parameters of player action are

obtained for sports professionals’ training purposes. However,

the complex and dynamic background of sports videos and the

complexity of player actions bring considerable difﬁculty to the

automatic analysis. To fulﬁll such a challenging task, robust

algorithms including global motion estimation with adaptive out-

liers ﬁltering, object segmentation based on adaptive background

construction, and automatic human body tracking are proposed

in this paper. Two visual analyzing tools: motion panorama and

overlay composition, are also introduced. Real diving and jump

game videos are used to test the proposed system and algorithms,

and the extensive and encouraging experimental results show

their effectiveness.

Index Terms—Action recognition, human body tracking, sports

training, video analysis, video object segmentation.

I. Introduction

ITH THE EXPLOSIVE growth of digital videos in

our daily life, automatic video content analysis has

become a basic requirement for efﬁcient indexing and retrieval

of long video sequences. In recent years, the analysis of sports

videos has attracted great attention due to its mass appeal

and tremendous commercial potentials. Many works have been

conducted and technologies and systems have been developed

for automatic or semi-automatic parsing the structure of sports

Manuscript received October 9, 2008; revised May 10, 2009. First version

published November 3, 2009; current version published March 5, 2010. This

work was supported in part by the National Basic Research Program of China

(Grant No. 2007CB311100), and the Co-building Program of the Beijing

Municipal Education Commission. This paper was recommended by Associate

Editor T. Fujii.

H. Li is with the School of Software, Dalian University of Technology,

Dalian 116620, China.

J. Tang is with the School of Computing, National University of Singapore,

117590 Singapore (e-mail: tangjh@comp.nus.edu.sg).

S. Wu is with France Telecom Research and Development Beijing, Beijing

100080, China.

Y. Zhang and S. Lin are with the Institute of Computing Technology,

Chinese Academy of Sciences, Beijing 100080, China.

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCSVT.2009.2035833

video [1], [2], semantic events detection and video summariza-

tion [3], [4], enhanced sports TV broadcasting [5], and content

insertion [6]. The difﬁculty of video analysis is the semantic

gap between the low-level audio-visual features and high-level

concepts. To bridge the gap, some mid-level representations

are constructed based on low-level features with clustering

or classiﬁcation methods [7]. However, these representations

are extracted from frame-based approaches and have no direct

link to high-level semantics. Moreover, they can only deduce

coarse-level knowledge of the video contents for general user

viewing purposes. Object’s behavior is another type but more

effective mid-level representation for video content analysis

[8], [9]. At the same time, for sports professionals such as

coachers and players, it is desired for ﬁner granularity analysis

to get more detailed information such as action names, match

tactics, kinematical, or biometric measurements from videos

for coaching assistant and performance improvements. For ex-

ample, by automatically recognizing the actions and obtaining

player body joint angles in diving game videos, the coaches or

players can easily retrieve and compare qualitatively or quan-

titatively the performed actions with the same ones performed

by elite players in video database, and then improve their

performance in later training or competition. To these ends,

this paper addresses the automatic detection, recognition, and

analysis of player actions from broadcast sports game videos

or videos recorded during daily training. More speciﬁcally,

we focus on one sports genre where the player performs

his or her action in a large arena and the camera needs to

be operated with pan/tilt/zoom to capture the player in the

middle of the image. Usually, one or more cameras are placed

in the side-view to record the entire detailed action. This

includes a broad category of individual sports videos, such

as diving, jumps, gymnastics videos, and so on (see Fig. 1).

For such kind of action-critical sports videos, general users

would like to rapidly locate and watch the highlights, namely,

the video clips containing the desired action, while sports

professionals will be more interested in the performances of

the players. Manual analysis of sports videos to achieve such

aims is labor-intensive and time-consuming. Therefore, the

systems and techniques that can automatically parse long time

videos into browse-able actions and further provide kinematic

measurements for performance analysis are demanding.

In this paper,we present an integrated coarse-to-ﬁne sports

video analysis system and various robust algorithms. Diving

videos are used as case study to demonstrate the effectiveness

of the system and algorithms due to the following motivations:

1051-8215/$26.00

 2010 IEEE

352 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 3, MARCH 2010

Fig. 1. Some individual sports actions captured with a side-view camera.

From left to right: diving, long jump, and tumbling.

1) diving is among one of the most popular spectator sports;

2) to make the player actions clearly viewed by the references

and audiences, there are always side-view shots for diving

videos; and 3) the complex and moving background of diving

videos and the complexity of diving action make the auto-

matic analysis a rather challenging task. To show the general

versatility of the proposed algorithms, we also test some key

algorithms on jump videos.

A. Related Work

Human action detection and recognition in videos is a hot

topic in computer vision research community and various

approaches have been proposed [10], [11]. But in the ﬁeld of

sports video, due to the dynamic background and the complex-

ity of sports action, there existed few works and most of them

were focused on action recognition with simple background,

e.g., tennis or soccer videos, or focused on relatively simple

action recognition. These works either took not much pain

to segment player’s body or didn’t need accurate tracking and

segmentation of the players. One of the earlier works on sports

action recognition was provided by Yamato et al. in [12].

They proposed using discrete hidden Markov models (HMMs)

to recognize image sequences of six different tennis strokes

among three subjects. Their system worked in a constrained

testing environment where the mesh features they used were

extracted from binarized human images which were obtained

from a pre-known background. Sullivan et al. [13] presented

a method for detecting tennis players’ strokes based on sim-

ilarity computed by point-to-point correspondence between

shapes. Although impressive results in terms of posture match-

ing were given, the extraction of edge needed a clean back-

ground. Miyamori et al. [8] analyzed the tennis player behav-

iors based on silhouette transitions. They extracted the player

silhouette images by frame-differencing, so the static back-

ground was necessary. Efros et al. [14] developed a generic

approach to recognizing action in “medium ﬁeld” sports video

by introducing a novel motion descriptor based on noisy op-

tical ﬂow measurements in a spatio-temporal volume for each

stabilized human ﬁgure. Similarly, Zhu et al. [9] proposed slice

based optical ﬂow histograms as motion descriptor to classify

player’s basic actions such as left-swing/right-swing in low

resolution tennis video sequence. Lu et al. [15] used the grids

of histograms of oriented gradient as representation of players

to track and recognize player actions like skate left/skate right

and running/walking in hockey and soccer sequences. The

limitation of the above three works [9], [14], [15] is that they

only presented results with relatively simple actions and simple

background such as soccer ﬁeld, tennis court, or hockey ﬁeld.

Compared with these previous works, the task in this paper,

to automatically detect and classify more complex actions in

video captured with moving camera, is more challenging.

Recent advances in video technology and computing power

have motivated the research of using computer vision and

image processing techniques to analyze sports game video

for tactics statistics, computer-aided coaching, or performance

improvements. Compared with previous methods that require

retro-reﬂective markers or magnetic sensors to be placed on

a player body, video-based approach has more advantages:

much less cost, no interference to the performance of player

and can analyze the rich archived video clips. Among such

works, Pascual et al. [16] developed a method for soccer player

position tracking aiming at the kinematical motion analysis

through a graph representation with four static cameras. Aided

with manually tracking in some cases, their method could

collect statistics measurements of each player in the game.

Wang et al. [17] classiﬁed tennis games into 58 winning

tactics patterns for archiving video clips and training purposes

using the ball trajectory and Bayesian network. To recover the

trajectory and ball landing position they turned to a wide-view

calibrated camera. The widely-studied human body model

based tracking approach has also been suggested for sports

biometric analysis. As shown in [18], a 42-dimensional body

model was used to track the golfer’s postural information and

then the obtained information was analyzed with respect to

a learned ideal motion. Nevertheless, their system ran such

slowly that each frame would take about 25 min to process,

and the initial parameters of the body model needed be set

up manually before tracking. Raquel et al. [19] incorporated

strong dynamic models into the human body tracking pro-

cess to recover the 3-D postural parameters of golfer from

monocular golf videos. One similar work to our player action

analysis was provided by Ryan et al. [20]. In their system,

the authors analyzed the acrobatic gestures of several sports

through modeling and characterizing acrobatic movements and

image processing techniques. However, their work was less

challenging than ours since that the player movements were

captured with static camera and they only recognized simple

acrobatic gestures by global measurements analysis.

B. Contributions of This Paper

In this paper, we develop robust algorithms and a system

for fully automatic analysis of complex actions in challeng-

ing dynamic background videos, aiming at high-level sports

video indexing/retrieval and training purposes. The schematic

diagram of the proposed system is illustrated in Fig. 2. For

an input video, global motion parameters between adjacent

frames are ﬁrst estimated, which are used as motion feature

for the highlights, to say, action clips detection. The detected

highlights are stored into library as video summaries for

user’s quick browsing. Then the player body shapes in action

clips are segmented and fed into hidden Markov models to

recognize the action type, which is used to index the highlight

library. With the segmented shape and recognized action type,

some critical kinematic parameters of action are automatically

obtained through the kinematic analysis component, and visual

analysis is conducted.

LI et al.: AUTOMATIC DETECTION AND ANALYSIS OF PLAYER ACTION IN MOVING BACKGROUND SPORTS VIDEO SEQUENCES 353

Fig. 2. Block diagram of the proposed system.

The contributions of our paper consist of the following

points.

1) An integrated framework for automatic analysis of sports

video in a coarse-to-ﬁne fashion is presented, which

attempts to extract semantic information with different

granularity to sufﬁce the retrieval requirements from

general users to sports professionals.

2) To robustly estimate the global motion between adjacent

video frames, we propose an adaptive outliers ﬁltering

strategy using Fisher linear discriminant analysis.

3) An object segmentation algorithm based on adaptive

dynamic background construction is proposed, which is

robust to complex and dynamic scenes.

4) The automatic kinematic analysis of player body move-

ments is achieved by model ﬁtting. By transferring the

initial model parameters from the recognized action

templates, our method avoids the manual setup of human

body model.

5) With the enabling techniques above, we present two

visual tools for individual sports game training: motion

panorama and overlay composition, which are percep-

tible for the visual analysis or comparing of player

performance.

II. Global Motion Estimation

Global motion (GM) is referred to the motion of the

background in a video sequence caused by the camera motion.

Global motion estimation (GME) is a process to estimate

the rotation, scaling, and translation parameters of camera

motion by comparing two different frames of a video, which

is the basic technique to the following algorithms of this

paper. The methods for GME can be classiﬁed into two cat-

egories: differential method [21], [22] and feature matching-

based method [23]. For methods among both categories, their

accuracies are inﬂuenced by the so-called outliers, to say, the

measurements whose motion is not consistent with the global

motion, mainly caused by local object (such as player) motion.

Some robust estimation methods have been developed to

handle outliers [21]–[23]. However, these methods used either

experimentally determined or manually speciﬁed thresholds

to remove outliers, thus are not adaptive to other data. In

this paper, we propose an adaptive outlier ﬁltering strategy

using Fisher linear discriminant analysis [24] to improve the

robustness of GME.

In this paper, we adopt the 6-parameter afﬁne model [25] to

describe the camera motion since it is computationally efﬁcient

but powerful to model the translation, rotation, and scaling op-

erations of camera when the relative depth of the object is not

large, which is applicable to most videos including our case.

Let u

=(x

)

be the position of a point p in the current

frame I

, and u



=(x



)

be the position of p in frame

k−1

, then we can build the relation between u

and u



using

the afﬁne model as follows:

= H

A (1)

where A =(a, b, c, d, e, f )

is the global motion parameter

and H

isa2× 6 matrix



10 00

000x





. (2)

Assume we have N feature point pairs (u



)(i =

1,...,N; N ≥ 3) from frames I

and I

k−1

, then we have

U = HA (3)

where H =(H

,...,H

)

,U=(u

,...,

)

. Thus A can be obtained by solving (3) using the least

squares method.

The feature point pairs are constructed as follows. We ﬁrst

calculate the global standard variance, gstd, of pixel values in

frame I

. Then we scan frame I

and check each n × n block.

If the standard variance, std, of a block is large enough, to

say, larger than α ∗ gstd, the upper left corner of the block is

selected as u

. u



is obtained by searching nearby blocks in

frame I

k−1

. In this way, we collect all the point pairs between

and I

k−1

Suppose A

∗

is the initial solution for global motion, we

calculate the residual error for each pair (u



)as

= u

− H

∗

. (4)

Since A* is the approximate solution to GM and the motion

of outliers is not consistent with GM, the residual errors of

outliers will be larger than those of inliers. Therefore, we can

separate the point pairs into inliers and outliers according to

their residual errors and then use the inliers to reﬁne A. Let

R = {|r

|,...,|r

|} be the residual error set for point pair set

{(u



),...,(u



)}, R

and R

out

be the residual error set

for inliers and outliers. The key problem is to ﬁnd a appropriate

threshold T such that

= {|r

|||r

| <T},R

out

= {|r

|||r

|≥T }

subject to R



out

= R & R



out

= ∅. (5)

Assume that the mean for R

is µ

, probability is p

|/N; the mean for R

out

is µ

out

, and probability is p

out

剩余13页未读，继续阅读

评论收藏

内容反馈

huangjie817

粉丝: 0
资源: 8

运动员的运动员的自动检测..论文

液位检测研究论文.doc

自动化毕业论文-电气自动化毕业论文.

钢材质量检测论文.doc

入侵检测技术论文.docx

目标检测word论文.rar

视频车辆压线的判定论文.zip资源matlab opencv运动目标检测程序资料

论文研究 - 男女高中越野滑雪者和速滑运动员的营养摄入模式

人工智能论文：基于深度学习的目标检测技术综述.docx

110m栏运动员竞技特征研究论文.doc

opencv的运动目标检测毕业论文.doc

最新电气自动化毕业设计论文题目. (2).pdf

基于单片机温度检测毕业论文.doc

对短跑运动员选材的探讨毕业论文.doc

音频信号检测（方案.芯片.论文）.rar

新闻报道对运动员生涯的影响-论文.zip

基于555定时器的湿度检测...设计与Multisim仿真__自动化工程 555定时器专业论文设计范文.pdf

运动员心率无线检测系统

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

AutoSAR标准协议4.2.2

光伏-储能并网系统仿真.rar

NPPJSONViewer.zip

GD32替换STM32注意事项.pdf

XCP协议的规范文档

VS2015安装证书，JavaScript_ProjectSystem.msi，JavaScript_LanguageService.msi

CANoe通过CAPL脚本实现自动测试

蓝牙BLE协议中文版.pdf

BaiduOCR.zip

最新资源

Qt 5实现串口调试助手（源工程文件、0积分下载）