AHybridMethodforHumanInteractionRecognitionusingSpatio-TemporalInterestPoints资源-CSDN文库

182 浏览量 2021-02-11 08:54:04 上传评论收藏 858KB PDF 举报

这篇研究论文的主题是关于利用空间-时间兴趣点来识别人类交互的方法。在详细介绍这个方法之前，我们先来理解几个核心的概念。 “空间-时间兴趣点（Spatio-Temporal Interest Points，STIPs）”是指在视频序列中那些具有显著变化的点。这些变化可能是由于物体的移动、形状变化或是场景中其他显著动作引起的。空间-时间兴趣点的识别是视频分析中的一个重要环节，尤其是在动作识别和场景理解领域。接下来，“运动上下文（Motion Context，MC）”是一种全局特征，它描述了视频中动作的整体情况，而不是单纯地集中在某一特定的空间-时间兴趣点上。MC可以捕捉到整个视频序列中的动态信息，这对于理解复杂的人类交互至关重要，因为交互往往涉及多个参与者以及他们的动作之间的关系。在机器学习中，“随机森林（Random Forest）”是一种集成学习方法，用于分类、回归等任务。它通过构造多个决策树并将它们的结果结合起来以提高整体模型的预测准确性。随机森林在处理高维度特征数据时特别有效，因为它能够减少过拟合，提高模型泛化能力。 “遗传算法（Genetic Algorithm，GA）”是一种优化算法，受到生物进化论的启发，通过选择、交叉和变异等操作在潜在的解决方案空间中搜索最优解。GA通常用于解决优化问题，可以应用在机器学习模型的训练中，以寻找模型参数的最佳组合。本论文提出了一种新颖有效的人类交互识别的混合方法。这个方法结合了全局特征（MC）和局部空间-时间兴趣点（STIPs）的空间-时间相关性的优势。使用从STIPs衍生出的MC特征来训练随机森林分类器。在这个训练过程中，应用遗传算法以达到可靠性与效率之间的良好平衡。此外，论文还设计了一种基于空间-时间相关性的有效匹配方法来辅助MC特征。在这个方法中，MC的结构被用于计算空间相关性分数，而生物序列匹配算法被用于计算时间相关性分数。这样的组合既考虑了动作的空间布局，也考虑了动作随时间的演变。研究者在UT-Interaction数据集上进行了实验，结果显示基于遗传算法搜索的随机森林分类器和基于空间-时间相关性的匹配方法的性能优于其他流行的机器学习方法。特别是，这两种方法的结合在大部分最先进作品中表现出了更优越的性能。在视频监控、视频检索、人机交互、机器人视觉等与计算机视觉相关的应用领域中，人类动作识别的研究在过去十年中引起了越来越多的关注，并已成为其中的核心重要性。早期的研究通常针对只包含单个人活动的简单数据集进行，而这些基准数据集上的识别率现在可以接近100%。然而，由于人类交互具有更丰富的内在语义和上下文信息，交互识别的准确率相对较低。随着3D重建和姿态估计算法的出现，在本世纪初经常被使用，但目前主流的方法是直接提取2D特征，这种方法更为高效，并且更容易适应各种应用场景。而本文提出的方法不仅创新，而且有效，为这一领域的研究提供了一个新的视角和实践方案。通过将全局和局部特征的优势相结合，并在模型训练和特征匹配阶段运用了优化算法，这一混合方法显著提高了人类交互识别的准确度。这将有助于推动计算机视觉技术在实际应用中的进一步发展和优化。

资源推荐

资源详情

资源评论

A Hybrid Method for Human Interaction Recognition

using Spatio-Temporal Interest Points

Nijun Li, Xu Cheng, Haiyan Guo, Zhenyang Wu

School of Information Science and Engineering

Southeast University, Nanjing, China

{lnjleo, xcheng, haiyan.guo, zhenyang}@seu.edu.cn

Abstract—This paper proposes an innovative and effective

hybrid way to recognize human interactions, which incorporates

the advantages of both global feature (Motion Context, MC) and

Spatio-Temporal (S-T) correlation of local Spatio-Temporal

Interest Points (STIPs). The MC feature, which also derives from

STIPs, is used to train a random forest where Genetic Algorithm

(GA) is applied to the training phase to achieve a good

compromise between reliability and efficiency. Besides, we design

an effective and efficient S-T correlation based match to assist the

MC feature, where MC’s structure and a biological sequence

matching algorithm are employed to calculate the spatial and

temporal correlation score, respectively. Experiments on the UT-

Interaction dataset show that our GA search based random

forest and S-T correlation based match achieve better

performance than some other prevalent machine leaning

methods, and that a combination of those two methods

outperforms most of the state-of-the-art works.

Keywords—spatio-temporal interest points (STIPs); motion

context (MC); random forest; genetic algorithm (GA); spatio-

temporal (S-T) correlation

NTRODUCTION

Human action recognition, which has provoked an

increasing research interest in the past decades, is now of

central importance in many applications related to computer

vision such as video surveillance, video retrieval, human-

computer interactions, robot vision, etc. Early studies in this

area usually experiment on simple datasets which only contain

single-person activities (e.g. Weizmann and KTH), and the

recognition rates on those benchmark datasets could be close to

100% now [1]. However, the recognition rates on human

interactions are relatively low due to their richer inner

semantics and contextual information [2].

3D reconstruction and pose estimation based methods are

often used in early years of this century, but now the prevalent

approach is extracting 2D features directly from video

sequences, among which the Spatio-Temporal Interest Points

(STIPs) [3, 4] have been prevalent in the past decade due to

their simplicity, effectiveness and robustness to cluttered

backgrounds [5]. To exploit STIPs wisely, the next question

should be considered is whether to use descriptors to describe

them. Most researchers will give an affirmative answer to this

question: they describe STIPs by various histograms (e.g.

HOF, HOG, HOG3D [6], 3D-SIFT [7], etc.), then cluster the

STIP descriptors to form an unstructured (Bag-of-Words,

BoW [7]) or structured (vocabulary tree [9]) codebook, and

finally fit them into a supervised (e.g. SVM [8] or neural

network [10]) or unsupervised (e.g. probabilistic Latent

Semantic Analysis, pLSA [11]) leaning framework.

Nevertheless, it is also possible to focus only on the Spatio-

Temporal (S-T) relationships of STIPs. Bregonzio et al. [12]

extract multiple features from “clouds” of STIPs and

successfully use Nearest Neighbor (NN) classifier and SVM to

recognize human actions. Another inspiring example is

“Motion Context (MC)” [11] which is derived from “Shape

Context” [13] for object recognition, capturing the distribution

of STIPs.

Although the approach combining STIPs with BoW and

SVM is well known for its good performance, it has some

obvious shortcomings: (1) BoW uses unstructured local

features whose informative S-T relationships are totally

ignored; (2) SVM is not necessarily the best choice for

discriminative learning machine due to its binary classification

nature and difficulty in determining the kernel function

parameters. To overcome those short-comings, Matikainen et

al. [14] describe human actions by pairwise S-T relationships

whereas Zhang et al. [15] put forward a “Bag of S-T Phrases

(BoP)” model, both taking advantage of the S-T constraints of

STIPs

and achieving promising results. In spite of SVM,

decision tree receives more and more attention [1, 9] for its

merits of multiclass classification and ability to create a

structured codebook. To better deal with noise and enjoy the

benefits of boosting, some works train a series of decision

trees (also called “random forest”) [16, 17] instead of single

tree.

This paper aims at exploring and presenting effective and

efficient methods for human interaction recognition, and the

contributions of our work are as follows.

(1) An innovative hybrid framework which incorporates

both global features and S-T correlation of local features is

proposed to recognize human interactions and achieves

promising results.

(2) Genetic Algorithm (GA) search is integrated into the

training of random forest for the first time, which proves to be

a good compromise between reliability and efficiency.

(3) An efficient scheme to calculate S-T correlation score

between two videos is presented, and such score based match

outperforms both BoW and pLSA (using the same codebook).

2014 22nd International Conference on Pattern Recognition

DOI 10.1109/ICPR.2014.434

2513

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余5页未读，立即下载

评论收藏

内容反馈

weixin_38615397

粉丝: 6
资源: 895

A Hybrid Method for Human Interaction Recognition using Spatio-T...

最新资源

A Hybrid Method for Human Interaction Recognition using Spatio-T...

spatio-attentive-graphs:PyTorch正式实施，用于我们的论文“用于人-物体交互检测的时空注意图”

人在回路机器学习Human-in-the-Loop_Machine_Learning.pdf

用于3D动作识别和检测的基于空时注意力对的LSTM网络

论文Cascaded Human-Object Interaction Recognition代码

04-Human-Computer-Interaction-3rd-Edition-by-ALAN-DIX-JANET-FINLAY-ISBN-0130461091-pdf.zip

Weight-Adapted Convolution Neural Network for Facial Expression Recognition in Human-Robot Interaction

ebtables-iptables interaction on a Linux-based bridge.mht

Brain-Computer Interfaces: Applying our Minds to Human-Computer Interaction (Human-Computer Interaction Series)

Nonlinear interaction decomposition (NID)-A method for separation of cross-frequency coupled sources in human brain.pdf

Auto Hand-3.2 - VR Physics Interaction

A Survey on Perception Methods for Human–Robot Interaction in Social Robots

Human-Robot Interaction

XR-Interaction-Toolkit-Examples-master.zip

Routing Message Interaction Patterns

《Human-Computer Interaction》

人机交互中的眼动跟踪Eye Gaze Tracking for Human Computer Interaction

human-computer interaction an empirical research perspective

Interactive Design: Beyond Human-Computer Interaction

HCI Human Computer Interaction 3rd Edtion

for-interaction-by-interaction:FIBI Meta网站

The Human-Computer Interaction Handbook 2ed[2008].pdf

Decoupling Object Detection from Human-Object Interaction Recogn

最新资源