ClassificationusingbothInter-&Intra-ChannelParallelConvolutions.pdf资源-CSDN文库

版权申诉

167 浏览量 2021-03-19 14:26:35 上传评论收藏 208KB PDF 举报

资源推荐

资源详情

资源评论

Convolutional Neural Networks for Multivariate Time Series Classiﬁcation using

both Inter- & Intra- Channel Parallel Convolutions

G. Devineau

W. Xi

F. Moutarde

J. Yang

MINES ParisTech, PSL Research University, Center for Robotics, Paris, France

Shanghai Jiao Tong University, School of Electronic Information and Electrical Engineering, China

{guillaume.devineau, wang.xi, fabien.moutarde}@mines-paristech.fr

Abstract

In this paper, we study a convolutional neural network we

recently introduced in [9], intended to recognize 3D hand

gestures via multivariate time series classiﬁcation.

The Convolutional Neural Network (CNN) we propo-

sed processes sequences of hand-skeletal joints’ positions

using parallel convolutions. We justify the model’s ar-

chitecture and investigate its performance on hand ges-

ture sequence classiﬁcation tasks. Our model only uses

hand-skeletal data and no depth image. Experimental re-

sults show that our approach achieves a state-of-the-art

performance on a challenging dataset (DHG dataset from

the SHREC 2017 3D Shape Retrieval Contest).Our model

achieves a 91.28% classiﬁcation accuracy for the 14 ges-

ture classes case and an 84.35% classiﬁcation accuracy for

the 28 gesture classes case.

1 Introduction

Gesture is a natural way for a user to interact with one’s

environment. One preferred way to infer the intent of a

gesture is to use a taxonomy of gestures and to classify

the unknown gesture into one of the existing categories ba-

sed on the gesture data, e.g. using a neural network to per-

form the classiﬁcation. In this paper we present and study a

convolutional neural network architecture relying on intra-

and inter- parallel processing of sequences of hand-skeletal

joints’ positions to classify complete hand gestures. Where

most existing deep learning approaches to gesture recog-

nition use RGB-D image sequences to classify gestures

[41], our neural network only uses hand (3D) skeletal data

sequences which are quicker to process than image se-

quences. The rest of this paper is structured as follows. We

ﬁrst review common recognition methods in Section II. We

then present the DHG dataset we used to evaluate our net-

work in Section III. We detail our approach in Section IV in

terms of motivations, architecture and results. Finally, we

conclude in Section VI and discuss how our model can be

improved and integrated into a realtime interactive system.

Note that the contents of this paper are highly similar to

that of [9], especially sections 1, 2 and 3, as well as the ﬁ-

gure illustrating the network, however in this article we fo-

cus more on practical tips and on justifying the network ar-

chitecture whereas the original paper focus was more cen-

tered on gesture-related aspects. Readers familiar with [9]

can directly skip to the subsection Architecture Tuning of

section IV, in which the network architecture is justiﬁed

more thoroughly.

2 Deﬁnition & Related Work

We deﬁne a 3D skeletal data sequence s as a vector s =

· · · p

)

whose components p

are multivariate time se-

quences. Each component p

= (p

(t))

t∈N

represents a mul-

tivariate sequence with three (univariate sequences) com-

ponents p

= (x

(i)

) that alltogether represent a time

sequence of the positions p

(t) of the i-th skeletal joint j

Every skeletal joint j

represents a distinct and precise arti-

culation or part of one’s hand in the physical world.

In the following subsections, we present a short review

of some approaches to gesture recognition. Typical ap-

proaches to hand gesture recognition begin with the ex-

traction of spatial and temporal features from raw data.

The features are later classiﬁed by a Machine Learning

algorithm. The feature extraction step can either be ex-

plicit, using hand-crafted features known to be useful for

classiﬁcation, or implicit, using (machine) learned features

that describe the data without requiring human labor or ex-

pert knowledge. Deep Learning algorithms leverage such

learned features to obtain hierarchical representations (fea-

tures) that often describe the data better than hand-crafted

features. As we work on skeletal data only, with a deep-

learning perspective, this review pays limited attention to

non deep-learning based approaches and to depth-based

approaches ; a survey on the former approaches can be

found in [19] while several recent surveys on the latter ap-

proaches are listed in Neverova’s thesis [21].

2.1 Non-deep-learning methods using hand-

crafted features

Various hand-crafted representations of skeletal data can

be used for classiﬁcation. These representations often des-

cribe physical attributes and constraints, or easily interpre-

table properties and correlations of the data, with an em-

phasis on geometric features and statistical features. Some

commonly used features are the positions of the skele-

tal joints, the orientation of the joints, the distance bet-

ween joints, the angles between joints, the curvature of the

joints’ trajectories, the presence of symmetries in the ske-

letal, and more generally other features that involve a hu-

man interpretable metric calculated from the skeletal data

[15, 16, 33]. For instance, in [37], Vemulapalli et al. pro-

pose a human skeletal representation within the Lie group

SE(3) × ... × SE(3), based on the idea that rigid body ro-

tations and translations in 3D space are members of the

Special Euclidean group SE(3). Human actions are then

viewed as curves in this manifold. Recognition (classiﬁca-

tion) is ﬁnally performed in the corresponding Lie algebra.

In [8], Devanne et al. represent skeletal joints’ sequences

as trajectories in a n-dimensional space ; the trajectories

of the joints are then interpreted in a Riemannian mani-

fold. Similarities between the shape of trajectories in this

shape space are then calculated with k-Nearest Neighbor

(k-NN) to achieve the sequence classiﬁcation. In [7], two

approaches for gesture recognition -on the DHG dataset

presented in the next section- are presented. The ﬁrst one,

proposed by Guerry et al., is a deep-learning method pre-

sented in the next subsection. The second one, proposed by

De Smedt et al., uses three hand-crafted descriptors : Shape

of Connected Joints (SoCJ), Histogram of Hand Directions

(HoHD) and Histogram of Wrist Rotations (HoWR), as

well as Fisher Vectors (FV) for the ﬁnal representation.

Regardless of the features used, hand-crafted features are

always fed into a classiﬁer to perform the gesture recog-

nition. In [5], CIPPITELLI et al. use a multi-class Sup-

port Vector Machine (SVM) for the ﬁnal classiﬁcation

of activity features based on posture features. Other very

frequently used classiﬁers [40] are Hidden Markov mo-

dels (HMM), Conditional Random Fields (CRF), discrete

distance-based methods, Naive Bayes, and even simple k-

Nearest Neighbors (k-NN) with Dynamic Time Warping

(DTW) discrepancy.

2.2 Deep-Learning based methods

Deep Learning, also known as Hierarchical Learning, is a

subclass of Machine Learning where algorithms f use a

cascade of non-linear computational units f

(layers), e.g.

using convolutions, for feature extraction and transforma-

tion : f = f

◦ f

◦ ... ◦ f

.A traditional Convolutional Neu-

ral Network (CNN, or ConvNet) model almost always in-

volves a sequence of convolution and pooling layers, that

are followed by dense layers. Convolution and pooling

layers serve as feature extractors, whereas the dense layers,

also called Multi Layer Perceptron (MLP), can be seen

as a classiﬁer. A strategy to mix deep-learning algorithms

and (hand) gesture recognition consists in training convo-

lutional neural networks [18] on RGB-D images. A di-

rect example of hand gesture recognition via image CNNs

can be found in the works of Strezoski et al. [32] where

CNNs are simply applied on the RGB images of sequences

to classify. Guerry et al. [7] propose a deep-learning ap-

proach for hand gesture recognition on the DHG dataset,

which is described in section III of this paper. The Guerry

et al. approach consists in concatenating the Red, Green,

Blue and Depth channels of each RGB-D image. An al-

ready pretrained VGG [29] image classiﬁcation model is

then applied on sequences of 5 concatenated images conse-

cutive in time. In [20], Molchanov et al. introduce a CNN

architecture for RGB-D images where the classiﬁer is made

of two CNN networks (a high-resolution network and a

low-resolution network) whose class-membership outputs

are fused with an element-wise multiplication. Neverova et

al. carry out a gesture classiﬁcation task on multi-modal

data (RGB-D images, audio streams and skeletal data) in

[22, 23]. Each modality is processed independently with

convolution layers at ﬁrst, and then merged. To avoid mea-

ningless co-adaptation of modalities a multi-modal dropout

(ModDrop) is introduced. Nevertheless, these approaches

use depth information where we only want to use ske-

letal data. In [38], Wang et al. color-code the joints of

a 3D skeleton across time. The colored (3D) trajectories

are projected on 2D planes in order to obtain images that

serve as inputs of CNNs. Each CNN emits a gesture class-

membership probability. Finally, a class score (probability)

is obtained by the fusion of the CNNs scores.

Recurrent Neural Networks (RNN), e.g. networks that use

Long Short-Term Memory (LSTM) [12] or Gated Recur-

rent Units (GRU) [4], have long been considered as the

best way to achieve state-of-the-art results when working

with neural networks on sequences like time series. Re-

cently, the emergence of new neural networks architectures

that use convolutions or attention mechanisms [35, 36] ra-

ther than recurrent cells has challenged this assumption,

given that RNNs present some signiﬁcant issues such as

being sensitive to the ﬁrst examples seen, having complex

dynamics that can lead to chaotic behavior [17] or being

models that are intrinsically sequential, which means that

their internal state computations are difﬁcult to parallelize,

to name only a few of their issues. In [30], Song et al. ele-

gantly combine the use of an LSTM-based neural network

for human action recognition from skeleton data with a

spatio-temporal attention mechanism. While this approach

seems promising, we rather seek to ﬁnd a convolution-only

architecture rather than a recurrent one.

Zheng et al. propose a convolution-based architecture that

does not involve recurrent cells in [42], although this ar-

chitecture can easily be extended with recurrent cells :

[25]. Zheng et al. introduce a general framework (Multi-

Channels Deep Convolution Neural Networks, or MC-

DCNN) for multivariate sequences classiﬁcation. In MC-

DCNN, multivariate time series are seen as multiple univa-

riate time series ; as such, the neural network input consist

of several 1D time series sequences. The feature learning

step is executed on every univariate sequence individually.

The respective learned features are later concatenated and

merged using a classic MLP placed at the end of the fea-

ture extraction layers to perform classiﬁcation. The major

剩余7页未读，继续阅读

评论收藏

内容反馈

版权申诉

Fun_He

粉丝: 19
资源: 104

Classification using both Inter- & Intra- Channel Parallel Convo...

最新资源

Classification using both Inter- & Intra- Channel Parallel Convo...

Kidney-Tumor-classification-DL-using-MLFlow-CICD-源码.zip

机器学习分类模型 Introduction-to-ML-Classification-Models-using-scikit-learn-master.zip

Chinese-Text-Classification-Pytorch-mas

Bert-Chinese-Text-Classification-Pytorch-master.zip.zip

SVM-classification-localization-master.zip_PSO-SVM_PSO检测_SVM_SVM

【6】Going deeper with convolutions.pdf

Augmented and Virtual Reality，增强现实和虚拟现实课件，COMP3025J

【船级社】 LR Rules for the Classification of Trimarans 2020-07.pdf

Detection and classification of peer-to-peer traffic - A survey.pdf

texture-classification-based-on-BPNN-and-dictionary-master.zip

Image-Classification-in-MATLAB-Using-TensorFlow

video-classification-3d-cnn-pytorch-master.zip

PyTorch-Image-Models-Multi-Label-Classification-main.zip

image-classification-cervical-cancer-master.zip

imagenet-classification-with-deep-convolutional-neural-networks原版和翻译..rar

pratikparmar-Gesture-Based-Mouse-using-MATLAB-.zip

ECG-arrhythmia-classification-using-a-2-D-convolutional-neural-network.:使用MITDB数据集预测心律失常

Chinese-Text-Classification-Pytorch-master.zip

Deep Imbalanced Attribute Classification using Visual Attention Aggregation

python-sphinx-feature-classification-doc-0.4.1-1.el8.noarch.rpm

python-sphinx-feature-classification-doc-0.4.1-1.el7.noarch.rpm

python-sphinx-feature-classification-doc-0.3.2-1.el7.noarch.rpm

python-sphinx-feature-classification-doc-0.3.0-1.el7.noarch.rpm

apache-atlas-2.1.0-classification-updater.zip

Gait feature extraction and gait classification using two-branch CNN

【船级社】 DNV Rules for classification Ships RU-SHIP 2023-07.pdf

最新资源