最新《深度学习人体姿态估计》综述论文_人体姿态估计综述资源-CSDN文库

5星 · 超过95%的资源需积分: 46 192 浏览量 2020-12-30 13:40:55 上传评论 2 收藏 13.32MB PDF 举报

人体姿态估计（Human Pose Estimation, HPE）是计算机视觉领域中的一个经典课题，随着技术的发展，它已从传统方法转向深度学习方法，并在过去的十年中获得了大量的关注。人体姿态估计的目标是从图像、视频等输入数据中定位人体部位，并构建人体表征，如人体骨架。这一技术应用广泛，包括人机交互、运动分析、增强现实（AR）和虚拟现实（VR）等多个领域。随着深度学习技术的快速发展，基于深度学习的人体姿态估计解决方案已经取得了非常高的性能。然而，仍存在一些挑战，包括训练数据不足、深度歧义和遮挡等问题。近期发表的这篇综述论文的主要目标是通过系统地分析和比较基于深度学习的2D和3D人体姿态估计方案，基于输入数据和推理过程，提供一个全面的综述。该论文覆盖了自2014年以来的240多篇研究论文，并包括2D和3D人体姿态估计数据集及评价指标。为了更好地理解人体姿态估计在深度学习中的应用和相关知识，我们可以将知识点分为以下几个方面： 1. 人体姿态估计基础人体姿态估计是一种技术，旨在通过分析视觉数据（如照片或视频帧）识别和定位人体的各个部位，包括四肢、头部、躯干等。这些部位的位置信息被用来构建一个称为人体骨架的模型，该模型是一种简化的表示，用于捕捉人体的姿态和动作。 2. 2D与3D姿态估计在人体姿态估计中，存在二维（2D）和三维（3D）两种估计方法。2D姿态估计侧重于从二维图像或视频帧中识别人体部位的位置，而3D姿态估计则更进一步，旨在确定这些部位在三维空间中的真实位置。3D估计需要考虑到人体部位的深度信息，并且通常需要处理比2D估计更为复杂的数学和计算问题。 3. 深度学习在姿态估计中的作用深度学习技术，尤其是卷积神经网络（CNN）和递归神经网络（RNN），已经成为进行高效人体姿态估计的关键。深度学习方法能够从大量的标注数据中自动学习到特征表示，这些特征表示比传统方法中人工设计的特征更加精细和鲁棒。 4. 训练数据的重要性尽管深度学习技术很强大，但它们对于大规模、高质量的训练数据集具有依赖性。在姿态估计中，高质量的标注数据集对于训练有效的深度学习模型至关重要，缺乏这类数据会限制模型性能的提升。 5. 深度歧义和遮挡问题人体姿态估计中一个重要的挑战是深度歧义，即同一姿态在二维图像上可能对应多种三维空间的配置。此外，遮挡问题也不可忽视，当身体的一部分被其他部分或外部物体遮挡时，会对姿态估计的准确性产生影响。 6. 应用领域人体姿态估计技术被广泛应用于包括人机交互、运动分析、增强现实和虚拟现实等领域。例如，在人机交互中，姿态估计可以被用来理解和预测用户的身体动作，使计算机系统能够响应用户的意图。 7. 数据集和评价指标论文中提到了评估人体姿态估计方法性能的一些常用数据集和指标。这些工具为研究人员提供了统一的标准来衡量和比较不同方法的性能，从而推动了领域内技术的进步。 8. 未来研究方向和挑战该综述论文总结了当前的研究成果，并讨论了未来可能的研究方向。它还指出了在深度学习中进行人体姿态估计仍需克服的挑战，比如算法的泛化能力、实时性能，以及对各种不同场景和人群的适应性。人体姿态估计在深度学习的推动下已发展成为一个活跃的研究领域，对各种应用产生了深远的影响。随着技术的不断进步和新挑战的出现，未来的研究人员将有望在这一领域中取得更多创新性的突破。

资源推荐

资源详情

资源评论

Deep Learning-Based Human Pose Estimation:

A Survey

Ce Zheng

∗

, Wenhan Wu

∗

, Taojiannan Yang, Sijie Zhu, Chen Chen, Member, IEEE, Ruixu Liu, Ju

Shen, Senior Member, IEEE, Nasser Kehtarnavaz Fellow, IEEE and Mubarak Shah, Fellow, IEEE

Abstract

—Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from

input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of

applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently

developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to

insufﬁcient training data, depth ambiguities, and occlusions. The goal of this survey paper is to provide a comprehensive review of recent

deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on

their input data and inference procedures. More than 240 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D

human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on

popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are

concluded. We also provide a regularly updated project page on: https://github.com/zczcwh/DL-HPE

Index Terms—Survey of human pose estimation, 2D and 3D pose estimation, deep learning-based pose estimation, pose estimation

datasets, pose estimation metrics

1 INTRODUCTION

UMAN pose estimation (HPE), which has been exten-

sively studied in computer vision literature, involves

estimating the conﬁguration of human body parts from

input data captured by sensors, in particular images and

videos. HPE provides geometric and motion information

of the human body which has been applied to a wide

range of applications (e.g., human-computer interaction,

motion analysis, augmented reality (AR), virtual reality

(VR), healthcare, etc.). With the rapid development of deep

learning solutions in recent years, such solutions have been

shown to outperform classical computer vision methods in

various tasks including image classiﬁcation [1], semantic seg-

mentation [2], and object detection [3]. Signiﬁcant progress

and remarkable performance have already been made by

employing deep learning techniques in HPE tasks. However,

challenges such as occlusion, insufﬁcient training data, and

depth ambiguity still pose difﬁculties to be overcome. 2D

HPE from images and videos with 2D pose annotations is

easily achievable and high performance has been reached

for the human pose estimation of a single person using deep

learning techniques. More recently, attention has been paid

•

∗

The ﬁrst two authors are contributed equally.

•

C. Zheng, W. Wu, T. Yang, S. Zhu and C. Chen are with the Department

of Electrical and Computer Engineering, University of North Carolina at

Charlotte, Charlotte, NC 28223.

E-mail: {czheng6, wwu25, tyang30, szhu3, chen.chen}@uncc.edu

•

R. Liu and J. Shen are with the Department of Computer Science,

University of Dayton, Dayton, OH 45469.

E-mail: {liur05, jshen1}@udayton.edu

•

N. Kehtarnavaz is with the Department of Electrical and Computer

Engineering, University of Texas at Dallas, Richardson, TX 75080.

E-mail: kehtar@utdallas.edu

•

M. Shah is with the Center for Research in Computer Vision, University

of Central Florida, Orlando, FL 32816.

E-mail: shah@crcv.ucf.edu

to highly occluded multi-person HPE in complex scenes. In

contrast, for 3D HPE, obtaining accurate 3D pose annotations

is much more difﬁcult than its 2D counterpart. Motion

capture systems can collect 3D pose annotation in controlled

lab environments; however, they have limitations for in-the-

wild environments. For 3D HPE from monocular RGB images

and videos, the main challenge is depth ambiguities. In multi-

view settings, viewpoints association is the key issue that

needs to be addressed. Some works have utilized sensors

such as depth sensor, inertial measurement units (IMUs), and

radio frequency devices, but these approaches are usually

not cost-effective and require special purpose hardware.

Given the rapid progress in HPE research, this article

attempts to track recent advances and summarize their

achievements in order to provide a clear picture of current

research on deep learning-based 2D and 3D HPE.

1.1 Previous surveys and our contributions

Table 1 lists the related surveys and reviews previously

reported on HPE. Among them, [4] [5] [6] [7] focus on the gen-

eral ﬁeld of visual-based human motion capture methods and

their implementations including pose estimation, tracking,

and action recognition. Therefore, pose estimation is only one

of the topics covered in these surveys. The research works on

3D human pose estimation before 2012 are reviewed in [8].

The body parts parsing-based methods for single-view and

multi-view HPE are reported in [9]. These surveys published

during 2001-2015 mainly focus on conventional methods

without deep learning. A survey on both traditional and

deep learning-based methods related to HPE is presented

in [10]. However, only a handful of deep learning-based

approaches are included. The survey in [11] covers 3D HPE

methods with RGB inputs. The survey in [13] only reviews

arXiv:2012.13392v1 [cs.CV] 24 Dec 2020

TABLE 1: Previous reviews and surveys on HPE.

Title Year Venue Brief Description

A survey of computer vision-based human motion capture [4] 2001 CVIU

A survey on human motion capture, including initialization, tracking, pose estimation,

and recognition

A survey of advances in vision-based human motion capture and analysis [5] 2006 CVIU

A survey following [4] for human motion capture, summarizing the human motion capture

methods from 2000 to 2006

Vision-based human motion analysis: An overview [6] 2007 CVIU An overview of vision-based human motion methods and analysis based on markerless data

Advances in view-invariant human motion analysis: A review [7] 2010 TSMCS

A review of human motion methods based on human detection, view-invariant pose

representation and estimation, and behavior understanding

Human pose estimation and activity recognition from multi-view videos:

Comparative explorations of recent developments [8]

2012 JSTSP A study of 3D human pose estimation and activity recognition based on multi-view approaches

A survey of human pose estimation: the body parts parsing based methods [9] 2015 JVCIR

A survey of body parts parsing-based methods for human pose estimation under single-view

and multiple-view settings with different input sources (images, videos, and depth)

Human pose estimation from monocular images: A comprehensive survey [10] 2016 Sensors A review of traditional and deep learning-based human pose estimation until 2016

3d human pose estimation: A review of the literature and analysis of covariates [11] 2016 CVIU A review on 3D human pose estimation from RGB images and video sequences

Monocular human pose estimation: a survey of deep learning-based methods [12] 2020 CVIU A survey of monocular human pose estimation using deep learning-based approaches

The progress of human pose estimation: a survey and taxonomy of models

applied in 2D human pose estimation [13]

2020 IEEE Access A summary of 2D human pose estimation methods and models until 2020

§ 3 2D HPE

§ 3.1 Single Person § 3.2 Multi-Person

Regression

Body Part

Detection

Top-down Bottom-up

§ 4 3D HPE

§ 4.1 Monocular RGB Images and Videos

§ 4.2 Other Sources

IMUs

Depth Sensors

Point Clouds

Radio Frequency

Device

Single View

Multi-View

Single Person

Multi-Person

Top-Down

Bottom-Up

Model-BasedModel-Free

§ 5 Datasets &

Evaluation

§ 6

Applications

Fig. 1: Taxonomy of this survey.

2D HPE methods and analyzes model interpretation. The

survey in [12] summarizes the monocular HPE from the

classical to more recent deep learning-based methods (till

2019). However, it only covers 2D HPE and 3D single-view

HPE from monocular images and videos. Also, no extensive

performance comparison is given.

This survey aims to address the shortcomings of the

previous surveys in terms of providing a systematic review

of the recent deep learning-based solutions to 2D and 3D

HPE but also covering other aspects of HPE including the

performance evaluation of (2D and 3D) HPE methods on

popular datasets, their applications, and comprehensive

discussion. The key points that distinguish this survey from

the previous ones are as follows:

• A comprehensive review of recent deep learning-based

2D and 3D HPE methods (up to 2020) is provided by

categorizing them according to 2D or 3D scenario, single-

view or multi-view, from monocular images/videos or

other sources, and learning paradigm.

•

Extensive performance evaluation of 2D and 3D HPE

methods. We summarize and compare reported per-

formances of promising methods on common datasets

based on their categories. The comparison of results

provides cues for the strength and weakness of differ-

ent methods, revealing the research trends and future

directions of HPE.

•

An overview of a wide range of HPE applications, such

as gaming, surveillance, AR/VR, and healthcare.

•

An insightful discussion of 2D and 3D HPE is presented

in terms of key challenges in HPE pointing to potential

future research towards improving performance.

These contributions make our survey more comprehen-

sive, up-to-date, and in-depth than previous survey papers.

1.2 Organization

In the following sections, we will cover various aspects of

recent advances in HPE with deep learning.

We ﬁrst overview the human body modeling techniques

2. Then, HPE is divided into two main categories: 2D

HPE (

3) and 3D HPE (

4). Fig. 1 shows the taxonomy of

deep learning methods for HPE. According to the number of

people, 2D HPE methods are categorized into single-person

and multi-person settings. For single-person methods (

3.1),

there are two categories of deep learning-based methods:

(1) regression methods, which directly build a mapping

from input images to body joint coordinates by employing

deep learning-based regressors; (2) body part detection

methods, which consist of two steps: the ﬁrst step involves

generating heatmaps of keypoints (i.e., joints) for body part

localization, and the second step involves assembling these

detected keypoints into whole body pose or skeleton. For

multi-person methods (

3.2), there are also two types of

deep learning-based methods: (1) top-down methods, which

construct human body poses by detecting the people ﬁrst

and then utilizing single-person HPE to predict the keypoints

for each person; (2) bottom-up methods, which ﬁrst detect

body keypoints without knowing the number of people, then

group the keypoints into individual poses.

3D HPE methods are classiﬁed according to the input

source types: monocular RGB images and videos (

4.1), or

other sensors (e.g., inertial measurement unit sensors,

4.2).

The majority of these methods use monocular RGB images

and videos, and they are further divided into single-view and

multi-view methods. Single-view methods are then separated

by single-person versus multi-person. Multi-view settings are

deployed mainly for multi-person pose estimation. Hence,

single-person or multi-person is not speciﬁed in this category.

Next, depending on the 2D and 3D HPE pipelines,

the datasets and evaluation metrics commonly used are

summarized followed by a comparison of results of the

promising methods (

5). In addition, various applications

of HPE such as AR/VR are mentioned (

6). The paper

ends by discussion of the challenges of HPE such as model

generalization and object occlusion, as well as future research

directions such as domain adaptation, human interaction

with scenes, and adversarial robustness of HPE (§ 7).

2 HUMAN BODY MODELING

Human body modeling is an important aspect of HPE in

order to represent keypoints and features extracted from

input data. For example, most HPE methods use an

-joints

rigid kinematic model. A human body is a sophisticated

entity with joints and limbs, and contains body kinematic

structure and body shape information. In typical methods,

a model-based approach is employed to describe and infer

human body pose, and render 2D and 3D poses. There are

typically three types of models for human body modeling,

i.e., kinematic model (used for 2D/3D HPE), planar model

(used for 2D HPE) and volumetric model (used for 3D HPE),

as shown in Fig. 2. In the following sections, a description of

these models is provided covering different representations.

(a) Kinematic (b) Planar (c) Volumetric

Fig. 2: Three types of models for human body modeling.

2.1 Kinematic model

The kinematic model, also called skeleton-based model

[12] or kinematic chain model [14], as shown in Fig. 2 (a),

includes a set of joint positions and the limb orientations to

represent the human body structure. The pictorial structure

model (PSM) [15] is a widely used graph model, which is

also known as the tree-structured model. This ﬂexible and

intuitive human body model is successfully utilized in 2D

HPE [16] [17] and 3D HPE [18] [19]. Although the kinematic

model has the advantage of ﬂexible graph-representation, it

is limited in representing texture and shape information.

2.2 Planar model

Other than the kinematic model to capture the relations

between different body parts, the planar model is used

to represent the shape and appearance of a human body

as shown in Fig. 2 (b). In the planar model, body parts

are usually represented by rectangles approximating the

human body contours. One example is the cardboard model

[20], which is composed of body part rectangular shapes

representing the limbs of a person. One of the early works

[21] used the cardboard model in HPE. Another example

is Active Shape Model (ASM) [22], which is widely used

to capture the full human body graph and the silhouette

deformations using principal component analysis [23] [24].

2.3 Volumetric models

With the increasing interest in 3D human reconstruction,

many human body models have been proposed for a wide

variety of human body shapes. We brieﬂy discuss several

popular 3D human body models used in deep learning-

based 3D HPE methods for recovering 3D human mesh. The

volumetric model representation is depicted in Fig. 2 (c).

SMPL: Skinned Multi-Person Linear

model [25] is

a skinned vertex-based model which represents a broad

range of human body shapes. SMPL can be modeled with

natural pose-dependent deformations exhibiting soft-tissue

dynamics. To learn how people deform with pose, there are

1786 high-resolution 3D scans of different subjects of poses

with template mesh in SMPL to optimize the blend weights

[26], pose-dependent blend shapes, the mean template shape,

and the regressor from vertices to joint locations. SMPL

is easy to deploy and compatible with existing rendering

engines, therefore is widely adopted in 3D HPE methods.

DYNA: Dynamic Human Shape in Motion

[27] model

attempts to represent realistic soft-tissue motions for various

body shapes. Motion related soft-tissue deformation is

approximated by a low-dimensional linear subspace. In order

to predict the low-dimensional linear coefﬁcients of soft-

tissue motion, the velocity and acceleration of the whole body,

the angular velocities and accelerations of the body parts,

and the soft-tissue shape coefﬁcients are used. Moreover,

DYNA leverages body mass index (BMI) to produce different

deformations for people with different shapes.

Stitched Puppet Model

[28] is a part-based graphical

model integrated with a realistic body model. Different 3D

body shapes and pose-dependent shape variations can be

translated to the corresponding graph nodes representation.

Each body part is represented by its own low-dimensional

state space. The body parts are connected via pairwise

potentials between nodes in the graph that ”stitch” the parts

together. In general, part connection via potential functions

is performed by using message passing algorithms such

as Belief Propagation (BP). To solve the problem that the

state space of each part cannot be easily discretized to apply

discrete BP, a max-product BP via a particle-based D-PMP

model [29] is applied.

Frankenstein & Adam: The Frankenstein model

[30]

produces human motion parameters not only for body

motion but also for facial expressions and hand gestures. This

model is generated by blending models of the individual

component meshes: SMPL [25] for the body, FaceWarehouse

[31] for the face, and an artist rigged for the hand. All

transform bones are merged into a single skeletal hierarchy

while the native parameterization of each component is kept

to express identity and motion variations.

The Adam model

[30] is optimized by the Frankenstein model using a large-

scale capture of people’s clothes. With the ability to express

human hair and clothing geometry, Adam is more suitable

to represent human under real-world conditions.

GHUM & GHUML(ite): A fully trainable end-to-end

deep learning pipeline

is proposed in [32] to model sta-

tistical and articulated 3D human body shape and pose.

GHUM is the moderate resolution version and GHUML is

the low resolution version. GHUM and GHUML are trained

by high resolution full-body scans (over 60,000 diverse

human conﬁgurations in their dataset) in a deep variational

auto-encoder framework. They are able to infer a host of

components such as non-linear shape spaces, pose-space

deformation correctives, skeleton joint center estimators, and

blend skinning function [26].

3 2D HUMAN POSE ESTIMATION

2D HPE methods estimate the 2D position or spatial location

of human body keypoints from images or videos. Traditional

2D HPE methods adopt different hand-crafted feature ex-

traction techniques [33] [34] for body parts, and these early

works describe human body as a stick ﬁgure to obtain global

pose structures. Recently, deep learning-based approaches

have achieved a major breakthrough in HPE by improving

the performance signiﬁcantly. In the following, we review

deep learning-based 2D HPE methods with respect to single-

person and multi-person scenarios.

3.1 2D single-person pose estimation

2D single-person pose estimation is used to localize human

body joint positions when the input is a single-person image.

If there are more than one person, the input image is cropped

ﬁrst so that there is only one person in each cropped patch

(or sub-image). This process can be achieved automatically

by an upper-body detector [35] or a full-body detector [3]. In

general, there are two categories for single-person pipelines

that employ deep learning techniques: regression methods

and body part detection methods. Regression methods apply

an end-to-end framework to learn a mapping from the input

image to body joints or parameters of human body models

[36]. The goal of body part detection methods is to predict

approximate locations of body parts and joints [37] [38],

which are normally supervised by heatmaps representation

[39] [40]. Heatmap-based frameworks are now widely used in

2D HPE tasks. The general frameworks of 2D single-person

HPE methods are depicted in Fig. 3.

2D HPE

Network

Input image

2D pose image

Body part heatmaps

Body Part

Association

(b) Body Part Detection Methods

Deep Learning-Based

Pose Regressor

Input image 2D pose imageKeypoints (coordinates)

(a) Regression Methods

Fig. 3: Single-person 2D HPE frameworks. (a) Regression

methods directly learn a mapping (via a deep neural

network) from the original image to the kinematic body

model and produce joint coordinates. (b) Body part detection

methods predict body joint locations using the supervision

of heatmaps.

3.1.1 Regression methods

There are many works based on the regression framework

(e.g., [36] [41] [42] [43] [44] [45] [46] [47] [48] [49]) to

predict joint coordinates from images as shown in Fig. 3

(a). Using AlexNet [1] as the backbone, Toshev and Szegedy

[36] proposed a cascaded deep neural network regressor

named DeepPose to learn keypoints from images. Due

to the impressive performance of DeepPose, the research

paradigm of HPE began to shift from classic approaches to

deep learning, in particular convolutional neural networks

(CNNs). Based on GoogLeNet [50], Carreira et al. [42]

proposed an Iterative Error Feedback (IEF) network, which

is a self-correcting model to progressively change an initial

solution by injecting the prediction error back to the input

space. Sun et al. [43] introduced a structure-aware regression

method called ”compositional pose regression” based on

ResNet-50 [51]. This method adopts a re-parameterized

and bone-based representation that contains human body

information and pose structure, instead of the traditional

joint-based representation. Luvizon et al. [44] proposed an

end-to-end regression approach for HPE using soft-argmax

function to convert feature maps into joint coordinates in a

fully differentiable framework.

A good feature that encodes rich pose information is

critical for regression-based methods. One popular strategy

to learn better feature representation is multi-task learning

[52]. By sharing representations between related tasks (e.g.,

pose estimation and pose-based action recognition), the

model can generalize better on the original task (pose

estimation). Following this direction, Li et al. [46] proposed a

heterogeneous multi-task framework that consists of two

tasks: predicting joints coordinates from full images by

building a regressor and detecting body parts from image

patches using a sliding window. Fan et al. [47] proposed

a Dual-Source (i.e., image patches and full images) Deep

Convolutional Neural Network (DS-CNN) for two tasks:

joint detection which determines whether a patch contains

a body joint, and joint localization which ﬁnds the exact

location of the joint in the patch. Each task corresponds to

a loss function, and the combination of two tasks leads to

improved results. Luvizon et al. [48] learned a multi-task

network to jointly handle 2D/3D pose estimation and action

recognition from video sequences.

3.1.2 Body part detection methods

Body part detection methods for HPE aim to train a body

part detector to predict the positions of body joints. Recent

detection methods tackle pose estimation as a heatmap

prediction problem. Concretely, the goal is to estimate

heatmaps

, H

, ..., H

}

for a total of

keypoints. The

pixel value

(x, y)

in each keypoint heatmap indicates the

probability that the keypoint lies in the position

(x, y)

(see

Fig. 3 (b)). The target (or ground-truth) heatmap is generated

by a 2D Gaussian centered at the ground-truth joint location

[39] [53]. Thus pose estimation networks are trained by

minimizing the discrepancy (e.g., the Mean Squared-Error

(MSE)) between the predicted heatmaps and target heatmaps.

Compared with joint coordinates, heatmaps provide

richer supervision information by preserving the spatial

location information to facilitate the training of convolutional

剩余25页未读，继续阅读

评论收藏

内容反馈

行走的瓶子Yolo

2023-07-29

这篇综述论文对最新的深度学习人体姿态估计进行了详尽的探讨，提供了全面的研究视角。
小小二-yan

2023-07-29

作者对标注数据库和评估指标进行了系统的介绍，这对于读者来说是非常重要的背景知识，使得他们能够有更深入的理解。
lowsapkj

2023-07-29

论文详细论述了现有方法的优缺点，对读者有很大的帮助，能够帮助他们在实际应用中做出明智的选择。
WaiyuetFung

2023-07-29

该论文对不同深度学习方法在人体姿态估计领域的应用进行了全面归纳，对于初学者来说非常友好。
丽龙

2023-07-29

作者对该领域的发展趋势给出了中肯而有见地的评论，为读者提供了对未来研究的启发。

syp_net

粉丝: 158
资源: 1184

最新《深度学习人体姿态估计》综述论文

基于卷积神经网络的2D人体姿态估计综述.pdf

基于深度学习的二维人体姿态估计综述.pdf

基于深度学习的人体解析研究综述.pdf

深度学习的二维人体姿态估计综述.pdf

人体姿态估计的强大算法

人体姿态估计论文（open pose)

Python-2019深度学习人体姿态估计指南

基于深度学习的二维人体姿态估计研究进展.pdf

人体姿态估计有关的论文-2021年-汇总

基于深度学习的人体姿态估计修正.pdf

多人人体姿态估计-基于Pytorch实现的HRNet多人人体姿态估计算法-附项目源码-优质项目实战.zip

基于YOLOv7的人体姿态估计讲解及源码

基于卷积神经网络的人体姿态估计算法综述.pdf

深度学习实时多人姿态估计与跟踪.pdf

人体姿态估计-基于Pytorch+MaskRCNN实现人体姿态估计算法-附项目源码-优质项目实战.zip

基于深度学习的人体动作识别综述.pdf

基于深度学习的实时人体姿态估计系统.caj

利用稀疏编码结合深度学习的人体姿态估计.pdf

人体姿态识别

计算机视觉中头部姿态估计的研究综述Head Pose Estimation in Computer Vision: A Survey（中文）word

计算机视觉中头部姿态估计的研究综述Head Pose Estimation in Computer Vision: A Survey（中文）

基于深度学习的三维人体姿态估计技术.pdf

人体姿态检测（深度学习）

最新资源