【免费】基于注意力机制的人体关键点隐式建模网络资源-CSDN文库

共131个文件

pyc：49个

py：39个

log：10个

需积分: 0 85 浏览量更新于2023-04-18 收藏 380.36MB ZIP 举报

标题中的“基于注意力机制的人体关键点隐式建模网络”指的是在计算机视觉领域中，利用深度学习技术，特别是注意力机制来实现对人体关键点的精确识别和定位的一种方法。这种方法通常涉及神经网络模型，它能够从图像中提取特征，并通过自注意力机制关注图像中与人体关键点相关的重要区域，从而提高定位的准确性。描述虽然简洁，但可以推测其核心内容可能涉及到以下几个方面： 1. **注意力机制**：注意力机制是深度学习中的一种策略，它允许模型在处理输入时聚焦于关键部分，而忽略不重要的信息。在人体关键点检测中，这可能意味着模型会自动学习关注那些指示关节位置的特征，如边缘、形状或颜色变化。 2. **人体关键点检测**：这是计算机视觉的一个重要任务，旨在识别图像中的人体部位，如头部、肩部、肘部、手腕等。这项技术广泛应用于动作识别、姿势估计、虚拟现实等领域。 3. **隐式建模**：这里的“隐式建模”可能是指模型不是直接预测关键点的位置，而是通过学习图像中的高级表示，间接推断出关键点的存在和位置。这种方法可能比直接定位更灵活，适应性更强。 4. **网络结构**：考虑到标签中提到了“网络”，这可能指的是深度神经网络（如卷积神经网络CNN或Transformer）的设计，它们是实现这种复杂任务的关键。网络可能包括多个层次，用于逐步从原始像素数据中提取信息，然后通过注意力机制集中处理。 5. **相关文件**：提供的压缩包文件列表中，`demo.ipynb`可能是一个Jupyter Notebook，用于展示或演示该网络的使用；`mpii.py`可能包含了处理MPII人体关键点数据集的代码；`requirements.txt`列出了项目所需的Python库和依赖；`lib`可能是一个包含辅助函数和工具的模块；其他文件如`vismpii.py`和`visualize.py`可能用于可视化结果，帮助理解和调试模型。 6. **实际应用**：此技术可以应用于体育分析、健康监控、人机交互、智能安全等领域，通过理解人的姿势和动作，提供智能化的服务和决策。 7. **训练与优化**：模型的训练通常需要大量的标注数据，如MPII数据集，通过反向传播和优化算法（如Adam）来调整权重，以最小化预测关键点与真实位置的误差。这个项目可能涉及到深度学习模型的构建、训练，以及一种特定的注意力机制来优化人体关键点检测的性能。通过使用各种Python脚本和库，研究者可能已经实现了一个能够从图像中准确提取和定位人体关键点的系统。

收起资源包目录

基于注意力机制的人体关键点隐式建模网络（131个子文件）

cpu_nms.c 287KB

gpu_nms.cpp 264KB

gpu_nms.cu 281KB

nms_kernel.cu 5KB

.gitignore 2KB

.gitignore 184B

gpu_nms.hpp 146B

TransPose-main.iml 495B

demo.ipynb 1.4MB

LICENSE 1KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-14-06_train.log 7KB

TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1_2023-03-20-21-21_train.log 3KB

TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1_2023-03-20-21-20_train.log 3KB

TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1_2023-03-21-13-57_train.log 3KB

TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1_2023-03-21-13-59_train.log 3KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-13-59_train.log 2KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-13-54_train.log 2KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-14-16_train.log 2KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-16-16_train.log 2KB

TP_R_256x192_d256_h1024_enc4_mh8_2023-03-21-13-57_train.log 2KB

Makefile 116B

README.md 9KB

vis3_MPII.png 541KB

img3_MPII.png 539KB

hrnet_w48-8ef0771d.pth 264.42MB

resnet50-19c8e357.pth 97.75MB

resnet18-5c106cde.pth 44.66MB

transpose_h.py 34KB

transpose_r.py 24KB

coco.py 15KB

function_mpii.py 11KB

vis.py 11KB

JointsDataset.py 10KB

function.py 9KB

jit_handles.py 9KB

vismpii.py 9KB

visualize.py 9KB

traincoco.py 8KB

train.py 8KB

graph.py 7KB

utils.py 7KB

mpii.py 7KB

flop_count.py 6KB

setup_linux.py 5KB

nms.py 5KB

test.py 5KB

hubconf.py 5KB

default.py 4KB

inference.py 4KB

transforms.py 4KB

mpii.py 3KB

loss.py 3KB

compute_flops.py 3KB

evaluate.py 2KB

zipreader.py 2KB

models.py 2KB

模型结构.py 2KB

early_stopping.py 2KB

标注展示.py 2KB

cpu_nms.py 1KB

_init_paths.py 739B

__init__.py 598B

__init__.py 456B

__init__.py 369B

__init__.py 0B

transpose_h.cpython-37.pyc 22KB

transpose_h.cpython-36.pyc 20KB

transpose_r.cpython-37.pyc 17KB

transpose_r.cpython-36.pyc 13KB

coco.cpython-36.pyc 12KB

coco.cpython-37.pyc 12KB

function_mpii.cpython-37.pyc 7KB

visualize.cpython-37.pyc 7KB

JointsDataset.cpython-36.pyc 7KB

JointsDataset.cpython-37.pyc 7KB

vismpii.cpython-37.pyc 7KB

vis.cpython-36.pyc 6KB

vis.cpython-37.pyc 6KB

function.cpython-37.pyc 6KB

function.cpython-36.pyc 6KB

graph.cpython-37.pyc 5KB

nms.cpython-36.pyc 5KB

nms.cpython-37.pyc 5KB

mpii.cpython-37.pyc 5KB

utils.cpython-36.pyc 5KB

utils.cpython-37.pyc 5KB

mpii.cpython-36.pyc 5KB

transforms.cpython-36.pyc 3KB

transforms.cpython-37.pyc 3KB

inference.cpython-36.pyc 3KB

inference.cpython-37.pyc 3KB

default.cpython-37.pyc 3KB

default.cpython-36.pyc 3KB

loss.cpython-36.pyc 3KB

loss.cpython-37.pyc 3KB

early_stopping.cpython-37.pyc 2KB

evaluate.cpython-36.pyc 2KB

evaluate.cpython-37.pyc 2KB

cpu_nms.cpython-36.pyc 1KB

共 131 条

身份认证购VIP最低享 7 折!

30元优惠券

资源推荐

资源预览

资源评论

## Introduction **[TransPose](https://arxiv.org/abs/2012.14214)** is a human pose estimation model based on a CNN feature extractor, a Transformer Encoder, and a prediction head. Given an image, the attention layers built in Transformer can efficiently capture long-range spatial relationships between keypoints and explain what dependencies the predicted keypoints locations highly rely on. ![Architecture](transpose_architecture.png) [[arxiv 2012.14214]](https://arxiv.org/abs/2012.14214) [[paper]](https://arxiv.org/pdf/2012.14214.pdf) [[demo-notebook]](demo.ipynb) > TransPose: Keypoint Localization via Transformer, > [Sen Yang](https://github.com/yangsenius), [Zhibin Quan](https://github.com/SigmaQuan), [Mu Nie](https://github.com/niechuanmu), [Wankou Yang](https://dblp.org/pid/99/3602.html), > ICCV 2021 ## Model Zoo We choose two types of CNNs as the backbone candidates: ResNet and HRNet. The derived convolutional blocks are ResNet-Small, HRNet-Small-W32, and HRNet-Small-W48. | Model | Backbone | #Attention layers | d | h | #Heads | #Params | AP (coco val gt bbox) | Download | | -------------- | ----------- | :---------------: | :--: | :--: | :----: | :-----: | :-------------------: | :------: | | TransPose-R-A3 | ResNet-S | 3 | 256 | 1024 | 8 | 5.2Mb | 73.8 | [model](https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc3_d256_h1024_mh8.pth) | | TransPose-R-A4 | ResNet-S | 4 | 256 | 1024 | 8 | 6.0Mb | 75.1 | [model](https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc4_d256_h1024_mh8.pth) | | TransPose-H-S | HRNet-S-W32 | 4 | 64 | 128 | 1 | 8.0Mb | 76.1 | [model](https://github.com/yangsenius/TransPose/releases/download/Hub/tp_h_32_256x192_enc4_d64_h128_mh1.pth) | | TransPose-H-A4 | HRNet-S-W48 | 4 | 96 | 192 | 1 | 17.3Mb | 77.5 | [model](https://github.com/yangsenius/TransPose/releases/download/Hub/tp_h_48_256x192_enc4_d96_h192_mh1.pth) | | TransPose-H-A6 | HRNet-S-W48 | 6 | 96 | 192 | 1 | 17.5Mb | 78.1 | [model](https://github.com/yangsenius/TransPose/releases/download/Hub/tp_h_48_256x192_enc6_d96_h192_mh1.pth) | ### Quick use Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/satpalsr/TransPose) You can directly load TransPose-R-A4 or TransPose-H-A4 models with pretrained weights on COCO train2017 dataset from Torch Hub, simply by: ```python import torch tpr = torch.hub.load('yangsenius/TransPose:main', 'tpr_a4_256x192', pretrained=True) tph = torch.hub.load('yangsenius/TransPose:main', 'tph_a4_256x192', pretrained=True) ``` ### Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset | Model | Input size | FPS* | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | | :------------: | :--------: | :--: | :----: | ----- | ----- | :----: | :----: | :----: | :---: | :---: | :----: | :----: | :----: | | TransPose-R-A3 | 256x192 | 141 | 8.0 | 0.717 | 0.889 | 0.788 | 0.680 | 0.786 | 0.771 | 0.930 | 0.836 | 0.727 | 0.835 | | TransPose-R-A4 | 256x192 | 138 | 8.9 | 0.726 | 0.891 | 0.799 | 0.688 | 0.798 | 0.780 | 0.931 | 0.845 | 0.735 | 0.844 | | TransPose-H-S | 256x192 | 45 | 10.2 | 0.742 | 0.896 | 0.808 | 0.706 | 0.810 | 0.795 | 0.935 | 0.855 | 0.752 | 0.856 | | TransPose-H-A4 | 256x192 | 41 | 17.5 | 0.753 | 0.900 | 0.818 | 0.717 | 0.821 | 0.803 | 0.939 | 0.861 | 0.761 | 0.865 | | TransPose-H-A6 | 256x192 | 38 | 21.8 | 0.758 | 0.901 | 0.821 | 0.719 | 0.828 | 0.808 | 0.939 | 0.864 | 0.764 | 0.872 | Note: - we computed the average FPS* of testing 100 samples from coco val dataset (with batchsize=1) on a single NVIDIA 2080Ti GPU. The FPS may fluctuate up and down at different tests. - We trained our different models on different hardware platforms: *1 x RTX2080Ti GPUs (TP-R-A4), 4 x TiTan XP GPUs (TP-H-S, TP-H-A4), and 4 x Tesla P40 GPUs (TP-H-A6)*. ### Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset | Model | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | | -------------- | ---------- | ------- | ------ | ----- | ----- | ------ | ------ | ------ | ----- | ----- | ------ | ------ | ------ | | TransPose-H-S | 256x192 | 8.0M | 10.2 | 0.734 | 0.916 | 0.811 | 0.701 | 0.793 | 0.786 | 0.950 | 0.856 | 0.745 | 0.843 | | TransPose-H-A4 | 256x192 | 17.3M | 17.5 | 0.747 | 0.919 | 0.822 | 0.714 | 0.807 | 0.799 | 0.953 | 0.866 | 0.758 | 0.854 | | TransPose-H-A6 | 256x192 | 17.5M | 21.8 | 0.750 | 0.922 | 0.823 | 0.713 | 0.811 | 0.801 | 0.954 | 0.867 | 0.759 | 0.859 | ### Visualization [Jupyter Notebook Demo](demo.ipynb) Given an input image, a pretrained TransPose model, and the predicted locations, we can visualize the spatial dependencies of the predicted locations with threshold for the attention scores. `TransPose-R-A4` with `threshold=0.00` ![example](attention_map_image_dependency_transposer_thres_0.0.jpg) `TransPose-R-A4` with `threshold=0.01` ![](attention_map_image_dependency_transposer_thres_0.01.jpg) `TransPose-H-A4` with `threshold=0.00` ![example](attention_map_image_dependency_transposeh_thres_0.0.jpg) `TransPose-H-A4` with `threshold=0.00075` ![example](attention_map_image_dependency_transposeh_thres_0.00075.jpg) ## Getting started ### Installation 1. Clone this repository, and we'll call the directory that you cloned as ${POSE_ROOT} ```bash git clone https://github.com/yangsenius/TransPose.git ``` 2. Install PyTorch>=1.6 and torchvision>=0.7 from the PyTorch [official website](https://pytorch.org/get-started/locally/) 3. Install package dependencies. Make sure the python environment >=3.7 ```bash pip install -r requirements.txt ``` 4. Make output (training models and files) and log (tensorboard log) directories under ${POSE_ROOT} & Make libs ```bash mkdir output log cd ${POSE_ROOT}/lib make ``` 5. Download pretrained models from the [releases](https://github.com/yangsenius/TransPose/releases) of this repo to the specified directory ```txt ${POSE_ROOT} `-- models `-- pytorch |-- imagenet | |-- hrnet_w32-36af842e.pth | |-- hrnet_w48-8ef0771d.pth | |-- resnet50-19c8e357.pth |-- transpose_coco | |-- tp_r_256x192_enc3_d256_h1024_mh8.pth | |-- tp_r_256x192_enc4_d256_h1024_mh8.pth | |-- tp_h_32_256x192_enc4_d64_h128_mh1.pth | |-- tp_h_48_256x192_enc4_d96_h192_mh1.pth | |-- tp_h_48_256x192_enc6_d96_h192_mh1.pth ``` ### Data Preparation We follow the steps of [HRNet](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch#data-preparation) to prepare the COCO train/val/test dataset and the annotations. The detected person results are downloaded from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing). Please download or link them to ${POSE_ROOT}/data/coco/, and make them look like this: ```txt ${POSE_ROOT}/data/coco/ |-- annotations | |-- person_keypoints_train2017.json | `-- person_keypoints_val2017.json |-- person_detection_results | |-- COCO_val2017_detections_AP_H_56_person.json | `-- COCO_test-dev2017_detections_AP_H_609_person.json `-- images |-- train2017 | |-- 000000000009.jpg | |-- ... `-- val2017 |-- 000000000139.jpg |-- ... ``` ### Traing & Testing #### Testing on COCO val2017 dataset ```bash pyth