# TSN
[Temporal segment networks: Towards good practices for deep action recognition](https://link.springer.com/chapter/10.1007/978-3-319-46484-8_2)
<!-- [ALGORITHM] -->
## Abstract
<!-- [ABSTRACT] -->
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Our first contribution is temporal segment network (TSN), a novel framework for video-based action recognition. which is based on the idea of long-range temporal structure modeling. It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video. The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network. Our approach obtains the state-the-of-art performance on the datasets of HMDB51 ( 69.4%) and UCF101 (94.2%). We also visualize the learned ConvNet models, which qualitatively demonstrates the effectiveness of temporal segment network and the proposed good practices.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/34324155/143019237-8823045b-dfa3-45cc-a992-ee83ab9d8459.png" width="800"/>
</div>
## Results and Models
### UCF-101
|config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json|
|:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|[tsn_r50_1x1x3_75e_ucf101_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py) [1] |8| ResNet50 | ImageNet |83.03|96.78|8332| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023-d85ab600.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.json) |
[1] We report the performance on UCF-101 split1.
### Diving48
|config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json|
|:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|[tsn_r50_video_1x1x8_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py)|8| ResNet50 | ImageNet | 71.27 | 95.74 | 5699 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/tsn_r50_video_1x1x8_100e_diving48_rgb_20210426-6dde0185.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log.json)|
|[tsn_r50_video_1x1x16_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py)|8| ResNet50 | ImageNet | 76.75 | 96.95 | 5705 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/tsn_r50_video_1x1x16_100e_diving48_rgb_20210426-63c5f2f7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log.json)|
### HMDB51
|config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json|
|:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|[tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py)|8| ResNet50 | ImageNet | 48.95| 80.19| 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb_20201123-ce6c27ed.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log.json) |
|[tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py) |8| ResNet50 | Kinetics400 | 56.08 | 84.31 | 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb_20201123-7f84701b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log.json) |
|[tsn_r50_1x1x8_50e_hmdb51_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py) |8| ResNet50 | Moments | 54.25 | 83.86| 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/tsn_r50_1x1x8_50e_hmdb51_mit_rgb_20201123-01526d41.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log.json) |
### Kinetics-400
|config | resolution | gpus | backbone|pretrain | top1 acc| top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M)| ckpt | log| json|
|:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|[tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) |340x256|8| ResNet50 | ImageNet|70.60|89.26|x|x|4.3 (25x10 frames)|8344| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json)|
|[tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) |short-side 256|8| ResNet50 | ImageNet|70.42|89.03|x|x|x|8343|[ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth)|[log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log)|[json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json)|
|[tsn_r50_dense_1x1x5_50e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py) |340x256|8x3| ResNet50| ImageNet |70.18|89.10|[69.15](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|[88.56](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|12.7 (8x10 frames)|7028| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/tsn_r50_dense_1x1x5_100e_kinetics400_rgb_20200627-a063165f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log.json)|
|[tsn_r50_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py) |short-side 320|8x2| ResNet50| ImageNet |70.91|89.51|x|x|10.7 (25x3 frames)| 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rg
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
MMAction是一种基于深度学习的视频理解框架,旨在通过自动化视频语义理解来为视频内容提供更高级别的分析和理解。它使用了一系列深度学习技术,包括卷积神经网络、循环神经网络和注意力机制,以从视频中提取信息,例如对象识别、动作识别、物体跟踪等。 MMAction还提供了一些预训练模型,这些模型可以方便地应用于各种视频分析任务,例如行为识别、姿势估计和场景分析。它基于PyTorch深度学习框架,提供了易于使用的API,以便用户可以轻松地构建自己的视频理解模型。框架还支持多任务学习,以便可以同时解决多个相关任务。此外,MMAction还支持分布式训练和推理,以加快模型训练和推断速度。 特别地,MMAction2训练微调模型专门针对视频理解任务进行了优化,以提高视频分类、关键帧检测、行为识别等任务的准确率。这种模型不仅能识别出单个动作,还能识别出连续的多个动作,因此其应用范围非常广泛。例如,它可以应用于智能监控领域,自动识别出监控视频中的人脸、行为和物体;也可以应用于体育比赛,自动识别出运动员的动作、表情和战术,为教练员提供精准的训练建议。 总的来说,MMAction模型是一个功能强大的
资源推荐
资源详情
资源评论
收起资源包目录
MMAction模型从视频中提取信息,例如对象识别、动作识别、物体跟踪 (2000个子文件)
S001C001P001R001A001_rgb.avi 964KB
argparse.bash 3KB
bootstrap.min.css 107KB
bootstrap.min.css 107KB
style.min.css 6KB
style.min.css 6KB
default.css 2KB
default.css 2KB
readthedocs.css 140B
ava_train_v2.1.csv 3KB
action_name.csv 3KB
Unnamed-VIA Project26May2022_16h14m59s_export.csv 3KB
ava_val_excluded_timestamps_v2.1.csv 0B
ava_train_excluded_timestamps_v2.1.csv 0B
ava_val_v2.1.csv 0B
数据集制作流程.docx 1.85MB
ffmpeg.exe 98.92MB
ffprobe.exe 98.83MB
ffplay.exe 98.79MB
mmaction2_overview.gif 1.62MB
spatio-temporal-det.gif 1.24MB
via_subtitle_annotator.html 5.76MB
via_video_annotator.html 4.24MB
ffmpeg-all.html 2.26MB
ffmpeg-all.html 2.14MB
ffprobe-all.html 1.82MB
ffplay-all.html 1.8MB
ffprobe-all.html 1.73MB
ffplay-all.html 1.71MB
ffmpeg-filters.html 1.27MB
via_audio_annotator.html 1.24MB
ffmpeg-filters.html 1.19MB
via_image_annotator.html 520KB
via_pair_annotator.html 423KB
via_video_annotator.html 422KB
via_audio_annotator.html 418KB
via_image_annotator.html 417KB
via_subtitle_annotator.html 292KB
ffmpeg-codecs.html 251KB
ffmpeg-codecs.html 227KB
ffmpeg-formats.html 217KB
ffmpeg-formats.html 213KB
ffmpeg.html 134KB
ffmpeg.html 132KB
general.html 113KB
general.html 112KB
ffmpeg-devices.html 108KB
ffmpeg-devices.html 107KB
ffmpeg-protocols.html 93KB
ffmpeg-protocols.html 92KB
faq.html 61KB
faq.html 61KB
ffmpeg-bitstream-filters.html 49KB
ffprobe.html 47KB
ffprobe.html 47KB
ffmpeg-bitstream-filters.html 45KB
ffmpeg-utils.html 45KB
developer.html 45KB
ffmpeg-utils.html 44KB
developer.html 44KB
ffplay.html 34KB
ffplay.html 34KB
mailing-list-faq.html 30KB
mailing-list-faq.html 30KB
git-howto.html 25KB
git-howto.html 25KB
platform.html 20KB
platform.html 20KB
fate.html 15KB
fate.html 14KB
ffmpeg-resampler.html 13KB
ffmpeg-resampler.html 13KB
nut.html 11KB
nut.html 11KB
ffmpeg-scaler.html 8KB
ffmpeg-scaler.html 8KB
libswresample.html 4KB
libswresample.html 4KB
libswscale.html 3KB
libswscale.html 3KB
libavutil.html 3KB
libavutil.html 3KB
libavcodec.html 3KB
libavcodec.html 3KB
libavformat.html 3KB
libavformat.html 3KB
libavfilter.html 3KB
libavfilter.html 3KB
libavdevice.html 3KB
libavdevice.html 3KB
zhihu_qrcode.jpg 388KB
img_00196.jpg 219KB
img_00197.jpg 218KB
img_00200.jpg 217KB
img_00200.jpg 217KB
img_00194.jpg 217KB
img_00195.jpg 216KB
img_00188.jpg 216KB
img_00170.jpg 215KB
img_00170.jpg 215KB
共 2000 条
- 1
- 2
- 3
- 4
- 5
- 6
- 20
资源评论
yc1111yc
- 粉丝: 22
- 资源: 164
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功