MMAction模型从视频中提取信息，例如对象识别、动作识别、物体跟踪

共2000个文件

py：870个

jpg：516个

md：226个

pytorch

需积分: 5 127 浏览量 2024-03-28 18:00:10 上传评论 1 收藏 827.1MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

MMAction模型从视频中提取信息，例如对象识别、动作识别、物体跟踪（2000个子文件）

S001C001P001R001A001_rgb.avi 964KB

argparse.bash 3KB

bootstrap.min.css 107KB

style.min.css 6KB

default.css 2KB

readthedocs.css 140B

ava_train_v2.1.csv 3KB

action_name.csv 3KB

Unnamed-VIA Project26May2022_16h14m59s_export.csv 3KB

ava_val_excluded_timestamps_v2.1.csv 0B

ava_train_excluded_timestamps_v2.1.csv 0B

ava_val_v2.1.csv 0B

数据集制作流程.docx 1.85MB

ffmpeg.exe 98.92MB

ffprobe.exe 98.83MB

ffplay.exe 98.79MB

mmaction2_overview.gif 1.62MB

spatio-temporal-det.gif 1.24MB

via_subtitle_annotator.html 5.76MB

via_video_annotator.html 4.24MB

ffmpeg-all.html 2.26MB

ffmpeg-all.html 2.14MB

ffprobe-all.html 1.82MB

ffplay-all.html 1.8MB

ffprobe-all.html 1.73MB

ffplay-all.html 1.71MB

ffmpeg-filters.html 1.27MB

via_audio_annotator.html 1.24MB

ffmpeg-filters.html 1.19MB

via_image_annotator.html 520KB

via_pair_annotator.html 423KB

via_video_annotator.html 422KB

via_audio_annotator.html 418KB

via_image_annotator.html 417KB

via_subtitle_annotator.html 292KB

ffmpeg-codecs.html 251KB

ffmpeg-codecs.html 227KB

ffmpeg-formats.html 217KB

ffmpeg-formats.html 213KB

ffmpeg.html 134KB

ffmpeg.html 132KB

general.html 113KB

general.html 112KB

ffmpeg-devices.html 108KB

ffmpeg-devices.html 107KB

ffmpeg-protocols.html 93KB

ffmpeg-protocols.html 92KB

faq.html 61KB

ffmpeg-bitstream-filters.html 49KB

ffprobe.html 47KB

ffmpeg-bitstream-filters.html 45KB

ffmpeg-utils.html 45KB

developer.html 45KB

ffmpeg-utils.html 44KB

developer.html 44KB

ffplay.html 34KB

mailing-list-faq.html 30KB

git-howto.html 25KB

platform.html 20KB

fate.html 15KB

fate.html 14KB

ffmpeg-resampler.html 13KB

nut.html 11KB

ffmpeg-scaler.html 8KB

libswresample.html 4KB

libswscale.html 3KB

libavutil.html 3KB

libavcodec.html 3KB

libavformat.html 3KB

libavfilter.html 3KB

libavdevice.html 3KB

zhihu_qrcode.jpg 388KB

img_00196.jpg 219KB

img_00197.jpg 218KB

img_00200.jpg 217KB

img_00194.jpg 217KB

img_00195.jpg 216KB

img_00188.jpg 216KB

img_00170.jpg 215KB

共 2000 条

# TSN [Temporal segment networks: Towards good practices for deep action recognition](https://link.springer.com/chapter/10.1007/978-3-319-46484-8_2)  ## Abstract  Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Our first contribution is temporal segment network (TSN), a novel framework for video-based action recognition. which is based on the idea of long-range temporal structure modeling. It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video. The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network. Our approach obtains the state-the-of-art performance on the datasets of HMDB51 ( 69.4%) and UCF101 (94.2%). We also visualize the learned ConvNet models, which qualitatively demonstrates the effectiveness of temporal segment network and the proposed good practices.  <div align=center> <img src="https://user-images.githubusercontent.com/34324155/143019237-8823045b-dfa3-45cc-a992-ee83ab9d8459.png" width="800"/> </div> ## Results and Models ### UCF-101 |config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json| |:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |[tsn_r50_1x1x3_75e_ucf101_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb.py) [1] |8| ResNet50 | ImageNet |83.03|96.78|8332| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023-d85ab600.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_75e_ucf101_rgb/tsn_r50_1x1x3_75e_ucf101_rgb_20201023.json) | [1] We report the performance on UCF-101 split1. ### Diving48 |config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json| |:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |[tsn_r50_video_1x1x8_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb.py)|8| ResNet50 | ImageNet | 71.27 | 95.74 | 5699 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/tsn_r50_video_1x1x8_100e_diving48_rgb_20210426-6dde0185.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_diving48_rgb/20210426_014138.log.json)| |[tsn_r50_video_1x1x16_100e_diving48_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb.py)|8| ResNet50 | ImageNet | 76.75 | 96.95 | 5705 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/tsn_r50_video_1x1x16_100e_diving48_rgb_20210426-63c5f2f7.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x16_100e_diving48_rgb/20210426_014103.log.json)| ### HMDB51 |config | gpus | backbone | pretrain | top1 acc| top5 acc | gpu_mem(M) | ckpt | log| json| |:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |[tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb.py)|8| ResNet50 | ImageNet | 48.95| 80.19| 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb_20201123-ce6c27ed.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_imagenet_rgb/20201025_231108.log.json) | |[tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb.py) |8| ResNet50 | Kinetics400 | 56.08 | 84.31 | 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb_20201123-7f84701b.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_kinetics400_rgb/20201108_190805.log.json) | |[tsn_r50_1x1x8_50e_hmdb51_mit_rgb](/configs/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb.py) |8| ResNet50 | Moments | 54.25 | 83.86| 21535| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/tsn_r50_1x1x8_50e_hmdb51_mit_rgb_20201123-01526d41.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x8_50e_hmdb51_mit_rgb/20201112_170135.log.json) | ### Kinetics-400 |config | resolution | gpus | backbone|pretrain | top1 acc| top5 acc | reference top1 acc | reference top5 acc | inference_time(video/s) | gpu_mem(M)| ckpt | log| json| |:--|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:| |[tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) |340x256|8| ResNet50 | ImageNet|70.60|89.26|x|x|4.3 (25x10 frames)|8344| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json)| |[tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) |short-side 256|8| ResNet50 | ImageNet|70.42|89.03|x|x|x|8343|[ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_1x1x3_100e_kinetics400_rgb_20200725-22592236.pth)|[log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log)|[json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_1x1x3_100e_kinetics400_rgb/20200725_031325.log.json)| |[tsn_r50_dense_1x1x5_50e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb.py) |340x256|8x3| ResNet50| ImageNet |70.18|89.10|[69.15](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|[88.56](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|12.7 (8x10 frames)|7028| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/tsn_r50_dense_1x1x5_100e_kinetics400_rgb_20200627-a063165f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x5_100e_kinetics400_rgb/20200627_105310.log.json)| |[tsn_r50_320p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rgb.py) |short-side 320|8x2| ResNet50| ImageNet |70.91|89.51|x|x|10.7 (25x3 frames)| 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_320p_1x1x3_100e_kinetics400_rg

评论收藏

内容反馈