Video-Dataset-Loading-Pytorch:用于加载，预处理和增强视频数据集的通用PyTorch数据集实现

共145个文件

jpg：130个

rst：5个

py：3个

machine-learning

deep-learning

pytorch

dataloader

5星 · 超过95%的资源需积分: 46 25 浏览量 2021-05-05 01:32:42 上传评论 2 收藏 1.3MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

Video-Dataset-Loading-Pytorch:用于加载，预处理和增强视频数据集的通用PyTorch数据集实现（145个子文件）

make.bat 764B

img_00004.jpg 29KB

img_00003.jpg 28KB

img_00003.jpg 25KB

img_00004.jpg 25KB

img_00002.jpg 24KB

img_00005.jpg 23KB

img_00002.jpg 22KB

img_00005.jpg 21KB

img_00006.jpg 17KB

img_00001.jpg 16KB

img_00006.jpg 16KB

img_00007.jpg 12KB

img_00007.jpg 11KB

img_00008.jpg 10KB

img_00011.jpg 9KB

img_00004.jpg 9KB

img_00010.jpg 9KB

img_00003.jpg 9KB

img_00008.jpg 9KB

img_00009.jpg 9KB

img_00008.jpg 9KB

img_00007.jpg 9KB

img_00002.jpg 9KB

img_00006.jpg 9KB

img_00005.jpg 8KB

img_00009.jpg 8KB

img_00004.jpg 8KB

img_00007.jpg 8KB

img_00015.jpg 8KB

img_00014.jpg 8KB

img_00013.jpg 8KB

img_00014.jpg 8KB

img_00002.jpg 8KB

img_00010.jpg 7KB

img_00009.jpg 7KB

img_00006.jpg 7KB

img_00003.jpg 7KB

img_00008.jpg 7KB

img_00015.jpg 7KB

img_00011.jpg 7KB

img_00012.jpg 7KB

img_00013.jpg 7KB

img_00001.jpg 7KB

img_00010.jpg 7KB

img_00010.jpg 6KB

img_00011.jpg 6KB

共 145 条

[![Documentation Status](https://readthedocs.org/projects/video-dataset-loading-pytorch/badge/?version=latest)](https://video-dataset-loading-pytorch.readthedocs.io/en/latest/?badge=latest) # Efficient Video Dataset Loading and Augmentation in PyTorch Author: [Raivo Koot](https://github.com/RaivoKoot) https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html If you find the code useful, please star the repository. If you are completely unfamiliar with loading datasets in PyTorch using `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`, I recommend getting familiar with these first through [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) or [this](https://github.com/utkuozbulak/pytorch-custom-dataset-examples). ### Overview: This example demonstrates the use of `VideoFrameDataset` The VideoFrameDataset class (an implementation of `torch.utils.data.Dataset`) serves to `easily`, `efficiently` and `effectively` load video samples from video datasets in PyTorch. 1) Easily because this dataset class can be used with custom datasets with minimum effort and no modification. The class merely expects the video dataset to have a certain structure on disk and expects a .txt annotation file that enumerates each video sample. Details on this can be found below and at `https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html`. 2) Efficiently because the video loading pipeline that this class implements is very fast. This minimizes GPU waiting time during training by eliminating input bottlenecks that can slow down training time by several folds. 3) Effectively because the implemented sampling strategy for video frames is very strong. Video training using the entire sequence of video frames (often several hundred) is too memory and compute intense. Therefore, this implementation samples frames evenly from the video (sparse temporal sampling) so that the loaded frames represent every part of the video, with support for arbitrary and differing video lengths within the same dataset. This approach has shown to be very effective and is taken from ["Temporal Segment Networks (ECCV2016)"](https://arxiv.org/abs/1608.00859) with modifications. In conjunction with PyTorch's DataLoader, the VideoFrameDataset class returns video batch tensors of size `BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH`. For a demo, visit `demo.py`. ### QuickDemo (demo.py) ```python root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure annotation_file = os.path.join(root, 'annotations.txt') # A row for each video sample as: (VIDEO_PATH START_FRAME END_FRAME CLASS_ID) """ DEMO 1 WITHOUT IMAGE TRANSFORMS """ dataset = VideoFrameDataset( root_path=root, annotationfile_path=annotation_file, num_segments=5, frames_per_segment=1, image_template='img_{:05d}.jpg', transform=None, random_shift=True, test_mode=False ) sample = dataset[0] # take first sample of dataset frames = sample[0] # list of PIL images label = sample[1] # integer label for image in frames: plt.imshow(image) plt.title(label) plt.show() plt.pause(1) ``` ![alt text](https://github.com/RaivoKoot/images/blob/main/Action_Video.jpg "Action Video") # Table of Contents - [1. Requirements](#1-requirements) - [2. Custom Dataset](#2-custom-dataset) - [3. Video Frame Sampling Method](#3-video-frame-sampling-method) - [4. Alternate Video Frame Sampling Methods](#4-alternate-video-frame-sampling-methods) - [5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training) - [6. Allowing Multiple Labels per Sample](#6-allowing-multiple-labels-per-sample) - [7. Conclusion](#7-conclusion) - [8. Upcoming Features](#8-upcoming-features) - [9. Acknowledgements](#9-acknowledgements) ### 1. Requirements ``` # Without these three, VideoFrameDataset will not work. torchvision >= 0.8.0 torch >= 1.7.0 python >= 3.6 ``` ### 2. Custom Dataset (This description explains using custom datasets where each sample has a single class label. If you want to know how to use a dataset where a sample can have more than a single class label, read this anyways and then read `6.` below) To use any dataset, two conditions must be met. 1) The video data must be supplied as RGB frames, each frame saved as an image file. Each video must have its own folder, in which the frames of that video lie. The frames of a video inside its folder must be named uniformly with consecutive indices such as `img_00001.jpg` ... `img_00120.jpg`, if there are 120 frames. Indices can start at zero or any other number and the exact file name template can be chosen freely. The filename template for frames in this example is "img_{:05d}.jpg" (python string formatting, specifying 5 digits after the underscore), and must be supplied to the constructor of VideoFrameDataset as a parameter. Each video folder must lie inside some `root` folder. 2) To enumerate all video samples in the dataset and their required metadata, a `.txt` annotation file must be manually created that contains a row for each video clip sample in the dataset. The training, validation, and testing datasets must have separate annotation files. Each row must be a space-separated list that contains `VIDEO_PATH START_FRAME END_FRAME CLASS_INDEX`. The `VIDEO_PATH` of a video sample should be provided without the `root` prefix of this dataset. This example project demonstrates this using a dummy dataset inside of `demo_dataset/`, which is the `root` dataset folder of this example. The folder structure looks as follows: ``` demo_dataset │ ├───annotations.txt ├───jumping # arbitrary class folder naming │ ├───0001 # arbitrary video folder naming │ │ ├───img_00001.jpg │ │ . │ │ └───img_00017.jpg │ └───0002 │ ├───img_00001.jpg │ . │ └───img_00018.jpg │ └───running # arbitrary folder naming ├───0001 # arbitrary video folder naming │ ├───img_00001.jpg │ . │ └───img_00015.jpg └───0002 ├───img_00001.jpg . └───img_00015.jpg ``` The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID) ``` jumping/0001 1 17 0 jumping/0002 1 18 0 running/0001 1 15 1 running/0002 1 15 1 ``` Another annotations file that uses multiple clips from each video could be ``` jumping/0001 1 8 0 jumping/0001 5 17 0 jumping/0002 1 18 0 running/0001 10 15 1 running/0001 5 10 1 running/0002 1 15 1 ``` (END_FRAME is inclusive) Instantiating a VideoFrameDataset with the `root_path` parameter pointing to `demo_dataset`, the `annotationsfile_path` parameter pointing to the annotation file, and the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to start using the VideoFrameDataset class. ### 3. Video Frame Sampling Method When loading a video, only a number of its frames are loaded. They are chosen in the following way: 1. The frame index range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded. This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling `dataset[i]`. ![alt text](https://github.com/RaivoKoot/images/blob/main/Sparse_Temporal_Sampling.jpg "Sparse-Temporal-Sampling-Strategy") ### 4. Alternate Video Frame Sampling Methods If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Bec