[![Documentation Status](https://readthedocs.org/projects/video-dataset-loading-pytorch/badge/?version=latest)](https://video-dataset-loading-pytorch.readthedocs.io/en/latest/?badge=latest)
# Efficient Video Dataset Loading and Augmentation in PyTorch
Author: [Raivo Koot](https://github.com/RaivoKoot)
https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html
If you find the code useful, please star the repository.
If you are completely unfamiliar with loading datasets in PyTorch using `torch.utils.data.Dataset` and `torch.utils.data.DataLoader`, I recommend
getting familiar with these first through [this](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) or
[this](https://github.com/utkuozbulak/pytorch-custom-dataset-examples).
### Overview: This example demonstrates the use of `VideoFrameDataset`
The VideoFrameDataset class (an implementation of `torch.utils.data.Dataset`) serves to `easily`, `efficiently` and `effectively` load video samples from video datasets in PyTorch.
1) Easily because this dataset class can be used with custom datasets with minimum effort and no modification. The class merely expects the
video dataset to have a certain structure on disk and expects a .txt annotation file that enumerates each video sample. Details on this
can be found below and at `https://video-dataset-loading-pytorch.readthedocs.io/en/latest/VideoDataset.html`.
2) Efficiently because the video loading pipeline that this class implements is very fast. This minimizes GPU waiting time during training by eliminating input bottlenecks
that can slow down training time by several folds.
3) Effectively because the implemented sampling strategy for video frames is very strong. Video training using the entire sequence of
video frames (often several hundred) is too memory and compute intense. Therefore, this implementation samples frames evenly from the video (sparse temporal sampling)
so that the loaded frames represent every part of the video, with support for arbitrary and differing video lengths within the same dataset.
This approach has shown to be very effective and is taken from
["Temporal Segment Networks (ECCV2016)"](https://arxiv.org/abs/1608.00859) with modifications.
In conjunction with PyTorch's DataLoader, the VideoFrameDataset class returns video batch tensors of size `BATCH x FRAMES x CHANNELS x HEIGHT x WIDTH`.
For a demo, visit `demo.py`.
### QuickDemo (demo.py)
```python
root = os.path.join(os.getcwd(), 'demo_dataset') # Folder in which all videos lie in a specific structure
annotation_file = os.path.join(root, 'annotations.txt') # A row for each video sample as: (VIDEO_PATH START_FRAME END_FRAME CLASS_ID)
""" DEMO 1 WITHOUT IMAGE TRANSFORMS """
dataset = VideoFrameDataset(
root_path=root,
annotationfile_path=annotation_file,
num_segments=5,
frames_per_segment=1,
image_template='img_{:05d}.jpg',
transform=None,
random_shift=True,
test_mode=False
)
sample = dataset[0] # take first sample of dataset
frames = sample[0] # list of PIL images
label = sample[1] # integer label
for image in frames:
plt.imshow(image)
plt.title(label)
plt.show()
plt.pause(1)
```
![alt text](https://github.com/RaivoKoot/images/blob/main/Action_Video.jpg "Action Video")
# Table of Contents
- [1. Requirements](#1-requirements)
- [2. Custom Dataset](#2-custom-dataset)
- [3. Video Frame Sampling Method](#3-video-frame-sampling-method)
- [4. Alternate Video Frame Sampling Methods](#4-alternate-video-frame-sampling-methods)
- [5. Using VideoFrameDataset for Training](#5-using-videoframedataset-for-training)
- [6. Allowing Multiple Labels per Sample](#6-allowing-multiple-labels-per-sample)
- [7. Conclusion](#7-conclusion)
- [8. Upcoming Features](#8-upcoming-features)
- [9. Acknowledgements](#9-acknowledgements)
### 1. Requirements
```
# Without these three, VideoFrameDataset will not work.
torchvision >= 0.8.0
torch >= 1.7.0
python >= 3.6
```
### 2. Custom Dataset
(This description explains using custom datasets where each sample has a single class label. If you want to know how to
use a dataset where a sample can have more than a single class label, read this anyways and then read `6.` below)
To use any dataset, two conditions must be met.
1) The video data must be supplied as RGB frames, each frame saved as an image file. Each video must have its own folder, in which the frames of
that video lie. The frames of a video inside its folder must be named uniformly with consecutive indices such as `img_00001.jpg` ... `img_00120.jpg`, if there are 120 frames.
Indices can start at zero or any other number and the exact file name template can be chosen freely. The filename template
for frames in this example is "img_{:05d}.jpg" (python string formatting, specifying 5 digits after the underscore), and must be supplied to the
constructor of VideoFrameDataset as a parameter. Each video folder must lie inside some `root` folder.
2) To enumerate all video samples in the dataset and their required metadata, a `.txt` annotation file must be manually created that contains a row for each
video clip sample in the dataset. The training, validation, and testing datasets must have separate annotation files. Each row must be a space-separated list that contains
`VIDEO_PATH START_FRAME END_FRAME CLASS_INDEX`. The `VIDEO_PATH` of a video sample should be provided without the `root` prefix of this dataset.
This example project demonstrates this using a dummy dataset inside of `demo_dataset/`, which is the `root` dataset folder of this example. The folder
structure looks as follows:
```
demo_dataset
│
├───annotations.txt
├───jumping # arbitrary class folder naming
│ ├───0001 # arbitrary video folder naming
│ │ ├───img_00001.jpg
│ │ .
│ │ └───img_00017.jpg
│ └───0002
│ ├───img_00001.jpg
│ .
│ └───img_00018.jpg
│
└───running # arbitrary folder naming
├───0001 # arbitrary video folder naming
│ ├───img_00001.jpg
│ .
│ └───img_00015.jpg
└───0002
├───img_00001.jpg
.
└───img_00015.jpg
```
The accompanying annotation `.txt` file contains the following rows (PATH, START_FRAME, END_FRAME, LABEL_ID)
```
jumping/0001 1 17 0
jumping/0002 1 18 0
running/0001 1 15 1
running/0002 1 15 1
```
Another annotations file that uses multiple clips from each video could be
```
jumping/0001 1 8 0
jumping/0001 5 17 0
jumping/0002 1 18 0
running/0001 10 15 1
running/0001 5 10 1
running/0002 1 15 1
```
(END_FRAME is inclusive)
Instantiating a VideoFrameDataset with the `root_path` parameter pointing to `demo_dataset`, the `annotationsfile_path` parameter pointing to the annotation file, and
the `imagefile_template` parameter as "img_{:05d}.jpg", is all that it takes to start using the VideoFrameDataset class.
### 3. Video Frame Sampling Method
When loading a video, only a number of its frames are loaded. They are chosen in the following way:
1. The frame index range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS even segments. From each segment, a random start-index is sampled from which FRAMES_PER_SEGMENT consecutive indices are loaded.
This results in NUM_SEGMENTS*FRAMES_PER_SEGMENT chosen indices, whose frames are loaded as PIL images and put into a list and returned when calling
`dataset[i]`.
![alt text](https://github.com/RaivoKoot/images/blob/main/Sparse_Temporal_Sampling.jpg "Sparse-Temporal-Sampling-Strategy")
### 4. Alternate Video Frame Sampling Methods
If you do not want to use sparse temporal sampling and instead want to sample a single N-frame continuous
clip from a video, this is possible. Set `NUM_SEGMENTS=1` and `FRAMES_PER_SEGMENT=N`. Bec
没有合适的资源?快使用搜索试试~ 我知道了~
Video-Dataset-Loading-Pytorch:用于加载,预处理和增强视频数据集的通用PyTorch数据集实现
共145个文件
jpg:130个
rst:5个
py:3个
5星 · 超过95%的资源 需积分: 46 22 下载量 25 浏览量
2021-05-05
01:32:42
上传
评论 2
收藏 1.3MB ZIP 举报
温馨提示
PyTorch中高效的视频数据集加载和增强 作者: 如果您发现该代码很有用,请给存储库加注星标。 如果您完全不熟悉使用torch.utils.data.Dataset和torch.utils.data.DataLoader在PyTorch中加载数据集,建议您首先通过或来熟悉它们。 概述:本示例演示VideoFrameDataset的用法 VideoFrameDataset类( torch.utils.data.Dataset的实现)用于easily , efficiently effectively从PyTorch的视频数据集中加载视频样本。 之所以容易,是因为该数据集类可以与自定义数据集一起使用,而无需花费任何努力,也无需修改。 该类仅希望视频数据集在磁盘上具有某种结构,并希望使用.txt注释文件枚举每个视频样本。 可以在下面以及https://video-dataset-loa
资源详情
资源评论
资源推荐
收起资源包目录
Video-Dataset-Loading-Pytorch:用于加载,预处理和增强视频数据集的通用PyTorch数据集实现 (145个子文件)
make.bat 764B
img_00004.jpg 29KB
img_00004.jpg 29KB
img_00003.jpg 28KB
img_00003.jpg 28KB
img_00003.jpg 25KB
img_00003.jpg 25KB
img_00004.jpg 25KB
img_00004.jpg 25KB
img_00002.jpg 24KB
img_00002.jpg 24KB
img_00005.jpg 23KB
img_00005.jpg 23KB
img_00002.jpg 22KB
img_00002.jpg 22KB
img_00005.jpg 21KB
img_00005.jpg 21KB
img_00006.jpg 17KB
img_00006.jpg 17KB
img_00001.jpg 16KB
img_00001.jpg 16KB
img_00001.jpg 16KB
img_00001.jpg 16KB
img_00006.jpg 16KB
img_00006.jpg 16KB
img_00007.jpg 12KB
img_00007.jpg 12KB
img_00007.jpg 11KB
img_00007.jpg 11KB
img_00008.jpg 10KB
img_00008.jpg 10KB
img_00011.jpg 9KB
img_00011.jpg 9KB
img_00004.jpg 9KB
img_00004.jpg 9KB
img_00010.jpg 9KB
img_00010.jpg 9KB
img_00003.jpg 9KB
img_00003.jpg 9KB
img_00008.jpg 9KB
img_00008.jpg 9KB
img_00009.jpg 9KB
img_00009.jpg 9KB
img_00008.jpg 9KB
img_00008.jpg 9KB
img_00007.jpg 9KB
img_00007.jpg 9KB
img_00002.jpg 9KB
img_00002.jpg 9KB
img_00006.jpg 9KB
img_00006.jpg 9KB
img_00005.jpg 8KB
img_00005.jpg 8KB
img_00005.jpg 8KB
img_00005.jpg 8KB
img_00009.jpg 8KB
img_00009.jpg 8KB
img_00004.jpg 8KB
img_00004.jpg 8KB
img_00007.jpg 8KB
img_00007.jpg 8KB
img_00015.jpg 8KB
img_00015.jpg 8KB
img_00014.jpg 8KB
img_00014.jpg 8KB
img_00013.jpg 8KB
img_00013.jpg 8KB
img_00014.jpg 8KB
img_00014.jpg 8KB
img_00002.jpg 8KB
img_00002.jpg 8KB
img_00010.jpg 7KB
img_00010.jpg 7KB
img_00009.jpg 7KB
img_00009.jpg 7KB
img_00009.jpg 7KB
img_00009.jpg 7KB
img_00006.jpg 7KB
img_00006.jpg 7KB
img_00003.jpg 7KB
img_00003.jpg 7KB
img_00008.jpg 7KB
img_00008.jpg 7KB
img_00015.jpg 7KB
img_00015.jpg 7KB
img_00011.jpg 7KB
img_00011.jpg 7KB
img_00012.jpg 7KB
img_00012.jpg 7KB
img_00012.jpg 7KB
img_00012.jpg 7KB
img_00013.jpg 7KB
img_00013.jpg 7KB
img_00001.jpg 7KB
img_00001.jpg 7KB
img_00010.jpg 7KB
img_00010.jpg 7KB
img_00010.jpg 6KB
img_00010.jpg 6KB
img_00011.jpg 6KB
共 145 条
- 1
- 2
蓝色山脉
- 粉丝: 17
- 资源: 4614
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1