【免费】2020-RIFE插值-RIFE-Real-TimeIntermediateFlowEstimationforVid

需积分: 0 169 浏览量 2022-08-03 15:48:55 上传评论收藏 16.19MB PDF 举报

RIFE（Real-Time Intermediate Flow Estimation）是一种针对视频帧插值技术的算法，旨在提高视频帧率和视觉质量。在视频帧插值中，RIFE着重解决了现有方法在处理运动边界时可能出现的艺术效果（artifact）问题。传统的帧插值方法通常通过估计双向光流并线性组合来近似中间光流，这种方法在处理复杂、大范围非线性运动和光照变化时可能会失真。RIFE通过引入名为IFNet（Intermediate Flow Network）的神经网络，能够直接从图像中估计中间光流，从而提高了光流的精度，并简化了融合过程，提升了插值质量，同时实现了更快的速度。 RIFE的核心创新在于其IFNet，这是一个专门设计用来估计中间光流的深度学习模型。IFNet能够对两帧之间的复杂运动进行更精确的建模，减少了边界处的失真。此外，RIFE还引入了一种名为“泄漏蒸馏损失”(leakage distillation loss)的训练策略，使得整个系统可以端到端地进行优化。这种损失函数有助于改善光流预测的准确性，并进一步提升插值性能。在实际应用中，RIFE的优势在于其实时性能和高质量的插值结果。它可以在高分辨率视频（如720p和1080p）上以实时速度运行，对于播放高帧率视频、为计算资源有限的用户提供视频编辑服务等场景具有巨大潜力。相比于现有的视频帧插值方法，RIFE不仅速度显著更快，而且在公共基准测试中表现出了最先进的性能。 RIFE的工作流程通常包括两个主要步骤：通过IFNet估计连续帧之间的中间光流；然后，利用这些精确的光流信息来合成新的中间帧。这个过程避免了传统方法中对双向光流的线性组合，从而降低了生成的中间帧中可能出现的错误或模糊现象。实验结果显示，RIFE在效率和效果上都优于其他VFI算法，为视频处理和相关应用提供了更为高效和优质的解决方案。其开源代码可供研究者和开发者在https://github.com/hzwer/arXiv2020-RIFE访问和使用，促进该领域的进一步研究和发展。 RIFE是视频帧插值领域的一个重大突破，它通过直接估计中间光流和端到端的训练策略，成功解决了速度与质量的平衡问题，为实时视频处理带来了新的可能。

资源详情

资源评论

资源推荐

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Zhewei Huang

Tianyuan Zhang

Wen Heng

Boxin Shi

Shuchang Zhou

Megvii Inc

Peking University

{huangzhewei, zhangtianyuan, hengwen, zsc}@megvii.com, shiboxin@pku.edu.cn

Abstract

We propose RIFE, a Real-time Intermediate Flow Es-

timation algorithm for Video Frame Interpolation (VFI).

Most existing methods ﬁrst estimate the bi-directional opti-

cal ﬂows and then linearly combine them to approximate in-

termediate ﬂows, leading to artifacts on motion boundaries.

RIFE uses a neural network named IFNet that can directly

estimate the intermediate ﬂows from images. With the more

precise ﬂows and our simpliﬁed fusion process, RIFE can

improve interpolation quality and have much better speed.

Based on our proposed leakage distillation loss, RIFE can

be trained in an end-to-end fashion. Experiments demon-

strate that our method is signiﬁcantly faster than existing

VFI methods and can achieve state-of-the-art performance

on public benchmarks. The code is available at https:

//github.com/hzwer/arXiv2020-RIFE.

1. Introduction

Video Frame Interpolation (VFI) aims to synthesize in-

termediate frames between two consecutive frames of a

video and is widely used to improve the frame rate and

enhance visual quality. VFI also supports various ap-

plications like slow-motion generation, video compres-

sion [31], and training data generation for video motion de-

blurring [4]. Moreover, VFI algorithms running on high-

resolution videos (e.g., 720p, and 1080p) with real-time

speed have many more potential applications, such as play-

ing a higher frame rate video on the client’s player, provid-

ing video editing services for users with limited computing

resources.

VFI is challenging due to the complex, large non-linear

motions and illumination changes in the real world. Flow-

based VFI algorithms have recently offered a framework

to address these challenges and achieved impressive re-

sults [17, 22, 35, 2]. Common approaches for these methods

involve two steps: 1) warping the input frames according to

approximated optical ﬂows and 2) fusing and reﬁning the

warped frames using a bunch of Convolutional Neural Net-

Figure 1: Speed and accuracy trade-off by adjusting

model size parameters C and F . We compare our models

with prior VFI methods including TOFlow [35], SepConv-

[24], MEMC-Net [3], DAIN [2], CAIN [8], Soft-

Splat [23] and BMBC [26] on the Vimeo90K testing set.

works (CNNs).

According to the way of warping frames, ﬂow-based VFI

algorithms can be classiﬁed into forward warping based

methods and backward warping based methods. Backward

warping is more widely used because forward warping lacks

uniﬁed and efﬁcient implementation and suffers from con-

ﬂicts when multiple source pixels are mapped to the same

location, which leads to overlapped pixels and holes.

Given the input frames I

, I

, backward warping based

methods need to approximate the intermediate ﬂows

t→0

, F

t→1

from the perspective of the frame I

that we are

expected to synthesize. Common practice [17, 34, 2] ﬁrst

computes bi-directional ﬂows from pre-trained off-the-shelf

optical ﬂow models, then linearly combines them. This

combination, however, will fail on motion boundaries, as

there will be different objects in the two frames. Conse-

quently, previous VFI methods share two major drawbacks:

1) To solve the artifacts brought by the linear combination

of optical ﬂows, previous methods usually need to ap-

proximate various representations, e.g., image depth [2],

intermediate ﬂow reﬁnement [17]. Coupled with the

large complexity in the bi-directional ﬂow estimation,

arXiv:2011.06294v2 [cs.CV] 17 Nov 2020

Figure 2: Overview of RIFE. Given two input frames I

, I

, we directly feed them into our extremely efﬁcient IFNet to

get the intermediate ﬂows F

t→0

, F

t→1

. Then the fusion process takes the warped frames

0→t

1→t

, intermediate ﬂows

t→0

, F

t→1

and the input frames I

, I

as input. Inside the fusion process, a FusionMap and Residual is ﬁrstly estimated,

then the warped frames are linearly combined according to the FusionMap, and added with the Residual to get reconstructed

intermediate frame

none of these methods can achieve real-time speed.

2) Having no direct supervision for the approximated inter-

mediate ﬂows: The intermediate ﬂow estimation process

and later reﬁne process is trained with only the ﬁnal re-

construction loss. There is no other supervision explic-

itly designed for the ﬂow estimation process, making the

whole system hard to converge.

We ﬁrst develop a specialized and efﬁcient intermediate

ﬂow network named IFNet to directly estimate the inter-

mediate ﬂows. IFNet adopts a coarse-to-ﬁne strategy with

progressively increased resolutions: it iteratively updates a

ﬂow ﬁeld via successive IFBlocks. Conceptually, according

to the iteratively updated ﬂow ﬁelds, we move correspond-

ing pixels from two input frames to the same location in

a latent intermediate frame. Unlike most previous optical

ﬂow models, IFNet does not contain expensive operators

like cost volume or pyramid feature warping and simply

uses ResNet block [11] as building blocks. This intentional

and simple design can beneﬁt from the out-of-the-box efﬁ-

cient implementation of ResNet blocks.

Employing strong intermediate supervision is also found

to be important. In fact, when training the IFNet end-to-

end with later fusion process using the ﬁnal reconstruc-

tion loss, our method produces worse results than previous

methods that used complex pipelines and pre-trained ﬂow

models in the intermediate ﬂow estimation process.The pic-

ture changes dramatically after we proposed much more ad-

vanced supervision to our intermediate ﬂow model, named

leakage distillation loss. This novel loss employs an over-

powered teacher with access to the intermediate frames dur-

ing training.

Combining these designs, our algorithm can achieve ex-

cellent results when trained from scratch. We illustrate the

speed and accuracy trade-off compared with other methods

in Figure 1.

In summary, our contributions are three-fold:

• We design a novel and efﬁcient IFNet to simplify

the ﬂow-based VFI methods. IFNet can be trained

from scratch and directly approximate the intermedi-

ate ﬂows given two input frames.

• We provide effective supervision for the IFNet by

proposing an adapted census loss function and a novel

leakage distillation loss function, which leads to a

more stable convergence and large performance im-

provement.

• Our proposed RIFE is the ﬁrst ﬂow-based and real-

time VFI algorithm that can process 720p videos at

30FPS. Experiments show that RIFE can achieve im-

pressive performance on public benchmarks.

2. Related Work

We provide a brief overview of the optical ﬂow estima-

tion task, which is the core of most VFI methods. Then, we

will review several most related ﬂow-based VFI methods,

and cover some inspiring ﬂow-free methods.

2.1. Optical Flow

Optical ﬂow estimation is a long-standing vision task

that aims to estimate the per-pixel motion. It provides a

useful representation that can be used in lots of downstream

tasks like video alignment [5], video editing [33], and video

analysis [36]. Since the milestone work of FlowNet [9]

based on U-net autoencoder [27], architectures for opti-

cal ﬂow model have evolved for several years, yielding

剩余9页未读，继续阅读

评论收藏

内容反馈

weixin_35780426

粉丝: 26
资源: 286

2020-RIFE插值-RIFE- Real-Time Intermediate Flow Estimation for Vid

评论0

最新资源

2020-RIFE插值-RIFE- Real-Time Intermediate Flow Estimation for Vid

评论0

RIFE插值

用于视频插值的Flowframes Windows GUI-RIFE，DAIN-NCNN，CAIN-NCNN。-.NET开发

arXiv2020-RIFE：RIFE：视频帧插值的实时中间流估计

flameTimewarpML：Flame机器学习Timewarp。 基于arXiv2020-RIFE

RIFE-Colab:适用于Google Colab的RIFE插值脚本以及适用于Windows或Linux的GUI

借助深度卷积神经网络对图片 & GIF & 视频进行超分辨率放大(即放大与降噪) 以及 对视频进行 插帧(即补帧).

rife-ncnn-vulkan:RIFE，使用ncnn库实现的视频帧插值实时中间流估计

基于三次样条函数的加Rife-vincent自卷积窗 插值FFT算法的电力系统谐波检测

正弦波频率估计的改进Rife算法

Video-Frame-Interpolation-Summary:视频帧插值摘要2020〜2021

matlab实现m-rife算法

RIFE_GUI:用于RIFE批处理的Python脚本（用于长电影的插值）

flowframes:使用DAIN（NCNN）或RIFE（CUDANCNN）进行视频插值的Flowframes Windows GUI

基于Nuttall窗插值FFT的谐波分析方法

Video-Frame-Interpolation-Collections:最新的视频帧插值（VFI）方法的集合

Rife算法的Matlab实现

Rife算法的matlab实现

dmandwp建站主题rife free-PHP

基于Rife算法的频率估计及其FPGA实现.pdf

frequency-estimatin-method.zip_Pisarenko_Rife_frequency_levenson

新建文件夹 (2)_frenquency_estimation_Rife_M-Rife_

任意点正弦波信号频率估计的快速算法 (2008年)

M_Rife_Rife_rife算法_mrife_频率估计_

AI生成器-Rife：AI生成器-Rife。 “ Januszamisąmedycy od chirurgi i farmacji。”

RIFE_trained_model_v3.6.zip

zaipinguji.zip_Rife_matlab rife算法_rife算法_载频 估计_载频估计

最新资源

flameTimewarpML：Flame机器学习Timewarp。基于arXiv2020-RIFE

借助深度卷积神经网络对图片 & GIF & 视频进行超分辨率放大(即放大与降噪) 以及对视频进行插帧(即补帧).

基于三次样条函数的加Rife-vincent自卷积窗插值FFT算法的电力系统谐波检测

zaipinguji.zip_Rife_matlab rife算法_rife算法_载频估计_载频估计