基于尺度空间流的端到端视频压缩优化方法资源-CSDN文库

91 浏览量 2024-11-25 17:47:53 上传评论收藏 4.23MB PDF 举报

资源推荐

资源详情

资源评论

Scale-space ﬂow for end-to-end optimized video compression

Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici

Google Research, Perception Team

{eirikur, dminnen, nickj, jballe, sjhwang, gtoderici}@google.com

Abstract

Despite considerable progress on end-to-end optimized

deep networks for image compression, video coding re-

mains a challenging task. Recently proposed methods for

learned video compression use optical ﬂow and bilinear

warping for motion compensation and show competitive

rate–distortion performance relative to hand-engineered

codecs like H.264 and HEVC. However, these learning-

based methods rely on complex architectures and training

schemes including the use of pre-trained optical ﬂow net-

works, sequential training of sub-networks, adaptive rate

control, and buffering intermediate reconstructions to disk

during training. In this paper, we show that a generalized

warping operator that better handles common failure cases,

e.g. disocclusions and fast motion, can provide competi-

tive compression results with a greatly simpliﬁed model and

training procedure. Speciﬁcally, we propose scale-space

ﬂow, an intuitive generalization of optical ﬂow that adds

a scale parameter to allow the network to better model un-

certainty. Our experiments show that a low-latency video

compression model (no B-frames) using scale-space ﬂow

for motion compensation can outperform analogous state-

of-the art learned video compression models while being

trained using a much simpler procedure and without any

pre-trained optical ﬂow networks.

1. Introduction

Recently, there has been signiﬁcant progress in the

area of end-to-end optimized image compression, which

went from barely matching JPEG [

33] to methods such

as [

8, 26, 5] that can outperform the best hand-engineered

codecs when evaluated in terms of multi-scale structural

similarity (MS-SSIM) [

36], PSNR, and subjective quality

assessments from user studies. While this is very encourag-

ing, over 60% of downstream internet trafﬁc currently con-

sists of streaming video data [

1], which means that in order

to maximize impact on bandwidth reduction, researchers

should focus on video compression.

Since the area of neural video compression is in early

Figure 1. Our proposed scale-space warping module. From the

source image x, we construct a ﬁxed-resolution scale-space vol-

ume X. In contrast to bilinear warping, where the warped output

is sampled directly from the 2-D source image using a 2-channel

displacement ﬁeld (f

, f

), we trilinearly sample from the 3-D

scale-space volume using a 3-channel displacements+scale ﬁeld

, g

). The scale value gives a continuous, differentiable

knob that can adaptively blur the source image when warping if

the warp is not a good prediction of the target image.

stages, it is not yet clear which network architectures are

most effective for different application scenarios. We can

roughly categorize the existing research methods into the

following three categories:

1) 3D autoencoders are a natural extension of the work

done for learned image compression, but [

27] demonstrated

that representing video using spatiotemporal transforma-

tions alone does not lead to better performance compared

to standard methods. However, when combined with tem-

porally conditioned entropy models [

19], such methods can

perform on par with standard methods in terms of MS-

SSIM.

2) Frame interpolation methods use neural networks to

temporally interpolate between frames in a video and then

encode the residuals [

38, 17]. This approach is commonly

used in standard video coding (called “bidirectional predic-

tion” or “B-frame coding”) [

37], but has the disadvantage

that it is generally not suitable for low-latency streaming

since such methods need information “from the future” to

decode each B-frame. However, in standard codecs, the use

of B-frames typically provides the best rate–distortion (RD)

8503

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

码流怪侠

粉丝: 2w+
资源: 89

基于尺度空间流的端到端视频压缩优化方法

基于公共空间嵌入的端到端深度零样本学习.pdf

Python-DVC端到端深度视频压缩框架

基于多尺度时空优化的空气质量预测方法.pdf

基于卷积神经网络的端到端压缩框架

基于纯视觉端到端深度学习的自动驾驶系统python代码.zip

oToV2：一种在线的端到端的剪枝优化方法介绍

基于RefineNet的端到端语音增强方法.docx

基于PPO算法的智能汽车端到端深度强化学习控制研究

基于层次分析和机器学习算法的移动视频端到端定界方法.pdf

基于注意力的端到端大词汇量语音识别

基于深度学习的自编码器端到端物理层优化方案.pdf

基于IBE技术的端到端加密系统的设计

基于卷积神经网络的端到端多光谱图像压缩方法

基于改进GAN的端到端自动驾驶图像生成方法.docx

基于注意力的端到端韵律结构和重音联合预测方法

python基于端到端的声纹识别系统源码.zip

网络游戏-基于软件定义网络的端到端路径上逐跳链路丢包测量方法.zip

端到端优化的3D点云几何信息有损压缩模型.docx

基于深度学习的端到端MIMO系统研究_张宇.pdf

WCDMA网络端到端业务性能优化

基于神经网络的端到端的事件指代消解研究.pdf

基于Android端到端实时无线视频传输系统

网络游戏-基于移动通信网络的端到端加密方法和加密系统.zip

基于深度学习的端到端乐谱音符识别.pdf

基于端到端失真优化的无线视频联合编码方案 (2010年)

最新资源