基于CNN的无损HEVC视频编码内部预测方法资源-CSDN文库

77 浏览量 2024-12-29 18:16:31 上传评论收藏 6.94MB PDF 举报

资源推荐

资源详情

资源评论

1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2940092, IEEE

Transactions on Circuits and Systems for Video Technology

SCHIOPU et al.: CNN-BASED INTRA-PREDICTION FOR LOSSLESS HEVC 1

CNN-based Intra-Prediction for Lossless HEVC

Ionut Schiopu, Member, IEEE , Hongyue Huang, Adrian Munteanu, Member, IEEE

Abstract—The paper proposes a novel block-wise prediction

paradigm based on Convolutional Neural Networks (CNNs) for

lossless video coding. A deep neural network model which

follows a multi-resolution design is employed for block-wise

prediction. Several contributions are proposed to improve neural

network training. A ﬁrst contribution proposes a novel loss

function formulation for an efﬁcient network training based

on a new approach for patch selection. Another contribution

consists in replacing all HEVC-based angular intra-prediction

modes with a CNN-based intra-prediction method, where each

angular prediction mode is complemented by a CNN-based

prediction mode using a speciﬁcally trained model. Another

contribution consists in an efﬁcient adaptation of the CNN-based

intra-prediction residual for lossless video coding. Experimental

results on standard test sequences show that the proposed coding

system outperforms the HEVC standard with an average bitrate

improvement of around 5%. To our knowledge, the paper is the

ﬁrst to replace all the traditional HEVC-based angular intra-

prediction modes with an intra-prediction method based on

modern Machine Learning techniques for lossless video coding

applications.

EDICS: IMD-CODE image/video coding and

transmission

I. INTRODUCTION

S new image processing technologies are developing,

with the growing spatio-temporal resolutions of sensors

and decreasing prices for storage, the user’s demand for video-

based applications is steadily increasing. Lossless video coding

solutions are typically needed in applications which require

high quality input data. In domains such as medical imaging

or satellite image processing, image and video processing ap-

plications are generally dealing with raw data which contains

critical information that might be lost when applying a lossy

coding solution.

In the video coding domain, the current compression stan-

dard is High Efﬁciency Video Coding (HEVC) [1], which

was introduced to replace the H.264/AVC standard [2]. HEVC

provides a bitrate reduction of 50% compared to its popular

H.264 predecessor by adopting a variety of highly-efﬁcient

coding tools. HEVC was mainly developed for lossy coding

applications, however, it also offers the possibility to operate in

lossless coding mode. Improvements were recently proposed

in [3], further enhancing the performance in lossless coding

applications and improving over prior lossless image coding

standards such as JPEG-LS [4] and JPEG 2000 [5].

Triggered by the rapid developments and success of the

Machine Learning (ML) techniques in numerous image pro-

cessing domains, several approaches have recently explored

The authors are with Department of Electronics and Informatics (ETRO),

Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium.

(Correspoding author: Ionut Schiopu)

Manuscript received May 24, 2019; revised August 9, 2019 and September

2, 2019; accepted September 2, 2019.

the potential offered by ML-based solutions for speciﬁc

components in the video coding framework. The goal of

these approaches is to replace a speciﬁc component in the

traditional coding systems and to offer viable alternatives to

existing coding tools. The applicability of ML-based tools

has been explored for several coding components, such as

video inter-prediction [6], rate control [7], transform [8], image

interpolation [9], loop ﬁltering [10], and post-processing [11],

to name a few.

In recent years, several ML-based solutions were specially

proposed for improving the intra-prediction component in

HEVC. In [11], the authors proposed a Variable-Filter-size

Residue-learning Convolutional Neural Network (VRCNN)

model based on a basic sequence of six convolution layers.

The VRCNN model achieves accelerated network training and

improves the HEVC performance. In [12], the authors pro-

posed an arithmetic coding strategy by directly estimating the

probability distribution of the 35 intra-prediction modes with

the adoption of a multi-level arithmetic codec, and by training

a simple neural network to perform probability estimation. In

[13], the authors proposed a neural network model, consisting

of ten convolutional layers, trained to compute a residual

block estimation. The intra-prediction module was improved

by simply adding the predicted residual block with the HEVC

intra-prediction block. In [14], the authors proposed a neural

network model based only on eight fully-connected layers.

The authors proposed the use of two models, one trained for

extending the angular intra-prediction modes, and the other

for extending the non-angular direction intra-prediction modes

(DC and planar). In [15], the authors proposed to extract a set

of features from the causal neighborhood and used them to

select a predeﬁned image pattern as the prediction signal.

It is important to observe that all these ML-based methods

aim at improving the performance in lossy coding of video. In

contrast, in this work, we approach the problem of improving

performance in lossless coding of video. To achieve this

goal, we propose a novel deep learning-based intra-prediction

approach for angular direction prediction and integrate the

proposed solution in a video coding system built based on

the HEVC architecture.

In our prior work, we proposed several CNN-based pre-

diction methods which successfully employ ML tools for

data compression. These methods have replaced the classical

prediction methods used in the traditional coding schemes with

a deep-learning-based prediction method for lossless image

coding applications. Several solutions were proposed for both

the pixel-wise and the block-wise (or macro-pixel-wise) pre-

diction strategies for compressing different types of images.

In [16], the ﬁrst pixel-wise CNN-based prediction scheme for

lossless compression of photographic images was proposed. In

[17], a novel residual-error prediction method based on deep-

the IEEE by sending an email to pubs-permissions@ieee.org.

Transactions on Circuits and Systems for Video Technology

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. X, XXX 2019

learning was proposed showing an improved performance of

up to 32% compared to traditional image codecs. In [18], a

novel macro-pixel-wise (block-wise) prediction method based

on CNNs was proposed for lossless compression of light

ﬁeld images. In [19], a novel approach for lossless image

coding was proposed based on an improved deep-learning-

based pixel-wise prediction and a novel context-tree based

bit-plane codec for photographic images, lenslet images and

video sequences. In our preliminary work [20], we proposed

the ﬁrst method for lossless HEVC video coding based on

CNNs, where a novel neural network design, called Angular

Intra-Prediction Convolutional Neural Network (AP-CNN) is

devised to efﬁciently predict 4 × 4 blocks. The method offers

an improved performance by replacing a reduced set of nine

CNN-based angular intra-prediction modes with a CNN-based

prediction and by combining linear and nonlinear prediction

methods.

The goal of this work is to further advance over our prelim-

inary results in [20] by proposing a novel deep-learning-based

intra-prediction approach for lossless video coding applica-

tions. The proposed approach incorporates a set of novel con-

tributions for block-wise neural network training. Additionally,

novel ML-based tools are proposed to target the challenging

task of lossless angular directional prediction. The proposed

deep-learning-based intra-prediction method was adapted to

the video coding framework of the HEVC standard.

In summary, the novel contributions of this paper are as

follows:

(1) a novel intra-prediction approach based on CNNs for an

improved HEVC performance in lossless video coding

applications;

(2) a new strategy for training the proposed modiﬁed version

of our AP-CNN model in [20], where the neural network

is trained to learn more efﬁciently from both the input

patch and the target output prediction block;

(3) a novel CNN-based angular intra-prediction method

which replaces all the 33 HEVC angular intra-prediction

modes proposed in the HEVC standard for two block

sizes: 4 × 4 and 8 × 8;

(4) a novel loss function formulation for training the pro-

posed neural network model based on a general loss

term, a local loss term, and a rate-based term where the

position of each predicted value relative to the causal

neighborhood is taken into account for an improved

block-wise intra-prediction;

(5) a novel approach for adapting the residual of the deep-

learning-based intra-prediction method in the HEVC

video coding framework for lossless coding applications;

(6) a novel training setup based on training patches extracted

after processing regular RGB images;

(7) an improved video coding system built based on the

HEVC architecture which offers remarkable rate savings

of 5% over HEVC in lossless video coding.

The remainder of this paper is organized as follows. Section

II outlines state-of-the-art methods in video coding based on

ML techniques. Section III describes the proposed CNN-based

intra-prediction approach. Section IV presents the experimen-

tal validation and the performance analysis of the proposed

lossless video coding system. Finally, Section V draws the

conclusions of this work.

II. STATE-OF-THE-ART

In the video coding domain, the research community is

currently developing future video coding technologies that

aim at substantially improving the coding performance by

targeting ”30 − 50%” rate savings over the current HEVC

standard [1] and support for lossless compression. A Call for

Proposals (CfP) on video compression beyond HEVC and its

extensions [21] was published by the collaborative Joint Video

Exploration Team (JVET) formed by the ITU-T Video Coding

Experts Group (VCEG) and the ISO/IEC Moving Picture

Experts Group (MPEG). A new video coding standard, known

as Versatile Video Coding (VVC) is being developed by JVET

based on the CfP responses. One of the main approaches

which brings substantial coding gains in VVC is to partition

the current block based on the traditional Quad-Tree (QT)

partitioning followed by other newly developed techniques

such as Binary-Tree (BT) and Ternary Tree (TT) partitioning.

Recently, the block partitioning topic was intensively explored

and several solutions were proposed in [22], [23], [24], [25],

[26]. Although all these newly developed techniques enhance

VVC and substantially improve the lossy coding performance

over HEVC, they are also characterized by a very high

computational complexity.

As alternative, the research community targeted the reduc-

tion of computational complexity by employing algorithmic

solutions based on ML methods. In this context, several CNN-

based approaches were proposed to solve the problem of

intensive search for an optimal frame partition.

In [27], a CNN-based solution is proposed to directly predict

the Coding Unit (CU) partition structure without the reference

to the information about its neighboring CU. In [28], a deep-

learning method to replace the brute-force search for Rate-

Distortion Optimization (RDO) is proposed by predicting the

CU partition for both intra and inter-modes based on CNNs

and Long- and Short-Term Memory (LSTM) networks. In [29],

a deep-learning approach is proposed which aims at driving the

encoder by estimating probabilities of blocks or CU splitting in

intra slices based on a texture analysis of the original content in

these blocks, and partly replacing the costly RDO optimization

employed in the search of an optimal frame segmentation.

ML-based tools have also been investigated as efﬁcient

alternatives to existing components in HEVC with the aim

of improving coding performance with an affordable increase

in computational complexity.

Such contributions include [30] which improves the per-

formance of the HEVC deblocking ﬁlter. In [31], the authors

are the ﬁrst to integrate the residual error prediction concept

[17] in the training of the network and effectively predict

some of the image details lost in lossy coding. This method

proposes to simply enhance the HEVC encoded videos after

the decoding process, therefore, no modiﬁcations are required

to the original HEVC implementation. In [32], the authors

apply the Residual Learning (ResL) concept [33] for training

Transactions on Circuits and Systems for Video Technology

SCHIOPU et al.: CNN-BASED INTRA-PREDICTION FOR LOSSLESS HEVC 3

a CNN-based model called Multi-reconstruction Recurrent

Residual Network (MRRN). In [34], a Quality Enhancement

Convolutional Neural Network (QE-CNN) method is proposed

based on a sequence of four convolutional layers where two

models are trained for I and P/B frames, respectively.

Another important area that has recently drawn attention

is improving the performance of intra-prediction and in this

context, a few ML-based solutions were recently proposed.

In [14] a neural network model design based on eight fully-

connected layers is proposed, where the loss function is

regularized by parameter penalization to avoid over-ﬁtting. The

model was trained to predict either the angular intra-prediction

modes or the non-angular direction (DC and planar) intra-

prediction modes. The model of [14] was trained only for 8×8

blocks, motivated by the observation that 8 × 8 blocks have

the highest frequency of occurrence in lossy coding. In [35],

the authors employ a neural network to perform block-wise

intra-prediction based on a multiple prediction mode which

co-adapt during training to minimize a loss function. The loss

function employs the ℓ

-norm and a sigmoid-function to the

prediction residual in the DCT domain.

In this paper, we propose a deep-learning-based approach

for lossless video coding. It is important to note that in lossless

coding, intra-prediction plays a much more important role than

in the lossy coding case as it must accurately predict many

more ﬁne textural details which are missing in lossy coding

due to the inherent quantization of the high frequency details.

Additionally, HEVC’s optimal segmentation yields a deep

quad-tree frame partition characterized by a high probability

of occurrence of small block sizes. One notes that HEVC

intra-prediction modes are able to provide an efﬁcient intra-

prediction for small block sizes compared to larger block size,

which is highly appraised in lossless video coding, but also

particularly difﬁcult to challenge. Therefore, the goal of the

proposed method is to improve the intra-prediction for 4 × 4

and 8 × 8 blocks. In our preliminary work [20], an AP-CNN

model was proposed to compute a block-based prediction at

different resolutions, by following a multi-resolution design

inspired from the U-NET architecture [36]. Our model in [20]

was able to provide an improved performance only for a

limited set of intra-prediction modes.

Here we further advance over our preliminary results in [20]

by bringing a set of novel contributions that improve the neural

network training and performance. Based on a similar concept

as in [14], the angular intra-prediction modes are separated

from the DC and planar modes; however, in this paper we

speciﬁcally train a model for each one of the HEVC angular

intra-prediction modes.

III. PROPOSED CNN-BA SED INTRA-PREDICTION

APPROACH

In this paper, we propose a novel deep-learning-based intra-

prediction approach for lossless video coding applications.

The proposed method is training a neural network model for

each of the 33 HEVC angular intra-prediction modes based

on a novel training procedure, and is replacing them with a

CNN-based prediction method. Moreover, the proposed coding

system is adapting the HEVC architecture to employ the CNN-

based prediction method for two different block sizes, yielding

substantial and systematic coding gains over HEVC.

In this section we present the set of contributions proposed

in this paper for a supervised training a neural network

for block-wise prediction. Section III-A describes the pro-

posed deep-learning-based prediction method. Section III-B

describes the proposed coding system based on HEVC, de-

signed for lossless video coding applications.

A. Deep-Learning-based Prediction

For the currently predicted block, the HEVC video coding

standard [1] selects the optimal intra-prediction mode m

from

the set S

HEV C

= {m

}

i=0:34

of 35 available modes as the

one yielding the best Rate-Distortion (RD) performance. The

HEV C

set contains the following mode types:

(a) m

, which is the planar prediction mode designed for

encoding planar surfaces;

(b) m

, which is the DC mode designed for encoding ﬂat

surfaces;

, m

, . . . , m

}, which are the angular direction

modes designed for computing the directional prediction

corresponding to 33 different prediction angles.

In this paper, a block-wise CNN-based prediction scheme

is proposed to improve HEVC’s lossless coding performance

by replacing the HEVC-based angular intra-prediction with

an improved CNN-based prediction for the set of 33 angular

direction intra-prediction modes. Let S

CNN

= {m

}

i=2:34

denote these angular modes, as depicted in Fig. 1a.

For lossless video coding applications, one may note that the

complexity of the block-wise prediction task is increasing with

the size of the prediction block. More exactly, the pixels found

at the most distant positions from the causal neighborhood (i.e.

the bottom right corner) are affected by a higher prediction

error. The effect is more visible for larger-size block partitions.

An important observation to be made is that, in lossless coding,

the optimal block segmentation of the input frame generated

by HEVC is generally characterized by a high frequency of

4 × 4 blocks, followed by block sizes of 8 × 8 pixels. One

can note that, since the frequency of the larger block sizes,

such as 16 × 16 and 32 × 32, is very low, not enough patches

are available for an efﬁcient model training. Hence, in this

paper, we focus on improving the prediction for blocks of size

× N

pixels, with k = 1, 2, where N

= 4 and N

= 8.

Proposed Training Strategy

Let us denote B

as the currently predicted block of size

× N

. The proposed CNN-based prediction method is

generating input patches of size 4N

× 4N

by selecting a

×4N

causal neighborhood around B

. The input patches

are employed to compute the network’s output prediction of

size 4N

× 4N

. However, the ﬁnal prediction block of size

of N

× N

is obtained by cropping the network’s output

prediction block at the corresponding position of B

In this paper, we propose to use three different patch

designs selected according to the current intra-prediction an-

gular mode. Fig. 1b shows the general causal neighborhood

conﬁguration, denoted by N C

, which is employed to generate

剩余12页未读，继续阅读

评论收藏

内容反馈

码流怪侠

粉丝: 2w+
资源: 417

基于CNN的无损HEVC视频编码内部预测方法

dip_imageprocessing_dip_

数字图像处理问答题总结和答案

图像处理技术课件

The Image Processing Handbook 6th Edition

picture

图像

img17

微软HEVC视频扩展插件（免费）

落雪音乐-六音音源 sixyin-music-source-v1.1.0.js

markdown配套文件，使用前先解压

VP9视频扩展 Microsoft.VP9VideoExtensions-1.0.52781.0-x64

音频转码，无限制的享受音乐吧！

喜马拉雅xm文件解密工具

离线安装包 Adobe Flash Player 32.0.0.156 ActiveX for IE

ev4转mp4小工具（无视授权密码）

OCPP协议解析 代码+代码含义详解

语音信号的处理与滤波（窗函数法）

ubuntu下安装ffmpeg必备软件包合集

小优优(U盘歌曲顺序调整)V2001.2.exe

mpp_demo.rar

免费音乐/音效/歌曲素材

Video DownloadHelper需要的合作应用VdhCoApp 2.0.19

m3u8的下载转换工具

萤石CS-TT5-3ECN

OPENCV函数手册（中文）.zip

离线安装包 Adobe Flash Player 32.0.0.156 PPAPI for Opera&amp;Chrome

QT+FFMPEG 从摄像机拉取rtsp裸h264流，播放并保存到本地

医疗器械报警语音（依据法规9706.108-2021）

HDMI协议中文版本

最新资源

OCPP协议解析代码+代码含义详解

离线安装包 Adobe Flash Player 32.0.0.156 PPAPI for Opera&Chrome