没有合适的资源?快使用搜索试试~ 我知道了~
基于CNN的无损HEVC视频编码内部预测方法
0 下载量 77 浏览量
2024-12-29
18:16:31
上传
评论
收藏 6.94MB PDF 举报
温馨提示
内容概要:本文提出了一种用于无损视频编码的新型卷积神经网络(CNN)基于的块级预测方法。具体来说,改进后的深度学习模型替代了现有的HEVC内部预测模式,特别是在4×4和8×8块大小的角向模式下表现显著。文章还介绍了新设计的损失函数、训练策略和其他优化措施来提高网络的性能。实验结果显示,在标准测试序列上,所提系统比传统的HEVC编码方式平均比特率降低了约5%。此外,还提出了混合编码方法,在HEVC和CNN方法间进行选择,从而进一步提升了性能。 适合人群:图像处理和视频编码领域的研究者与工程师,尤其是关注机器学习应用于传统视频编码技术的人群。 使用场景及目标:适用于需要高保真度数据压缩的应用场景,如医学影像处理、卫星图处理等领域。主要目的是利用先进的深度学习技术改善视频内部预测机制,以达到更高的编码效率。 其他说明:该论文不仅提供了一个具体的解决方案来提升HEVC无损编码的效果,同时也探讨了深度学习在网络训练和编码应用方面的挑战和潜力。对于希望深入了解如何将现代AI技术融入传统工程任务的学者和技术人员非常有帮助。
资源推荐
资源详情
资源评论
1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2940092, IEEE
Transactions on Circuits and Systems for Video Technology
SCHIOPU et al.: CNN-BASED INTRA-PREDICTION FOR LOSSLESS HEVC 1
CNN-based Intra-Prediction for Lossless HEVC
Ionut Schiopu, Member, IEEE , Hongyue Huang, Adrian Munteanu, Member, IEEE
Abstract—The paper proposes a novel block-wise prediction
paradigm based on Convolutional Neural Networks (CNNs) for
lossless video coding. A deep neural network model which
follows a multi-resolution design is employed for block-wise
prediction. Several contributions are proposed to improve neural
network training. A first contribution proposes a novel loss
function formulation for an efficient network training based
on a new approach for patch selection. Another contribution
consists in replacing all HEVC-based angular intra-prediction
modes with a CNN-based intra-prediction method, where each
angular prediction mode is complemented by a CNN-based
prediction mode using a specifically trained model. Another
contribution consists in an efficient adaptation of the CNN-based
intra-prediction residual for lossless video coding. Experimental
results on standard test sequences show that the proposed coding
system outperforms the HEVC standard with an average bitrate
improvement of around 5%. To our knowledge, the paper is the
first to replace all the traditional HEVC-based angular intra-
prediction modes with an intra-prediction method based on
modern Machine Learning techniques for lossless video coding
applications.
EDICS: IMD-CODE image/video coding and
transmission
I. INTRODUCTION
A
S new image processing technologies are developing,
with the growing spatio-temporal resolutions of sensors
and decreasing prices for storage, the user’s demand for video-
based applications is steadily increasing. Lossless video coding
solutions are typically needed in applications which require
high quality input data. In domains such as medical imaging
or satellite image processing, image and video processing ap-
plications are generally dealing with raw data which contains
critical information that might be lost when applying a lossy
coding solution.
In the video coding domain, the current compression stan-
dard is High Efficiency Video Coding (HEVC) [1], which
was introduced to replace the H.264/AVC standard [2]. HEVC
provides a bitrate reduction of 50% compared to its popular
H.264 predecessor by adopting a variety of highly-efficient
coding tools. HEVC was mainly developed for lossy coding
applications, however, it also offers the possibility to operate in
lossless coding mode. Improvements were recently proposed
in [3], further enhancing the performance in lossless coding
applications and improving over prior lossless image coding
standards such as JPEG-LS [4] and JPEG 2000 [5].
Triggered by the rapid developments and success of the
Machine Learning (ML) techniques in numerous image pro-
cessing domains, several approaches have recently explored
The authors are with Department of Electronics and Informatics (ETRO),
Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium.
(Correspoding author: Ionut Schiopu)
Manuscript received May 24, 2019; revised August 9, 2019 and September
2, 2019; accepted September 2, 2019.
the potential offered by ML-based solutions for specific
components in the video coding framework. The goal of
these approaches is to replace a specific component in the
traditional coding systems and to offer viable alternatives to
existing coding tools. The applicability of ML-based tools
has been explored for several coding components, such as
video inter-prediction [6], rate control [7], transform [8], image
interpolation [9], loop filtering [10], and post-processing [11],
to name a few.
In recent years, several ML-based solutions were specially
proposed for improving the intra-prediction component in
HEVC. In [11], the authors proposed a Variable-Filter-size
Residue-learning Convolutional Neural Network (VRCNN)
model based on a basic sequence of six convolution layers.
The VRCNN model achieves accelerated network training and
improves the HEVC performance. In [12], the authors pro-
posed an arithmetic coding strategy by directly estimating the
probability distribution of the 35 intra-prediction modes with
the adoption of a multi-level arithmetic codec, and by training
a simple neural network to perform probability estimation. In
[13], the authors proposed a neural network model, consisting
of ten convolutional layers, trained to compute a residual
block estimation. The intra-prediction module was improved
by simply adding the predicted residual block with the HEVC
intra-prediction block. In [14], the authors proposed a neural
network model based only on eight fully-connected layers.
The authors proposed the use of two models, one trained for
extending the angular intra-prediction modes, and the other
for extending the non-angular direction intra-prediction modes
(DC and planar). In [15], the authors proposed to extract a set
of features from the causal neighborhood and used them to
select a predefined image pattern as the prediction signal.
It is important to observe that all these ML-based methods
aim at improving the performance in lossy coding of video. In
contrast, in this work, we approach the problem of improving
performance in lossless coding of video. To achieve this
goal, we propose a novel deep learning-based intra-prediction
approach for angular direction prediction and integrate the
proposed solution in a video coding system built based on
the HEVC architecture.
In our prior work, we proposed several CNN-based pre-
diction methods which successfully employ ML tools for
data compression. These methods have replaced the classical
prediction methods used in the traditional coding schemes with
a deep-learning-based prediction method for lossless image
coding applications. Several solutions were proposed for both
the pixel-wise and the block-wise (or macro-pixel-wise) pre-
diction strategies for compressing different types of images.
In [16], the first pixel-wise CNN-based prediction scheme for
lossless compression of photographic images was proposed. In
[17], a novel residual-error prediction method based on deep-
Copyright © 2019 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from
the IEEE by sending an email to pubs-permissions@ieee.org.
1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2940092, IEEE
Transactions on Circuits and Systems for Video Technology
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. X, XXX 2019
learning was proposed showing an improved performance of
up to 32% compared to traditional image codecs. In [18], a
novel macro-pixel-wise (block-wise) prediction method based
on CNNs was proposed for lossless compression of light
field images. In [19], a novel approach for lossless image
coding was proposed based on an improved deep-learning-
based pixel-wise prediction and a novel context-tree based
bit-plane codec for photographic images, lenslet images and
video sequences. In our preliminary work [20], we proposed
the first method for lossless HEVC video coding based on
CNNs, where a novel neural network design, called Angular
Intra-Prediction Convolutional Neural Network (AP-CNN) is
devised to efficiently predict 4 × 4 blocks. The method offers
an improved performance by replacing a reduced set of nine
CNN-based angular intra-prediction modes with a CNN-based
prediction and by combining linear and nonlinear prediction
methods.
The goal of this work is to further advance over our prelim-
inary results in [20] by proposing a novel deep-learning-based
intra-prediction approach for lossless video coding applica-
tions. The proposed approach incorporates a set of novel con-
tributions for block-wise neural network training. Additionally,
novel ML-based tools are proposed to target the challenging
task of lossless angular directional prediction. The proposed
deep-learning-based intra-prediction method was adapted to
the video coding framework of the HEVC standard.
In summary, the novel contributions of this paper are as
follows:
(1) a novel intra-prediction approach based on CNNs for an
improved HEVC performance in lossless video coding
applications;
(2) a new strategy for training the proposed modified version
of our AP-CNN model in [20], where the neural network
is trained to learn more efficiently from both the input
patch and the target output prediction block;
(3) a novel CNN-based angular intra-prediction method
which replaces all the 33 HEVC angular intra-prediction
modes proposed in the HEVC standard for two block
sizes: 4 × 4 and 8 × 8;
(4) a novel loss function formulation for training the pro-
posed neural network model based on a general loss
term, a local loss term, and a rate-based term where the
position of each predicted value relative to the causal
neighborhood is taken into account for an improved
block-wise intra-prediction;
(5) a novel approach for adapting the residual of the deep-
learning-based intra-prediction method in the HEVC
video coding framework for lossless coding applications;
(6) a novel training setup based on training patches extracted
after processing regular RGB images;
(7) an improved video coding system built based on the
HEVC architecture which offers remarkable rate savings
of 5% over HEVC in lossless video coding.
The remainder of this paper is organized as follows. Section
II outlines state-of-the-art methods in video coding based on
ML techniques. Section III describes the proposed CNN-based
intra-prediction approach. Section IV presents the experimen-
tal validation and the performance analysis of the proposed
lossless video coding system. Finally, Section V draws the
conclusions of this work.
II. STATE-OF-THE-ART
In the video coding domain, the research community is
currently developing future video coding technologies that
aim at substantially improving the coding performance by
targeting ”30 − 50%” rate savings over the current HEVC
standard [1] and support for lossless compression. A Call for
Proposals (CfP) on video compression beyond HEVC and its
extensions [21] was published by the collaborative Joint Video
Exploration Team (JVET) formed by the ITU-T Video Coding
Experts Group (VCEG) and the ISO/IEC Moving Picture
Experts Group (MPEG). A new video coding standard, known
as Versatile Video Coding (VVC) is being developed by JVET
based on the CfP responses. One of the main approaches
which brings substantial coding gains in VVC is to partition
the current block based on the traditional Quad-Tree (QT)
partitioning followed by other newly developed techniques
such as Binary-Tree (BT) and Ternary Tree (TT) partitioning.
Recently, the block partitioning topic was intensively explored
and several solutions were proposed in [22], [23], [24], [25],
[26]. Although all these newly developed techniques enhance
VVC and substantially improve the lossy coding performance
over HEVC, they are also characterized by a very high
computational complexity.
As alternative, the research community targeted the reduc-
tion of computational complexity by employing algorithmic
solutions based on ML methods. In this context, several CNN-
based approaches were proposed to solve the problem of
intensive search for an optimal frame partition.
In [27], a CNN-based solution is proposed to directly predict
the Coding Unit (CU) partition structure without the reference
to the information about its neighboring CU. In [28], a deep-
learning method to replace the brute-force search for Rate-
Distortion Optimization (RDO) is proposed by predicting the
CU partition for both intra and inter-modes based on CNNs
and Long- and Short-Term Memory (LSTM) networks. In [29],
a deep-learning approach is proposed which aims at driving the
encoder by estimating probabilities of blocks or CU splitting in
intra slices based on a texture analysis of the original content in
these blocks, and partly replacing the costly RDO optimization
employed in the search of an optimal frame segmentation.
ML-based tools have also been investigated as efficient
alternatives to existing components in HEVC with the aim
of improving coding performance with an affordable increase
in computational complexity.
Such contributions include [30] which improves the per-
formance of the HEVC deblocking filter. In [31], the authors
are the first to integrate the residual error prediction concept
[17] in the training of the network and effectively predict
some of the image details lost in lossy coding. This method
proposes to simply enhance the HEVC encoded videos after
the decoding process, therefore, no modifications are required
to the original HEVC implementation. In [32], the authors
apply the Residual Learning (ResL) concept [33] for training
1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2940092, IEEE
Transactions on Circuits and Systems for Video Technology
SCHIOPU et al.: CNN-BASED INTRA-PREDICTION FOR LOSSLESS HEVC 3
a CNN-based model called Multi-reconstruction Recurrent
Residual Network (MRRN). In [34], a Quality Enhancement
Convolutional Neural Network (QE-CNN) method is proposed
based on a sequence of four convolutional layers where two
models are trained for I and P/B frames, respectively.
Another important area that has recently drawn attention
is improving the performance of intra-prediction and in this
context, a few ML-based solutions were recently proposed.
In [14] a neural network model design based on eight fully-
connected layers is proposed, where the loss function is
regularized by parameter penalization to avoid over-fitting. The
model was trained to predict either the angular intra-prediction
modes or the non-angular direction (DC and planar) intra-
prediction modes. The model of [14] was trained only for 8×8
blocks, motivated by the observation that 8 × 8 blocks have
the highest frequency of occurrence in lossy coding. In [35],
the authors employ a neural network to perform block-wise
intra-prediction based on a multiple prediction mode which
co-adapt during training to minimize a loss function. The loss
function employs the ℓ
1
-norm and a sigmoid-function to the
prediction residual in the DCT domain.
In this paper, we propose a deep-learning-based approach
for lossless video coding. It is important to note that in lossless
coding, intra-prediction plays a much more important role than
in the lossy coding case as it must accurately predict many
more fine textural details which are missing in lossy coding
due to the inherent quantization of the high frequency details.
Additionally, HEVC’s optimal segmentation yields a deep
quad-tree frame partition characterized by a high probability
of occurrence of small block sizes. One notes that HEVC
intra-prediction modes are able to provide an efficient intra-
prediction for small block sizes compared to larger block size,
which is highly appraised in lossless video coding, but also
particularly difficult to challenge. Therefore, the goal of the
proposed method is to improve the intra-prediction for 4 × 4
and 8 × 8 blocks. In our preliminary work [20], an AP-CNN
model was proposed to compute a block-based prediction at
different resolutions, by following a multi-resolution design
inspired from the U-NET architecture [36]. Our model in [20]
was able to provide an improved performance only for a
limited set of intra-prediction modes.
Here we further advance over our preliminary results in [20]
by bringing a set of novel contributions that improve the neural
network training and performance. Based on a similar concept
as in [14], the angular intra-prediction modes are separated
from the DC and planar modes; however, in this paper we
specifically train a model for each one of the HEVC angular
intra-prediction modes.
III. PROPOSED CNN-BA SED INTRA-PREDICTION
APPROACH
In this paper, we propose a novel deep-learning-based intra-
prediction approach for lossless video coding applications.
The proposed method is training a neural network model for
each of the 33 HEVC angular intra-prediction modes based
on a novel training procedure, and is replacing them with a
CNN-based prediction method. Moreover, the proposed coding
system is adapting the HEVC architecture to employ the CNN-
based prediction method for two different block sizes, yielding
substantial and systematic coding gains over HEVC.
In this section we present the set of contributions proposed
in this paper for a supervised training a neural network
for block-wise prediction. Section III-A describes the pro-
posed deep-learning-based prediction method. Section III-B
describes the proposed coding system based on HEVC, de-
signed for lossless video coding applications.
A. Deep-Learning-based Prediction
For the currently predicted block, the HEVC video coding
standard [1] selects the optimal intra-prediction mode m
i
from
the set S
HEV C
= {m
i
}
i=0:34
of 35 available modes as the
one yielding the best Rate-Distortion (RD) performance. The
S
HEV C
set contains the following mode types:
(a) m
0
, which is the planar prediction mode designed for
encoding planar surfaces;
(b) m
1
, which is the DC mode designed for encoding flat
surfaces;
(c) {m
2
, m
3
, . . . , m
34
}, which are the angular direction
modes designed for computing the directional prediction
corresponding to 33 different prediction angles.
In this paper, a block-wise CNN-based prediction scheme
is proposed to improve HEVC’s lossless coding performance
by replacing the HEVC-based angular intra-prediction with
an improved CNN-based prediction for the set of 33 angular
direction intra-prediction modes. Let S
CNN
= {m
i
}
i=2:34
denote these angular modes, as depicted in Fig. 1a.
For lossless video coding applications, one may note that the
complexity of the block-wise prediction task is increasing with
the size of the prediction block. More exactly, the pixels found
at the most distant positions from the causal neighborhood (i.e.
the bottom right corner) are affected by a higher prediction
error. The effect is more visible for larger-size block partitions.
An important observation to be made is that, in lossless coding,
the optimal block segmentation of the input frame generated
by HEVC is generally characterized by a high frequency of
4 × 4 blocks, followed by block sizes of 8 × 8 pixels. One
can note that, since the frequency of the larger block sizes,
such as 16 × 16 and 32 × 32, is very low, not enough patches
are available for an efficient model training. Hence, in this
paper, we focus on improving the prediction for blocks of size
N
k
× N
k
pixels, with k = 1, 2, where N
1
= 4 and N
2
= 8.
Proposed Training Strategy
Let us denote B
x
as the currently predicted block of size
N
k
× N
k
. The proposed CNN-based prediction method is
generating input patches of size 4N
k
× 4N
k
by selecting a
4N
k
×4N
k
causal neighborhood around B
x
. The input patches
are employed to compute the network’s output prediction of
size 4N
k
× 4N
k
. However, the final prediction block of size
of N
k
× N
k
is obtained by cropping the network’s output
prediction block at the corresponding position of B
x
.
In this paper, we propose to use three different patch
designs selected according to the current intra-prediction an-
gular mode. Fig. 1b shows the general causal neighborhood
configuration, denoted by N C
g
, which is employed to generate
剩余12页未读,继续阅读
资源评论
码流怪侠
- 粉丝: 2w+
- 资源: 417
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 【岗位说明】工业工厂类(职务分析样本).doc
- 【岗位说明】公司律师岗位职责.doc
- 【岗位说明】公司律师岗位说明.doc
- 【岗位说明】工艺技术部部门职责.doc
- 【岗位说明】焊接工岗位说明.doc
- 【岗位说明】供应商管理工程师.doc
- 【岗位说明】供应商管理工程师岗位说明.doc
- 【岗位说明】焊接工岗位职责.doc
- 【岗位说明】后勤服务类(职务分析样本).doc
- 【岗位说明】基建部职能说明书.doc
- 【岗位说明】计算机管理员岗位职责.doc
- 【岗位说明】计算机管理员岗位说明.doc
- 【岗位说明】计算机开发部岗位职责表.doc
- 【岗位说明】技术部经理岗位职责及权限说明书.doc
- 【岗位说明】技改室职能说明书.doc
- 【岗位说明】技术副总岗位职责.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功