没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文提出了一种基于机器学习(LightGBM)的早期跳过决策(Early Skip Decision, ESD)方法,用于优化Versatile Video Coding(VVC)标准中的内部子分区(ISP)编码过程。ISP模式虽然提高了压缩效率但增加了编码复杂度,因此本文提出了一种早期决定是否跳过ISP模式测试的方法。具体来说,利用LightGBM模型对块尺寸、长宽比、预测模式、子分区变换系数绝对值和(MAST)等因素进行特征提取,训练分类器来判断是否跳过ISP模式测试。实验结果显示,与传统方法相比,新方法能在平均0.08% BD率损失的情况下减少7.2%的编码时间。 适合人群:视频编码领域的研究人员和技术开发者。 使用场景及目标:适用于需要高效压缩高分辨率视频的应用场景,如5G移动基础设施、虚拟现实(VR)、增强现实(AR)等。目标是在保持较高编码质量的前提下显著减少编码时间。 其他说明:本研究不仅考虑了ISP模式下的预测效果,还考虑了变换域的影响,首次将两者结合起来进行快速决策,有助于在实际应用中推广新的视频编码标准。
资源推荐
资源详情
资源评论
Received 27 September 2022, accepted 10 October 2022, date of publication 17 October 2022, date of current version 25 October 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3215163
Machine Learning-Based Early Skip Decision for
Intra Subpartition Prediction in VVC
JEEYOON PARK , (Student Member, IEEE), B UMYOON KIM , (Student Member, IEEE),
JEEHWAN LEE , (Student Member, IEEE), AN D BYEUNGWOO JEON , (Senior Member, IEEE)
Department of Electrical and Computer Engineering, Sungkyunkwan University, Jangan-gu, Suwon 16410, South Korea
Corresponding author: Byeungwoo Jeon (bjeon@skku.edu)
This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) through
the Ministry of Science and ICT under Grant NRF-2020R1A2C2007673; and in part by the System LSI Division, Samsung Electronics
Company Ltd.
ABSTRACT The recently published video coding standard, Versatile Video Coding (VVC/H.266), has
the intra subpartition (ISP) coding mode, which divides an intra-predicted block into smaller blocks called
subpartitions, each of which can be predicted using the newly reconstructed subpartition while still sharing
the same intra mode. It is a VVC intra prediction tool that brings significant coding gains but also increases
its encoding complexity. In this context, this paper addresses how to speed up the ISP encoding process by
designing an ISP early skip decision scheme using a simple LightGBM model. The proposed ISP decision
expedites the encoding process by early determination of whether or not to skip the ISP mode test. The
proposed method uses the mean absolute sum of transform coefficients as a key feature. Our experimental
results show an average encoding time saving of 7.2% under the all intra coding configuration with 0.08%
BDBR loss. Compared to the state-of-the-art methods, our solution is able to outperform related works in
terms of the combined rate-distortion and time saving.
INDEX TERMS VVC, intra prediction, fast intra prediction, H.266/VVC, encoder optimization, intra
subpartition (ISP), light gradient boosting machine (LightGBM).
I. INTRODUCTION
Along with the reicent commercial introduction of 5G mobile
infrastructure, unconventional media, such as 360-degree
video/VR or immersive media providing up to 6 DoF (degrees
of freedom), have started to emerge as new business oppor-
tunities (in addition to well-known HD, 4K, and 8K video).
But all of these types of media carry a large amount of
data, causing explosive video traffic. This demands a very
powerful video coding technique that can provide very high
compression performance.
Versatile Video Coding (VVC) [1], [2], [3] is the latest
video coding standard by the Joint Video Experts Team
(JVET), jointly formed by the Moving Picture Experts
Group (ISO/IEC MPEG) and the Video Coding Experts
Group (ITU-T VCEG), and provides more than twice the
The associate editor coordinating the review of this manuscript and
approving it for publication was Chaker Larabi .
compression performance compared to the High Efficiency
Video Coding (HEVC) standard [4]. It has many advanced
coding tools compared to HEVC. It is reported [5], [6], [7]
that the coding efficiency of VVC surpasses that of HEVC,
with an average bitrate savings of 25.06% (all intra (AI)
case), 41.04% (random access (RA) case), and 30.88% (low
delay - B (LDB) case) at the same video quality. However,
it is also noted that its encoding time has increased signifi-
cantly by 26, 8, and 6 times against HEVC AI, RA, and LDB,
respectively.
Intra coding is a method of encoding a given block through
intra prediction referring to samples already reconstructed in
the same picture [8]. It is reported [8] that VVC includes
many powerful intra-coding tools, such as mode dependent
intra smoothing (MDIS) [9], cross-component linear model
(CCLM) [10], position dependent intra prediction combi-
nation (PDPC) [11], multiple reference line (MRL) [12],
[13] intra prediction, intra subpartition (ISP) [14], [15], and
111052
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022
J. Park et al.: Machine Learning-Based Early Skip Decision for Intra Subpartition Prediction in VVC
FIGURE 1. Intra prediction modes for luma block in VVC [1].
matrix-based intra prediction (MIP) [17]. As shown in Fig. 1,
VVC supports up to 95 intra prediction modes, among which
28 modes are referred to as wide angular intra prediction
(WAIP) modes [18] while 65 modes are general angular
modes. It also has DC and planar modes as non-directional
intra modes. For intra prediction, VVC achieves 25.06% of
coding efficiency improvements but requires 26 times of
encoding time compared to HEVC [6]. The optimal intra
prediction modes are determined through a complex search
process that involves recursive block partitioning and test-
ing of various predictions for each block, which greatly
increases the coding complexity. From a practical point of
view, a substantial reduction in coding complexity can help
widespread use of the new coding standard. In this regard,
many researchers have studied coding complexity reduction
of VVC intra prediction for fast VVC encoding.
The ISP [14], [15], [16] is an efficient VVC intra prediction
tool. As shown in Fig. 2, the ISP divides a luma intra pre-
diction block equally into two or four smaller blocks. These
are called subpartitions each of which is predicted using the
same intra mode. [16] describes the ISP scheme implemented
in VVC test model (VTM) which also has various early
termination strategies to reduce the complexity of the ISP
encoding search process. Even after the much enhanced ISP
encoder search solution was implemented, however, efforts
to minimize ISP complexity while maintaining ISP coding
efficiency have continued. Park et al. [19] proposed a fast
algorithm that limits the use of ISP by focusing on the refer-
ence samples used for each subpartition block when the ISP
is applied. In other words, if a block is not predicted using
closer reference samples by ISP, its ISP mode test is skipped
to make the encoder faster. An optimization scheme for fast
ISP coding mode is also proposed based on the CU texture
complexity [20], [21], [22]. They measure the CU block tex-
ture complexity to determine whether a CU needs to use the
ISP mode or not, so as to achieve faster encoding. We note that
previous fast ISP decision approaches [19], [20], [21], [22]
can reduce the overall encoding time by effectively avoiding
unnecessary rate distortion optimization (RDO) processes
FIGURE 2. ISP mode in VVC [1].
by fast intra mode decision through characterization of the
information of each block. However, we also note that those
previous approaches [19], [20], [21], [22] considered only
the intra prediction direction and the texture of the block
itself; that is, they missed due consideration of the benefit of
performing separate transforms for each subpartition.
Meanwhile, since machine learning is a recent viable
method to reduce encoder complexity with a small influ-
ence on coding efficiency, there are several studies on fast
decision making processes by implementing learning-based
algorithms [23], [24], [25], [26], [27], [28]. Dong et al. [22]
used decision tree (DT) [37] model for designing a fast ISP
mode skip method. But only CU texture complexity is con-
sidered as in previous works [19], [20], [21]. While the goal
of this paper is to design a fast ISP search scheme (the same
goal as previous approaches), the proposed ISP Early Skip
Decision (ISP-ESD) scheme also makes early determinations
on whether or not to test ISP mode in the RDO process
by considering the efficiency facilitated by ISP prediction
and transforming each subpartition individually. Moreover,
the proposed method uses Light Gradient Boosting Machine
(LightGBM) classifiers [38]. Therefore, our solution is the
first machine learning-based fast ISP search algorithm that
takes both aspects of prediction and transform into consider-
ation. In this paper, in comparison with the ISP tool-off test
in VTM, the proposed method reduces the encoder run-time
of ISP from 13.8% to 7.2% (i.e., about 50% reduction) in
exchange for a loss of 0.08% BD-Rate.
The main contributions of this work are:
• New and efficient VVC ISP intra prediction complexity
reduction solution.
• Use of efficient LightGBM model to reduce the com-
plexity of the ISP mode test while minimizing the coding
efficiency loss.
• Define key features and use them for machine learning
classifiers.
• The proposed ISP-ESD implementation is independent
of the quantization parameter (QP) setting.
VOLUME 10, 2022 111053
J. Park et al.: Machine Learning-Based Early Skip Decision for Intra Subpartition Prediction in VVC
The remainder of this paper is organized as follows.
Section II describes the process of the ISP scheme in VVC.
Section III explains the motivation for ISP early skip decision
method. In Section IV, the proposed machine learning-based
ISP-ESD scheme is explained in detail. Subsequently, the
simulation results are shown in Section V. Finally, Section VI
concludes the paper.
II. ISP PREDICTION SCHEME IN VVC
As the first step of encoding video, each picture is partitioned
into coding units (CUs) of various shapes and sizes. How
a picture is partitioned into CUs is represented in a tree
structure, and the tree information is transmitted to a decoder.
CUs represent a group of pixels, which are encoded in the
same coding mode. A larger CU is desirable in reducing
the signaling overhead of the coding mode and relevant
information, but it may cause prediction performance loss
unless all the pixels in the CU are either homogeneous (in
intra prediction) or well represented by a motion vector (in
inter prediction). Especially in intra prediction, a larger CU
inevitably means a larger distance from the reference samples
in neighboring CU blocks; this tends to decrease the accuracy
of intra prediction. In return, a smaller CU can enhance intra-
prediction accuracy, but it increases signaling overhead due
to the increased number of CUs in a picture. In order to
solve this dilemma, under ISP mode, an intra-coded block
is subdivided into smaller blocks that still share the same
intra prediction mode. ISP performs intra prediction for each
subpartition using nearer reconstructed reference samples in
already encoded subpartition blocks. In VVC, the regular
intra modes, i.e., planar, DC, and all angular modes, can be
used with ISP.
A. BLOCK SUBPARTITION IN ISP SCHEME
As shown in Fig. 2, under the ISP mode, a CU can be split into
four subpartitions either horizontally (HOR-ISP) or vertically
(VER-ISP), where the subpartition direction is indicated by
the two ISP flags (Table 1). It should be noted that due to
practical considerations of memory access, the partitioning is
carried out in such a way that there are at least 16 samples
per subpartition [16]. Therefore, ISP is not applied to 4 × 4
CUs. Additionally, in the case of 4 × 8 or 8 × 4 CUs,
a CU is divided only into two blocks (called a half split)
instead of four. For the other sizes, a CU is divided into
four subpartitions of the same shape and size (called a quad
split). Furthermore, to avoid writing narrow blocks of data
to memory, the minimum width of an intra prediction is four
samples. Therefore, when the VER-ISP mode is used for a
CU with a width of four, the partition is not made in prediction
process, but is still made in transform process [16].
B. TRANSFORM IN ISP SCHEME
ISP is related not only to intra prediction but also to the
transform. VVC has two types of transforms. One is the
primary transform whose kernel is selected among DCT-II
and DST-VII separately for horizontal and vertical directions
TABLE 1. IntraSubPartitionsSplitType and related flags [2].
TABLE 2. Implicit transform selection for ISP.
[29], [30]. The other is the secondary transform, which is the
low-frequency non-separable transform (LFNST), obtained
by offline training with intra-prediction residuals [29], [31].
While the selection of a primary and secondary transform in
VVC is signaled by a CU-level signal, mts_idx, and lfnst_idx,
under the ISP mode, it is signaled implicitly by a CU-level
signal, lfnst_idx, which indicates the primary and secondary
transforms for the CU, as in Table 2. If lfnst_idx is 0, a pri-
mary transform is selected based on the width (or height)
of a subpartition, and the secondary transform is not used.
If lfnst_idx is either 1 or 2, then, DCT-II is used as the
primary transform. In addition, lfnst_idx is signaled for a CU
block; thus, the same LFNST transform kernel is utilized for
all the subpartitions that have a non-zero coded block flag
(CBF) [32].
C. ENCODER SEARCH SCHEME OF ISP MODE
The ISP search is carried out to select the best ISP coding
mode for each CU block to encode. This search decides the
best intra prediction mode and whether ISP mode is selected
or not. If ISP is selected, it also determines whether its split
is vertical or horizontal. This ISP test evaluates RD cost
of a combination (mode, split, lfnst). Here, ‘‘mode’’ refers
to the intra mode (planar, DC, and all angular modes in
Fig. 1); ‘‘split’’ the ISP split direction, which are HOR-ISP
and VER-ISP; and ‘‘lfnst’’ indicates whether or not to use
LFNST (whether the index of LFNST is 0, 1, or 2). The
RD cost of each combination (mode, split, lfnst) is obtained
as a cumulative sum of the RD costs of each subpartition.
A detailed technical description on how to configure the list
for the ISP mode test, the ISP encoder search process, early
termination steps, and rules used to skip the ISP test from
RDO process can be found in [16].
III. MOTIVATION
The benefits of ISP come not only from better intra pre-
diction but also from better utilization of the correlation
between pixels within each subpartition by transform. Since
intra prediction can exploit closer reconstructed samples in
previous subpartitions, significant accuracy improvement is
expected in predictor generation [14]. In this regard, the
111054 VOLUME 10, 2022
剩余13页未读,继续阅读
资源评论
码流怪侠
- 粉丝: 2w+
- 资源: 301
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- (177024842)rabbitMQ实战java版-rabbitMQ-demo.zip
- (176206204)编译原理 词法分析器 lex词法分析器 pl语言词法分析器
- (174738418)html和js实现的元旦倒计时代码
- (174623826)元旦倒计时代码,使用原生js实现
- 设计模式(Design pattern)/ Java实例
- 向日葵远程工具32位,可以针对32位系统进行远程使用
- 驱动总裁.exe,可以针对刚安装的windows系统,用于安装基本驱动
- 训练自己YOLOv10模型+标签分类源码+pt模型转onnx模型
- Java毕设项目:基于spring+mybatis+mysql实现的绿色农产品果蔬商城水果商城蔬菜商城【含源码+数据库+答辩PPT+毕业论文】
- (174778442)元旦倒计时代码-可以自定义内容 直接编辑js即可
- 搜狗输入法安装包,方便使用搜狗输入法打字的朋友
- VB账目统计和access数据库接,显示帐目的实例子(VB6.0编写源代码)
- Java毕设项目:基于spring+mybatis+mysql实现的在线云音乐系统【含源码+数据库+答辩PPT+毕业论文】
- Python毕业设计-基于人脸识别的门禁系统项目源码+数据库+文档说明(高分毕设)
- 2025年度答辩PPT
- 铁路轨道缺陷数据集,4278张原始图片,支持YOLOV11格式的标注,可识别是否有裂缝,间隙缺陷
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功