基于机器学习的VVC内部子分区预测早期跳过决策研究资源-CSDN文库

187 浏览量 2024-12-29 16:52:31 上传评论收藏 3.08MB PDF 举报

资源推荐

资源详情

资源评论

Received 27 September 2022, accepted 10 October 2022, date of publication 17 October 2022, date of current version 25 October 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3215163

Machine Learning-Based Early Skip Decision for

Intra Subpartition Prediction in VVC

JEEYOON PARK , (Student Member, IEEE), B UMYOON KIM , (Student Member, IEEE),

JEEHWAN LEE , (Student Member, IEEE), AN D BYEUNGWOO JEON , (Senior Member, IEEE)

Department of Electrical and Computer Engineering, Sungkyunkwan University, Jangan-gu, Suwon 16410, South Korea

Corresponding author: Byeungwoo Jeon (bjeon@skku.edu)

This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) through

the Ministry of Science and ICT under Grant NRF-2020R1A2C2007673; and in part by the System LSI Division, Samsung Electronics

Company Ltd.

ABSTRACT The recently published video coding standard, Versatile Video Coding (VVC/H.266), has

the intra subpartition (ISP) coding mode, which divides an intra-predicted block into smaller blocks called

subpartitions, each of which can be predicted using the newly reconstructed subpartition while still sharing

the same intra mode. It is a VVC intra prediction tool that brings signiﬁcant coding gains but also increases

its encoding complexity. In this context, this paper addresses how to speed up the ISP encoding process by

designing an ISP early skip decision scheme using a simple LightGBM model. The proposed ISP decision

expedites the encoding process by early determination of whether or not to skip the ISP mode test. The

proposed method uses the mean absolute sum of transform coefﬁcients as a key feature. Our experimental

results show an average encoding time saving of 7.2% under the all intra coding conﬁguration with 0.08%

BDBR loss. Compared to the state-of-the-art methods, our solution is able to outperform related works in

terms of the combined rate-distortion and time saving.

INDEX TERMS VVC, intra prediction, fast intra prediction, H.266/VVC, encoder optimization, intra

subpartition (ISP), light gradient boosting machine (LightGBM).

I. INTRODUCTION

Along with the reicent commercial introduction of 5G mobile

infrastructure, unconventional media, such as 360-degree

video/VR or immersive media providing up to 6 DoF (degrees

of freedom), have started to emerge as new business oppor-

tunities (in addition to well-known HD, 4K, and 8K video).

But all of these types of media carry a large amount of

data, causing explosive video trafﬁc. This demands a very

powerful video coding technique that can provide very high

compression performance.

Versatile Video Coding (VVC) [1], [2], [3] is the latest

video coding standard by the Joint Video Experts Team

(JVET), jointly formed by the Moving Picture Experts

Group (ISO/IEC MPEG) and the Video Coding Experts

Group (ITU-T VCEG), and provides more than twice the

The associate editor coordinating the review of this manuscript and

approving it for publication was Chaker Larabi .

compression performance compared to the High Efﬁciency

Video Coding (HEVC) standard [4]. It has many advanced

coding tools compared to HEVC. It is reported [5], [6], [7]

that the coding efﬁciency of VVC surpasses that of HEVC,

with an average bitrate savings of 25.06% (all intra (AI)

case), 41.04% (random access (RA) case), and 30.88% (low

delay - B (LDB) case) at the same video quality. However,

it is also noted that its encoding time has increased signiﬁ-

cantly by 26, 8, and 6 times against HEVC AI, RA, and LDB,

respectively.

Intra coding is a method of encoding a given block through

intra prediction referring to samples already reconstructed in

the same picture [8]. It is reported [8] that VVC includes

many powerful intra-coding tools, such as mode dependent

intra smoothing (MDIS) [9], cross-component linear model

(CCLM) [10], position dependent intra prediction combi-

nation (PDPC) [11], multiple reference line (MRL) [12],

[13] intra prediction, intra subpartition (ISP) [14], [15], and

111052

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

VOLUME 10, 2022

J. Park et al.: Machine Learning-Based Early Skip Decision for Intra Subpartition Prediction in VVC

FIGURE 1. Intra prediction modes for luma block in VVC [1].

matrix-based intra prediction (MIP) [17]. As shown in Fig. 1,

VVC supports up to 95 intra prediction modes, among which

28 modes are referred to as wide angular intra prediction

(WAIP) modes [18] while 65 modes are general angular

modes. It also has DC and planar modes as non-directional

intra modes. For intra prediction, VVC achieves 25.06% of

coding efﬁciency improvements but requires 26 times of

encoding time compared to HEVC [6]. The optimal intra

prediction modes are determined through a complex search

process that involves recursive block partitioning and test-

ing of various predictions for each block, which greatly

increases the coding complexity. From a practical point of

view, a substantial reduction in coding complexity can help

widespread use of the new coding standard. In this regard,

many researchers have studied coding complexity reduction

of VVC intra prediction for fast VVC encoding.

The ISP [14], [15], [16] is an efﬁcient VVC intra prediction

tool. As shown in Fig. 2, the ISP divides a luma intra pre-

diction block equally into two or four smaller blocks. These

are called subpartitions each of which is predicted using the

same intra mode. [16] describes the ISP scheme implemented

in VVC test model (VTM) which also has various early

termination strategies to reduce the complexity of the ISP

encoding search process. Even after the much enhanced ISP

encoder search solution was implemented, however, efforts

to minimize ISP complexity while maintaining ISP coding

efﬁciency have continued. Park et al. [19] proposed a fast

algorithm that limits the use of ISP by focusing on the refer-

ence samples used for each subpartition block when the ISP

is applied. In other words, if a block is not predicted using

closer reference samples by ISP, its ISP mode test is skipped

to make the encoder faster. An optimization scheme for fast

ISP coding mode is also proposed based on the CU texture

complexity [20], [21], [22]. They measure the CU block tex-

ture complexity to determine whether a CU needs to use the

ISP mode or not, so as to achieve faster encoding. We note that

previous fast ISP decision approaches [19], [20], [21], [22]

can reduce the overall encoding time by effectively avoiding

unnecessary rate distortion optimization (RDO) processes

FIGURE 2. ISP mode in VVC [1].

by fast intra mode decision through characterization of the

information of each block. However, we also note that those

previous approaches [19], [20], [21], [22] considered only

the intra prediction direction and the texture of the block

itself; that is, they missed due consideration of the beneﬁt of

performing separate transforms for each subpartition.

Meanwhile, since machine learning is a recent viable

method to reduce encoder complexity with a small inﬂu-

ence on coding efﬁciency, there are several studies on fast

decision making processes by implementing learning-based

algorithms [23], [24], [25], [26], [27], [28]. Dong et al. [22]

used decision tree (DT) [37] model for designing a fast ISP

mode skip method. But only CU texture complexity is con-

sidered as in previous works [19], [20], [21]. While the goal

of this paper is to design a fast ISP search scheme (the same

goal as previous approaches), the proposed ISP Early Skip

Decision (ISP-ESD) scheme also makes early determinations

on whether or not to test ISP mode in the RDO process

by considering the efﬁciency facilitated by ISP prediction

and transforming each subpartition individually. Moreover,

the proposed method uses Light Gradient Boosting Machine

(LightGBM) classiﬁers [38]. Therefore, our solution is the

ﬁrst machine learning-based fast ISP search algorithm that

takes both aspects of prediction and transform into consider-

ation. In this paper, in comparison with the ISP tool-off test

in VTM, the proposed method reduces the encoder run-time

of ISP from 13.8% to 7.2% (i.e., about 50% reduction) in

exchange for a loss of 0.08% BD-Rate.

The main contributions of this work are:

• New and efﬁcient VVC ISP intra prediction complexity

reduction solution.

• Use of efﬁcient LightGBM model to reduce the com-

plexity of the ISP mode test while minimizing the coding

efﬁciency loss.

• Deﬁne key features and use them for machine learning

classiﬁers.

• The proposed ISP-ESD implementation is independent

of the quantization parameter (QP) setting.

VOLUME 10, 2022 111053

J. Park et al.: Machine Learning-Based Early Skip Decision for Intra Subpartition Prediction in VVC

The remainder of this paper is organized as follows.

Section II describes the process of the ISP scheme in VVC.

Section III explains the motivation for ISP early skip decision

method. In Section IV, the proposed machine learning-based

ISP-ESD scheme is explained in detail. Subsequently, the

simulation results are shown in Section V. Finally, Section VI

concludes the paper.

II. ISP PREDICTION SCHEME IN VVC

As the ﬁrst step of encoding video, each picture is partitioned

into coding units (CUs) of various shapes and sizes. How

a picture is partitioned into CUs is represented in a tree

structure, and the tree information is transmitted to a decoder.

CUs represent a group of pixels, which are encoded in the

same coding mode. A larger CU is desirable in reducing

the signaling overhead of the coding mode and relevant

information, but it may cause prediction performance loss

unless all the pixels in the CU are either homogeneous (in

intra prediction) or well represented by a motion vector (in

inter prediction). Especially in intra prediction, a larger CU

inevitably means a larger distance from the reference samples

in neighboring CU blocks; this tends to decrease the accuracy

of intra prediction. In return, a smaller CU can enhance intra-

prediction accuracy, but it increases signaling overhead due

to the increased number of CUs in a picture. In order to

solve this dilemma, under ISP mode, an intra-coded block

is subdivided into smaller blocks that still share the same

intra prediction mode. ISP performs intra prediction for each

subpartition using nearer reconstructed reference samples in

already encoded subpartition blocks. In VVC, the regular

intra modes, i.e., planar, DC, and all angular modes, can be

used with ISP.

A. BLOCK SUBPARTITION IN ISP SCHEME

As shown in Fig. 2, under the ISP mode, a CU can be split into

four subpartitions either horizontally (HOR-ISP) or vertically

(VER-ISP), where the subpartition direction is indicated by

the two ISP ﬂags (Table 1). It should be noted that due to

practical considerations of memory access, the partitioning is

carried out in such a way that there are at least 16 samples

per subpartition [16]. Therefore, ISP is not applied to 4 × 4

CUs. Additionally, in the case of 4 × 8 or 8 × 4 CUs,

a CU is divided only into two blocks (called a half split)

instead of four. For the other sizes, a CU is divided into

four subpartitions of the same shape and size (called a quad

split). Furthermore, to avoid writing narrow blocks of data

to memory, the minimum width of an intra prediction is four

samples. Therefore, when the VER-ISP mode is used for a

CU with a width of four, the partition is not made in prediction

process, but is still made in transform process [16].

B. TRANSFORM IN ISP SCHEME

ISP is related not only to intra prediction but also to the

transform. VVC has two types of transforms. One is the

primary transform whose kernel is selected among DCT-II

and DST-VII separately for horizontal and vertical directions

TABLE 1. IntraSubPartitionsSplitType and related flags [2].

TABLE 2. Implicit transform selection for ISP.

[29], [30]. The other is the secondary transform, which is the

low-frequency non-separable transform (LFNST), obtained

by ofﬂine training with intra-prediction residuals [29], [31].

While the selection of a primary and secondary transform in

VVC is signaled by a CU-level signal, mts_idx, and lfnst_idx,

under the ISP mode, it is signaled implicitly by a CU-level

signal, lfnst_idx, which indicates the primary and secondary

transforms for the CU, as in Table 2. If lfnst_idx is 0, a pri-

mary transform is selected based on the width (or height)

of a subpartition, and the secondary transform is not used.

If lfnst_idx is either 1 or 2, then, DCT-II is used as the

primary transform. In addition, lfnst_idx is signaled for a CU

block; thus, the same LFNST transform kernel is utilized for

all the subpartitions that have a non-zero coded block ﬂag

(CBF) [32].

C. ENCODER SEARCH SCHEME OF ISP MODE

The ISP search is carried out to select the best ISP coding

mode for each CU block to encode. This search decides the

best intra prediction mode and whether ISP mode is selected

or not. If ISP is selected, it also determines whether its split

is vertical or horizontal. This ISP test evaluates RD cost

of a combination (mode, split, lfnst). Here, ‘‘mode’’ refers

to the intra mode (planar, DC, and all angular modes in

Fig. 1); ‘‘split’’ the ISP split direction, which are HOR-ISP

and VER-ISP; and ‘‘lfnst’’ indicates whether or not to use

LFNST (whether the index of LFNST is 0, 1, or 2). The

RD cost of each combination (mode, split, lfnst) is obtained

as a cumulative sum of the RD costs of each subpartition.

A detailed technical description on how to conﬁgure the list

for the ISP mode test, the ISP encoder search process, early

termination steps, and rules used to skip the ISP test from

RDO process can be found in [16].

III. MOTIVATION

The beneﬁts of ISP come not only from better intra pre-

diction but also from better utilization of the correlation

between pixels within each subpartition by transform. Since

intra prediction can exploit closer reconstructed samples in

previous subpartitions, signiﬁcant accuracy improvement is

expected in predictor generation [14]. In this regard, the

111054 VOLUME 10, 2022

剩余13页未读，继续阅读

评论收藏

内容反馈

码流怪侠

粉丝: 2w+
资源: 301

基于机器学习的VVC内部子分区预测早期跳过决策研究

基于集成学习的高效VVC比特率梯度预测方法

基于轻量级全连接网络的H.266VVC分量间预测.docx

VVC参考软件

VVC反变换编码

最新通用视频编码标准H.266VVC.pdf

基于深度学习的快速QTMT划分.docx

最新视频编码标准（H.266）VVC-Draft10版本，预发布版本！

行业分类-设备装置-针对高效率视频编码标准的跳过编码模式提前判决方法.zip

H266-VVC源代码解析-视频编解码

低复杂度视频特征在VVC中全帧内率控的应用研究

JVET-O0682VVC表格_视频编码_VTM_

VVCSoftware-VTM-VTM-20.0 H.266/VVC 参考软件

基于VVC标准的电影胶片颗粒处理工具链

视频编码技术HEVC与VVC效率及复杂度比较研究

VTM块划分代码_块划分；_H.266_statementzbb_VVC_

基于深度学习的快速QTMT划分.pdf

vvc 266 标准发布会的ppt

行业分类-设备装置-基于在编码流中传送的内部预测模式的空间差错隐藏.zip

VVC H.266 360P 60FPS测试片源

前端开源库-vvc

VVC中一些4K序列

VVCSoftware_VTM-VTM-11.0

FC300 VVC+ 开环提升调试说明 20140127 JT

H.266/VVC视频解码器软件实现与优化

VVC视频编解码中文参考资源

感应电机的电压控制（VVC控制技术 ）设备节能应用手册.pdf

VVC中CU划分结果显示

微软HEVC视频扩展插件（免费）

落雪音乐-六音音源 sixyin-music-source-v1.1.0.js

最新资源

感应电机的电压控制（VVC控制技术）设备节能应用手册.pdf