基于视频通用内容特性的高效编码器参数优化模型研究（视频编码领域，HEVC标准，复杂度与性能优化）资源-CSDN文库

28 浏览量 2025-01-05 11:34:59 上传评论收藏 3.16MB PDF 举报

资源推荐

资源详情

资源评论

Multimed Tools Appl (2018) 77:16113–16141

DOI 10.1007/s11042-017-5180-1

Proof-of-concept: role of generic content characteristics

in optimizing video encoders

Complexity- and content-aware sequence-level encoder

parameter decision framework

Ahmed Aldahdooh

· Marcus Barkowsky

Patrick Le Callet

Received: 19 January 2017 / Revised: 3 July 2017 / Accepted: 30 August 2017 /

Published online: 11 September 2017

©SpringerScience+BusinessMedia,LLC2017

Abstract The influence of content characteristics on the efficiency of redundancy and irrel-

evance reduction in video coding is well known. Each new standard in video coding includes

additional coding tools that potentially increase the complexity of the encoding process in

order to gain further rate-distortion efficiency. In order to be versatile, encoder implemen-

tations often neglect the content dependency or they optimize the encoding complexity on

a local scale, i.e. on a single frame or on the coding unit level without being aware of the

global content type. In this contribution, an analysis is presented which coding tool settings

of the recent High Efficiency Video Coding (HEVC) standard are most efficient for a given

content type when balancing rate-distortion against computational complexity measured in

encoding time. The content type is algorithmically determined, leading to a framework for

rate-distortion-complexity based encoder parameter decision for any given video sequence.

The implementability is demonstrated using a set of 35 Ultra-HD (UHD) sequences. The

performance results and evaluations show that the encoding parameters may be predicted

to optimize the video coding. For instance, predicting motion search range achieves com-

plexity reduction of 36% on average when HEVC reference HM is used at a cost of bitrate

(2%). When another HEVC coding standard software, x265, is used to predict the coding

unit (CU) size, there is a reduction of 20% in bitrate and of 8% in distortion but there is a

reduction of 6% in execution time.

Keywords Content-aware coding · Video content features · Execution

time(Complexity) · HEVC · UHD

! Ahmed Aldahdooh

Ahmed.Aldahdooh@univ-nantes.fr

Marcus Barkowsky

Marcus.Barkowsky@univ-nantes.fr

Patrick Le Callet

Patrick.LeCallet@univ-nantes.fr

Laboratoire des Sciences du Num

erique de Nantes (LS2N) – UMR 6004, Universit

edeNantes,

Nantes, France

16114 Multimed Tools Appl (2018) 77:16113–16141

1Introduction

The recent development in multimedia devices and mobile networks have opened the door

for end-users to easily capture videos with different resolutions and qualities, therefore the

demand for delivering high quality immersive videos is increasing. According to the Cisco

report [47], in 2016, 15% connected Flat-Panel TV sets are 4K and by 2021, the expectations

are to be 56%. In addition, 30% of global video on demand (VoD) streaming content will

be in UHD by 2021, 66% (76% in 2016) in HD, and 4% (23% in 2016) in SD.

Moreover, smart-phone applications became popular and important. On the other hand,

these devices have limited computational power and batteries. The latest video coding

standard, High Efficiency Video Coding (HEVC) [19], is designed especially to target dif-

ferent types of applications and particularly high resolution video applications [46]. Quality,

bitrate, and complexity (encoding time) are the key elements of video coding performance

evaluation. The complexity of HEVC is increased due to the new/improved coding tools.

This complexity is a liability for some targeted users, for some applications, or, for some

devices. Some targeted users, like content providers, may not care about the complex-

ity since they have the power to build high performance encoders, i.e. parallel encoders.

Some applications (security and safety applications) require that the captured videos need

to be quickly encoded and sent. Due to the limited computational power and batteries

of some devices, the complexity is an important issue. Therefore, tools to reduce the

encoding time without compromising the coding efficiency and the perceived quality are

important.

In this paper, we measure the computational complexity by the execution time. Algorith-

mic complexity would require a detailed analysis of the video coding algorithm which is out

of the scope of this paper and due to the complexity of the decision process in the encoder

also very questionable. Our approach may be justified by the fact that the Joint Video Team

used the same measure when evaluating new coding tools [5].

There are several sources of complexity increase in video coding. First, the new or

improved encoding tools that are introduced in HEVC such as new intra and inter modes,

new quad-tree block structure, improved motion estimation, and number of reference frames

[46]. For instance, testing all combinations of block splitting and inter modes in each ref-

erence frame will highly increase the complexity. In [5], the distribution of encoding time

per operation and encoding configuration is analyzed. Second, choosing encoder parame-

ter values also trade-off the quality and the complexity. For instance, selecting a smaller

motion search range value, accelerates the encoding process at the price of quality and a

larger value may slow down the encoding process.

Finally, many research efforts have pointed to the importance of content types and its

underlying characteristics in video coding. In the real world, different video classes like,

natural scenes, cartoons, sports, news broadcasting and computer-generated videos exist and

each class may be categorized into subclasses. The wide variety of video content causes a

challenge for content based research since, of course, natural scenes differ from sport scenes

and sport scenes differ from cartoon and etc. Hence, considering content types and their

corresponding features is very important in different aspects. First, in setting up subjective

experiment; in [14, 31, 32], Video Quality Experts Group (VQEG) and Pinson et al. mention

some limitations to video source selection to conduct researches. Second, in improving

objective measures of video quality; in [4, 22, 29], the video content features are analyzed to

improve the objective video quality measure. Third, in improving video coding efficiency; it

can be concluded from the subjective experiment of Pitrey et al. [33]thatthevideocontent

influences the video encoding.

Multimed Tools Appl (2018) 77:16113–16141 16115

In more details, one may encode a given video with several configurations and may get a

similar Mean Opinion Score (MOS) for the output videos, but practically one of them spent

minimal computational power. This minimum may change from one content to another

depending on video content characteristics. Fourth, in designing error resilience tools: in

[1, 6, 18, 35], switch algorithms that utilize the content features are used to decide which

error concealment technique should be applied. While in [3], a content-aware adaptive

multiple description coding scheme is proposed.

Even though this dependency is well-known, there are insufficient research efforts that

analyze the content features to predict the appropriate encoder parameter values that trade-

off between bitrate, distortion, and complexity. The existing tools, Section 2,arefocusingin

reducing complexity to a certain extend while the quality loss and bitrate decrease levels are

not assured and the complexity awareness came from the fact that some of the modes and

tools of the video coding either are rarely used or unnecessary in some situations, Section 2.

A room of improvement can be accomplished not just for complexity reduction but also to

trade-off between bitrate (R), distortion (D), and complexity (C) by utilizing the underlying

content features to predict the encoder parameter values.

In this paper, a new approach that addresses and analyzes the content features to predict

the encoder parameter values are demonstrated. In the analysis phase, the content features

are analyzed to find the features that have an influence in deciding the appropriate encoder

parameter values at sequence level and for a given QP. In order to find this relationship, R

(the total bitrate of a sequence that is required for transmission/storage), D (the PSNR val-

ues or other video quality measurement), and C (the encoding time that is required to encode

a sequence using specific configuration) are considered. Then, for each sample in the data

set, the content features are associated/labeled with the appropriate encoder parameter value

(class). Finally, the data set is trained using classification tree or support vector machines

learning algorithms. This prediction model is used to predict the encoding parameters of the

video to be encoded. The results show, for instance, that predicting motion search range for

UHD contents achieves complexity reduction of 36% on average when the HEVC reference

software HM13 is used at a cost of bitrate (2%). It is also shown that the x265 implemen-

tation of the HEVC standard gains 20% of bitrate and 8% of distortion but reduction of 6%

in execution time when optimizing the coding unit (CU) size.

Despite the fact that local content characteristics are frequently used to select local cod-

ing parameters, the authors are not aware of publications that first estimate the temporal

and spatial sequence characteristics before choosing the encoding parameters. In video cod-

ing research, the test sequences are well known by the experts and thus adaptations of the

encoding algorithm are known. While sequence characteristics may be used in industrial

products, it seems that this has not been published. To the best of the author’s knowledge,

open-source software does not seem to use content statistics for choosing parameters but

rather relies on predefined profile based approaches.

UHD content is particularly interesting for this study as the computational complexity

of such high resolution is still an issue in many applications. In addition, it should be noted

that UHD content requires both attention to small details as well as general content struc-

ture so the results presented in this paper should apply to the lower resolutions that are

more determined by the general content structure. The primary contribution of this paper is

the prediction of the encoding parameter values leading to minimum complexity in terms

of execution time using the underlying content features. For instance, if a video sample is

encoded using different configurations and the output videos are in same bitrate and dis-

tortion ranges, the configuration that achieve the minimum encoding time will be chosen.

If the output videos achieve same complexity and distortion ranges, the configuration of

16116 Multimed Tools Appl (2018) 77:16113–16141

the lowest bitrate will be chosen. The following properties of the proposed model can be

noticed:

– The video coding tools are not changed and the candidate prediction modes are not

reduced.

– Targeting global quality not local quality since block-to-block and frame-to-frame

quality variations yield annoying temporal artifacts.

–Theproposedmodelusestherelationalvalues,nottheabsolutevalues,ofthebitrate,

the distortion and the execution time. Therefore, one can continue using the model

as it for low complexity devices such as portable devices, but retraining the model is

recommended since these relational values will be changed if the computer architecture

is changed.

– The proposed model is a complementary work of other complexity reduction tech-

niques. Suppose that there is N sequences to be encoded and there is time limitation (not

necessarily power supply limitation), one possible solution is to distribute the time bud-

get evenly. This solution might not be optimum since some of the sequences are hard

to code and some are not. Therefore, one thing to do is to map the predicted parameters

values of each sequence into available budget and then use one of up-to-date algorithms

of complexity reduction such as [26].

The rest of the paper is organized as follows: a related work is summarized in

Section 2. The observations and the problem statement are formalized in Section 3.

Section 4 details the content features. The description of the proposed model is illustrated

in detail in Section 5. Experimental results and performance evaluation are presented in

Sections 6 and 7 respectively. Finally, Section 8 contains some concluding remarks along

with a summary of the paper.

2Relatedwork

The aim of any encoding system is to provide the best-effort video quality for the end

users. Due to the limited bandwidth, coding systems, since H.261, employ rate-distortion

optimization (RDO) model aiming to achieve minimum degradation in video quality for a

given bitrate. It is expressed mathematically as [45]:

min{D}, subject to R ≤ R

(1)

where R

is the given bandwidth. This is solved using Lagrangian optimization as expressed

in [45]:

min{J }, where J = D + λR, (2)

where λ is a Lagrange multiplier.

When a new video coding standard is introduced, new or improved tools are introduced

that increase the complexity dramatically since the RDO needs to test all possible combi-

nations of the modes. Therefore, many research efforts have been conducted to reduce the

encoder complexity The awareness of complexity in these algorithms came from the fact

that not each coding tool need to be utilized. In [8, 21, 36, 39, 44, 49, 53], the complexity

control algorithms are implemented for inter frames based on fast mode decision, chang-

ing motion estimation search methods, or reducing the number of reference pictures. Going

into more detail, In [36], three complexity control are proposed; the first uses spatial and

temporal blocks to determine the search range using proper threshold, second, uses the sum

Multimed Tools Appl (2018) 77:16113–16141 16117

of absolute differences (SAD) cost with two thresholds to determine the prediction mode,

finally, SAD, motion vectors and optimal reference frame are used to decide the number of

reference frames. In [53], the authors proposed to use both the fractional motion estimation

and fast integer motion estimation algorithms to reduce the complexity. Kim et al. [21] use

the best mode information of a correlated macro-bloks(MB) in the time-successive frame to

determine the search mode and use the adaptive rate distortion cost threshold for early ter-

mination process. Su et al. [44] manage the complexity by changing the motion estimation

parameters and by adjusting mode decision processes using different complexity levels. In

[8], the motion estimation tools are categorized into five states according to the complexity

using SAD cost. Shen et al. [39]utilizeinter-levelcorrelationofquadtreestructureandthe

spatiotemporal correlation to determine the inter mode. In [49], the rate distortion cost on

reference frame is used to determine the CU splitting. Other algorithms are implemented

for intra frames [9, 21, 23, 27, 54]. In [27], partial computation of the cost function is used

to determine the intra mode while in [23], the Discrete cosine transform (DCT) based dom-

inant edge direction is used. In [21], the best inter mode is used to determine the proper

intra mode. Chen et al. [9] map the edge direction to a proper prediction mode in HEVC

while Zhao et al. [54]useSSIMstructuresimilarityofneighboringcodingunitstodeter-

mine the intra mode. Algorithms like [44] analyze the complexity of each coding tools and

range them to provide coding levels of complexity. Some of the above mentioned algo-

rithms are implemented in H.264/AVC and they might be adapted in HEVC. The largest

amount of complexity of HEVC is due to quad tree structure, therefore a lot of efforts have

been introduced in this domain [7, 9, 11, 17, 24, 28, 38, 40]. In [24], the authors utilize the

fact of correlation between consecutive frames to ignore the rarely used depth information

at frame level and utilize the neighboring and co-located blocks to determine the CU split-

ting while in [38], the authors extract features that are related to the content at the CU level

and use them to build a prediction model to determine the CU splitting. Shen et al. [40]

use Mean Absolute Deviation MAD to measure the texture homogeneity of the CU to early

terminate the splitting. In [7], the CU depth decision is determined by utilizing the spatial

correlations in the sequence frame while Nguyen et al. [28] determine the most probable

CU depth ranges by utilizing temporal correlation of depth levels among CUs and the con-

tinuity of the motion vector field. Chen et al. [9] propose a bottom-up partition process by

utilizing the gradient information of pixels. In [17], the back-propagation neural network

(BPNN) was used to build a classifier to decide the splitting of the CU using the of sum

absolute transform difference (SATD) and the coded block flag (CBF) as features while in

[11] the authors use the decision trees to decide the CU splitting by utilizing the encoding

information such as rate-distortion, skip merge flag, and merge flag. In [10], the authors use

the max tree depth of unconstrained frame to encode a specific number of next consecutive

frames while in [26], the complexity is controlled by weighting the basic operations in the

reference encoder.

The above mentioned techniques achieve significant reduction in complexity although

the complexity awareness came from the fact that some of the introduced modes and tools

of the encoder either are rarely used or unnecessary in some situations. Most of these

algorithms employ content properties as demonstrated in the aforementioned algorithms.

Properties like spatio-temporal correlation between blocks, SAD cost, MVs, RD cost, or

flags are used in these algorithms. Moreover, these algorithms work on block or frame level

which may yield block-to-block and frame-to-frame variations in quality. The aim of these

algorithms is to reduce the complexity while the bitrate and quality are not balanced. A

good

deal of improvement can be accomplished by utilizing the underlying content features

to predict the encoder’s parameters at sequence level.

剩余28页未读，继续阅读

评论收藏

内容反馈

码流怪侠

粉丝: 2w+
资源: 424

基于视频通用内容特性的高效编码器参数优化模型研究（视频编码领域，HEVC标准，复杂度与性能优化）

基于深度学习的高效视频编码（HEVC）复杂度降低方法

JCTVC-H1001-v1.zip_HEVC计算复杂度降低算法研究_hevc

基于合并模式和运动估计的快速编码单元编码方案以优化HEVC视频编码器的计算复杂度

新一代高效视频编码H.265HEVC原理、标准与实现 2014年版

基于感知模型的高效视频编码实时率失真优化（HEVC）

视频编码技术HEVC与VVC效率及复杂度比较研究

HEVC编码器x265的率失真复杂度优化(RDCO)算法研究与应用

视频编码复杂度受限时H.265/HEVC的率失真优化算法研究

HEVC低复杂度编码优化算法

考虑内容复杂度的HEVC帧层码率控制优化.pdf

新一代高效视频编码H.265HEVC原理、标准与实现 [万帅，杨付正 编著] 2014年版

基于卷积神经网络的HEVC帧内编码技术研究与优化

基于机器学习的HEVC屏幕内容编码与视频传输技术研究

新一代高效视频编码H.265+HEVC++原理、标准与实现_万帅，杨付正....pdf

视频编码标准HEVC、VVC和AV1的率失真/复杂度分析

下一代视频编码标准HEVC分析研究

基于时空运动一致性的高效视频编码标准HEVC合并模式早期决策方法

基于HEVC编码标准的图书馆视频资料检索研究.pdf

基于机器学习的快速CU划分方法减少HEVC复杂度的研究

高效视频编码(HEVC)开源实现Kvazaar 2.0：快速与优化的互帧编码器

HEVC 示例编码器HM13.0使用手册

最新视频编码标准 HEVC H265

基于HEVC的CABAC二进制算术编码器的FPGA实现.pdf

HEVC多视图编码多层次复杂度优化：运动估计与并行处理技术的应用

一种基于SoC-FPGA的HEVC编码器高效数据存取系统.pdf

微软HEVC视频扩展插件（免费）

落雪音乐-六音音源 sixyin-music-source-v1.1.0.js

最新资源

新一代高效视频编码H.265HEVC原理、标准与实现 [万帅，杨付正编著] 2014年版