没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文探讨了视频编码过程中内容特性对速率、失真和计算复杂度的影响,提出了利用视频内容特征预测最优编码参数的新方法。具体而言,作者针对高分辨率视频(UHD)提出了一种新的决策框架,在序列级别而非帧或块级别进行分析。通过对编码参数(如运动搜索范围和块大小)的选择和预测来优化压缩率、质量和复杂度之间的权衡关系。实验结果表明,使用新提出的框架可以有效降低复杂度并在一定程度上减少比特率和提高质量。为了评估该模型的表现并展示预测效果,研究团队使用HM和x265编码器进行了对比测试,验证了框架的有效性和实用性。文中还讨论了未来改进方向,比如引入语义特性以及更精细地建模视觉系统响应。 适用人群:视频编码技术的研究人员和技术开发者、高校相关专业教师与学生 使用场景及目标:为不同类型设备和服务(尤其是受限于硬件能力的情况如移动设备等)提供更为智能高效的视频编码解决方案;提升编码效率的同时不影响用户体验的流畅性和画质表现。 其他说明:文中强调了解析不同应用场景下各类参数的最佳配置方式,这不仅限于特定软件平台或协议版本。对于后续工作,作者提议继续拓展特征空间探索更多可能性。此外,关于主观感受评价也是值得进一步深入研究的内容之一。
资源推荐
资源详情
资源评论
Multimed Tools Appl (2018) 77:16113–16141
DOI 10.1007/s11042-017-5180-1
Proof-of-concept: role of generic content characteristics
in optimizing video encoders
Complexity- and content-aware sequence-level encoder
parameter decision framework
Ahmed Aldahdooh
1
· Marcus Barkowsky
1
·
Patrick Le Callet
1
Received: 19 January 2017 / Revised: 3 July 2017 / Accepted: 30 August 2017 /
Published online: 11 September 2017
©SpringerScience+BusinessMedia,LLC2017
Abstract The influence of content characteristics on the efficiency of redundancy and irrel-
evance reduction in video coding is well known. Each new standard in video coding includes
additional coding tools that potentially increase the complexity of the encoding process in
order to gain further rate-distortion efficiency. In order to be versatile, encoder implemen-
tations often neglect the content dependency or they optimize the encoding complexity on
a local scale, i.e. on a single frame or on the coding unit level without being aware of the
global content type. In this contribution, an analysis is presented which coding tool settings
of the recent High Efficiency Video Coding (HEVC) standard are most efficient for a given
content type when balancing rate-distortion against computational complexity measured in
encoding time. The content type is algorithmically determined, leading to a framework for
rate-distortion-complexity based encoder parameter decision for any given video sequence.
The implementability is demonstrated using a set of 35 Ultra-HD (UHD) sequences. The
performance results and evaluations show that the encoding parameters may be predicted
to optimize the video coding. For instance, predicting motion search range achieves com-
plexity reduction of 36% on average when HEVC reference HM is used at a cost of bitrate
(2%). When another HEVC coding standard software, x265, is used to predict the coding
unit (CU) size, there is a reduction of 20% in bitrate and of 8% in distortion but there is a
reduction of 6% in execution time.
Keywords Content-aware coding · Video content features · Execution
time(Complexity) · HEVC · UHD
! Ahmed Aldahdooh
Ahmed.Aldahdooh@univ-nantes.fr
Marcus Barkowsky
Marcus.Barkowsky@univ-nantes.fr
Patrick Le Callet
Patrick.LeCallet@univ-nantes.fr
1
Laboratoire des Sciences du Num
´
erique de Nantes (LS2N) – UMR 6004, Universit
´
edeNantes,
Nantes, France
16114 Multimed Tools Appl (2018) 77:16113–16141
1Introduction
The recent development in multimedia devices and mobile networks have opened the door
for end-users to easily capture videos with different resolutions and qualities, therefore the
demand for delivering high quality immersive videos is increasing. According to the Cisco
report [47], in 2016, 15% connected Flat-Panel TV sets are 4K and by 2021, the expectations
are to be 56%. In addition, 30% of global video on demand (VoD) streaming content will
be in UHD by 2021, 66% (76% in 2016) in HD, and 4% (23% in 2016) in SD.
Moreover, smart-phone applications became popular and important. On the other hand,
these devices have limited computational power and batteries. The latest video coding
standard, High Efficiency Video Coding (HEVC) [19], is designed especially to target dif-
ferent types of applications and particularly high resolution video applications [46]. Quality,
bitrate, and complexity (encoding time) are the key elements of video coding performance
evaluation. The complexity of HEVC is increased due to the new/improved coding tools.
This complexity is a liability for some targeted users, for some applications, or, for some
devices. Some targeted users, like content providers, may not care about the complex-
ity since they have the power to build high performance encoders, i.e. parallel encoders.
Some applications (security and safety applications) require that the captured videos need
to be quickly encoded and sent. Due to the limited computational power and batteries
of some devices, the complexity is an important issue. Therefore, tools to reduce the
encoding time without compromising the coding efficiency and the perceived quality are
important.
In this paper, we measure the computational complexity by the execution time. Algorith-
mic complexity would require a detailed analysis of the video coding algorithm which is out
of the scope of this paper and due to the complexity of the decision process in the encoder
also very questionable. Our approach may be justified by the fact that the Joint Video Team
used the same measure when evaluating new coding tools [5].
There are several sources of complexity increase in video coding. First, the new or
improved encoding tools that are introduced in HEVC such as new intra and inter modes,
new quad-tree block structure, improved motion estimation, and number of reference frames
[46]. For instance, testing all combinations of block splitting and inter modes in each ref-
erence frame will highly increase the complexity. In [5], the distribution of encoding time
per operation and encoding configuration is analyzed. Second, choosing encoder parame-
ter values also trade-off the quality and the complexity. For instance, selecting a smaller
motion search range value, accelerates the encoding process at the price of quality and a
larger value may slow down the encoding process.
Finally, many research efforts have pointed to the importance of content types and its
underlying characteristics in video coding. In the real world, different video classes like,
natural scenes, cartoons, sports, news broadcasting and computer-generated videos exist and
each class may be categorized into subclasses. The wide variety of video content causes a
challenge for content based research since, of course, natural scenes differ from sport scenes
and sport scenes differ from cartoon and etc. Hence, considering content types and their
corresponding features is very important in different aspects. First, in setting up subjective
experiment; in [14, 31, 32], Video Quality Experts Group (VQEG) and Pinson et al. mention
some limitations to video source selection to conduct researches. Second, in improving
objective measures of video quality; in [4, 22, 29], the video content features are analyzed to
improve the objective video quality measure. Third, in improving video coding efficiency; it
can be concluded from the subjective experiment of Pitrey et al. [33]thatthevideocontent
influences the video encoding.
Multimed Tools Appl (2018) 77:16113–16141 16115
In more details, one may encode a given video with several configurations and may get a
similar Mean Opinion Score (MOS) for the output videos, but practically one of them spent
minimal computational power. This minimum may change from one content to another
depending on video content characteristics. Fourth, in designing error resilience tools: in
[1, 6, 18, 35], switch algorithms that utilize the content features are used to decide which
error concealment technique should be applied. While in [3], a content-aware adaptive
multiple description coding scheme is proposed.
Even though this dependency is well-known, there are insufficient research efforts that
analyze the content features to predict the appropriate encoder parameter values that trade-
off between bitrate, distortion, and complexity. The existing tools, Section 2,arefocusingin
reducing complexity to a certain extend while the quality loss and bitrate decrease levels are
not assured and the complexity awareness came from the fact that some of the modes and
tools of the video coding either are rarely used or unnecessary in some situations, Section 2.
A room of improvement can be accomplished not just for complexity reduction but also to
trade-off between bitrate (R), distortion (D), and complexity (C) by utilizing the underlying
content features to predict the encoder parameter values.
In this paper, a new approach that addresses and analyzes the content features to predict
the encoder parameter values are demonstrated. In the analysis phase, the content features
are analyzed to find the features that have an influence in deciding the appropriate encoder
parameter values at sequence level and for a given QP. In order to find this relationship, R
(the total bitrate of a sequence that is required for transmission/storage), D (the PSNR val-
ues or other video quality measurement), and C (the encoding time that is required to encode
a sequence using specific configuration) are considered. Then, for each sample in the data
set, the content features are associated/labeled with the appropriate encoder parameter value
(class). Finally, the data set is trained using classification tree or support vector machines
learning algorithms. This prediction model is used to predict the encoding parameters of the
video to be encoded. The results show, for instance, that predicting motion search range for
UHD contents achieves complexity reduction of 36% on average when the HEVC reference
software HM13 is used at a cost of bitrate (2%). It is also shown that the x265 implemen-
tation of the HEVC standard gains 20% of bitrate and 8% of distortion but reduction of 6%
in execution time when optimizing the coding unit (CU) size.
Despite the fact that local content characteristics are frequently used to select local cod-
ing parameters, the authors are not aware of publications that first estimate the temporal
and spatial sequence characteristics before choosing the encoding parameters. In video cod-
ing research, the test sequences are well known by the experts and thus adaptations of the
encoding algorithm are known. While sequence characteristics may be used in industrial
products, it seems that this has not been published. To the best of the author’s knowledge,
open-source software does not seem to use content statistics for choosing parameters but
rather relies on predefined profile based approaches.
UHD content is particularly interesting for this study as the computational complexity
of such high resolution is still an issue in many applications. In addition, it should be noted
that UHD content requires both attention to small details as well as general content struc-
ture so the results presented in this paper should apply to the lower resolutions that are
more determined by the general content structure. The primary contribution of this paper is
the prediction of the encoding parameter values leading to minimum complexity in terms
of execution time using the underlying content features. For instance, if a video sample is
encoded using different configurations and the output videos are in same bitrate and dis-
tortion ranges, the configuration that achieve the minimum encoding time will be chosen.
If the output videos achieve same complexity and distortion ranges, the configuration of
16116 Multimed Tools Appl (2018) 77:16113–16141
the lowest bitrate will be chosen. The following properties of the proposed model can be
noticed:
– The video coding tools are not changed and the candidate prediction modes are not
reduced.
– Targeting global quality not local quality since block-to-block and frame-to-frame
quality variations yield annoying temporal artifacts.
–Theproposedmodelusestherelationalvalues,nottheabsolutevalues,ofthebitrate,
the distortion and the execution time. Therefore, one can continue using the model
as it for low complexity devices such as portable devices, but retraining the model is
recommended since these relational values will be changed if the computer architecture
is changed.
– The proposed model is a complementary work of other complexity reduction tech-
niques. Suppose that there is N sequences to be encoded and there is time limitation (not
necessarily power supply limitation), one possible solution is to distribute the time bud-
get evenly. This solution might not be optimum since some of the sequences are hard
to code and some are not. Therefore, one thing to do is to map the predicted parameters
values of each sequence into available budget and then use one of up-to-date algorithms
of complexity reduction such as [26].
The rest of the paper is organized as follows: a related work is summarized in
Section 2. The observations and the problem statement are formalized in Section 3.
Section 4 details the content features. The description of the proposed model is illustrated
in detail in Section 5. Experimental results and performance evaluation are presented in
Sections 6 and 7 respectively. Finally, Section 8 contains some concluding remarks along
with a summary of the paper.
2Relatedwork
The aim of any encoding system is to provide the best-effort video quality for the end
users. Due to the limited bandwidth, coding systems, since H.261, employ rate-distortion
optimization (RDO) model aiming to achieve minimum degradation in video quality for a
given bitrate. It is expressed mathematically as [45]:
min{D}, subject to R ≤ R
c
(1)
where R
c
is the given bandwidth. This is solved using Lagrangian optimization as expressed
in [45]:
min{J }, where J = D + λR, (2)
where λ is a Lagrange multiplier.
When a new video coding standard is introduced, new or improved tools are introduced
that increase the complexity dramatically since the RDO needs to test all possible combi-
nations of the modes. Therefore, many research efforts have been conducted to reduce the
encoder complexity The awareness of complexity in these algorithms came from the fact
that not each coding tool need to be utilized. In [8, 21, 36, 39, 44, 49, 53], the complexity
control algorithms are implemented for inter frames based on fast mode decision, chang-
ing motion estimation search methods, or reducing the number of reference pictures. Going
into more detail, In [36], three complexity control are proposed; the first uses spatial and
temporal blocks to determine the search range using proper threshold, second, uses the sum
Multimed Tools Appl (2018) 77:16113–16141 16117
of absolute differences (SAD) cost with two thresholds to determine the prediction mode,
finally, SAD, motion vectors and optimal reference frame are used to decide the number of
reference frames. In [53], the authors proposed to use both the fractional motion estimation
and fast integer motion estimation algorithms to reduce the complexity. Kim et al. [21] use
the best mode information of a correlated macro-bloks(MB) in the time-successive frame to
determine the search mode and use the adaptive rate distortion cost threshold for early ter-
mination process. Su et al. [44] manage the complexity by changing the motion estimation
parameters and by adjusting mode decision processes using different complexity levels. In
[8], the motion estimation tools are categorized into five states according to the complexity
using SAD cost. Shen et al. [39]utilizeinter-levelcorrelationofquadtreestructureandthe
spatiotemporal correlation to determine the inter mode. In [49], the rate distortion cost on
reference frame is used to determine the CU splitting. Other algorithms are implemented
for intra frames [9, 21, 23, 27, 54]. In [27], partial computation of the cost function is used
to determine the intra mode while in [23], the Discrete cosine transform (DCT) based dom-
inant edge direction is used. In [21], the best inter mode is used to determine the proper
intra mode. Chen et al. [9] map the edge direction to a proper prediction mode in HEVC
while Zhao et al. [54]useSSIMstructuresimilarityofneighboringcodingunitstodeter-
mine the intra mode. Algorithms like [44] analyze the complexity of each coding tools and
range them to provide coding levels of complexity. Some of the above mentioned algo-
rithms are implemented in H.264/AVC and they might be adapted in HEVC. The largest
amount of complexity of HEVC is due to quad tree structure, therefore a lot of efforts have
been introduced in this domain [7, 9, 11, 17, 24, 28, 38, 40]. In [24], the authors utilize the
fact of correlation between consecutive frames to ignore the rarely used depth information
at frame level and utilize the neighboring and co-located blocks to determine the CU split-
ting while in [38], the authors extract features that are related to the content at the CU level
and use them to build a prediction model to determine the CU splitting. Shen et al. [40]
use Mean Absolute Deviation MAD to measure the texture homogeneity of the CU to early
terminate the splitting. In [7], the CU depth decision is determined by utilizing the spatial
correlations in the sequence frame while Nguyen et al. [28] determine the most probable
CU depth ranges by utilizing temporal correlation of depth levels among CUs and the con-
tinuity of the motion vector field. Chen et al. [9] propose a bottom-up partition process by
utilizing the gradient information of pixels. In [17], the back-propagation neural network
(BPNN) was used to build a classifier to decide the splitting of the CU using the of sum
absolute transform difference (SATD) and the coded block flag (CBF) as features while in
[11] the authors use the decision trees to decide the CU splitting by utilizing the encoding
information such as rate-distortion, skip merge flag, and merge flag. In [10], the authors use
the max tree depth of unconstrained frame to encode a specific number of next consecutive
frames while in [26], the complexity is controlled by weighting the basic operations in the
reference encoder.
The above mentioned techniques achieve significant reduction in complexity although
the complexity awareness came from the fact that some of the introduced modes and tools
of the encoder either are rarely used or unnecessary in some situations. Most of these
algorithms employ content properties as demonstrated in the aforementioned algorithms.
Properties like spatio-temporal correlation between blocks, SAD cost, MVs, RD cost, or
flags are used in these algorithms. Moreover, these algorithms work on block or frame level
which may yield block-to-block and frame-to-frame variations in quality. The aim of these
algorithms is to reduce the complexity while the bitrate and quality are not balanced. A
good
deal of improvement can be accomplished by utilizing the underlying content features
to predict the encoder’s parameters at sequence level.
剩余28页未读,继续阅读
资源评论
码流怪侠
- 粉丝: 2w+
- 资源: 424
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- labview yolov5目标检测onnxruntime推理,封装dll, labview调用dll,支持同时加载多个模型并行推理,可cpu gpu, x86 x64位,识别视频和图片,cpu和g
- 南京理工大学毕业论文overleaf LaTex模板,微调版
- 机械设计桌上型插针自动裁切设备sw17可编辑全套技术资料100%好用.zip
- 相平面法,车辆动力学控制,协调控制使用,质心侧偏角-横摆角速度相平面 - 质心侧偏角-横摆角速度相平面程序,其中包括相平面绘制,鞍点绘制以及双线法边界绘制 输入初始条件一键
- Python课程设计之俄罗斯方块项目源码(高分项目).zip
- Screenshot_2025-01-05-21-37-05-687_com.android.browser.jpg
- 51965911111265232811736049198540.jpg
- 基于FPGA的超声波测距技术实现与优化
- 基于FPGA的超声波测距设计(报告+quartus程序)
- 通信原理实验报告(共6份)
- 大二实训,外星人入侵游戏 放入了很多抽象元素
- Arthas jar包 直接下载即可
- 基于ensp的校园网络规划论文 模版
- 机械设计自动升降环形动力生产称重线sw17可编辑全套技术资料100%好用.zip
- 生产管理(3).apk
- HandyControl .net 4.6
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功