HEVCPerformance资源-CSDN文库

需积分: 9 54 浏览量 2013-05-15 11:03:50 上传评论收藏 1.3MB PDF 举报

HEVC（High Efficiency Video Coding，高效视频编码），原名H.265，是由国际电信联盟（ITU-T）视频编码专家组（VCEG）和国际标准化组织（ISO）/国际电工委员会（IEC）运动图像专家组（MPEG）共同组成的联合视频组（JCT-VC）开发的一种视频压缩标准。它是H.264/MPEG-4 AVC的继任者，目标是提供比H.264更高效的视频压缩，尤其是在较低的比特率下能够提供更高的视频质量。在编码效率方面，HEVC相较于它的前辈，例如H.262/MPEG-2 Video、H.263、MPEG-4 Visual、H.264/MPEG-4 AVC等，具有明显的优势。编码效率的核心是能够在给定的视频质量下最小化所需比特率，或者是在可用比特率下最大化视频质量。HEVC的出现使得在大约50%的平均比特率下，HEVC编码器能达到与H.264/MPEG-4 AVC编码器相当的主观视频重现质量，这意味着HEVC在低比特率、高分辨率视频内容和低延迟通信应用中尤为有效。为了比较不同视频编码标准的压缩能力，研究者们采用了客观和主观两种测试方法。客观测试通常利用峰值信噪比（PSNR）作为衡量标准，而主观测试则采用人眼的视觉感受来评估。根据文章中的主观测试结果，HEVC在WVGA和HD序列上的表现显示其设计特别适合于低比特率、高分辨率视频内容以及低延迟通信应用。值得注意的是，主观上测量到的改进在一定程度上超过了PSNR度量的改进。视频压缩技术的发展始终以优化编码效率为主要目标。HEVC标准的制定，得益于对现有视频编码技术的深刻理解和创新改进。它采用了更为复杂的编码工具和算法，包括更大的编码单元（CU），更灵活的预测模式，以及更先进的熵编码技术等，从而能够在保持视频质量不变的情况下，减少所需的比特率。具体而言，HEVC的改进包括但不限于以下几个方面： 1. 更大的编码块尺寸：HEVC中的编码块（Coding Block）大小可达64x64，而H.264中为16x16，这使得HEVC在对大范围纹理或平滑区域进行编码时更为高效。 2. 更多的预测模式：HEVC提供了更多的帧内预测和帧间预测模式，提高了预测的精确度，尤其是在复杂场景中。 3. 更高效的熵编码：利用了基于上下文的自适应二进制算术编码（CABAC）的改进版本，这在减少比特率的同时保持或提高了视频质量。 4. 并行处理能力：HEVC设计上支持更加灵活的并行处理机制，使得在多核处理器上进行编码和解码时更加高效。在应用方面，HEVC因其高压缩比、高视频质量的特点，被广泛用于流媒体传输、数字电视广播、蓝光光盘和视频会议等多种场景。尽管HEVC提供了显著的性能提升，但它也引入了更高的计算复杂性，导致编码和解码设备的硬件要求更高。随着硬件技术的发展和HEVC专利许可问题的逐步解决，HEVC正逐渐成为视频压缩的新标准。 HEVC的出现标志着视频编码技术的巨大进步，其在各种应用场景中都展示了比前代标准更好的性能，特别是在要求高效压缩和高视频质量的场合。随着HEVC技术的普及和优化，未来视频通信和存储领域必将因此而受益。

资源推荐

资源详情

资源评论

PRE-PUBLICATION DRAFT, TO APPEAR IN IEEE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DEC. 2012

Abstract—The compression capability of several generations

of video coding standards is compared by means of PSNR and

subjective testing results. A unified approach is applied to the

analysis of designs including H.262/MPEG-2 Video, H.263,

MPEG-4 Visual, H.264/MPEG-4 AVC, and HEVC. The results of

subjective tests for WVGA and HD sequences indicate that

HEVC encoders can achieve equivalent subjective reproduction

quality as encoders that conform to H.264/MPEG-4 AVC when

using approximately 50% less bit rate on average. The HEVC

design is shown to be especially effective for low bit rates, high-

resolution video content, and low-delay communication applica-

tions. The measured subjective improvement somewhat exceeds

the improvement measured by the PSNR metric.

Index Terms—Video compression, standards, HEVC, JCT-VC,

MPEG, VCEG, H.264, MPEG-4, AVC.

I. INTRODUCTION

HE primary goal of most digital video coding standards

has been to optimize coding efficiency. Coding efficiency

is the ability to minimize the bit rate necessary for representa-

tion of video content to reach a given level of video quality –

or, as alternatively formulated, to maximize the video quality

achievable within a given available bit rate.

The goal of this paper is to analyze the coding efficiency

that can be achieved by use of the emerging high-efficiency

video coding (HEVC) standard [1][2][3][4], relative to the

coding efficiency characteristics of its major predecessors

including H.262/MPEG-2 Video [5][6][7], H.263 [8],

MPEG-4 Visual [9], and H.264/MPEG-4 AVC [10][11][12].

When designing a video coding standard for broad use, the

standard is designed in order to give the developers of encod-

ers and decoders as much freedom as possible to customize

Original manuscript received May 7, 2012, revised version received Au-

gust 22, 2012.

J.-R. Ohm is with RWTH Aachen University, Aachen, Germany (e-mail:

ohm@ient.rwth-aachen.de).

G. J. Sullivan is with Microsoft Corporation, Redmond, WA, 98052, USA

(e-mail: garys@ieee.org).

H. Schwarz is with the Fraunhofer Institute for Telecommunications –

Heinrich Hertz Institute, Berlin, Germany (e-mail:

heiko.schwarz@hhi.fraunhofer.de).

T. K. Tan is with M-Sphere Consulting Pte. Ltd., 808379, Singapore. He is

a consultant for NTT DOCOMO, Inc. (e-mail: ttk@pacific.net.sg).

T. Wiegand is jointly affiliated with the Fraunhofer Institute for Telecom-

munications – Heinrich Hertz Institute and the Berlin Institute of Technology,

both in Berlin, Germany (e-mail: twiegand@ieee.org)

their implementations. This freedom is essential to enable a

standard to be adapted to a wide variety of platform architec-

tures, application environments, and computing resource con-

straints. This freedom is constrained by the need to achieve

interoperability – i.e., to ensure that a video signal encoded by

each vendor’s products can be reliably decoded by others.

This is ordinarily achieved by limiting the scope of the stand-

ard to two areas (cp. Fig. 1 in [11]):

1) Specifying the format of the data to be produced by a

conforming encoder and constraining some characteristics

of that data (such as its maximum bit rate and maximum

frame rate), without specifying any aspects of how an en-

coder would process input video to produce the encoded

data (leaving all pre-processing and algorithmic decision-

making processes outside the scope of the standard), and

2) Specifying (or bounding the approximation of) the decod-

ed results to be produced by a conforming decoder in re-

sponse to a complete and error-free input from a conform-

ing encoder, prior to any further operations to be per-

formed on the decoded video (providing substantial free-

dom over the internal processing steps of the decoding

process and leaving all post-processing, loss/error recov-

ery, and display processing outside the scope as well).

This intentional limitation of scope complicates the analysis

of coding efficiency for video coding standards, as most of the

elements that affect the end-to-end quality characteristics are

outside the scope of the standard. In this work, the emerging

HEVC design is analyzed using a systematic approach that is

largely similar in spirit to that previously applied to analysis of

the first version of H.264/MPEG-4 AVC in [13]. A major

emphasis in this analysis is the application of a disciplined and

uniform approach for optimization of each of the video encod-

ers. Additionally, a greater emphasis is placed on subjective

video quality analysis than what was applied in [13], as the

most important measure of video quality is the subjective

perception of quality as experienced by human observers.

The paper is organized as follows: Section II briefly de-

scribes the syntax features of the investigated video coding

standards and highlights the main coding tools that contribute

to the coding efficiency improvement from one standard gen-

eration to the next. The uniform encoding approach that is

used for all standards discussed in this paper is described in

section III. In section IV, the current performance of the

Comparison of the Coding Efficiency of Video

Coding Standards – Including High Efficiency

Video Coding (HEVC)

Jens-Rainer Ohm, Member, IEEE, Gary J. Sullivan, Fellow, IEEE, Heiko Schwarz,

Thiow Keng Tan, Senior Member, IEEE, and Thomas Wiegand, Fellow, IEEE

PRE-PUBLICATION DRAFT, TO APPEAR IN IEEE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DEC. 2012

HEVC reference implementation is investigated in terms of

tool-wise analysis, and in comparison to previous standards, as

assessed by objective quality measurement (particularly

PSNR). Section V provides results of the subjective quality

testing of HEVC in comparison to the previous best-

performing standard, H.264/MPEG-4 AVC.

II. SYNTAX OVERVIEW

The basic design of all major video coding standards since

H.261 (in 1990) [14] follows the so-called block-based hybrid

video coding approach. Each block of a picture is either intra-

picture coded (a.k.a. coded in an “intra” coding mode), with-

out referring to other pictures of the video sequence, or it is

temporally predicted (i.e. inter-picture coded, a.k.a. coded in

an “inter” coding mode), where the prediction signal is formed

by a displaced block of an already coded picture. The latter

technique is also referred to as motion-compensated prediction

and represents the key concept for utilizing the large amount

of temporal redundancy in video sequences. The prediction

error signal (or the complete intra coded block) is processed

using transform coding for exploiting spatial redundancy. The

transform coefficients that are obtained by applying a decorre-

lating (linear or approximately linear) transform to the input

signal are quantized and then entropy coded together with side

information such as coding modes and motion parameters.

Although all considered standards follow the same basic de-

sign, they differ in various aspects, which finally results in a

significantly improved coding efficiency from one generation

of standard to the next. In the following, we provide an over-

view of the main syntax features for the considered standards.

The description is limited to coding tools for progressive-scan

video that are relevant for the comparison in this paper. For

further details, the reader is referred to the draft HEVC stand-

ard [4], the prior standards [5][8][9][10], and corresponding

books [6][7][15] and overview articles [3][11][12].

In order to specify conformance points facilitating interop-

erability for different application areas, each standard defines

particular profiles. A profile specifies a set of coding tools that

can be employed in generating conforming bitstreams. We

concentrate on the profiles that provide the best coding effi-

ciency for progressive-scanned 8-bit-per-sample video with

the 4:2:0 chroma sampling format, as the encoding of inter-

laced-scan video, high bit depths, and non-4:2:0 material is not

in the central focus of the HEVC project for developing the

first version of the standard.

A. ITU-T Rec. H.262 | ISO/IEC 13818-2 (MPEG-2 Video)

H.262/MPEG-2 Video [5] was developed as an official joint

project of ITU-T and ISO/IEC JTC 1. It was finalized in 1994

and is still widely used for digital television and the DVD-

video optical disc format. Similarly as for its predecessors

H.261 [14] and MPEG-1 Video [16], each picture of a video

sequence is partitioned into macroblocks, which consist of a

16×16 luma block and, in the 4:2:0 chroma sampling format,

two associated 8×8 chroma blocks. The standard defines three

picture types: I, P, and B pictures. I and P pictures are always

coded in display/output order. In I pictures, all macroblocks

are coded in intra coding mode, without referencing other

pictures in the video sequence. A macroblock (MB) in a P

picture can be either transmitted in intra or in inter mode. For

the inter mode, the last previously coded I or P picture is used

as reference picture. The displacement of an inter MB in a P

picture relative to the reference picture is specified by a half-

sample precision motion vector. The prediction signal at half-

sample locations is obtained by bi-linear interpolation. In

general, the motion vector is differentially coded using the

motion vector of the MB to the left as a predictor. The stand-

ard includes syntax features that allow a particularly efficient

signaling of zero-valued motion vectors. In H.262/MPEG-2

Video, B pictures have the property that they are coded after,

but displayed before the previously coded I or P picture. For a

B picture, two reference pictures can be employed: the I/P

picture that precedes the B picture in display order and the I/P

picture that succeeds it. When only one motion vector is used

for motion compensation of a MB, the chosen reference pic-

ture is indicated by the coding mode. B pictures also provide

an additional coding mode, for which the prediction signal is

obtained by averaging prediction signals from both reference

pictures. For this mode, which is referred to as the bi-

prediction or bi-directional prediction mode, two motion vec-

tors are transmitted. Consecutive runs of inter MBs in B pic-

tures that use the same motion parameters as the MB to their

left and do not include a prediction error signal can be indicat-

ed by a particularly efficient syntax.

For transform coding of intra MBs and the prediction errors

of inter MBs, a DCT is applied to blocks of 8×8 samples. The

DCT coefficients are represented using a scalar quantizer. For

intra MBs, the reconstruction values are uniformly distributed,

while for inter MBs, the distance between zero and the first

non-zero reconstruction values is increased to three halves of

the quantization step size. The intra DC coefficients are differ-

entially coded using the intra DC coefficient of the block to

their left (if available) as their predicted value. For perceptual

optimization, the standard supports the usage of quantization

weighting matrices, by which effectively different quantiza-

tion step sizes can be used for different transform coefficient

frequencies. The transform coefficients of a block are scanned

in a zig-zag manner and transmitted using two-dimensional

run-level variable-length coding (VLC). Two VLC tables are

specified for quantized transform coefficients (a.k.a. “levels”).

One table is used for inter MBs. For intra MBs, the employed

table can be selected at the picture level.

The most widely implemented profile of H.262/MPEG-2

Video is the Main Profile. It supports video coding with the

4:2:0 chroma sampling format and includes all tools that sig-

nificantly contribute to coding efficiency. The Main Profile is

used for the comparisons in this paper.

B. ITU-T Recommendation H.263

The first version of ITU-T Rec. H.263 [8] defines syntax

features that are very similar to those of H.262/MPEG-2 Vid-

eo, but it includes some changes that make it more efficient

for low-delay low bit rate coding. The coding of motion vec-

tors has been improved by using the component-wise median

of the motion vectors of three neighboring previously decoded

blocks as the motion vector predictor. The transform coeffi-

PRE-PUBLICATION DRAFT, TO APPEAR IN IEEE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DEC. 2012

cient levels are coded using a three-dimensional run-level-last

VLC, with tables optimized for lower bit rates. The first ver-

sion of H.263 contains four annexes (annexes D through G)

that specify additional coding options, among which annexes

D and F are frequently used for improving coding efficiency.

The usage of annex D allows motion vectors to point outside

the reference picture, a key feature that is not permitted in

H.262/MPEG-2 Video. Annex F introduces a coding mode for

P pictures, the inter 8×8 mode, in which four motion vectors

are transmitted for a MB, each for an 8×8 sub-block. It further

specifies the usage of overlapped block motion compensation.

The second and third versions of H.263, which are often

called H.263+ and H.263++, respectively, add several optional

coding features in the form of annexes. Annex I improves the

intra coding by supporting a prediction of intra AC coeffi-

cients, defining alternative scan patterns for horizontally and

vertically predicted blocks, and adding a specialized quantiza-

tion and VLC for intra coefficients. Annex J specifies a

deblocking filter that is applied inside the motion compensa-

tion loop. Annex O adds scalability support, which includes a

specification of B pictures roughly similar to those in

H.262/MPEG-2 Video. Some limitations of version 1 in terms

of quantization are removed by annex T, which also improves

the chroma fidelity by specifying a smaller quantization step

size for chroma coefficients than for luma coefficients. An-

nex U introduces the concept of multiple reference pictures.

With this feature, motion-compensated prediction is not re-

stricted to use just the last decoded I/P picture (or, for coded B

pictures using annex O, the last two I/P pictures) as a refer-

ence picture. Instead, multiple decoded reference pictures are

inserted into a picture buffer and can be used for inter predic-

tion. For each motion vector, a reference picture index is

transmitted, which indicates the employed reference picture

for the corresponding block. The other annexes in H.263+ and

H.263++ mainly provide additional functionalities such as the

specification of features for improved error resilience.

The H.263 profiles that provide the best coding efficiency

are the Conversational High Compression (CHC) profile and

the High Latency Profile (HLP). The CHC profile includes

most of the optional features (annexes D, F, I, J, T, and U) that

provide enhanced coding efficiency for low-delay applica-

tions. The High Latency Profile adds the support of B pictures

(as defined in annex O) to the coding efficiency tools of the

CHC profile and is targeted for applications that allow a high-

er coding delay.

C. ISO/IEC 14496-2 (MPEG-4 Visual)

MPEG-4 Visual [9], a.k.a. Part 2 of the MPEG-4 suite is

backward-compatible to H.263 in the sense that each conform-

ing MPEG-4 decoder must be capable of decoding H.263

Baseline bitstreams (i.e. bitstreams that use no H.263 optional

annex features). Similarly as for annex F in H.263, the inter

prediction in MPEG-4 can be done with 16×16 or 8×8 blocks.

While the first version of MPEG-4 only supports motion com-

pensation with half-sample precision motion vectors and bi-

linear interpolation (similar to H.262/MPEG-2 Video and

H.263), version 2 added support for quarter-sample precision

motion vectors. The luma prediction signal at half-sample

locations is generated using an 8-tap interpolation filter. For

generating the quarter-sample positions, bi-linear interpolation

of the integer- and half-sample positions is used. The chroma

prediction signal is generated by bi-linear interpolation. Mo-

tion vectors are differentially coded using a component-wise

median prediction and are allowed to point outside the refer-

ence picture. MPEG-4 Visual supports B pictures (in some

profiles), but it does not support the feature of multiple refer-

ence pictures (except on a slice basis for loss resilience pur-

poses) and it does not specify a deblocking filter inside the

motion compensation loop.

The transform coding in MPEG-4 is basically similar to that

of H.262/MPEG-2 Video and H.263. However, two different

quantization methods are supported. The first quantization

method, which is sometimes referred to as MPEG-style quan-

tization, supports quantization weighting matrices similarly to

H.262/MPEG-2 Video. With the second quantization method,

which is called H.263-style quantization, the same quantiza-

tion step size is used for all transform coefficients with the

exception of the DC coefficient in intra blocks. The transform

coefficient levels are coded using a three-dimensional run-

level-last code as in H.263. Similarly as in annex I of H.263,

MPEG-4 Visual also supports the prediction of AC coeffi-

cients in intra blocks as well as alternative scan patterns for

horizontally and vertically predicted intra blocks and the usage

of a separate VLC table for intra coefficients.

For the comparisons in this paper, we used the Advanced

Simple Profile (ASP) of MPEG-4 Visual, which includes all

relevant coding tools. We generally enabled quarter-sample

precision motion vectors. MPEG-4 ASP additionally includes

global motion compensation. Due to the limited benefits expe-

rienced in practice and the complexity and general difficulty

of estimating global motion fields suitable for improving the

coding efficiency, this feature is rarely supported in encoder

implementations and is also not used in our comparison.

D. ITU-T Rec. H.264 | ISO/IEC 14496-10 (MPEG-4 AVC)

H.264/MPEG-4 AVC [10][12] is the second video coding

standard that was jointly developed by ITU-T VCEG and

ISO/IEC MPEG. It still uses the concept of 16×16 macro-

blocks, but contains many additional features. One of the most

obvious differences from older standards is its increased flexi-

bility for inter coding. For the purpose of motion-compensated

prediction, a macroblock can be partitioned into square and

rectangular block shapes with sizes ranging from 4×4 to

16×16 luma samples. H.264/MPEG-4 AVC also supports

multiple reference pictures. Similarly to annex U of H.263,

motion vectors are associated with a reference picture index

for specifying the employed reference picture. The motion

vectors are transmitted using quarter-sample precision relative

to the luma sampling grid. Luma prediction values at half-

sample locations are generated using a 6-tap interpolation

filter and prediction values at quarter-sample locations are

obtained by averaging two values at integer- and half-sample

positions. Weighted prediction can be applied using a scaling

and offset of the prediction signal. For the chroma compo-

nents, a bi-linear interpolation is applied. In general, motion

vectors are predicted by the component-wise median of the

剩余14页未读，继续阅读

评论收藏

内容反馈

longyxiaopei

粉丝: 0
资源: 2

HEVC Performance

最新资源

HEVC Performance

HEVC的概论

HEVC帧间提案

HEVC入门论文(多篇)

HEVC测试序列

Performance Comparison of VVC, AV1, HEVC, and AVC

HEVC帧内预测

HEVC 配置文件

HEVC_overview

HEVC-Overview

Overview of HEVC

HEVC 关键技术

HEVC 熵编码

HEVC Specfication

HEVC开源编码器x265

Effective H.264/AVC to HEVC Transcoder based on Prediction Homogeneity

基于深度学习的HEVC SCC帧内编码快速算法.pdf

Overview of the High Efficiency Video Coding(HEVC) Standard.pdf

Fast mode decision algorithm for 3D-HEVC encoding optimization based on depth information

Performance Comparison of H.264 and H.265 Encoders for 4K Video

Performance

5-1-Java-Performance

mediasdk_release_notes_2018r1.pdf

hevc standard 原文和翻译

解读hevc标准

最新资源