1670 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
the video encoders. Additionally, a greater emphasis is placed
on subjective video quality analysis than what was applied
in [13], as the most important measure of video quality is
the subjective perception of quality as experienced by human
observers.
The paper is organized as follows. Section II describes the
syntax features of the investigated video coding standards
and highlights the main coding tools that contribute to the
coding efficiency improvement from one standard generation
to the next. The uniform encoding approach that is used for
all standards discussed in this paper is described in Section
III. In Section IV, the current performance of the HEVC
reference implementation is investigated in terms of tool-
wise analysis, and in comparison with previous standards,
as assessed by objective quality measurement, particularly
peak signal-to-noise ratio (PSNR). Section V provides results
of the subjective quality testing of HEVC in comparison
to the previous best-performing standard, H.264/MPEG-4
AVC.
II. Syntax Overview
The basic design of all major video coding standards since
H.261 (in 1990) [14] follows the so-called block-based hybrid
video coding approach. Each block of a picture is either intra-
picture coded (also known as coded in an intra coding mode),
without referring to other pictures of the video sequence, or it
is temporally predicted (i.e., inter-picture coded, also known
as coded in an inter coding mode), where the prediction signal
is formed by a displaced block of an already coded picture.
The latter technique is also referred to as motion-compensated
prediction and represents the key concept for utilizing the
large amount of temporal redundancy in video sequences. The
prediction error signal (or the complete intra-coded block)
is processed using transform coding for exploiting spatial
redundancy. The transform coefficients that are obtained by
applying a decorrelating (linear or approximately linear) trans-
form to the input signal are quantized and then entropy coded
together with side information such as coding modes and
motion parameters. Although all considered standards follow
the same basic design, they differ in various aspects, which
finally results in a significantly improved coding efficiency
from one generation of standard to the next. In the following,
we provide an overview of the main syntax features for the
considered standards. The description is limited to coding tools
for progressive-scan video that are relevant for the comparison
in this paper. For further details, the reader is referred to the
draft HEVC standard [4], the prior standards [5], [8]–[10], and
corresponding books [6], [7], [15] and overview articles [3],
[11], [12].
In order to specify conformance points facilitating interop-
erability for different application areas, each standard defines
particular profiles. A profile specifies a set of coding tools
that can be employed in generating conforming bitstreams. We
concentrate on the profiles that provide the best coding effi-
ciency for progressive-scanned 8-bit-per-sample video with the
4:2:0 chroma sampling format, as the encoding of interlaced-
scan video, high bit depths, and non-4:2:0 material has not
been in the central focus of the HEVC project for developing
the first version of the standard.
A. ITU-T Rec. H.262 | ISO/IEC 13818-2 (MPEG-2 Video)
H.262/MPEG-2 Video [5] was developed as an official joint
project of ITU-T and ISO/IEC JTC 1. It was finalized in 1994
and is still widely used for digital television and the DVD-
Video optical disc format. Similarly, as for its predecessors
H.261 [14] and MPEG-1 Video [16], each picture of a video
sequence is partitioned into macroblocks (MBs), which consist
ofa16× 16 luma block and, in the 4:2:0 chroma sampling
format, two associated 8 × 8 chroma blocks. The standard
defines three picture types: I, P, and B pictures. I and P
pictures are always coded in display/output order. In I pictures,
all MBs are coded in intra coding mode, without referencing
other pictures in the video sequence. An MB in a P picture
can be either transmitted in intra or in inter mode. For the
inter mode, the last previously coded I or P picture is used
as reference picture. The displacement of an inter MB in
a P picture relative to the reference picture is specified by
a half-sample precision motion vector. The prediction signal
at half-sample locations is obtained by bilinear interpolation.
In general, the motion vector is differentially coded using
the motion vector of the MB to the left as a predictor.
The standard includes syntax features that allow a partic-
ularly efficient signaling of zero-valued motion vectors. In
H.262/MPEG-2 Video, B pictures have the property that they
are coded after, but displayed before the previously coded
I or P picture. For a B picture, two reference pictures can
be employed: the I/P picture that precedes the B picture
in display order and the I/P picture that succeeds it. When
only one motion vector is used for motion compensation
of an MB, the chosen reference picture is indicated by the
coding mode. B pictures also provide an additional coding
mode, for which the prediction signal is obtained by averaging
prediction signals from both reference pictures. For this mode,
which is referred to as the biprediction or bidirectional predic-
tion mode, two motion vectors are transmitted. Consecutive
runs of inter MBs in B pictures that use the same motion
parameters as the MB to their left and do not include a
prediction error signal can be indicated by a particularly
efficient syntax.
For transform coding of intra MBs and the prediction errors
of inter MBs, a discrete cosine transform (DCT) is applied to
blocks of 8×8 samples. The DCT coefficients are represented
using a scalar quantizer. For intra MBs, the reconstruction
values are uniformly distributed, while for inter MBs, the
distance between zero and the first nonzero reconstruction
values is increased to three halves of the quantization step size.
The intra DC coefficients are differentially coded using the
intra DC coefficient of the block to their left (if available) as
their predicted value. For perceptual optimization, the standard
supports the usage of quantization weighting matrices, by
which effectively different quantization step sizes can be used
for different transform coefficient frequencies. The transform
coefficients of a block are scanned in a zig–zag manner
and transmitted using 2-D run-level variable-length coding
(VLC). Two VLC tables are specified for quantized transform