1650 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
is generic and should also be generally suited for other
applications that are not specifically mentioned above.
As has been the case for all past ITU-T and ISO/IEC video
coding standards, in HEVC only the bitstream structure and
syntax is standardized, as well as constraints on the bitstream
and its mapping for the generation of decoded pictures. The
mapping is given by defining the semantic meaning of syntax
elements and a decoding process such that every decoder
conforming to the standard will produce the same output
when given a bitstream that conforms to the constraints of the
standard. This limitation of the scope of the standard permits
maximal freedom to optimize implementations in a manner
appropriate to specific applications (balancing compression
quality, implementation cost, time to market, and other con-
siderations). However, it provides no guarantees of end-to-
end reproduction quality, as it allows even crude encoding
techniques to be considered conforming.
To assist the industry community in learning how to use the
standard, the standardization effort not only includes the de-
velopment of a text specification document, but also reference
software source code as an example of how HEVC video can
be encoded and decoded. The draft reference software has been
used as a research tool for the internal work of the committee
during the design of the standard, and can also be used as a
general research tool and as the basis of products. A standard
test data suite is also being developed for testing conformance
to the standard.
This paper is organized as follows. Section II highlights
some key features of the HEVC coding design. Section III
explains the high-level syntax and the overall structure of
HEVC coded data. The HEVC coding technology is then
described in greater detail in Section IV. Section V explains
the profile, tier, and level design of HEVC. Since writing an
overview of a technology as substantial as HEVC involves a
significant amount of summarization, the reader is referred
to [1] for any omitted details. The history of the HEVC
standardization effort is discussed in Section VI.
II. HEVC Coding Design and Feature Highlights
The HEVC standard is designed to achieve multiple goals,
including coding efficiency, ease of transport system integra-
tion and data loss resilience, as well as implementability using
parallel processing architectures. The following subsections
briefly describe the key elements of the design by which
these goals are achieved, and the typical encoder operation
that would generate a valid bitstream. More details about the
associated syntax and the decoding process of the different
elements are provided in Sections III and IV.
A. Video Coding Layer
The video coding layer of HEVC employs the same hy-
brid approach (inter-/intrapicture prediction and 2-D transform
coding) used in all video compression standards since H.261.
Fig. 1 depicts the block diagram of a hybrid video encoder,
which could create a bitstream conforming to the HEVC
standard.
An encoding algorithm producing an HEVC compliant
bitstream would typically proceed as follows. Each picture
is split into block-shaped regions, with the exact block par-
titioning being conveyed to the decoder. The first picture
of a video sequence (and the first picture at each clean
random access point into a video sequence) is coded using
only intrapicture prediction (that uses some prediction of data
spatially from region-to-region within the same picture, but has
no dependence on other pictures). For all remaining pictures
of a sequence or between random access points, interpicture
temporally predictive coding modes are typically used for
most blocks. The encoding process for interpicture prediction
consists of choosing motion data comprising the selected
reference picture and motion vector (MV) to be applied for
predicting the samples of each block. The encoder and decoder
generate identical interpicture prediction signals by applying
motion compensation (MC) using the MV and mode decision
data, which are transmitted as side information.
The residual signal of the intra- or interpicture prediction,
which is the difference between the original block and its pre-
diction, is transformed by a linear spatial transform. The trans-
form coefficients are then scaled, quantized, entropy coded,
and transmitted together with the prediction information.
The encoder duplicates the decoder processing loop (see
gray-shaded boxes in Fig. 1) such that both will generate
identical predictions for subsequent data. Therefore, the quan-
tized transform coefficients are constructed by inverse scaling
and are then inverse transformed to duplicate the decoded
approximation of the residual signal. The residual is then
added to the prediction, and the result of that addition may
then be fed into one or two loop filters to smooth out artifacts
induced by block-wise processing and quantization. The final
picture representation (that is a duplicate of the output of the
decoder) is stored in a decoded picture buffer to be used for
the prediction of subsequent pictures. In general, the order of
encoding or decoding processing of pictures often differs from
the order in which they arrive from the source; necessitating a
distinction between the decoding order (i.e., bitstream order)
and the output order (i.e., display order) for a decoder.
Video material to be encoded by HEVC is generally ex-
pected to be input as progressive scan imagery (either due to
the source video originating in that format or resulting from
deinterlacing prior to encoding). No explicit coding features
are present in the HEVC design to support the use of interlaced
scanning, as interlaced scanning is no longer used for displays
and is becoming substantially less common for distribution.
However, a metadata syntax has been provided in HEVC to
allow an encoder to indicate that interlace-scanned video has
been sent by coding each field (i.e., the even or odd numbered
lines of each video frame) of interlaced video as a separate
picture or that it has been sent by coding each interlaced frame
as an HEVC coded picture. This provides an efficient method
of coding interlaced video without burdening decoders with a
need to support a special decoding process for it.
In the following, the various features involved in hybrid
video coding using HEVC are highlighted as follows.
1) Coding tree units and coding tree block (CTB) structure:
The core of the coding layer in previous standards was