- 1 -
Introduction to H.264 Advanced Video Coding
Abstract - We give a tutorial on video coding principles and
standards with emphasis on the latest technology called H.264 or
MPEG-4 Part 10. We describe a basic method called block-based
hybrid coding employed by most video coding standards. We use
graphical illustration to show the functionality. This paper is
suitable for those who are interested in implementing video codec
in embedded software, pure hardwired, or a combination of both.
I. Introduction
Digitized video has played an important role in many
consumer electronics applications including VCD, DVD,
video phone, portable media player, video conferencing,
video recording, e-learning etc. In order to provide
solutions of high quality (high frame resolution, high frame
rate, and low distortion) or low cost (low bit rate for storage
or transmission) or both, video compression is indispensable.
Advancement in semiconductor technology makes possible
efficient implementation of effective but computationally
complicated compression methods.
Because there are a wide range of target applications
from low-end to high-end under various constraints such as
power consumption and area cost, an application-specific
implementation may be pure software, pure hardwired, or
something in between. In order to do an optimal
implementation, it is essential to fully understand the
principles behind and algorithms employed in video coding.
Starting with MPEG-1[1] and H.261 [2], video coding
techniques/standards have gone through several generations.
The latest standard is called H.264 (also called
MPEG-4 AVC, Advanced Video Coding defined in MPEG-4
Part 10) [3][4][5]. Compared with previous standards, H.264
achieves up to 50% improvement in bit-rate efficiency. It has
been adopted by many application standards such as HD
DVD [6], DVB-H [7], HD-DTV [8], etc. Therefore, its
implementation is a very popular research topic to date. In
this tutorial paper, we introduce the essential features of
H.264.
The rest of this paper is organized as following. In
Section II, we give an outline of the block-based hybrid
video coding method. In Section III, we describe in more
detail each basic coding function. Finally, in Section IV, we
draw some conclusions.
II. Block-Based Hybrid Video Coding
A digitized video signal consists of a periodical
sequence of images called frame. Each frame consists of a
two dimensional array of pixels. Each pixel consists of three
color components, R, G and B. Usually, pixel data is
converted from RGB to another color space called YUV in
which U and V components can be sub-sampled. A
block-based coding approach divides a frame into
macroblocks each consisting of say 16x16 pixels. In a 4:2:0
format, each MB consists of 16x16 = 256 Y components and
8x8 = 64 U and 64 V components. Each of three components
of an MB is processed separately.
Fig. 1 shows a pseudo-code description of how to
compress a frame MB by MB. To compress an MB, we use a
hybrid of three techniques: prediction, transformation &
quantization, and entropy coding. The procedure works on a
frame of video. For video sequence level, we need a top
level handler, which is not covered in this paper. In the
pseudo code, f
t
denotes the current frame to be compressed
and mode could be I, P, or B.
Prediction tries to find a reference MB that is similar to
the current MB under processing so that, instead of the
whole current MB, only their (hopefully small) difference
needs to be coded. Depending on where the reference MB
comes from, prediction is classified into inter-frame
prediction and intra-frame prediction. In an inter-predict (P
or B) mode, the reference MB is somewhere in a frame
before or after the current frame, where the current MB
resides. It could also be some weighted function of MBs
procedure encode_a_frame (f
t
, mode)
for I = 1, N //** N: #rows of MBs per frame
for J = 1, M //** M: #columns of MBs per frame
Curr_MB = MB(f
t
, I, J);
case (mode)
I: Pred_MB = Intra_Pred (f’
t
, I, J);
P: Pred_MB = ME (f’
t-1
, I, J);
B: Pred_MB = ME (f’
t-1
, f’
t+1
, I, J);
Res_MB = Curr_MB – Pred_MB;
Res_Coef = Quant(Transform(Res_MB));
Output(Entropy_code(Res_Coef));
Reconst_res = ITransform(IQuant(Res_Coef)) ;
Reconst_MB = Reconst_res + Pred_MB;
Insert(Reconst_MB, f’
t
) ;
end encode_a_frame;
Fig. 1. Pseudo Code for Block-Based Hybrid Coding
a Video Frame
Jian-Wen Chen Chao-Yang Kao Youn-Long Lin
Department of Computer Science
National Tsing Hua University
Hsin-Chu, TAIWAN 300
Tel : +886-3-573-1072
Fax : +886-3-572-3694
e-mail : ylin@cs.nthu.edu.tw.