没有合适的资源?快使用搜索试试~ 我知道了~
H.264/AVC编码算法运动估计的判断
需积分: 0 4 下载量 80 浏览量
2009-12-31
15:38:09
上传
评论
收藏 401KB PDF 举报
温馨提示
试读
13页
H.264/AVC编码算法运动估计的判断,H.264/AVC编码算法运动估计的判断
资源推荐
资源详情
资源评论
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 8, AUGUST 2009 1095
Motion Estimation Optimization for H.264/AVC
Using Source Image Edge Features
Zhenyu Liu, Member, IEEE, Junwei Zhou, Satoshi Goto, Fellow, IEEE, and Takeshi Ikenaga, Member, IEEE
Abstract— The H.264/AVC coding standard processes variable
block size motion-compensated prediction with multiple reference
frames to achieve a pronounced improvement in compression
efficiency. Accordingly, the computation of motion estimation
increases in proportion to the product of the number of reference
frame and the number of intermode. The mathematical analysis
in this paper illustrates that the motion-compensated prediction
errors are mainly determined by the detailed textures in the
source image. The image block being rich in textures contains
numerous high-frequency signals, which make variable block size
and multiple reference frame techniques essential. On the basis
of rate-distortion theory, in this paper, the spatial homogeneity of
an image block is made as a relative concept with respect to the
current quantization step. For the homogenous block, its futile
reference frames and intermodes can be eliminated efficiently. It
is further revealed that the sum of absolute differences value of an
image block is mainly determined by the sum of its edge gradient
amplitude and the current quantization step. Consequently, the
image content-based early termination algorithm is proposed,
and it outperforms the original method adopted by JVT
reference software. Moreover, the dynamic search range
algorithm based on the edge gradient amplitude of source image
block is analyzed. One eminent advantage of the proposed edge-
based algorithms is their efficiency to the macroblock-pipelining
architecture, and another desirable feature is their orthogonality
to fast block-matching algorithms. Experimental results
show that when these algorithms are integrated with hybrid
unsymmetrical-cross multi-hexagongrid search, an averaged
31.4–60.0% motion estimation time can be saved, whereas the
averaging BDPSNR loss is 0.0497 dB for all tested sequences.
Index Terms— Edge gradient, fast mode decision, H.264/AVC,
motion estimation (ME), multiple reference frame (MRF), vari-
able block size (VBS).
I. INTRODUCTION
H
IGH-PERFORMANCE video coding algorithms strive
to reduce temporal and spatial redundancy. For this pur-
pose, the latest international video coding standard H.264/AVC
adopts the state-of-the-art techniques, which include quarter-
pixel accurate, variable block size (VBS), and multiple ref-
erence frame (MRF) techniques, to improve the coding gain.
Manuscript received January 1, 2008; revised June 15, 2008, August 31,
2008, and December 15, 2008. First version published May 12, 2009; current
version published August 14, 2009. This work was supported by CREST
JST. This paper was recommended by Associate Editor G. Wen.
Z. Liu is with RIIT of Tsinghua University, Beijing, 100084 China (e-mail:
liuzhenyu73@tsinghua.edu.cn).
J. Zhou is with the Sun Microsystems Incorporation, Santa Clara, CA 95054
USA (e-mail: junwei.zhou@sun.com).
S. Goto, and T. Ikenaga are with the Graduate School of IPS,
Waseda University, Tokyo, 808-0135 Japan (e-mail: goto@waseda.jp;
ikenaga@waseda.jp).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSVT.2009.2022796
According to the analysis in [1], 89.2% computation power is
consumed by motion estimation (ME) part, and hence reducing
the redundant computation of ME in H.264/AVC has become
the fundamental research topic. In general, the computation
saving of ME rests on the following approaches: 1) reduc-
ing the search positions through the efficient search pattern
(fast block matching), for example, 1-D full search, four-step
search, and diamond search; 2) eliminating the searches of the
redundant search modes and reference frames; 3) early termi-
nation of the block matching by defining some thresholds; and
4) dynamically adjusting the search range. The first category
of algorithms, i.e., fast block matching, have been discussed
thoroughly in many papers [2]–[4] and widely adopted in the
software- or hardware-oriented implementations [5], [6]. In
this paper, we focus on exploring the approaches in the other
three categories. Through analyzing the edge features of the
source image block, we discard the trivial inter-search modes
and reference frames, define the content-based thresholds for
early termination, and dynamically reduce the search range.
The proposed approaches are compatible with the traditional
fast block-matching methods. In addition, they are friendly to
the macroblock (MB)-pipelining architecture, which is widely
adopted in hardwired encoder designs [7], [8].
VBS and MRF algorithms are the major issues leading to
massive computation in H.264/AVC encoding. The required
computation is in direct proportion to the product of the num-
ber of reference frames and the number of intermodes. The
traditional fast block-matching algorithms cannot efficiently
reduce the computational complexity introduced by VBS and
MRF techniques. On the other hand, it has been justified
that the performance of VBS and MRF algorithms depends
mainly on the nature of video sequences [1]. This means that
a great deal of computation is performed without achieving
any coding improvement. The experiments in this paper further
reveal that, at low bit rates, significant superfluous computation
exists in H.264/AVC ME processing.
Many algorithms have been provided to discard the redun-
dant computation in MRFs [1], [9]–[11]. Huang et al.[1],
developed four criteria for early terminating the motion search
on MRFs. Other works [9]–[11] reduced the search areas
depending on the strong correlations of motion vectors (MV)
in consequent pictures. However, the restrictions of MB-
pipelining architecture either degrade their performance, or
introduce considerable hardware overheads, as we shall see
in Section VI. Although some reasons have been adduced
in [1] and [9] for the superior prediction performance of
the MRF technique, the analysis of [12], [13] reveals that
1051-8215/$26.00 © 2009 IEEE
Authorized licensed use limited to: China Three Gorges University IEL Trial. Downloaded on November 3, 2009 at 22:27 from IEEE Xplore. Restrictions apply.
1096 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 8, AUGUST 2009
the most critical issue is the aliasing problem [14] coming
from high-frequency signals (or detailed textures in the spatial
domain) of the source image. Consequently, the efficiency of
the MRF technique depends on the homogeneity of the image
block. With rate-distortion theory, we make the homogeneity
as a relative concept, which depends on the power spectral
density of current quantization noise. For the homogenous
image block, discarding its MRF technique merely introduces
negligible coding quality loss, as shown in Section VII-A.
With the augment of quantization parameter (QP), more image
blocks are determined as homogenous by the proposed criteria,
and in consequence, the computation-saving performance of
our methods is ameliorated with increased QP. Moreover,
the proposed homogeneity-based reference frame reduction
algorithm is efficient to the MB-pipelining architecture, as
analyzed in Section VI.
The homogeneity-based fast intermode decision algorithm
was first proposed by Wu et al. [15] to discard the redundant
intermodes in H.264/AVC. In detail, the homogeneity of one
N × N block, where N = 16 or N = 8, is determined
by evaluating the sum of magnitude of edges at all pixels
in this block. If the sum is less than the preset constant
threshold, this block is designated as a homogenous one and it
is not further split for other intermode search. The thresholds
for homogeneity decision in [15] are constants. However,
experiments illustrate that with the increase of QP, as the
rate cost for side information, such as MVs and sub-MB
types, becomes expensive, the ratio of 8 × 8 sub-MB modes
always declines. Hence, in this paper the relative homogeneous
concept is also adopted in the intermode reduction algorithm
to improve the computation saving at low bit rates.
Early termination schemes were used in [1], [16], [17],
which depend on either the all_zero block detection or some
constant thresholds derived by experiments. The investigations
in [16] show that at high bit rates, as the quantization interval
has a small value, the all_zero block detection algorithm hardly
provides any computation saving, especially for those textures-
rich video sequences. In this paper, the spatial investigation
reveals that the sum of absolute difference (SAD) value of
the image block has strong correlations with the sum of
edge amplitude of all pixels in this block and the current
quantization interval (Q
step
). This conclusion indeed interprets
the strong SAD correlation in pictures, which is illustrated
in [5]. Consequently, the content-based early termination
thresholds for integer motion estimation (IME) are developed.
Experiments show that at high and moderate bit rates, the pro-
posed adaptive thresholds outperform the original method [17]
adopted by JM reference software version 11.0 (JM11.0)
in terms of the coding quality and the computation saving,
especially to those sequences being rich in detailed textures.
The mathematical analysis in [18] suggested that the motion
between the object and the camera sensor works as a low-pass
filter, which can smooth the edges of the sampled image, and
this is designated as motion blur [19]. Therefore, when consid-
ering the typical video recording conditions [20] which refer to
30-frames/s frame rate, 1/60 s exposure time, and no synthetic
videos, we can estimate the motion speed of the image block
according to its edge gradient nature. In detail, for a block
containing a large edge gradient amplitude, it is reasonable
to assume that this block undergoes slow movement and then
its search range can be reduced accordingly. This algorithm
is combined with the original dynamic search range (DSR)
algorithm in JM11.0 [21] to further improve its computation
saving.
The rest of this paper is organized as follows. In Sec-
tion II, the impact of the spatial edge gradient on prediction
error is analyzed. And then, the edge-based fast reference
frame and intermode decision algorithms are proposed in
Section III. The content-based early termination approach
and the edge gradient-based search range decision algorithm
are described in Section IV and Section V, respectively.
The whole process flow of the proposed fast algorithms
is depicted in Section VI. The proposed algorithms are
integrated with unsymmetrical-cross multihexagongrid search
(UMHexagonS) [5] to demonstrate their compatibility with
the traditional fast block-matching algorithms. Section VII
presents the detailed experimental results to verify the per-
formance of the proposed schemes. Finally, conclusions are
drawn in Section VIII.
II. M
ATHEMAT I CA L ANALYSIS OF EDGE GRADIENT
IMPACT TO PREDICTION ERROR
With the hybrid coder model [12], Girod deduced that when
S
ss
>, the power spectral density of prediction errors is
S
ee
() = S
ss
()
1 −
|P()|
2
S
ss
()
S
ss
() +
(1)
where = (ω
x
,ω
y
), S
ee
and S
ss
denote the power spectral
density of the prediction errors and the source signals, respec-
tively, P() is the 2-D Fourier transform of the probability
density function (pdf) of the displacement estimation error,
and can be interpreted as the power spectral density of the
white noise incurred for the quantization processing. From (1),
the prediction error power (S
ee
) has a strong correlation with
the source signals S
ss
. When S
ss
, (1) can be simplified
as S
ee
() = S
ss
()[1 −|P()|
2
]. In this case, the power of
the prediction error hinges entirely on the image content and
pdf of displacement estimation error.
The spatial analysis presents a more straightforward insight
of the prediction error power as compared to the spectral
analysis provided in [12]. In this section, the impact of the
edge intensity of the source image on the prediction errors
is investigated in the spatial domain. In order to simplify the
mathematical description, the analysis is first restricted to 1-D
spatial signals, as shown in Fig. 1, and the quantization noise
is temporarily ignored. s
t
(x) and s
t−1
(x) denote the spatial-
continuous signals at time instance t and t − 1. s
t
(x) is a
displaced version of s
t−1
(x) and the distance is d
x
, which
can be expressed as s
t
(x) = s
t−1
(x − d
x
). These continuous
image signals are sampled by the sensor array before digital
processing. The spatial sampling interval is denoted as u
x
.The
displacement estimation error is
x
=d
x
−round(d
x
/u
x
) · u
x
.
From Fig. 1, the prediction error e(i ·u
x
) of pixel i can be
approximated as
e(i · u
x
) ≈
x
· s
t
(i · u
x
) (2)
Authorized licensed use limited to: China Three Gorges University IEL Trial. Downloaded on November 3, 2009 at 22:27 from IEEE Xplore. Restrictions apply.
LIU et al.: MOTION ESTIMATION OPTIMIZATION FOR H.264/AVC USING SOURCE IMAGE EDGE FEATURES 1097
Camera sensor
x
s
t
(x)
s'
t
(i·u
x
)
s
t – 1
(x)
Δ
x
s'
t
(i·u
x
)
e(i·u
x
)
d
x
u
x
i
i + 1 i + 2 i + 3
Δ
x
Fig. 1. Analysis of 1-D prediction error caused by edge gradient and
displacement estimation error.
where s
t
(i · u
x
) is the edge gradient of s
t
(x) at the ith
camera sensor and the displacement estimation error
x
is
a random variable with zero mean and
x
∈ [−u
x
/2, u
x
/2].
When
x
=±u
x
/2, |e(i · u
x
)| reaches its maximum value
(u
x
·|s
t
(i · u
x
)|)/2 and when
x
= 0, |e(i · u
x
)| vanishes.
This conclusion agrees with the aliasing investigation in the
spectral domain provided in the literature [14]. Equation
(2) also interprets the necessity of MRFs during prediction
processing: If the displacement error
x,t−1
between the
current image s
t
(x) and the first previous one s
t−1
(x) is larger
than that of the kth previous image s
t−k
(x), i.e.,
x,t−k
,
s
t−k
(x) is preferred to be chosen as the prediction signal
because its prediction error coming from aliasing problem is
reduced.
In order to simplify the notations in the following discus-
sions, it is assumed that the spatial sampling intervals in x-
and y-direction are u
x
= u
y
= 1. From (2), it is convenient
to derive the 2-D prediction error in one pixel
e(i, j) ≈
x
(i, j) ·
∂s
t
(i, j)
∂x
+
y
(i, j) ·
∂s
t
(i, j)
∂y
. (3)
If it is assumed that
x
(i, j) and
y
(i, j) are independent,
E(
x
) = E(
y
) = 0, and E(
2
x
) = E(
2
y
) = σ
2
, the variance
of e(i, j), i.e., σ(i, j), is written as
σ
2
(i, j) = σ
2
∂s
t
(i, j)
∂x
2
+
∂s
t
(i, j)
∂y
2
. (4)
Using the prediction error variance of one pixel (4), the
prediction error power of an image block can be deduced as
i, j
σ
2
(i, j) = σ
2
i, j
∂s
t
(i, j)
∂x
2
+
∂s
t
(i, j)
∂y
2
(5)
where (i, j ) ∈ block.
Like the spectral analysis represented by (1), (5) also
indicates that the prediction error power is determined by
the image features and the displacement estimation error.
Additionally, the spatial analysis illustrates that the power
of the block prediction error is proportional to the sum of
squares of the edge gradient amplitudes. This conclusion plays
an important role in the proposed early termination threshold
definition described in Section IV.
Optimum forward channel
+
+
+
+
+
S
t
(u,v) E
t
(u,v)
G(u,v)
F(u,v)
N(u,v)
E
t
(u,v)
S
t
(u,v)
S
t–1
(u,v)
Fig. 2. Model of hybrid coder with the optimum forward channel, G (u,v) =
max[0, 1 − (/(S
ee
(u,v)))] and the power spectral density of N (u,v) is
S
nn
(u,v) = max[0,(1 − (/(S
ee
(u,v)))].
Equation (3) yields two important conclusions.
1) According to the terms of displacement error |
x
|
and |
y
|, the impact of aliasing vanishes at full pixel
displacements and is at its maximum at half pixel
displacements.
2) Because of the terms of edge gradient
(
∂s
t
(i, j)/∂ x,∂s
t
(i, j)/∂y
)
, aliasing is caused by
high-frequency signals in the source image.
In practice, a picture that is rich in sharp edges must con-
tain numerous high-frequency signals. In the literature [22],
for 2-D spatial signal s(x, y),
(
∂s(x, y)/∂ x,∂s(x, y)/∂ y
)
is
defined as the local spatial frequency, which is introduced to
describe the local frequency feature in a region. The spatial
edge gradient analysis is superior to the spectral analysis
because it can efficiently reveal the local frequency nature of
the image with trivial computational overhead. Therefore, as
we shall see in Section III, when the image block contains
numerous textures, the power of its prediction errors becomes
augmented, which requires advanced coding approaches, such
as VBS and MRF techniques. Otherwise, the redundant com-
putation can be discarded with negligible coding quality
degradation. This is the essence of our homogeneity-based fast
algorithms.
III. H
OMOGENEITY-BASED REFERENCE FRAME AND
INTERMODE REDUCTION
Using rate-distortion theory, the relative homogeneity con-
cept is developed in Section III-A. Based on the relative
homogeneous block detection algorithm, the futile reference
frames and intermodes could be eliminated efficiently, which
is described in Section III-B.
A. Relative Homogeneous Block Detection Algorithm
Based on the hybrid coder model with the optimum forward
channel, as shown in Fig. 2, it is convenient to develop the
relative homogeneity concept. Capital letters, for example
S
t
(u,v), represent the discrete 2-D Fourier transforms of
the corresponding spatial signals. Let S
t
(u,v) denote the
N × N small image block to be encoded through the hybrid
coder and
S
t−1
(u,v) is the prediction signals generated from
the previously decoded image signals by the low-pass filter
F(u,v). The optimum forward channel consists of a nonideal
band-limiting filter G(u,v) and an additional noise N (u,v).
With rate-distortion theory [23], the distortion D and the
Authorized licensed use limited to: China Three Gorges University IEL Trial. Downloaded on November 3, 2009 at 22:27 from IEEE Xplore. Restrictions apply.
剩余12页未读,继续阅读
资源评论
why19870626
- 粉丝: 1
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功