914 Circuits Syst Signal Process (2009) 28: 913–923
Keywords Multi-view video coding ·Vector field estimation ·Loop constraint ·
Weighted disparity interpolation
1 Introduction
3DTV and free viewpoint video are new types of natural video sources that expand
the user’s sensation significantly beyond what is offered by traditional 2D media.
The key technologies related with 3D video and free viewpoint video cover the whole
processing chain from capturing, signal processing, data representation, compression,
transmission, display, and interaction. However, since multi-view video with N views
results in N -times the raw data rate of a single view, efficient compression is crucial
to make 3D and free viewpoint video practical.
In a statistical analysis it is shown that multi-view video contains a large degree
of inter-view statistical dependencies in addition to the temporal statistical dependen-
cies that can be exploited for video compression. So to investigate multi-view coding
(MVC) technology in-depth, MPEG decided to issue a “Call for Proposals” (CfP) for
MVC technology along with related requirements. They pointed out that the relations
between the disparity and motion fields should be fully exploited while the compu-
tational complexity of disparity and motion estimation should be low [7, 9]. Fortu-
nately, MVC also contains a high degree of temporal statistical dependencies between
temporally succeeding images [1–3]. For instance, disparity-compensated view pre-
diction exploits correlation among the views and uses concepts known from motion-
compensated prediction. The current JMVM proposed by the Joint Video Team (JVT)
adopts the prediction structures using hierarchical B pictures. This structure uses the
block-based coding techniques of H.264/AVC to exploit both temporal and view cor-
relation within temporally successive pictures and neighboring views [5].
However, the schemes of only using “spatial prediction” or “temporal prediction”
mode did not sufficiently exploit the correlation between views, resulting in a low
coding efficiency. So Yongtae Kim et al. proposed a fast disparity and motion es-
timation method [4]. The reliability of each macro block was calculated using the
difference between the predicted vectors that were obtained from different methods,
including joint disparity/motion estimation.
It is worth noting that the “loop constraints” using multi-view camera geometry is
a perfect method for exploring the relation between disparity fields and motion fields
simultaneously [10, 11], so in our scheme we extend it to multi-view images and
propose the novel “vector field estimation” method based on the linear relationship
between adjacent views in a parallel camera model. Experimental results over multi-
view image sets imply the coding efficiency is improved about 0.2–0.5 dB compared
with previous coding approaches such as H.264/AVC simulcast and JMVM.
The rest of this paper is organized as follows. Section 2 presents our multi-view
coding scheme based on vector field estimation and weighted disparity interpolation.
Section 3 shows the experimental results and a comparison of coding efficiency with
other schemes. Section 4 contains conclusions and future work.