ISO/IEC 14496-2 Committee Draft
i
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 N2202
Tokyo, March 1998
INFORMATION TECHNOLOGY -
CODING OF AUDIO-VISUAL OBJECTS: VISUAL
ISO/IEC 14496-2
Committee Draft
Draft of 15 May, 1998
Contents
1. Introduction...................................................................................................................................... vii
1.1 Purpose...................................................................................................................................... vii
1.2 Application................................................................................................................................ vii
1.3 Profiles and levels...................................................................................................................... vii
1.4 Object based coding syntax....................................................................................................... viii
1.4.1 Video object.................................................................................................................... viii
1.4.2 Face object........................................................................................................................ ix
1.4.3 Mesh object........................................................................................................................ x
1.4.4 Overview of the object based nonscalable syntax ................................................................ x
1.4.5 Generalized scalability...................................................................................................... xi
1.5 Error Resilience........................................................................................................................ xiii
1. Scope................................................................................................................................................. 14
ISO/IEC 14496-2 Committee Draft
ii
2. Normative references........................................................................................................................14
3. Definitions .........................................................................................................................................16
4. Abbreviations and symbols ...............................................................................................................24
4.1 Arithmetic operators...................................................................................................................24
4.2 Logical operators........................................................................................................................25
4.3 Relational operators....................................................................................................................25
4.4 Bitwise operators........................................................................................................................25
4.5 Conditional operators.................................................................................................................25
4.6 Assignment................................................................................................................................25
4.7 Mnemonics ................................................................................................................................25
4.8 Constants ...................................................................................................................................26
5. Conventions .......................................................................................................................................27
5.1 Method of describing bitstream syntax........................................................................................27
5.2 Definition of functions................................................................................................................28
5.2.1 Definition of bytealigned() function..................................................................................28
5.2.2 Definition of nextbits_bytealigned() function....................................................................28
5.2.3 Definition of next_start_code() function............................................................................28
5.2.4 Definition of next_resync_marker() function ....................................................................28
5.2.5 Definition of transparent_mb() function............................................................................29
5.2.6 Definition of transparent_block() function ........................................................................29
5.3 Reserved, forbidden and marker_bit ...........................................................................................29
5.4 Arithmetic precision...................................................................................................................29
6. Visual bitstream syntax and semantics.............................................................................................30
6.1 Structure of coded visual data.....................................................................................................30
6.1.1 Visual object sequence......................................................................................................30
6.1.2 Visual object.....................................................................................................................31
6.1.3 Video object......................................................................................................................31
6.1.4 Mesh object ......................................................................................................................37
6.1.5 Face object........................................................................................................................39
6.2 Visual bitstream syntax ..............................................................................................................42
6.2.1 Start codes........................................................................................................................42
6.2.2 Visual Object Sequence and Visual Object........................................................................43
6.2.3 Video Object.....................................................................................................................45
6.2.4 Video Object Layer...........................................................................................................46
6.2.5 Group of Video Object Plane ............................................................................................50
6.2.6 Video Object Plane and Video Plane with Short Header....................................................51
6.2.7 Macroblock.......................................................................................................................64
6.2.8 Block................................................................................................................................69
6.2.9 Still Texture Object ..........................................................................................................71
ISO/IEC 14496-2 Committee Draft
iii
6.3.8 Block related...................................................................................................................121
6.3.9 Still texture object...........................................................................................................122
6.3.10 Mesh related .................................................................................................................127
6.3.11 Face object....................................................................................................................129
7. The visual decoding process ............................................................................................................135
7.1 Video decoding process ............................................................................................................135
7.2 Higher syntactic structures........................................................................................................136
7.3 Texture decoding......................................................................................................................137
7.3.1 Variable length decoding ................................................................................................137
7.3.2 Inverse scan....................................................................................................................139
7.3.3 Intra dc and ac prediction for intra macroblocks..............................................................140
7.3.4 Inverse quantisation........................................................................................................142
7.3.5 Inverse DCT ...................................................................................................................145
7.4 Shape decoding.........................................................................................................................146
7.4.1 Higher syntactic structures..............................................................................................146
7.4.2 Macroblock decoding......................................................................................................146
7.4.3 Arithmetic decoding .......................................................................................................156
7.4.4 Grayscale Shape Decoding..............................................................................................158
7.5 Motion compensation decoding.................................................................................................160
7.5.1 Padding process ..............................................................................................................161
7.5.2 Half sample interpolation................................................................................................164
7.5.3 General motion vector decoding process..........................................................................165
7.5.4 Unrestricted motion compensation ..................................................................................167
7.5.5 Vector decoding processing and motion-compensation in progressive P-VOP .................167
7.5.6 Overlapped motion compensation ...................................................................................169
7.5.7 Temporal prediction structure .........................................................................................171
7.5.8 Vector decoding process of non-scalable progressive B-VOPs .........................................172
7.5.9 Motion compensation in non-scalable progressive B-VOPs .............................................172
7.6 Interlaced video decoding .........................................................................................................176
7.6.1 Field DCT and DC and AC Prediction............................................................................176
7.6.2 Motion compensation......................................................................................................176
7.7 Sprite decoding.........................................................................................................................185
7.7.1 Higher syntactic structures..............................................................................................186
7.7.2 Sprite Reconstruction......................................................................................................186
7.7.3 Low-latency sprite reconstruction....................................................................................186
7.7.4 Sprite reference point decoding.......................................................................................188
7.7.5 Warping..........................................................................................................................189
7.7.6 Sample reconstruction.....................................................................................................190
7.7.7 Scalable sprite decoding..................................................................................................191
7.8 Generalized scalable decoding ..................................................................................................192
7.8.1 Temporal scalability........................................................................................................194
7.8.2 Spatial scalability............................................................................................................197
7.9 Still texture object decoding......................................................................................................201
7.9.1 Decoding of the DC subband...........................................................................................201
7.9.2 ZeroTree Decoding of the Higher Bands .........................................................................202
7.9.3 Inverse Quantization.......................................................................................................208
7.10 Mesh object decoding..............................................................................................................212
7.10.1 Mesh geometry decoding...............................................................................................213
7.10.2 Decoding of mesh motion vectors..................................................................................216
7.11 Face object decoding...............................................................................................................219
7.11.1 Frame based face object decoding..................................................................................219
ISO/IEC 14496-2 Committee Draft
iv
7.11.2 DCT based face object decoding....................................................................................220
7.11.3 Decoding of the viseme parameter fap 1........................................................................221
7.11.4 Decoding of the viseme parameter fap 2........................................................................222
7.11.5 Fap masking.................................................................................................................222
7.12 Output of the decoding process...............................................................................................223
7.12.1 Video data ....................................................................................................................223
7.12.2 2D Mesh data ...............................................................................................................223
7.12.3 Face animation parameter data .....................................................................................223
8. Visual-Systems Composition Issues.................................................................................................224
8.1 Temporal Scalability Composition............................................................................................224
8.2 Sprite Composition...................................................................................................................225
9. Profiles and Levels ..........................................................................................................................226
9.1 Visual Object Profiles...............................................................................................................226
9.2 Visual Combination Profiles.....................................................................................................228
9.3 Visual Combination Profiles@Levels .......................................................................................228
9.3.1 Natural Visual................................................................................................................228
9.3.2 Synthetic Visual .............................................................................................................228
2.4.3 Synthetic/Natural Hybrid Visual.....................................................................................229
10. Annex A.........................................................................................................................................230
10.1 Discrete cosine transform for video texture.............................................................................230
10.2 Discrete wavelet transform for still texture .............................................................................231
10.2.1 Adding the mean ..........................................................................................................231
10.2.2 wavelet filter.................................................................................................................231
10.2.3 Symmetric extension.....................................................................................................232
10.2.4 Decomposition level .....................................................................................................232
10.2.5 Shape adaptive wavelet filtering and symmetric extension ............................................233
11. Annex B .........................................................................................................................................234
11.1 Variable length codes .............................................................................................................234
11.1.1 Macroblock type ...........................................................................................................234
11.1.2 Macroblock pattern.......................................................................................................235
11.1.3 Motion vector ..............................................................................................................238
11.1.4 DCT coefficients...........................................................................................................240
11.1.5 Shape Coding ...............................................................................................................250
11.1.6 Sprite Coding ...............................................................................................................256
11.1.7 DCT based facial object decoding .................................................................................257
11.2 Arithmetic Decoding..............................................................................................................266
11.2.1 Aritmetic decoding for still texture object .....................................................................266
11.2.2 Arithmetic decoding for shape decoding .......................................................................271
11.2.3 Face Object Decoding...................................................................................................273
12. Annex C.........................................................................................................................................275
13. Annex D.........................................................................................................................................286
14. Annex E .........................................................................................................................................287
14.1 Error resilience.......................................................................................................................287
14.1.1 Resynchronization ........................................................................................................287
14.1.2 Data Partitioning ..........................................................................................................288
14.1.3 Reversible VLC ............................................................................................................289
14.1.4 Decoder Operation........................................................................................................290
ISO/IEC 14496-2 Committee Draft
v
14.1.5 Adaptive Intra Refresh (AIR) Method ...........................................................................294
14.2 Complexity Estimation ...........................................................................................................296
14.2.1 Video Object Layer Class..............................................................................................297
14.2.2 Video Object Plane Class ..............................................................................................300
14.2.3 Video Object Plane........................................................................................................300
14.2.4 Resynchronization in Case of Unknown Video Header Format......................................302
15. Annex F..........................................................................................................................................304
15.1 Segmentation for VOP Generation..........................................................................................304
15.1.1 Introduction ..................................................................................................................304
15.1.2 Description of a combined temporal and spatial segmentation framework .....................304
15.1.3 References.....................................................................................................................306
15.2 Bounding Rectangle of VOP Formation..................................................................................308
15.3 Postprocessing for Coding Noise Reduction ............................................................................309
15.3.1 Deblocking filter...........................................................................................................309
15.3.2 Deringing filter.............................................................................................................310
15.3.3 Further issues................................................................................................................312
16. Annex G .........................................................................................................................................313
17. Annex H .........................................................................................................................................314
18. Annex I...........................................................................................................................................315
19. Annex J ..........................................................................................................................................316
20. Annex K .........................................................................................................................................317
20.1 Introduction............................................................................................................................317
20.2 Decoding Process of a View-Dependent Object .......................................................................317
20.2.1 General Decoding Scheme ............................................................................................317
20.2.2 Computation of the View-Dependent Scalability parameters .........................................319
20.2.3 VD mask computation...................................................................................................321
20.2.4 Differential mask computation ......................................................................................322
20.2.5 DCT coefficients decoding ............................................................................................322
20.2.6 Texture update..............................................................................................................322
20.2.7 IDCT ............................................................................................................................323
21. Annex L..........................................................................................................................................324
21.1 Introduction............................................................................................................................324
21.2 Description of the set up of a visual decoder (informative) ......................................................324
21.2.1 Processing of decoder configuration information...........................................................325
21.3 Specification of decoder configuration information .................................................................326
21.3.1 VideoObject..................................................................................................................326
21.3.2 StillTextureObject.........................................................................................................326
21.3.3 MeshObject...................................................................................................................327
21.3.4 FaceObject ....................................................................................................................327
22. Annex M.........................................................................................................................................328