AnOverviewofCoreCodingToolsintheAV1VideoCodec_aomav1编码器实时编码工具资源-CSDN文库

共1个文件

pdf：1个

视频编解码

196 浏览量 2024-05-23 18:10:46 上传评论收藏 338KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

av1.pdf.zip （1个子文件）

An Overview of Core Coding Tools.pdf 343KB

An Overview of Core Coding Tools in the AV1 Video Codec

Yue Chen

∗

, Debargha Murherjee

∗

, Jingning Han

∗

, Adrian Grange

∗

, Yaowu Xu

∗

, Zoe Liu

∗

, Sarah Parker

∗

, Cheng Chen

∗

Hui Su

∗

, Urvang Joshi

∗

, Ching-Han Chiang

∗

, Yunqing Wang

∗

, Paul Wilkins

∗

, Jim Bankoski

∗

Luc Trudeau

†

, Nathan Egge

†

, Jean-Marc Valin

†

, Thomas Davies

‡

, Steinar Midtskogen

‡

, Andrey Norkin

and Peter de Rivaz

∗

Google, USA

†

Mozilla, USA

‡

Cisco, UK and Norway

Netﬂix, USA

Argon Design, UK

Abstract—AV1 is an emerging open-source and royalty-free video

compression format, which is jointly developed and ﬁnalized in early

2018 by the Alliance for Open Media (AOMedia) industry consortium.

The main goal of AV1 development is to achieve substantial compression

gain over state-of-the-art codecs while maintaining practical decoding

complexity and hardware feasibility. This paper provides a brief technical

overview of key coding techniques in AV1 along with preliminary

compression performance comparison against VP9 and HEVC.

Index Terms—Video Compression, AV1, Alliance for Open Media,

Open-source Video Coding

I. INTRODUCTION

Video applications have become ubiquitous on the internet over

the last decade, with modern devices driving a high growth in

the consumption of high resolution, high quality content. Services

such as video-on-demand and conversational video are predominant

bandwidth consumers, that impose severe challenges on delivery

infrastructures and hence create even stronger need for high efﬁciency

video compression technology. On the other hand, a key factor in

the success of the web is that the core technologies, for example,

HTML, web browsers (Firefox, Chrome, etc.), and operating systems

like Android, are open and freely implemetable. Therefore, in an

effort to create an open video format at par with the leading

commercial choices, in mid 2013, Google launched and deployed the

VP9 video codec[1]. VP9 is competitive coding efﬁciency with the

state-of-the-art royalty-bearing HEVC[2] codec, while considerably

outperforming the most commonly used format H.264[3] as well as

its own predecessor VP8[4].

However, as the demand for high efﬁciency video applications rose

and diversiﬁed, it soon became imperative to continue the advances

in compression performance. To that end, in late 2015, Google co-

founded the Alliance for Open Media (AOMedia)[5], a consortium

of more than 30 leading hi-tech companies, to work jointly towards

a next-generation open video coding format called AV1.

The focus of AV1 development includes, but is not limited to

achieving: consistent high-quality real-time video delivery, scalability

to modern devices at various bandwidths, tractable computational

footprint, optimization for hardware, and ﬂexibility for both commer-

cial and non-commercial content. The codec was ﬁrst initialized with

VP9 tools and enhancements, and then new coding tools were pro-

posed, tested, discussed and iterated in AOMedia’s codec, hardware,

and testing workgroups. As of today, the AV1 codebase has reached

the ﬁnal bug-ﬁx phase, and already incorporates a variety of new

compression tools, along with high-level syntax and parallelization

features designed for speciﬁc use cases. This paper will present the

key coding tools in AV1, that provide the majority of the almost

30% reduction in average bitrate compared with the most performant

libvpx VP9 encoder at the same quality.

R R

VP9

R R

AV1

64x64

128x128

R: Recursive

Fig. 1. Partition tree in VP9 and AV1

II. AV1 CODING TECHNIQUES

A. Coding Block Partition

VP9 uses a 4-way partition tree starting from the 64×64 level

down to 4×4 level, with some additional restrictions for blocks 8×8

and below as shown in the top half of Fig.1. Note that partitions

designated as R refer to as recursive in that the same partition tree

is repeated at a lower scale until we reach the lowest 4×4 level.

AV1 not only expands the partition-tree to a 10-way structure as

shown in the same ﬁgure, but also increases the largest size (referred

to as superblock in VP9/AV1 parlance) to start from 128×128. Note

that this includes 4:1/1:4 rectangular partitions that did not exist in

VP9. None of the rectangular partitions can be further subdivided. In

addition, AV1 adds more ﬂexibility to the use of partitions below 8×8

level, in the sense that 2×2 chroma inter prediction now becomes

possible on certain cases.

B. Intra Prediction

VP9 supports 10 intra prediction modes, including 8 directional

modes corresponding to angles from 45 to 207 degrees, and 2 non-

directional predictor: DC and true motion (TM) mode. In AV1, the

potential of an intra coder is further explored in various ways: the

granularity of directional extrapolation are upgraded, non-directional

predictors are enriched by taking into account gradients and evolving

correlations, coherence of luma and chroma signals is exploited, and

tools are developed particularly for artiﬁcial content.

1) Enhanced Directional Intra Prediction: To exploit more vari-

eties of spatial redundancy in directional textures, in AV1, directional

intra modes are extended to an angle set with ﬁner granularity. The

original 8 angles are made nominal angles, based on which ﬁne angle

variations in a step size of 3 degrees are introduced, i.e., the prediction

angle is presented by a nominal intra angle plus an angle delta,

which is -3 ∼ 3 multiples of the step size. To implement directional

prediction modes in AV1 via a generic way, the 48 extension modes

are realized by a uniﬁed directional predictor that links each pixel to a

reference sub-pixel location in the edge and interpolates the reference

pixel by a 2-tap bilinear ﬁlter. In total, there are 56 directional intra

modes enabled in AV1.

2) Non-directional Smooth Intra Predictors: AV1 expands on

non-directional intra modes by adding 3 new smooth predictors

SMOOTH V, SMOOTH H, and SMOOTH, which predict the block

using quadratic interpolation in vertical or horizontal directions, or

the average thereof, after approximating the right and bottom edges

as the rightmost pixel in the top edge and the bottom pixel in the left

edge. In addition, the TM mode is replaced by the PAETH predictor:

for each pixel, we copy one from the top, left and top-left edge

references, which has the value closest to (top + left - topleft), meant

to adopt the reference from the direction with the lower gradient.

3) Recursive-ﬁltering-based Intra Predictor: To capture decaying

spatial correlation with references on the edges, FILTER INTRA

modes are designed for luma blocks by viewing them as 2-D non-

separable Markov processes. Five ﬁlter intra modes are pre-designed

for AV1, each represented by a set of eight 7-tap ﬁlters reﬂecting

correlation between pixels in a 4×2 patch and 7 neighbors adjacent

to it. An intra block can pick one ﬁlter intra mode, and be predicted

in batches of 4×2 patches. Each patch is predicted via the selected

set of 7-tap ﬁlters weighting the neighbors differently at the 8 pixel

locations. For those patches not fully attached to references on block

boundary, predicted values of immediate neighbors are used as the

reference, meaning prediction is computed recursively among the

patches so as to combine more edge pixels at remote locations.

4) Chroma Predicted from Luma: Chroma from Luma (CfL) is

a chroma-only intra predictor that models chroma pixels as a linear

function of coincident reconstructed luma pixels. Reconstructed luma

pixels are subsampled into the chroma resolution, and then the DC

component is removed to form the AC contribution. To approximate

chroma AC component from the AC contribution, instead of requring

the decoder to imply scaling parameters as in some prior art, AV1-

CfL determines the parameters based on the original chroma pixels

and signals them in the bitstream. This reduces decoder complexity

and yields more precise predictions. As for the DC prediction, it is

computed using intra DC mode, which is sufﬁcient for most chroma

content and has mature fast implementations. More details of AV1-

CfL tool can be found in [6].

5) Color Palette as a Predictor: Sometimes, especially for arti-

ﬁcial videos like screen capture and games, blocks can be approxi-

mated by a small number of unique colors. Therefore, AV1 introduces

palette modes to the intra coder as a general extra coding tool. The

palette predictor for each plane of a block is speciﬁed by (i) a color

palette, with 2 to 8 colors, and (ii) color indices for all pixels in the

block. The number of base colors determines the trade-off between

ﬁdelity and compactness. The color indices are entropy coded using

the neighborhood-based context.

6) Intra Block Copy: AV1 allows its intra coder to refer back

to previously reconstructed blocks in the same frame, in a manner

similar to how inter coder refers to blocks from previous frames.

It can be very beneﬁcial for screen content videos which typically

contain repeated textures, patterns and characters in the same frame.

Speciﬁcally, a new prediction mode named IntraBC is introduced, and

will copy a reconstructed block in the current frame as prediction. The

location of the reference block is speciﬁed by a displacement vector in

a way similar to motion vector compression in motion compensation.

Displacement vectors are in whole pixels for the luma plane, and

may refer to half-pel positions on corresponding chrominance planes,

where bilinear ﬁltering is applied for sub-pel interpolation.

Display Order

(Decoding order as numbered)

KEY/GOLDEN

ALTREF

9 10

ALTREF2

BWDREF

Overlay frame

Golden-Frame (GF) Group

Sub-Group

BWDREF

Fig. 2. Example multi-layer structure of a golden-frame group

C. Inter Prediction

Motion compensation is an essential module in video coding.

In VP9, up to 2 references, amongst up to 3 candidate reference

frames, are allowed, then the predictor either operates a block-

based translational motion compensation, or averages two of such

predictions if two references are signalled. AV1 has a more powerful

inter coder, which largely extends the pool of reference frames and

motion vectors, breaks the limitation of block-based translational

prediction, also enhances compound prediction by using highly

adaptable weighting algorithms as well as sources.

1) Extended Reference Frames: AV1 extends the number of refer-

ences for each frame from 3 to 7. In addition to VP9’s LAST(nearest

past) frame, GOLDEN(distant past) frame and ALTREF(temporal

ﬁltered future) frame, we add two near past frames (LAST2 and

LAST3) and two future frames (BWDREF and ALTREF2)[7]. Fig.2

demonstrates the multi-layer structure of a golden-frame group, in

which an adaptive number of frames share the same GOLDEN

and ALTREF frames. BWDREF is a look-ahead frame directly

coded without applying temporal ﬁltering, thus more applicable as a

backward reference in a relatively shorter distance. ALTREF2 serves

as an intermediate ﬁltered future reference between GOLDEN and

ALTREF. All the new references can be picked by a single prediction

mode or be combined into a pair to form a compound mode. AV1

provides an abundant set of reference frame pairs, providing both

bi-directional compound prediction and uni-directional compound

prediction, thus can encode a variety of videos with dynamic temporal

correlation characteristics in a more adaptive and optimal way.

2) Dynamic Spatial and Temporal Motion Vector Referencing:

Efﬁcient motion vector (MV) coding is crucial to a video codec

because it takes a large portion of the rate cost for inter frames.

To that end, AV1 incorporates a sophisticated MV reference selection

scheme to obtain good MV references for a given block by searching

both spatial and temporal candidates. AV1 not only searches a deeper

spatial neighborhood than VP9 to construct a spatial candidate pool,

but also utilizes a temporal motion ﬁeld estimation mechanism to

generate temporal candidates. The motion ﬁeld estimation process

works in three stages: motion vector buffering, motion trajectory

creation, and motion vector projection. First, for coded frames, we

store the reference frame indices and the associated motion vectors.

Before decoding a current frame, we examine motion trajectories,

like MV

Ref 2

in Fig.3 pointing a block in frame Ref2 to somewhere

in frame Ref0

Ref 2

, that possibly pass each 64×64 processing unit,

by checking the collocated 192×128 buffered motion ﬁelds in up to

3 references. By doing so, for any 8×8 block, all the trajectories it

belongs to are recorded. Next, at the coding block level, once the

reference frame(s) have been determined, motion vector candidates

are derived by linearly project passing motion trajectories onto

the desired reference frames, e.g., converting MV

Ref 2

in Fig.3 to

or MV

. Once all spatial and temporal candidates have been

aggregated in the pool, they are sorted, merged and ranked to obtain

up to 4 ﬁnal candidates[8]. The scoring scheme relies on calculating a

评论收藏

内容反馈

CodecConductor

粉丝: 1w+
资源: 42

An Overview of Core Coding Tools in the AV1 Video Codec

A Technical Overview of VP9--the Latest Open-Source Video Codec

An Overview of Multi-Task Learning in Deep Neural Networks.pdf

Overview of the H.264/AVC video coding standard

TECHNICAL OVERVIEW OF VP8, AN OPEN SOURCE VIDEO CODEC FOR THE WEB

Overview of the High Efficiency Video Coding(HEVC) Standard.pdf

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

AN OVERVIEW OF THE OMNeT++ SIMULATION.pdf

2007 - Smith - An Overview of the Tesseract OCR Engine.pdf

Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC

An overview of multi-task learning.pdf

Overview of HEVC codec standard.rar

An overview of gradient descent optimization algorithms

From Theory to Practice: An overview of MIMO

Overview of the High Efficiency Video Coding

Overview of Multi-view Video Coding_04381085

Overview of DriverStudio Tools

An overview of wavelet based multiresolution analyses

An Overview of Distance Metric Learning (by Liu Yang)

落雪音乐-六音音源 sixyin-music-source-v1.1.0.js

微软HEVC视频扩展插件（免费）

markdown配套文件，使用前先解压

喜马拉雅xm文件解密工具

音频转码，无限制的享受音乐吧！

ev4转mp4小工具（无视授权密码）

VP9视频扩展 Microsoft.VP9VideoExtensions-1.0.52781.0-x64

OCPP协议解析 代码+代码含义详解

ubuntu下安装ffmpeg必备软件包合集

mpp_demo.rar

最新资源

OCPP协议解析代码+代码含义详解