An Overview of Core Coding Tools in the AV1 Video Codec
Yue Chen
∗
, Debargha Murherjee
∗
, Jingning Han
∗
, Adrian Grange
∗
, Yaowu Xu
∗
, Zoe Liu
∗
, Sarah Parker
∗
, Cheng Chen
∗
,
Hui Su
∗
, Urvang Joshi
∗
, Ching-Han Chiang
∗
, Yunqing Wang
∗
, Paul Wilkins
∗
, Jim Bankoski
∗
,
Luc Trudeau
†
, Nathan Egge
†
, Jean-Marc Valin
†
, Thomas Davies
‡
, Steinar Midtskogen
‡
, Andrey Norkin
§
and Peter de Rivaz
¶
∗
Google, USA
†
Mozilla, USA
‡
Cisco, UK and Norway
§
Netflix, USA
¶
Argon Design, UK
Abstract—AV1 is an emerging open-source and royalty-free video
compression format, which is jointly developed and finalized in early
2018 by the Alliance for Open Media (AOMedia) industry consortium.
The main goal of AV1 development is to achieve substantial compression
gain over state-of-the-art codecs while maintaining practical decoding
complexity and hardware feasibility. This paper provides a brief technical
overview of key coding techniques in AV1 along with preliminary
compression performance comparison against VP9 and HEVC.
Index Terms—Video Compression, AV1, Alliance for Open Media,
Open-source Video Coding
I. INTRODUCTION
Video applications have become ubiquitous on the internet over
the last decade, with modern devices driving a high growth in
the consumption of high resolution, high quality content. Services
such as video-on-demand and conversational video are predominant
bandwidth consumers, that impose severe challenges on delivery
infrastructures and hence create even stronger need for high efficiency
video compression technology. On the other hand, a key factor in
the success of the web is that the core technologies, for example,
HTML, web browsers (Firefox, Chrome, etc.), and operating systems
like Android, are open and freely implemetable. Therefore, in an
effort to create an open video format at par with the leading
commercial choices, in mid 2013, Google launched and deployed the
VP9 video codec[1]. VP9 is competitive coding efficiency with the
state-of-the-art royalty-bearing HEVC[2] codec, while considerably
outperforming the most commonly used format H.264[3] as well as
its own predecessor VP8[4].
However, as the demand for high efficiency video applications rose
and diversified, it soon became imperative to continue the advances
in compression performance. To that end, in late 2015, Google co-
founded the Alliance for Open Media (AOMedia)[5], a consortium
of more than 30 leading hi-tech companies, to work jointly towards
a next-generation open video coding format called AV1.
The focus of AV1 development includes, but is not limited to
achieving: consistent high-quality real-time video delivery, scalability
to modern devices at various bandwidths, tractable computational
footprint, optimization for hardware, and flexibility for both commer-
cial and non-commercial content. The codec was first initialized with
VP9 tools and enhancements, and then new coding tools were pro-
posed, tested, discussed and iterated in AOMedia’s codec, hardware,
and testing workgroups. As of today, the AV1 codebase has reached
the final bug-fix phase, and already incorporates a variety of new
compression tools, along with high-level syntax and parallelization
features designed for specific use cases. This paper will present the
key coding tools in AV1, that provide the majority of the almost
30% reduction in average bitrate compared with the most performant
libvpx VP9 encoder at the same quality.
R
R R
R
VP9
R
R R
R
AV1
64x64
128x128
R: Recursive
Fig. 1. Partition tree in VP9 and AV1
II. AV1 CODING TECHNIQUES
A. Coding Block Partition
VP9 uses a 4-way partition tree starting from the 64×64 level
down to 4×4 level, with some additional restrictions for blocks 8×8
and below as shown in the top half of Fig.1. Note that partitions
designated as R refer to as recursive in that the same partition tree
is repeated at a lower scale until we reach the lowest 4×4 level.
AV1 not only expands the partition-tree to a 10-way structure as
shown in the same figure, but also increases the largest size (referred
to as superblock in VP9/AV1 parlance) to start from 128×128. Note
that this includes 4:1/1:4 rectangular partitions that did not exist in
VP9. None of the rectangular partitions can be further subdivided. In
addition, AV1 adds more flexibility to the use of partitions below 8×8
level, in the sense that 2×2 chroma inter prediction now becomes
possible on certain cases.
B. Intra Prediction
VP9 supports 10 intra prediction modes, including 8 directional
modes corresponding to angles from 45 to 207 degrees, and 2 non-
directional predictor: DC and true motion (TM) mode. In AV1, the
potential of an intra coder is further explored in various ways: the
granularity of directional extrapolation are upgraded, non-directional
predictors are enriched by taking into account gradients and evolving
correlations, coherence of luma and chroma signals is exploited, and
tools are developed particularly for artificial content.
1) Enhanced Directional Intra Prediction: To exploit more vari-
eties of spatial redundancy in directional textures, in AV1, directional
intra modes are extended to an angle set with finer granularity. The
original 8 angles are made nominal angles, based on which fine angle
variations in a step size of 3 degrees are introduced, i.e., the prediction
angle is presented by a nominal intra angle plus an angle delta,
which is -3 ∼ 3 multiples of the step size. To implement directional
prediction modes in AV1 via a generic way, the 48 extension modes
are realized by a unified directional predictor that links each pixel to a