Internet Engineering Task Force (IETF) JM. Valin
Request for Comments: 6716 Mozilla Corporation
Category: Standards Track K. Vos
ISSN: 2070-1721 Skype Technologies S.A.
T. Terriberry
Mozilla Corporation
September 2012
Definition of the Opus Audio Codec
Abstract
This document defines the Opus interactive speech and audio codec.
Opus is designed to handle a wide range of interactive audio
applications, including Voice over IP, videoconferencing, in-game
chat, and even live, distributed music performances. It scales from
low bitrate narrowband speech at 6 kbit/s to very high quality stereo
music at 510 kbit/s. Opus uses both Linear Prediction (LP) and the
Modified Discrete Cosine Transform (MDCT) to achieve good compression
of both speech and music.
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in
Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6716.
Valin, et al. Standards Track [Page 1]
RFC 6716
Interactive Audio Codec September 2012
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to
BCP 78 and the IETF Trust’s Legal
Provisions Relating to IETF Documents
(
http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
The licenses granted by the IETF Trust to this RFC under
Section 3.c
of the Trust Legal Provisions shall also include the right to extract
text from Sections
1 through 8 and Appendix A and Appendix B of this
RFC and create derivative works from these extracts, and to copy,
publish, display and distribute such derivative works in any medium
and for any purpose, provided that no such derivative work shall be
presented, displayed or published in a manner that states or implies
that it is part of this RFC or any other IETF Document.
Table of Contents
1. Introduction ....................................................5
1.1. Notation and Conventions ...................................6
2. Opus Codec Overview .............................................8
2.1. Control Parameters ........................................10
2.1.1. Bitrate ............................................10
2.1.2. Number of Channels (Mono/Stereo) ...................11
2.1.3. Audio Bandwidth ....................................11
2.1.4. Frame Duration .....................................11
2.1.5. Complexity .........................................11
2.1.6. Packet Loss Resilience .............................12
2.1.7. Forward Error Correction (FEC) .....................12
2.1.8. Constant/Variable Bitrate ..........................12
2.1.9. Discontinuous Transmission (DTX) ...................13
3. Internal Framing ...............................................13
3.1. The TOC Byte ..............................................13
3.2. Frame Packing .............................................16
3.2.1. Frame Length Coding ................................16
3.2.2. Code 0: One Frame in the Packet ....................16
3.2.3. Code 1: Two Frames in the Packet, Each with
Equal Compressed Size ..............................
17
3.2.4. Code 2: Two Frames in the Packet, with
Different Compressed Sizes .........................
17
Valin, et al. Standards Track [Page 2]
RFC 6716
Interactive Audio Codec September 2012
3.2.5. Code 3: A Signaled Number of Frames in the Packet ..18
3.3. Examples ..................................................21
3.4. Receiving Malformed Packets ...............................22
4. Opus Decoder ...................................................23
4.1. Range Decoder .............................................23
4.1.1. Range Decoder Initialization .......................25
4.1.2. Decoding Symbols ...................................25
4.1.3. Alternate Decoding Methods .........................27
4.1.4. Decoding Raw Bits ..................................29
4.1.5. Decoding Uniformly Distributed Integers ............29
4.1.6. Current Bit Usage ..................................30
4.2. SILK Decoder ..............................................32
4.2.1. SILK Decoder Modules ...............................32
4.2.2. LP Layer Organization ..............................33
4.2.3. Header Bits ........................................35
4.2.4. Per-Frame LBRR Flags ...............................36
4.2.5. LBRR Frames ........................................36
4.2.6. Regular SILK Frames ................................37
4.2.7. SILK Frame Contents ................................37
4.2.7.1. Stereo Prediction Weights .................40
4.2.7.2. Mid-Only Flag .............................42
4.2.7.3. Frame Type ................................43
4.2.7.4. Subframe Gains ............................44
4.2.7.5. Normalized Line Spectral Frequency
(LSF) and Linear Predictive Coding (LPC)
Coeffieients ..............................
46
4.2.7.6. Long-Term Prediction (LTP) Parameters .....74
4.2.7.7. Linear Congruential Generator (LCG) Seed ..86
4.2.7.8. Excitation ................................86
4.2.7.9. SILK Frame Reconstruction .................98
4.2.8. Stereo Unmixing ...................................102
4.2.9. Resampling ........................................103
4.3. CELT Decoder .............................................104
4.3.1. Transient Decoding ................................108
4.3.2. Energy Envelope Decoding ..........................108
4.3.3. Bit Allocation ....................................110
4.3.4. Shape Decoding ....................................116
4.3.5. Anti-collapse Processing ..........................120
4.3.6. Denormalization ...................................121
4.3.7. Inverse MDCT ......................................121
4.4. Packet Loss Concealment (PLC) ............................122
4.4.1. Clock Drift Compensation ..........................122
4.5. Configuration Switching ..................................123
4.5.1. Transition Side Information (Redundancy) ..........124
4.5.2. State Reset .......................................127
4.5.3. Summary of Transitions ............................128
5. Opus Encoder ..................................................131
5.1. Range Encoder ............................................132
Valin, et al. Standards Track [Page 3]
RFC 6716
Interactive Audio Codec September 2012
5.1.1. Encoding Symbols ..................................133
5.1.2. Alternate Encoding Methods ........................134
5.1.3. Encoding Raw Bits .................................135
5.1.4. Encoding Uniformly Distributed Integers ...........135
5.1.5. Finalizing the Stream .............................135
5.1.6. Current Bit Usage .................................136
5.2. SILK Encoder .............................................136
5.2.1. Sample Rate Conversion ............................137
5.2.2. Stereo Mixing .....................................137
5.2.3. SILK Core Encoder .................................138
5.3. CELT Encoder .............................................150
5.3.1. Pitch Pre-filter ..................................150
5.3.2. Bands and Normalization ...........................151
5.3.3. Energy Envelope Quantization ......................151
5.3.4. Bit Allocation ....................................151
5.3.5. Stereo Decisions ..................................152
5.3.6. Time-Frequency Decision ...........................153
5.3.7. Spreading Values Decision .........................153
5.3.8. Spherical Vector Quantization .....................154
6. Conformance ...................................................155
6.1. Testing ..................................................155
6.2. Opus Custom ..............................................156
7. Security Considerations .......................................157
8. Acknowledgements ..............................................158
9. References ....................................................159
9.1. Normative References .....................................159
9.2. Informative References ...................................159
Appendix A. Reference Implementation .............................163
A.1. Extracting the Source ....................................164
A.2. Up-to-Date Implementation ................................164
A.3. Base64-Encoded Source Code ...............................164
A.4. Test Vectors .............................................321
Appendix B. Self-Delimiting Framing ..............................321
Valin, et al. Standards Track [Page 4]
RFC 6716
Interactive Audio Codec September 2012
1. Introduction
The Opus codec is a real-time interactive audio codec designed to
meet the requirements described in [
REQUIREMENTS]. It is composed of
a layer based on Linear Prediction (LP) [
LPC] and a layer based on
the Modified Discrete Cosine Transform (MDCT) [
MDCT]. The main idea
behind using two layers is as follows: in speech, linear prediction
techniques (such as Code-Excited Linear Prediction, or CELP) code low
frequencies more efficiently than transform (e.g., MDCT) domain
techniques, while the situation is reversed for music and higher
speech frequencies. Thus, a codec with both layers available can
operate over a wider range than either one alone and can achieve
better quality by combining them than by using either one
individually.
The primary normative part of this specification is provided by the
source code in
Appendix A. Only the decoder portion of this software
is normative, though a significant amount of code is shared by both
the encoder and decoder.
Section 6 provides a decoder conformance
test. The decoder contains a great deal of integer and fixed-point
arithmetic that needs to be performed exactly, including all rounding
considerations, so any useful specification requires domain-specific
symbolic language to adequately define these operations.
Additionally, any conflict between the symbolic representation and
the included reference implementation must be resolved. For the
practical reasons of compatibility and testability, it would be
advantageous to give the reference implementation priority in any
disagreement. The C language is also one of the most widely
understood, human-readable symbolic representations for machine
behavior. For these reasons, this RFC uses the reference
implementation as the sole symbolic representation of the codec.
While the symbolic representation is unambiguous and complete, it is
not always the easiest way to understand the codec’s operation. For
this reason, this document also describes significant parts of the
codec in prose and takes the opportunity to explain the rationale
behind many of the more surprising elements of the design. These
descriptions are intended to be accurate and informative, but the
limitations of common English sometimes result in ambiguity, so it is
expected that the reader will always read them alongside the symbolic
representation. Numerous references to the implementation are
provided for this purpose. The descriptions sometimes differ from
the reference in ordering or through mathematical simplification
wherever such deviation makes an explanation easier to understand.
For example, the right shift and left shift operations in the
reference implementation are often described using division and
Valin, et al. Standards Track [Page 5]