The .xz File Format
===================
Version 1.2.0 (2024-01-19)
0. Preface
0.1. Notices and Acknowledgements
0.2. Getting the Latest Version
0.3. Version History
1. Conventions
1.1. Byte and Its Representation
1.2. Multibyte Integers
2. Overall Structure of .xz File
2.1. Stream
2.1.1. Stream Header
2.1.1.1. Header Magic Bytes
2.1.1.2. Stream Flags
2.1.1.3. CRC32
2.1.2. Stream Footer
2.1.2.1. CRC32
2.1.2.2. Backward Size
2.1.2.3. Stream Flags
2.1.2.4. Footer Magic Bytes
2.2. Stream Padding
3. Block
3.1. Block Header
3.1.1. Block Header Size
3.1.2. Block Flags
3.1.3. Compressed Size
3.1.4. Uncompressed Size
3.1.5. List of Filter Flags
3.1.6. Header Padding
3.1.7. CRC32
3.2. Compressed Data
3.3. Block Padding
3.4. Check
4. Index
4.1. Index Indicator
4.2. Number of Records
4.3. List of Records
4.3.1. Unpadded Size
4.3.2. Uncompressed Size
4.4. Index Padding
4.5. CRC32
5. Filter Chains
5.1. Alignment
5.2. Security
5.3. Filters
5.3.1. LZMA2
5.3.2. Branch/Call/Jump Filters for Executables
5.3.3. Delta
5.3.3.1. Format of the Encoded Output
5.4. Custom Filter IDs
5.4.1. Reserved Custom Filter ID Ranges
6. Cyclic Redundancy Checks
7. References
0. Preface
This document describes the .xz file format (filename suffix
".xz", MIME type "application/x-xz"). It is intended that this
this format replace the old .lzma format used by LZMA SDK and
LZMA Utils.
0.1. Notices and Acknowledgements
This file format was designed by Lasse Collin
<lasse.collin@tukaani.org> and Igor Pavlov.
Special thanks for helping with this document goes to
Ville Koskinen. Thanks for helping with this document goes to
Mark Adler, H. Peter Anvin, Mikko Pouru, and Lars Wirzenius.
This document has been put into the public domain.
0.2. Getting the Latest Version
The latest official version of this document can be downloaded
from <https://xz.tukaani.org/format/xz-file-format.txt>.
Specific versions of this document have a filename
xz-file-format-X.Y.Z.txt where X.Y.Z is the version number.
For example, the version 1.0.0 of this document is available
at <https://xz.tukaani.org/format/xz-file-format-1.0.0.txt>.
0.3. Version History
Version Date Description
1.2.0 2024-01-19 Added RISC-V filter and updated URLs in
Sections 0.2 and 7. The URL of this
specification was changed.
1.1.0 2022-12-11 Added ARM64 filter and clarified 32-bit
ARM endianness in Section 5.3.2,
language improvements in Section 5.4
1.0.4 2009-08-27 Language improvements in Sections 1.2,
2.1.1.2, 3.1.1, 3.1.2, and 5.3.1
1.0.3 2009-06-05 Spelling fixes in Sections 5.1 and 5.4
1.0.2 2009-06-04 Typo fixes in Sections 4 and 5.3.1
1.0.1 2009-06-01 Typo fix in Section 0.3 and minor
clarifications to Sections 2, 2.2,
3.3, 4.4, and 5.3.2
1.0.0 2009-01-14 The first official version
1. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC-2119].
Indicating a warning means displaying a message, returning
appropriate exit status, or doing something else to let the
user know that something worth warning occurred. The operation
SHOULD still finish if a warning is indicated.
Indicating an error means displaying a message, returning
appropriate exit status, or doing something else to let the
user know that something prevented successfully finishing the
operation. The operation MUST be aborted once an error has
been indicated.
1.1. Byte and Its Representation
In this document, byte is always 8 bits.
A "null byte" has all bits unset. That is, the value of a null
byte is 0x00.
To represent byte blocks, this document uses notation that
is similar to the notation used in [RFC-1952]:
+-------+
| Foo | One byte.
+-------+
+---+---+
| Foo | Two bytes; that is, some of the vertical bars
+---+---+ can be missing.
+=======+
| Foo | Zero or more bytes.
+=======+
In this document, a boxed byte or a byte sequence declared
using this notation is called "a field". The example field
above would be called "the Foo field" or plain "Foo".
If there are many fields, they may be split to multiple lines.
This is indicated with an arrow ("--->"):
+=====+
| Foo |
+=====+
+=====+
---> | Bar |
+=====+
The above is equivalent to this:
+=====+=====+
| Foo | Bar |
+=====+=====+
1.2. Multibyte Integers
Multibyte integers of static length, such as CRC values,
are stored in little endian byte order (least significant
byte first).
When smaller values are more likely than bigger values (for
example file sizes), multibyte integers are encoded in a
variable-length representation:
- Numbers in the range [0, 127] are copied as is, and take
one byte of space.
- Bigger numbers will occupy two or more bytes. All but the
last byte of the multibyte representation have the highest
(eighth) bit set.
For now, the value of the variable-length integers is limited
to 63 bits, which limits the encoded size of the integer to
nine bytes. These limits may be increased in the future if
needed.
The following C code illustrates encoding and decoding of
variable-length integers. The functions return the number of
bytes occupied by the integer (1-9), or zero on error.
#include <stddef.h>
#include <inttypes.h>
size_t
encode(uint8_t buf[static 9], uint64_t num)
{
if (num > UINT64_MAX / 2)
return 0;
size_t i = 0;
while (num >= 0x80) {
buf[i++] = (uint8_t)(num) | 0x80;
num >>= 7;
}
buf[i++] = (uint8_t)(num);
return i;
}
size_t
decode(const uint8_t buf[], size_t size_max, uint64_t *num)
{
if (size_max == 0)
return 0;
if (size_max > 9)
size_max = 9;
*num = buf[0] & 0x7F;
size_t i = 0;
while (buf[i++] & 0x80) {
if (i >= size_max || buf[i] == 0x00)
return 0;
*num |= (uint64_t)(buf[i] & 0x7F) << (i * 7);
}
return i;