2 SHA-1 is a Shambles
hash functions are classically applied on the message before signing it, in order to improve
efficiency and provide security guarantees. Informally, a cryptographic hash function
H
is a function that maps an arbitrarily long message
M
to a fixed-length hash value (we
denote
n
its bit size). Collision resistance is the main security property expected from a
hash function: it should be hard for an adversary to compute a collision, aka two distinct
messages
M
and
M
0
that map to the same hash value
H
(
M
) =
H
(
M
0
), where by “hard”
one means not faster than the generic 2
n/2
computations birthday attack.
A cryptanalyst will try to find a collision for the hash function at a reduced cost, but
ad-hoc collision attacks are hard to exploit in practice, because the attacker has then
usually little control over the value of the actual colliding messages (in particular where the
differences are inserted, which are the interesting parts when attacking a digital signature
scheme for example). Thus, one can consider stronger and more relevant variants of the
collision attack in practice, such as the so-called chosen-prefix collision [
SLdW07
] or CP
collision: two message prefixes
P
and
P
0
are first given as challenge to the adversary,
and his goal is to compute two messages
M
and
M
0
such that
H
(
P k M
) =
H
(
P
0
k M
0
),
where
k
denotes concatenation. With such ability, the attacker can obtain a collision even
though prefixes can be chosen arbitrarily (and thus potentially contain some meaningful
information). A CP collision can also be found generically with 2
n/2
computations (thus
2
80
for a 160-bit hash function like
SHA-1
), but ad-hoc CP collision attacks are much
more difficult to find than plain collision attacks, because of the random and completely
uncontrolled internal differences created by the prefixes. Yet, a CP collision attack was
found for the
MD5
hash function [
SLdW07
], eventually leading to the creation of colliding
X.509 certificates, and later of a rogue Certificate Authority (CA) [
SSA
+
09
]. CP collisions
have also been shown to break important internet protocols, including TLS, IKE, and
SSH [BL16], because they allow forgeries of the handshake messages.
Largely inspired by
MD4
[
Riv91
] and then
MD5
[
Riv92
],
SHA-1
is one the most famous
cryptographic hash functions in the world, having been the NIST and de-facto worldwide
hash function standard for nearly two decades. It remained a NIST standard until its
deprecation in 2011 (and disallowed to be used for digital signatures at the end of 2013).
Indeed, even though his successors
SHA-2
or
SHA-3
are believed to be secure,
SHA-1
has
been broken by a theoretical collision attack in 2004 [
WYY05
]. Due to its high technicality
and computational complexity (originally estimated to about 2
69
hash function calls), this
attack was only implemented in practice in 2017, using a large GPU cluster [
SBK
+
17
].
Because it took more than a decade to compute an actual collision and because plain
collisions are difficult to use directly to attack a protocol, the on-field
SHA-1
deprecation
process has been quite slow in practice and one can still observe many uses of
SHA-1
in
the wild unfortunately, migration being expensive. Web browsers have recently started to
reject certificates with
SHA-1
signatures, but there are still many users with older browsers,
and many protocols and software that allow
SHA-1
signatures. As observed in [
LP19
], it is
still possible to buy a
SHA-1
certificate from a trusted CA, many email clients accept a
SHA-1
certificate when opening a TLS connection, and
SHA-1
is also widely supported to
authenticate TLS handshake messages.
Very recently, a CP collision attack against
SHA-1
has been published [
LP19
], which
requires an estimated complexity between 2
66.9
and 2
69.4
SHA-1
computations. It works
with a two-phase strategy: given the challenge prefix and the random differences on the
internal state it will induce, the first part of the attack uses a birthday approach to limit
the internal state differences to a not-too-big subset (as done in [
SLdW07
,
Ste13b
]). From
this subset, reusing basic principles of the various collision search advances on
SHA-1
, one
slowly adds successive message blocks to come closer to a collision, eventually reaching
the goal after a dozen blocks. Even though these advances put the CP collisions within
practical reach for very well-funded entities, it remains very expensive to conduct and also
very difficult to deploy as the attack contains many very technical parts.