0. Introduction
1. Algorithm Details (Not Brute Force!)
a. Reference to Papers
b. How practical is this?
2. How long will this take?
3. How much disk space is needed?
4. How much bandwidth is needed?
5. About the prototype client
6. About J-L
0. INTRODUCTION
---------------------------------------
I'm writing a paper for the RSA conference calling for an attack on the MD5
algorithm. As you may or may not know, the MD5 algorithm is one of the most
widely used crypto algorithms and the cost to attack it has become very
affordable for a determined individual ($200,000USD - see below).
Wide-spread applications currently using MD5:
- /etc/shadow passwords in UNIX use it,
- ssh uses it to verify integrity of traffic,
- ssl uses it for digitally signing certificates and traffic integrity,
See the certificates for these 'secure' websites:
- Amazon.com
- Chapters.ca
- SourceForge.net
- Login.Yahoo.com
- Many pseudo-random number generators use it,
- Intrusion detection systems use it,
- The RPM package management (as well as others) uses it...
...and the list goes on!
So here's where the fun comes in. My colleague (Tom St Denis) and I have
developed a highly optimized prototype client to find collisions in MD5.
Thoughput has been measured at 5 MegaMD5s/sec comapred to OpenSSL's 1.3
MegaMD5s/sec on an AthlonXP 1700. If used in a distributed fashion and to
the same magnitude as the RC5-64 project, it will find it's first collision
in 2 years using approximately 70GB of space on a central server.
Shortly after this projected 2 years for the first collision, millions and
millions of MD5 collisions (or 'breaks' in MD5) will be discoverd. So 2
years of work is not wasted finding a single solution, but to find MANY
solutions.
1. ALGORITHM DETAILS (NOT BRUTE FORCE!)
---------------------------------------
a. REFERENCE TO PAPERS
======================================
Paul van Oorschot (former chief cryptographer at Entrust) wrote a paper back
in 1994 (acmccs94.pdf). Basically it
gave a design for a $10,000,000USD machine to break MD5 in 24 days. Over eight
years later, this machine would cost about $200,000USD. So MD5 needs to go,
but no one is taking it seriously! We need to change this.
I propose we use the same technique as the one in Paul's paper.
All functions with a finite number of outputs will fall into a cycle if you
do this:
x = <initial value>
loop {
x = f( x );
}
Simply replace 'f(x)' with the MD5 hash function and we have what is call a
random walk using MD5.
When you notice it has fallen into a cycle, you know two inputs produced the
same output. This is a collision. This is effectively breaks the
algorithm. The way the statistics work out, if you find one collision millions
will follow shortly afterwards.
You can break this task up into any number of processors by giving each node
a unique starting point to "walk" along. Whenever the top 32bits of the
hash output value are all zeros we call this a Distinguished Point or DP.
The client sends this to the central server, which will store the bottom 96bits
(since the top 32 are implicitly zeros) and the index of the DP for this user.
ie. jdoe submits their nth DP, the server stores
<low 96bits of hash>,<40bit long integer 'n'>
If the server is asked to store a DP already in the database, then we
know two "walks" have collided.
Say user1 and user2 have colliding DPs, their nth and mth respectively. We
know the true collision occurred somewhere between {n-1, n} and {m-1, m}. This
collision can be found relatively quickly.
b. HOW PRACTICAL IS THIS?
======================================
Unlike the DES3, OGR, RC5-64, CSC and other attacks, this attack doesn't have
to start over once an answer is found!!! The first collision is expected to
occur after 264 hash's. The next trillion collisions are expected to happen
shortly after!!! Within the first day after the critical statistical point of 264,
it's very likely we'll have enough data to fill many CDs!
The proposed attack is as follows. Bart wants to get $1,000,000USD from Homer,
but Homer will only sign a contract worth $1USD. So Bart will resort to using
distributed.net to create two contracts with the same MD5 hash; one Homer agreeing
to give Bart $1USD, another agreeing to give Bart $1,000,001USD. If Homer signs the $1USD
contract using MD5 and his public-key, the same signature will correctly verify the
$1,000,000USD contract.
A sample contract can be found here
2. HOW LONG WILL THIS TAKE?
---------------------------------------
Our prototype client gets about 5 MegaMD5s/sec on an AthlonXP 1700, comparable
to the same number of RC5-64 MegaKeys/sec on the same processor.
264 MD5 hash's will need to be calculated before we can expect to have a
collision. This is due to the birthday paradox. With 127.24 GigaKeys/sec
== 127.24 GigaMD5s/sec the same speed as the RC5-64 project, and extrapolating
Moore's law, 264 MD5s will be reached in 2 years.
3. HOW MUCH DISK SPACE IS NEEDED?
---------------------------------------
264 MD5s divided-by 232 MD5s/DP multiplied by
(96bits for the hash + 40bits for the index)
= 264 / 232 * 136
= approx 239 bits
= approx 26 GigaBytes
= approx 64GigaBytes
4. HOW MUCH BANDWIDTH IS REQUIRED?
---------------------------------------
At 5 MegaMD5s/sec, a DP is expected to be discovered once every 14mins.
That's 17 bytes ever 14mins. This is very low, even when ramped up to
50,000 users. This is similar to, or less, than the bandwidth required
by current projects.
5. ABOUT THE CLIENT
---------------------------------------
Right now it's all in x86 ASM. The ALU and the FPU are both used
simultaneously. A single CPU runs 3 concurrent MD5 random walks, 1 in the
ALU, 2 in the 64bit MMX registers. OpenSSL's benchmarks on an AthlonXP 1700
gets about 1.3 MegaMD5s/sec, we get about 5 MegaMD5s/sec. Intel CPUs have a
better MMX/FPU pipeline then the AMD, so the performance increase is
significant there. A bit-sliced implementation is being developed for
CPUs with deep pipelines.
Prototype source code, win32 and elf-x86 executables:
http://www.certainkey.com/dnet/md5col_cli2.zip (Licence: GPL)
To benchmark:
To prototype with random initial point:
6. ABOUT J-L
---------------------------------------
Jean-Luc has a small I.T. security firm based in Ottawa Ontario Canada
which develops cryptographic software and provides consulting services to
both public and private sectors. He did the AES implementation and some
analysis for Entrust Technologies before starting his own practice.
J-L was a speaker at the 2002 Ottawa International Linux Symposium (here)
where is discussed the AES, it's effects on standards, and the obsolescence of
128bit hash algorithms such as MD5.
J-L is a developer for the GNU/Linux CryptoAPI (aka. The Linux crypto patch -
here). Advocating the move away from MD5 and encouraging everyone to comply
with their countries crypto laws... as they themselves encourage their
governments to change these laws.
|