Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

1/22

Cryptanalysis of Hash Functions usingan approach based on SAT-solvers.

Jair Cazarin Villanueva 125535

Dr. Mauricio Osorio

February 2008


2/22

1. Introduction.

Cryptography is the study of mathematical techniques related to aspects of information

security such as confidentiality, data integrity, entity authentication, and data origin

authentication [1]. In other words, cryptography is about the prevention and detection ofcheating and other malicious activities. Cryptography hash functions produce hash

values, which concisely represent longer messages or documents from which they were

computed [2] and they are used for a variety purposes but mainly in cryptography.

Therefore, computer security depends heavily on the strength of hash functions.

Examples of hash functions are MD4 [7], MD5 [8] and the SHA family [9]. The main role

of cryptographic hash functions is in the provision of message integrity checks and digital

signature [3]. Its very clear that the formal analysis of their robustness is of outmostimportance, unfortunately several standard cryptographic hash functions were broken in

2005 [4]. Breaking in this case mean to find a way to efficiently produce different

messages which are mapped to the same hash value by some hash function, as would

compromise the security of applications in which this functions are used. So, the use of

other kind of approaches seems to be the next step toward a greater assurance of security.

On the other hand, weve Boolean Satisfiability (SAT) Solvers which attacks the problem

of determining if the variables of a given Boolean formula can be assigned in such a way

as to make the formula evaluate to true. The use of SAT-solvers in various applications is

increasing and a growing number of problems who were efficiently encoded into SAT are

successfully being tackled by these programs [5], furthermore the performances of the

algorithms of these programs have been increased [6]. Most of this applications are still

belong to the traditional domains of formal verification and artificial intelligence, and

although several applications of SAT solvers to cryptanalysis have been described in the

literature, their efforts have been failed to produce any attacks of interests to

cryptologists [4] until the research of Ilya Mironov and Lintao Zhang in [2].

In this paper, they described that some of the attacks to this hash functions can be

automated by encoding them as CNF Formulas, which are the within reach of modern


3/22

SAT-solvers, and with this transformation delegate the more laborious part of the attack

to the them, creating the first example of a SAT-solver aided cryptanalysis of a non-trivial

cryptographic primitive.

The strategy was based on the fact that the original attacks consisted of several steps each

of which involves a lot of bit-tweaking and manual work, causing to keep track as many

as 122 Boolean conditions in the simplest function, using this, they found a way to

transform this conditions to a CNF formulas which are then used by a SAT-solver in order

to automate certain parts of the attack obviating the need for compiling tables of

sufficient conditions and designing clever message modification techniques.

So far, current research in this field found some ways to automate certain parts of these

attacks, its still needed to verify and test this approaches in order to find a complete

automation and/or improvements in both areas. This can be accomplished in two ways,

one can be creating a new toolkit for cryptanalysts and the other one is improving SAT-

solvers for specific cryptography problems.

2. Objectives.

2.1 General Objective.

To design and test this approach with different kind of SAT-solvers with the purpose to

find the most suitable one, then implement a semiautomatic testing tool to help

cryptanalysts in order to find weaknesses in hash functions, and through this tool,

analyze more in depth trying to detect possible improvements of the SAT-solvers

algorithms for this specific kind of problems.

2.2 Specific Objectives. Understand the theory and construction of hash functions including its principles. Study the MD4 and MD5 family of Hash Functions and how it really works. Interpret the problem of Boolean satisfiability including its complexity and

variations.


4/22

Install and use the MiniSAT SAT-solver. Understand how the attacks on Hash functions run. Study of the discovery process of the differential path for MD4 and MD5.

Realize how to automate the differential path via SAT-solvers. Discover one or two more SAT-solvers in order to compare results with Mini-SAT. Generate full collisions for MD4 and Md5 hash functions.

3. Research Scope.

This thesis is focused on working with the MDx family of hash functions designed by Ron

Rivest. Although there are others state-of-the-art hash functions like SHA-0 and SHA-1

and the foundations of this hash functions share similar design principles, its very well

known that the attack on SHA-1 is just theoretical [10] and also for SHA-0 generate a full

collision would require 3 million CPU hours using common SAT-Solvers [4].

Moreover, the first two stages of the attacks are usually done by hand or by applying

some of heuristics that implies a lot of creativity, therefore until now its not possible to

develop a full automatic attack and thus were going to replicate just the attacks already

known by the literature.

In the case of SAT-solvers well use MiniSAT [11] as main SAT-solver and if time

constraints permit us, we are going to be able to test two or more different SAT-solvers

that are going to be chosen as the research advance.

4. Hardware y Software.

Its required a simple personal computer under the minimal characteristics in both

processing and storage for the execution of the distinct approaches and algorithms in this

research, for this reason it will be used a laptop with the following features:

Dell XPS M1210 Core 2 Duo Intel 2 GHz.


5/22

3 GB of RAM.For the software, right now weve only taken into account MiniSAT [11] which is a

minimalistic, open source SAT solver with the purpose of help researchers and developers

with projects related on SAT. Probably also were going to use SATELITE that is a CNF

minimizer and preprocessor to the MiniSAT. Some reasons of choosing MiniSAT are its

key features like efficiency, integration and easy modification, but more important its

performance on SAT competitions [6].

5. Problem Statement.

5.1 Theory of Hash Functions.Cryptographic hash functions also known as one way hash functions are a major tool in

modern cryptography.

Hash functions are defined as a computationally efficient function mapping binary string

or arbitrary length to binary strings of some fixed length, called hash-values [1]. The basic

idea is that a hash-value serves as a compact representative of an input string. In the

cryptographic field, a hash function h is typically chosen such that it is computationally

infeasible to find two distinct inputs which hash to a common value, this means find

inputs x and y such that h(x) = h(y) (this is called the collision-resistant property and is

recognized as the gold standard of security hash functions [13]) and also that a given

specific hash-value, it is computationally infeasible to find an input x such that h(x) = y

(This is known as preimage resistant).

The formal definition of a cryptographic hash function is a mapping:

: {0,1} {0,1 }

Where {0,1}* denotes the set of bit strings of arbitrary length. The image h(X) of some

messageI {0,1} is called the hash value of X [16].


6/22

The most common cryptographic uses of hash functions are with digital signatures and

for data integrity. With digital signatures, a long message is usually hashed and only the

hash-value is signed (this is known as signature schemes) [14]. The party receiving the

message then hashes the received message and verifies that the received signature is

correct for this hash-value. Note here that the inability to find two messages with the

same hash-value is a security requirement, since otherwise; the signature on one message

hash-value would be the same as that on another, allowing a signer to sign one message

at a later point in time claim to have signed another. In the case of data integrity the

hash-value corresponding to a particular input is computed at some point in time. The

integrity of this hash-value is protected in some manner. At a subsequent point in time,

to verify that the input data has not been altered, the hash-value is recomputed using theinput at hand, and compared for equality with the original hash-value. Specific

applications include virus protection and software distribution.

Now its very important to differentiate a weak one-way hash function, and a strong one-

way hash function, so lets define each one.

A weak one-way hash function is a function Hsuch that [14]:

Hcan be applied to any argument of any size. Hproduces a fixed size output. Given Handy, its easy to compute H(y), this means that they are computable in

polynomial time.

Given Hand a suitably choseny, its computationally infeasible to findy y suchthat H(y) = H(y).

A strong one-way hash function is a function Hsuch that [14]:

Hcan be applied to any argument of any size. Hproduces a fixed size output (larger than a weak hash function). Given Handy, its easy to compute H(y).


7/22

Given H, its computationally infeasible to find any pair y, ysuch that y yandH(y) = H(y).

The main differences between strong and weak hash functions are that they are easier to

use in systems design because there are no pre-conditions on the select ofy, and they

provide the full claimed level of security even when used repeatedly. In this thesis

research, we will use just strong one-way hash functions.

Also in the literature we can find a different kind of taxonomy for hash functions [17]:

Preimage resistant, if given hash value y, it is computationally infeasible to find amessagex with h(x) = y.

Second preimage resistant, if given a message y, its computationally infeasible tofind a messagex y with h(x) = h(y).

Collision Resistant, if its computationally infeasible to find a collision, that is apair of two different messagesy andywith h(y) = h(y).

We can infer that one-way is equivalent to preimage resistant and a weak hash function is

second preimage resistant. These properties are going to be used forward when we start

talking about hash function construction.

A first standard hash function was MD4 (designed by Ron Rivest [7]), then was followed

by a better version called MD5. Then it appeared the first NIST-approved hash function,

SHA-0 (Secure Hash Algorithm) [9] which adopted the general structure of MD4 and two

years later was replaced by a new version, SHA1 [4]. Actually just two hash functions were

in wide-spread use: MD5 and SHA-1. One of the first persons who studied the

construction of hash function was Damgard from Aarhus University in Denmark, and a

theory in collision free hash functions construction was to consider families of hash

functions instead of just one hash function, in order to make complexity theoretic

treatment possible [14]. Under this statement we can say that MD4, MD5, SHA-0 and

SHA-1 belong to one family.


8/22

Most hash functions have a

termed a compression functio

message depends on what is

chaining variable has a fixed i

compression function is the

suitably complex way under

hashed. This process contin

under the action of different p

The final value of the chainin

that message. Later well stud

a collision-resistant compressi

Figure 1. The use of a compres

5.2 MD4.

MD4 is a message-digest alg

words. The message is padded

64-bit binary representation o

the message [17]. The messag

in three distinct rounds.

similar iterative structure which is based

n [12]. In summary, the computation of th

called a chaining variable. At the start

itial value which is specified as a part of the

used to update the value of this chainin

he action and influence of the part of the

es recursively, with the chaining variable

arts of the message, until the entire messag

variable is then output as the hash value c

some of the approaches in the basic const

on function.

sion functions in an iterative hash function.

follow the same design.

rithm developed by Rivest in 1990 and op

to ensure that its length in bits plus 64 is di

f the original length of the message is then

is processed in 512-bit blocks, and each bl

round what is

hash value for

f hashing, this

algorithm. The

g variable in a

message being

being updated

has been used.

rresponding to

uction block of

MDx and SHAx

rates on 32-bit

visible by 512. A

oncatenated to

ck is processed


9/22

Attacks on versions of MD4 with either the first or the last rounds missing we developed

very quickly by Den Boer, Bossealaers et al [18]. Also [19] has shown how collisions for the

full version of MD4 can be found in under a minute on typical PC.

5.3 MD5.

Some weaknesses that might lead to a compromised were discovered on MD4, so RSA has

to improve it and MD5 born in 1991 (Also by Rivest). It is basically MD5 with safety-belts

and while it is slightly slower than MD4, its more secure [17]. The algorithm consists of

four distinct rounds, which has slightly different design from that of MD4. Message-

digest size, as well as padding requirements, remains the same.

Attacking MD5 is a much more involved proposition than attacking MD4 since it is farmore complicated algorithm to analyze.

5.4 Overview of the attacks on hash functions.

The attacks on the MDx family of hash algorithms are very similar. We can summary in

finding a 512-bit message such that H(IV, M) = H(IV, M o ),where His the compression

function and is fixed [4]. Also as [2] said the complexity of this attack is 2, where n =

128 or 160. The trick is on the choice of a good and use some techniques to find a M that

take advantage of the weaknesses of the compression function bring the complexity of the

attack to fewer 2&$evaluations of the hash function.

In more detail, these attacks are divided in four stages [2]:

1. Choose ", , #'. Here stands for both xor and mod 2%$.2. Choose a differential path " , , # , where r is the number of rounds (r = 48, 64

or 80).

3. Find a set of sufficient conditions on the message M = ", , #', and theintermediate variables , , that guarantee that the message pair M, M =

""", , #'

#' follows the differential path " , , #

.

4. Choose a message M such that all sufficient conditions hold.


10/22


11/22

An equivalent formulation is to say that each clause should have at least one literal that is

true under the assignment. Such a clause is the said to be satisfied. If the is no assignment

satisfying all clauses, the CNF is said to be unsatisfiable.

An example of what an instance of SAT looks like:

SAT is a typical search problem. We are given an instance I (that is, some input data

specifying the problem at hand, in this case a Boolean formula in a conjunctive normal

form), and we are asked to find a solution S (an object that meets a particular

specification, in this case an assignment that satisfies each clause). If no such solution

exists, we must say so.

5.6 Sat-Solvers Algorithms.

A wide variety of techniques have been developed for solve SAT instances, as a result all

of them can be classified as either complete or approximate. Complete methods

systematically examine the entire solution (if one exists) in bounded time or otherwise

return that the formula is unsatisfiable. In this thesis were going to focus on software

that implements a complete method, but further we can analyze the behavior of some

approximate methods.

5.6.1 DPLL Algorithm.

The original Davis-Putnam procedure was based on a resolution rule that eliminated the

variables one-by-one and added all possible resolvents to the set of clauses; this was

known as DP Method. Unfortunately, this procedure requires exponential spaces,

therefore quickly was replaced the resolution rule with a splitting rule which divides the

problem into two smaller sub problems, this was known as DPLL because of theirauthors, Davis, G, Logemman and Donald Loveland in 1962 [25].

This is the fastest known algorithm for satisfiability testing that is no just sound, but also

complete. In summary, DPLL is depth-first search with backtracking and unit

propagataion.


12/22


13/22

With good data structures, we can implement unit propagation to take linear time in the

size of the input set of clauses.

5.6.2 Stochastic Local Search Algorithms.

Approximate SAT algorithms have gained widespread attention because they offer acomputationally feasible approach to finding high-quality solutions to NP-hard problems

in a scalable and efficient manner [26].

SLS (Stochastic Local Search Algorithms) generally involve taking a candidate solution

and performing some sort of perturbation which results in one or more new candidate

solutions. An evaluation function is then used to determine which of the candidate

solutions should be accepted. Also this kind of algorithms included two operations called

intensification and diversification [27]. Intensification is a means of greedily improving

solution quality within a small area of the search space for a local optimum, while

diversification helps to prevent stagnation by ensuring that contain only suboptimal

solutions. Incorporating some form of randomness has proved to an efficient

diversification mechanism, while intensification can be achieved through a variety of

techniques such as iterative improvement or the selection step in a genetic algorithm.

5.7 MiniSAT.Minisat was described in the paper An Extansible SAT-solver by Niklas En and Niklas

Sorensson from the Chalmers University of Technology in Sweden [4]. Because of the

growing number of problems encoded into SAT, the found that modifies an existing

solver with an understanding of the problem domain and of modern SAT-techniques, is

was so difficult. For this reason, they developed a small, complete and efficient SAT-

solver with the purpose to give the sufficient details about implementation enable

researchers around the world to construct his o her own solver in a very short time, in

order to meet the needs of a particular application area.

The ideas behind MiniSAT are based on conflict-driven backtracking, watched literals

and dynamic variables ordering. MiniSAT was implemented in C++. Later, well analyze

more in depth internal algorithms of MiniSAT.


14/22

6. Bibliography already reviewed.[2] Dejan Jovanovic and Predrag Janicic. Logical analysis of hash functions. Pages 200215.

Springer Verlag, 2005.

[3] RSA Laboratories - 2.1.6 What is a hash function?.

http://www.rsa.com/rsalabs/node.asp?id=2176.

[4] Ilya Mironov, Lintao Zhang. Applications of SAT Solvers to cryptanalysis of hash functions.

[6] SAT Competitions. http://www.satcompetition.org/.

[7] RFC 1186 (rfc1186) - MD4 Message Digest Algorithm. http://www.faqs.org/rfcs/rfc1186.html.

[8] RFC 1321 (rfc1321) - The MD5 Message-Digest Algorithm.

http://www.faqs.org/rfcs/rfc1321.html.

[9] RFC 3174 (rfc3174) - US Secure Hash Algorithm 1 (SHA1).

http://www.faqs.org/rfcs/rfc3174.html.

[11] MiniSat Page. http://minisat.se/.

[12] Ivan Damgard. Collision fre hash functions and public key signature schemes. In David

Chaum and Win L. Price, editors, Advances in Cryptology. Springer, 1988.

[14] Ivan Damgard. A design principle for hash functions. In advances in cryptology. Springer,

1990.

[15] Brassard, Gilles. One way hash functions and DES. Advances in Cryptology. Berlin: Springer-

Verlag, 1990. [16] El de MD4

[18] Crypto FAQ RSA http://www.rsa.com/rsalabs/node.asp?id=2253

6. Bibliography partially reviewed.[1] Menezes, a. et al. Handbook of Applied Cryptography. Boca Raton: CRC Press, 1997

[5] Niklas Een and Niklas Sorensson. An extensible SAT Solver. SAT 2003.

[10] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding collisions in the full SHA-1.

[17] Ilya Mironov. Hash functions, theory, attacks and applications.

[19] B. den Boer and A. Bosselaers, An attack on the last two rounds of MD4,Advances in

Cryptology - Crypto '91, Springer-Verlag (1992), 194-203.

[20] H. Dobbertin, Alf Swindles Ann, CryptoBytes (3) 1 (Autumn 1995).


15/22

[21] Gabbay, Dov and Christopher Hogger. Handbook of Logic in Artificial Intelligence and Logic

Programming. Oxford: Clarendon Press, 1993.

[22] I.P. Gent and T. Walsh, "The search for Satisfaction", Internal Report, Dept. of Computer

Science, University of Strathclyde, 1999

[23] Algorithms for the Satisfiability (SAT) Problem: A survey. J. Gu, P. W. Purdom, J. Franco, and

B. W. Wah, in "Satisfiability Problem: Theory and Applications", DIMACS Series in Discrete

Mathematics and Theoretical Computer Science, American Mathematical Society, 1997, pp. 19-152.

[24] Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani. Algorithms. McGrawHill.

[25] Davis, M., G. Logemann, and D. Loveland (1962, July). A machine program for theorem-

proving. Commun. ACM 5 (7), 394-397.

[26] Holger H. Hoos and Thomas Sttzle: Stochastic local search: foundations and applications

(2005).

[27] Li, C.M., and Anbulagan. Heuristics based on unit propagation for satisfiability problems. In

Proc. 15th IJCAI. 1997.

7. Bibliography to Review.* Philip Hawkes, Michael Paddon, Gregory G. Rose: Musings on the Wang et al. MD5Collision, Cryptology ePrint Archive, Report 2004/264, 13 October 2004.

* M.J.B. Robshaw, On Recent Results for MD2, MD4 and Md5. RSA Laboratories Bulletin, News

and advice from RSA Laboratories. Number 4. Nomver 12, 1996.

* Hans Dobbertin, The Status of MD5 after a Recent Attack. RSA Laboratories, CryptoBytes, The

technical newsletter of RSA laboratories, a division of RSA Data Security, INC. Number 2, Summer

1996.

* Ilya Mironov. Hash functions: Theory, attacks and applications. November 14, 2005.

* Ilya Mironov. Hash Functions: From Merkle-Damgard to Shoup. Computer Science Department,

Stanford University.

* Preneel Bart. Analysis and Design of Cryptographic Hash Functions. February 2003.

* Klima Vlastimil. Finding MD5 Collisions on a Notebook PC Using Multi-message Modifications.

March 31, 2005.

* Propositional Logic, Class Notes for CS264A, UCLA.

* Goldberg Evgueni and Yakov Novikov. BerkMin: a Fast and Robust Sat-Solver.


16/22

* Niklas Een and Armin Biere. Effective Preprocessing in SAT through Variable and Clause

Elimination.

* Marques-Silva Joao et al. GRASP: A search Algorithm for a propositional satisfiability.

* Irinas Rish and Rina Dechter. Resolution versus Search: Two strategies for SAT.

* Fabio Massaci. Using WALK-SAT and Rel-SAT for Cryptographic Key Search.

Appendix 1.

MD4 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and thatwe wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple of 8,and it may be arbitrarily large. We imagine the bits of the message

written down as follows:

m_0 m_1 ... m_{b-1} .

The following five steps are performed to compute the message digestof the message.

Step 1. Append padding bits

The message is "padded" (extended) so that its length (in bits)is congruent to 448, modulo 512. That is, the message isextended so that it is just 64 bits shy of being a multiple of512 bits long. Padding is always performed, even if the lengthof the message is already congruent to 448, modulo 512 (in

which case 512 bits of padding are added).

Padding is performed as follows: a single "1" bit is appendedto the message, and then enough zero bits are appended so thatthe length in bits of the padded message becomes congruent to448, modulo 512.

Step 2. Append length

A 64-bit representation of b (the length of the message beforethe padding bits were added) is appended to the result of theprevious step. In the unlikely event that b is greater than2^64, then only the low-order 64 bits of b are used. (These

bits are appended as two 32-bit words and appended low-orderword first in accordance with the previous conventions.)

At this point the resulting message (after padding with bitsand with b) has a length that is an exact multiple of 512 bits.Equivalently, this message has a length that is an exactmultiple of 16 (32-bit) words. Let M[0 ... N-1] denote thewords of the resulting message, where N is a multiple of 16.

Step 3. Initialize MD buffer


17/22


18/22

[B C D A 11 19][A B C D 12 3][D A B C 13 7][C D A B 14 11][B C D A 15 19]

[Round 2]Let [A B C D i s] denote the operation

A = (A + g(B,C,D) + X[i] + 5A827999)


19/22

[D A B C 9 9][C D A B 5 11][B C D A 13 15][A B C D 3 3][D A B C 11 9][C D A B 7 11][B C D A 15 15]

Then perform the following additions:A = A + AAB = B + BB

C = C + CCD = D + DD

(That is, each of the four registers is incremented bythe value it had before this block was started.)

end /* of loop on i */

Step 5. Output

The message digest produced as output is A,B,C,D. That is, webegin with the low-order byte of A, and end with the high-orderbyte of D.

Md5 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and that

we wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple ofeight, and it may be arbitrarily large. We imagine the bits of themessage written down as follows:

m_0 m_1 ... m_{b-1}

The following five steps are performed to compute the message digestof the message.

3.1 Step 1. Append Padding Bits

The message is "padded" (extended) so that its length (in bits) iscongruent to 448, modulo 512. That is, the message is extended sothat it is just 64 bits shy of being a multiple of 512 bits long.Padding is always performed, even if the length of the message is

already congruent to 448, modulo 512.

Padding is performed as follows: a single "1" bit is appended to themessage, and then "0" bits are appended so that the length in bits ofthe padded message becomes congruent to 448, modulo 512. In all, atleast one bit and at most 512 bits are appended.

3.2 Step 2. Append Length

A 64-bit representation of b (the length of the message before the


20/22

padding bits were added) is appended to the result of the previousstep. In the unlikely event that b is greater than 2^64, then onlythe low-order 64 bits of b are used. (These bits are appended as two32-bit words and appended low-order word first in accordance with theprevious conventions.)

At this point the resulting message (after padding with bits and withb) has a length that is an exact multiple of 512 bits. Equivalently,this message has a length that is an exact multiple of 16 (32-bit)words. Let M[0 ... N-1] denote the words of the resulting message,where N is a multiple of 16.

3.3 Step 3. Initialize MD Buffer

A four-word buffer (A,B,C,D) is used to compute the message digest.Here each of A, B, C, D is a 32-bit register. These registers areinitialized to the following values in hexadecimal, low-order bytesfirst):

word A: 01 23 45 67

word B: 89 ab cd efword C: fe dc ba 98word D: 76 54 32 10

3.4 Step 4. Process Message in 16-Word Blocks

We first define four auxiliary functions that each take as inputthree 32-bit words and produce as output one 32-bit word.

F(X,Y,Z) = XY v not(X) ZG(X,Y,Z) = XZ v Y not(Z)H(X,Y,Z) = X xor Y xor ZI(X,Y,Z) = Y xor (X v not(Z))

In each bit position F acts as a conditional: if X then Y else Z.The function F could have been defined using + instead of v since XYand not(X)Z will never have 1's in the same bit position.) It isinteresting to note that if the bits of X, Y, and Z are independentand unbiased, the each bit of F(X,Y,Z) will be independent andunbiased.

The functions G, H, and I are similar to the function F, in that theyact in "bitwise parallel" to produce their output from the bits of X,Y, and Z, in such a manner that if the corresponding bits of X, Y,and Z are independent and unbiased, then each bit of G(X,Y,Z),H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note thatthe function H is the bit-wise "xor" or "parity" function of its

inputs.

This step uses a 64-element table T[1 ... 64] constructed from thesine function. Let T[i] denote the i-th element of the table, whichis equal to the integer part of 4294967296 times abs(sin(i)), where iis in radians. The elements of the table are given in the appendix.

Do the following:

/* Process each 16-word block. */


21/22

For i = 0 to N/16-1 do

/* Copy block i into X. */For j = 0 to 15 do

Set X[j] to M[i*16+j].end /* of loop on j */

/* Save A as AA, B as BB, C as CC, and D as DD. */AA = ABB = B

CC = CDD = D

/* Round 1. *//* Let [abcd k s i] denote the operation

a = b + ((a + F(b,c,d) + X[k] + T[i])


22/22

end /* of loop on i */

3.5 Step 5. Output

The message digest produced as output is A, B, C, D. That is, webegin with the low-order byte of A, and end with the high-order byteof D.

This completes the description of MD5. A reference implementation inC is given in the appendix.

4. Summary

The MD5 message-digest algorithm is simple to implement, and providesa "fingerprint" or message digest of a message of arbitrary length.It is conjectured that the difficulty of coming up with two messageshaving the same message digest is on the order of 2^64 operations,and that the difficulty of coming up with any message having a givenmessage digest is on the order of 2^128 operations. The MD5 algorithm

has been carefully scrutinized for weaknesses. It is, however, arelatively new algorithm and further security analysis is of coursejustified, as is the case with any new proposal of this sort.

Documents

Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)