Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

Embed Size (px)

Citation preview

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    1/22

    Cryptanalysis of Hash Functions usingan approach based on SAT-solvers.

    Jair Cazarin Villanueva 125535

    Dr. Mauricio Osorio

    February 2008

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    2/22

    1. Introduction.

    Cryptography is the study of mathematical techniques related to aspects of information

    security such as confidentiality, data integrity, entity authentication, and data origin

    authentication [1]. In other words, cryptography is about the prevention and detection ofcheating and other malicious activities. Cryptography hash functions produce hash

    values, which concisely represent longer messages or documents from which they were

    computed [2] and they are used for a variety purposes but mainly in cryptography.

    Therefore, computer security depends heavily on the strength of hash functions.

    Examples of hash functions are MD4 [7], MD5 [8] and the SHA family [9]. The main role

    of cryptographic hash functions is in the provision of message integrity checks and digital

    signature [3]. Its very clear that the formal analysis of their robustness is of outmostimportance, unfortunately several standard cryptographic hash functions were broken in

    2005 [4]. Breaking in this case mean to find a way to efficiently produce different

    messages which are mapped to the same hash value by some hash function, as would

    compromise the security of applications in which this functions are used. So, the use of

    other kind of approaches seems to be the next step toward a greater assurance of security.

    On the other hand, weve Boolean Satisfiability (SAT) Solvers which attacks the problem

    of determining if the variables of a given Boolean formula can be assigned in such a way

    as to make the formula evaluate to true. The use of SAT-solvers in various applications is

    increasing and a growing number of problems who were efficiently encoded into SAT are

    successfully being tackled by these programs [5], furthermore the performances of the

    algorithms of these programs have been increased [6]. Most of this applications are still

    belong to the traditional domains of formal verification and artificial intelligence, and

    although several applications of SAT solvers to cryptanalysis have been described in the

    literature, their efforts have been failed to produce any attacks of interests to

    cryptologists [4] until the research of Ilya Mironov and Lintao Zhang in [2].

    In this paper, they described that some of the attacks to this hash functions can be

    automated by encoding them as CNF Formulas, which are the within reach of modern

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    3/22

    SAT-solvers, and with this transformation delegate the more laborious part of the attack

    to the them, creating the first example of a SAT-solver aided cryptanalysis of a non-trivial

    cryptographic primitive.

    The strategy was based on the fact that the original attacks consisted of several steps each

    of which involves a lot of bit-tweaking and manual work, causing to keep track as many

    as 122 Boolean conditions in the simplest function, using this, they found a way to

    transform this conditions to a CNF formulas which are then used by a SAT-solver in order

    to automate certain parts of the attack obviating the need for compiling tables of

    sufficient conditions and designing clever message modification techniques.

    So far, current research in this field found some ways to automate certain parts of these

    attacks, its still needed to verify and test this approaches in order to find a complete

    automation and/or improvements in both areas. This can be accomplished in two ways,

    one can be creating a new toolkit for cryptanalysts and the other one is improving SAT-

    solvers for specific cryptography problems.

    2. Objectives.

    2.1 General Objective.

    To design and test this approach with different kind of SAT-solvers with the purpose to

    find the most suitable one, then implement a semiautomatic testing tool to help

    cryptanalysts in order to find weaknesses in hash functions, and through this tool,

    analyze more in depth trying to detect possible improvements of the SAT-solvers

    algorithms for this specific kind of problems.

    2.2 Specific Objectives. Understand the theory and construction of hash functions including its principles. Study the MD4 and MD5 family of Hash Functions and how it really works. Interpret the problem of Boolean satisfiability including its complexity and

    variations.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    4/22

    Install and use the MiniSAT SAT-solver. Understand how the attacks on Hash functions run. Study of the discovery process of the differential path for MD4 and MD5.

    Realize how to automate the differential path via SAT-solvers. Discover one or two more SAT-solvers in order to compare results with Mini-SAT. Generate full collisions for MD4 and Md5 hash functions.

    3. Research Scope.

    This thesis is focused on working with the MDx family of hash functions designed by Ron

    Rivest. Although there are others state-of-the-art hash functions like SHA-0 and SHA-1

    and the foundations of this hash functions share similar design principles, its very well

    known that the attack on SHA-1 is just theoretical [10] and also for SHA-0 generate a full

    collision would require 3 million CPU hours using common SAT-Solvers [4].

    Moreover, the first two stages of the attacks are usually done by hand or by applying

    some of heuristics that implies a lot of creativity, therefore until now its not possible to

    develop a full automatic attack and thus were going to replicate just the attacks already

    known by the literature.

    In the case of SAT-solvers well use MiniSAT [11] as main SAT-solver and if time

    constraints permit us, we are going to be able to test two or more different SAT-solvers

    that are going to be chosen as the research advance.

    4. Hardware y Software.

    Its required a simple personal computer under the minimal characteristics in both

    processing and storage for the execution of the distinct approaches and algorithms in this

    research, for this reason it will be used a laptop with the following features:

    Dell XPS M1210 Core 2 Duo Intel 2 GHz.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    5/22

    3 GB of RAM.For the software, right now weve only taken into account MiniSAT [11] which is a

    minimalistic, open source SAT solver with the purpose of help researchers and developers

    with projects related on SAT. Probably also were going to use SATELITE that is a CNF

    minimizer and preprocessor to the MiniSAT. Some reasons of choosing MiniSAT are its

    key features like efficiency, integration and easy modification, but more important its

    performance on SAT competitions [6].

    5. Problem Statement.

    5.1 Theory of Hash Functions.Cryptographic hash functions also known as one way hash functions are a major tool in

    modern cryptography.

    Hash functions are defined as a computationally efficient function mapping binary string

    or arbitrary length to binary strings of some fixed length, called hash-values [1]. The basic

    idea is that a hash-value serves as a compact representative of an input string. In the

    cryptographic field, a hash function h is typically chosen such that it is computationally

    infeasible to find two distinct inputs which hash to a common value, this means find

    inputs x and y such that h(x) = h(y) (this is called the collision-resistant property and is

    recognized as the gold standard of security hash functions [13]) and also that a given

    specific hash-value, it is computationally infeasible to find an input x such that h(x) = y

    (This is known as preimage resistant).

    The formal definition of a cryptographic hash function is a mapping:

    : {0,1} {0,1 }

    Where {0,1}* denotes the set of bit strings of arbitrary length. The image h(X) of some

    messageI {0,1} is called the hash value of X [16].

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    6/22

    The most common cryptographic uses of hash functions are with digital signatures and

    for data integrity. With digital signatures, a long message is usually hashed and only the

    hash-value is signed (this is known as signature schemes) [14]. The party receiving the

    message then hashes the received message and verifies that the received signature is

    correct for this hash-value. Note here that the inability to find two messages with the

    same hash-value is a security requirement, since otherwise; the signature on one message

    hash-value would be the same as that on another, allowing a signer to sign one message

    at a later point in time claim to have signed another. In the case of data integrity the

    hash-value corresponding to a particular input is computed at some point in time. The

    integrity of this hash-value is protected in some manner. At a subsequent point in time,

    to verify that the input data has not been altered, the hash-value is recomputed using theinput at hand, and compared for equality with the original hash-value. Specific

    applications include virus protection and software distribution.

    Now its very important to differentiate a weak one-way hash function, and a strong one-

    way hash function, so lets define each one.

    A weak one-way hash function is a function Hsuch that [14]:

    Hcan be applied to any argument of any size. Hproduces a fixed size output. Given Handy, its easy to compute H(y), this means that they are computable in

    polynomial time.

    Given Hand a suitably choseny, its computationally infeasible to findy y suchthat H(y) = H(y).

    A strong one-way hash function is a function Hsuch that [14]:

    Hcan be applied to any argument of any size. Hproduces a fixed size output (larger than a weak hash function). Given Handy, its easy to compute H(y).

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    7/22

    Given H, its computationally infeasible to find any pair y, ysuch that y yandH(y) = H(y).

    The main differences between strong and weak hash functions are that they are easier to

    use in systems design because there are no pre-conditions on the select ofy, and they

    provide the full claimed level of security even when used repeatedly. In this thesis

    research, we will use just strong one-way hash functions.

    Also in the literature we can find a different kind of taxonomy for hash functions [17]:

    Preimage resistant, if given hash value y, it is computationally infeasible to find amessagex with h(x) = y.

    Second preimage resistant, if given a message y, its computationally infeasible tofind a messagex y with h(x) = h(y).

    Collision Resistant, if its computationally infeasible to find a collision, that is apair of two different messagesy andywith h(y) = h(y).

    We can infer that one-way is equivalent to preimage resistant and a weak hash function is

    second preimage resistant. These properties are going to be used forward when we start

    talking about hash function construction.

    A first standard hash function was MD4 (designed by Ron Rivest [7]), then was followed

    by a better version called MD5. Then it appeared the first NIST-approved hash function,

    SHA-0 (Secure Hash Algorithm) [9] which adopted the general structure of MD4 and two

    years later was replaced by a new version, SHA1 [4]. Actually just two hash functions were

    in wide-spread use: MD5 and SHA-1. One of the first persons who studied the

    construction of hash function was Damgard from Aarhus University in Denmark, and a

    theory in collision free hash functions construction was to consider families of hash

    functions instead of just one hash function, in order to make complexity theoretic

    treatment possible [14]. Under this statement we can say that MD4, MD5, SHA-0 and

    SHA-1 belong to one family.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    8/22

    Most hash functions have a

    termed a compression functio

    message depends on what is

    chaining variable has a fixed i

    compression function is the

    suitably complex way under

    hashed. This process contin

    under the action of different p

    The final value of the chainin

    that message. Later well stud

    a collision-resistant compressi

    Figure 1. The use of a compres

    5.2 MD4.

    MD4 is a message-digest alg

    words. The message is padded

    64-bit binary representation o

    the message [17]. The messag

    in three distinct rounds.

    similar iterative structure which is based

    n [12]. In summary, the computation of th

    called a chaining variable. At the start

    itial value which is specified as a part of the

    used to update the value of this chainin

    he action and influence of the part of the

    es recursively, with the chaining variable

    arts of the message, until the entire messag

    variable is then output as the hash value c

    some of the approaches in the basic const

    on function.

    sion functions in an iterative hash function.

    follow the same design.

    rithm developed by Rivest in 1990 and op

    to ensure that its length in bits plus 64 is di

    f the original length of the message is then

    is processed in 512-bit blocks, and each bl

    round what is

    hash value for

    f hashing, this

    algorithm. The

    g variable in a

    message being

    being updated

    has been used.

    rresponding to

    uction block of

    MDx and SHAx

    rates on 32-bit

    visible by 512. A

    oncatenated to

    ck is processed

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    9/22

    Attacks on versions of MD4 with either the first or the last rounds missing we developed

    very quickly by Den Boer, Bossealaers et al [18]. Also [19] has shown how collisions for the

    full version of MD4 can be found in under a minute on typical PC.

    5.3 MD5.

    Some weaknesses that might lead to a compromised were discovered on MD4, so RSA has

    to improve it and MD5 born in 1991 (Also by Rivest). It is basically MD5 with safety-belts

    and while it is slightly slower than MD4, its more secure [17]. The algorithm consists of

    four distinct rounds, which has slightly different design from that of MD4. Message-

    digest size, as well as padding requirements, remains the same.

    Attacking MD5 is a much more involved proposition than attacking MD4 since it is farmore complicated algorithm to analyze.

    5.4 Overview of the attacks on hash functions.

    The attacks on the MDx family of hash algorithms are very similar. We can summary in

    finding a 512-bit message such that H(IV, M) = H(IV, M o ),where His the compression

    function and is fixed [4]. Also as [2] said the complexity of this attack is 2, where n =

    128 or 160. The trick is on the choice of a good and use some techniques to find a M that

    take advantage of the weaknesses of the compression function bring the complexity of the

    attack to fewer 2&$evaluations of the hash function.

    In more detail, these attacks are divided in four stages [2]:

    1. Choose ", , #'. Here stands for both xor and mod 2%$.2. Choose a differential path " , , # , where r is the number of rounds (r = 48, 64

    or 80).

    3. Find a set of sufficient conditions on the message M = ", , #', and theintermediate variables , , that guarantee that the message pair M, M =

    """, , #'

    #' follows the differential path " , , #

    .

    4. Choose a message M such that all sufficient conditions hold.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    10/22

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    11/22

    An equivalent formulation is to say that each clause should have at least one literal that is

    true under the assignment. Such a clause is the said to be satisfied. If the is no assignment

    satisfying all clauses, the CNF is said to be unsatisfiable.

    An example of what an instance of SAT looks like:

    SAT is a typical search problem. We are given an instance I (that is, some input data

    specifying the problem at hand, in this case a Boolean formula in a conjunctive normal

    form), and we are asked to find a solution S (an object that meets a particular

    specification, in this case an assignment that satisfies each clause). If no such solution

    exists, we must say so.

    5.6 Sat-Solvers Algorithms.

    A wide variety of techniques have been developed for solve SAT instances, as a result all

    of them can be classified as either complete or approximate. Complete methods

    systematically examine the entire solution (if one exists) in bounded time or otherwise

    return that the formula is unsatisfiable. In this thesis were going to focus on software

    that implements a complete method, but further we can analyze the behavior of some

    approximate methods.

    5.6.1 DPLL Algorithm.

    The original Davis-Putnam procedure was based on a resolution rule that eliminated the

    variables one-by-one and added all possible resolvents to the set of clauses; this was

    known as DP Method. Unfortunately, this procedure requires exponential spaces,

    therefore quickly was replaced the resolution rule with a splitting rule which divides the

    problem into two smaller sub problems, this was known as DPLL because of theirauthors, Davis, G, Logemman and Donald Loveland in 1962 [25].

    This is the fastest known algorithm for satisfiability testing that is no just sound, but also

    complete. In summary, DPLL is depth-first search with backtracking and unit

    propagataion.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    12/22

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    13/22

    With good data structures, we can implement unit propagation to take linear time in the

    size of the input set of clauses.

    5.6.2 Stochastic Local Search Algorithms.

    Approximate SAT algorithms have gained widespread attention because they offer acomputationally feasible approach to finding high-quality solutions to NP-hard problems

    in a scalable and efficient manner [26].

    SLS (Stochastic Local Search Algorithms) generally involve taking a candidate solution

    and performing some sort of perturbation which results in one or more new candidate

    solutions. An evaluation function is then used to determine which of the candidate

    solutions should be accepted. Also this kind of algorithms included two operations called

    intensification and diversification [27]. Intensification is a means of greedily improving

    solution quality within a small area of the search space for a local optimum, while

    diversification helps to prevent stagnation by ensuring that contain only suboptimal

    solutions. Incorporating some form of randomness has proved to an efficient

    diversification mechanism, while intensification can be achieved through a variety of

    techniques such as iterative improvement or the selection step in a genetic algorithm.

    5.7 MiniSAT.Minisat was described in the paper An Extansible SAT-solver by Niklas En and Niklas

    Sorensson from the Chalmers University of Technology in Sweden [4]. Because of the

    growing number of problems encoded into SAT, the found that modifies an existing

    solver with an understanding of the problem domain and of modern SAT-techniques, is

    was so difficult. For this reason, they developed a small, complete and efficient SAT-

    solver with the purpose to give the sufficient details about implementation enable

    researchers around the world to construct his o her own solver in a very short time, in

    order to meet the needs of a particular application area.

    The ideas behind MiniSAT are based on conflict-driven backtracking, watched literals

    and dynamic variables ordering. MiniSAT was implemented in C++. Later, well analyze

    more in depth internal algorithms of MiniSAT.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    14/22

    6. Bibliography already reviewed.[2] Dejan Jovanovic and Predrag Janicic. Logical analysis of hash functions. Pages 200215.

    Springer Verlag, 2005.

    [3] RSA Laboratories - 2.1.6 What is a hash function?.

    http://www.rsa.com/rsalabs/node.asp?id=2176.

    [4] Ilya Mironov, Lintao Zhang. Applications of SAT Solvers to cryptanalysis of hash functions.

    [6] SAT Competitions. http://www.satcompetition.org/.

    [7] RFC 1186 (rfc1186) - MD4 Message Digest Algorithm. http://www.faqs.org/rfcs/rfc1186.html.

    [8] RFC 1321 (rfc1321) - The MD5 Message-Digest Algorithm.

    http://www.faqs.org/rfcs/rfc1321.html.

    [9] RFC 3174 (rfc3174) - US Secure Hash Algorithm 1 (SHA1).

    http://www.faqs.org/rfcs/rfc3174.html.

    [11] MiniSat Page. http://minisat.se/.

    [12] Ivan Damgard. Collision fre hash functions and public key signature schemes. In David

    Chaum and Win L. Price, editors, Advances in Cryptology. Springer, 1988.

    [14] Ivan Damgard. A design principle for hash functions. In advances in cryptology. Springer,

    1990.

    [15] Brassard, Gilles. One way hash functions and DES. Advances in Cryptology. Berlin: Springer-

    Verlag, 1990. [16] El de MD4

    [18] Crypto FAQ RSA http://www.rsa.com/rsalabs/node.asp?id=2253

    6. Bibliography partially reviewed.[1] Menezes, a. et al. Handbook of Applied Cryptography. Boca Raton: CRC Press, 1997

    [5] Niklas Een and Niklas Sorensson. An extensible SAT Solver. SAT 2003.

    [10] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding collisions in the full SHA-1.

    [17] Ilya Mironov. Hash functions, theory, attacks and applications.

    [19] B. den Boer and A. Bosselaers, An attack on the last two rounds of MD4,Advances in

    Cryptology - Crypto '91, Springer-Verlag (1992), 194-203.

    [20] H. Dobbertin, Alf Swindles Ann, CryptoBytes (3) 1 (Autumn 1995).

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    15/22

    [21] Gabbay, Dov and Christopher Hogger. Handbook of Logic in Artificial Intelligence and Logic

    Programming. Oxford: Clarendon Press, 1993.

    [22] I.P. Gent and T. Walsh, "The search for Satisfaction", Internal Report, Dept. of Computer

    Science, University of Strathclyde, 1999

    [23] Algorithms for the Satisfiability (SAT) Problem: A survey. J. Gu, P. W. Purdom, J. Franco, and

    B. W. Wah, in "Satisfiability Problem: Theory and Applications", DIMACS Series in Discrete

    Mathematics and Theoretical Computer Science, American Mathematical Society, 1997, pp. 19-152.

    [24] Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani. Algorithms. McGrawHill.

    [25] Davis, M., G. Logemann, and D. Loveland (1962, July). A machine program for theorem-

    proving. Commun. ACM 5 (7), 394-397.

    [26] Holger H. Hoos and Thomas Sttzle: Stochastic local search: foundations and applications

    (2005).

    [27] Li, C.M., and Anbulagan. Heuristics based on unit propagation for satisfiability problems. In

    Proc. 15th IJCAI. 1997.

    7. Bibliography to Review.* Philip Hawkes, Michael Paddon, Gregory G. Rose: Musings on the Wang et al. MD5Collision, Cryptology ePrint Archive, Report 2004/264, 13 October 2004.

    * M.J.B. Robshaw, On Recent Results for MD2, MD4 and Md5. RSA Laboratories Bulletin, News

    and advice from RSA Laboratories. Number 4. Nomver 12, 1996.

    * Hans Dobbertin, The Status of MD5 after a Recent Attack. RSA Laboratories, CryptoBytes, The

    technical newsletter of RSA laboratories, a division of RSA Data Security, INC. Number 2, Summer

    1996.

    * Ilya Mironov. Hash functions: Theory, attacks and applications. November 14, 2005.

    * Ilya Mironov. Hash Functions: From Merkle-Damgard to Shoup. Computer Science Department,

    Stanford University.

    * Preneel Bart. Analysis and Design of Cryptographic Hash Functions. February 2003.

    * Klima Vlastimil. Finding MD5 Collisions on a Notebook PC Using Multi-message Modifications.

    March 31, 2005.

    * Propositional Logic, Class Notes for CS264A, UCLA.

    * Goldberg Evgueni and Yakov Novikov. BerkMin: a Fast and Robust Sat-Solver.

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    16/22

    * Niklas Een and Armin Biere. Effective Preprocessing in SAT through Variable and Clause

    Elimination.

    * Marques-Silva Joao et al. GRASP: A search Algorithm for a propositional satisfiability.

    * Irinas Rish and Rina Dechter. Resolution versus Search: Two strategies for SAT.

    * Fabio Massaci. Using WALK-SAT and Rel-SAT for Cryptographic Key Search.

    Appendix 1.

    MD4 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and thatwe wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple of 8,and it may be arbitrarily large. We imagine the bits of the message

    written down as follows:

    m_0 m_1 ... m_{b-1} .

    The following five steps are performed to compute the message digestof the message.

    Step 1. Append padding bits

    The message is "padded" (extended) so that its length (in bits)is congruent to 448, modulo 512. That is, the message isextended so that it is just 64 bits shy of being a multiple of512 bits long. Padding is always performed, even if the lengthof the message is already congruent to 448, modulo 512 (in

    which case 512 bits of padding are added).

    Padding is performed as follows: a single "1" bit is appendedto the message, and then enough zero bits are appended so thatthe length in bits of the padded message becomes congruent to448, modulo 512.

    Step 2. Append length

    A 64-bit representation of b (the length of the message beforethe padding bits were added) is appended to the result of theprevious step. In the unlikely event that b is greater than2^64, then only the low-order 64 bits of b are used. (These

    bits are appended as two 32-bit words and appended low-orderword first in accordance with the previous conventions.)

    At this point the resulting message (after padding with bitsand with b) has a length that is an exact multiple of 512 bits.Equivalently, this message has a length that is an exactmultiple of 16 (32-bit) words. Let M[0 ... N-1] denote thewords of the resulting message, where N is a multiple of 16.

    Step 3. Initialize MD buffer

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    17/22

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    18/22

    [B C D A 11 19][A B C D 12 3][D A B C 13 7][C D A B 14 11][B C D A 15 19]

    [Round 2]Let [A B C D i s] denote the operation

    A = (A + g(B,C,D) + X[i] + 5A827999)

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    19/22

    [D A B C 9 9][C D A B 5 11][B C D A 13 15][A B C D 3 3][D A B C 11 9][C D A B 7 11][B C D A 15 15]

    Then perform the following additions:A = A + AAB = B + BB

    C = C + CCD = D + DD

    (That is, each of the four registers is incremented bythe value it had before this block was started.)

    end /* of loop on i */

    Step 5. Output

    The message digest produced as output is A,B,C,D. That is, webegin with the low-order byte of A, and end with the high-orderbyte of D.

    Md5 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and that

    we wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple ofeight, and it may be arbitrarily large. We imagine the bits of themessage written down as follows:

    m_0 m_1 ... m_{b-1}

    The following five steps are performed to compute the message digestof the message.

    3.1 Step 1. Append Padding Bits

    The message is "padded" (extended) so that its length (in bits) iscongruent to 448, modulo 512. That is, the message is extended sothat it is just 64 bits shy of being a multiple of 512 bits long.Padding is always performed, even if the length of the message is

    already congruent to 448, modulo 512.

    Padding is performed as follows: a single "1" bit is appended to themessage, and then "0" bits are appended so that the length in bits ofthe padded message becomes congruent to 448, modulo 512. In all, atleast one bit and at most 512 bits are appended.

    3.2 Step 2. Append Length

    A 64-bit representation of b (the length of the message before the

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    20/22

    padding bits were added) is appended to the result of the previousstep. In the unlikely event that b is greater than 2^64, then onlythe low-order 64 bits of b are used. (These bits are appended as two32-bit words and appended low-order word first in accordance with theprevious conventions.)

    At this point the resulting message (after padding with bits and withb) has a length that is an exact multiple of 512 bits. Equivalently,this message has a length that is an exact multiple of 16 (32-bit)words. Let M[0 ... N-1] denote the words of the resulting message,where N is a multiple of 16.

    3.3 Step 3. Initialize MD Buffer

    A four-word buffer (A,B,C,D) is used to compute the message digest.Here each of A, B, C, D is a 32-bit register. These registers areinitialized to the following values in hexadecimal, low-order bytesfirst):

    word A: 01 23 45 67

    word B: 89 ab cd efword C: fe dc ba 98word D: 76 54 32 10

    3.4 Step 4. Process Message in 16-Word Blocks

    We first define four auxiliary functions that each take as inputthree 32-bit words and produce as output one 32-bit word.

    F(X,Y,Z) = XY v not(X) ZG(X,Y,Z) = XZ v Y not(Z)H(X,Y,Z) = X xor Y xor ZI(X,Y,Z) = Y xor (X v not(Z))

    In each bit position F acts as a conditional: if X then Y else Z.The function F could have been defined using + instead of v since XYand not(X)Z will never have 1's in the same bit position.) It isinteresting to note that if the bits of X, Y, and Z are independentand unbiased, the each bit of F(X,Y,Z) will be independent andunbiased.

    The functions G, H, and I are similar to the function F, in that theyact in "bitwise parallel" to produce their output from the bits of X,Y, and Z, in such a manner that if the corresponding bits of X, Y,and Z are independent and unbiased, then each bit of G(X,Y,Z),H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note thatthe function H is the bit-wise "xor" or "parity" function of its

    inputs.

    This step uses a 64-element table T[1 ... 64] constructed from thesine function. Let T[i] denote the i-th element of the table, whichis equal to the integer part of 4294967296 times abs(sin(i)), where iis in radians. The elements of the table are given in the appendix.

    Do the following:

    /* Process each 16-word block. */

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    21/22

    For i = 0 to N/16-1 do

    /* Copy block i into X. */For j = 0 to 15 do

    Set X[j] to M[i*16+j].end /* of loop on j */

    /* Save A as AA, B as BB, C as CC, and D as DD. */AA = ABB = B

    CC = CDD = D

    /* Round 1. *//* Let [abcd k s i] denote the operation

    a = b + ((a + F(b,c,d) + X[k] + T[i])

  • 8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)

    22/22

    end /* of loop on i */

    3.5 Step 5. Output

    The message digest produced as output is A, B, C, D. That is, webegin with the low-order byte of A, and end with the high-order byteof D.

    This completes the description of MD5. A reference implementation inC is given in the appendix.

    4. Summary

    The MD5 message-digest algorithm is simple to implement, and providesa "fingerprint" or message digest of a message of arbitrary length.It is conjectured that the difficulty of coming up with two messageshaving the same message digest is on the order of 2^64 operations,and that the difficulty of coming up with any message having a givenmessage digest is on the order of 2^128 operations. The MD5 algorithm

    has been carefully scrutinized for weaknesses. It is, however, arelatively new algorithm and further security analysis is of coursejustified, as is the case with any new proposal of this sort.