Catena - eprint.iacr.orgCatena: A Memory-Consuming Password-Scrambling Framework Christian Forler⋆, Stefan Lucks, and Jakob Wenzel Bauhaus-Universit¨at Weimar, Germany {Christian.Forler,

Catena: A Memory-Consuming Password-Scrambling

Framework

Christian Forler⋆, Stefan Lucks, and Jakob Wenzel

Bauhaus-Universitat Weimar, GermanyChristian.Forler, Stefan.Lucks, [email protected]

Abstract. It is a common wisdom that servers should store the one-way hash of their clients’passwords, rather than storing the password in the clear. In this paper we introduce a set of func-tional properties a key-derivation function (password scrambler) should have. Unfortunately,none of the existing algorithms satisfies our requirements and therefore, we introduce a noveland provably secure password scrambling framework (PSF) called Catena. Furthermore, we in-troduce two instantiations of Catena based on a memory-consuming one-way functions. Thus,Catena excellently thwarts massively parallel attacks on cheap memory-constrained hardware,such as recent graphical processing units (GPUs). Additionally, we show that Catena is alsoa good key-derivation function, since – in the random oracle model – it is indistinguishablefrom a random function. Furthermore, the memory-access pattern of both instantiations ispassword-independent and therefore, Catena provides resistance against cache-timing attacks.Moreover, Catena is the first PSF which naturally supports (1) client-independent updates(the server can increase the security parameters and update the password hash without userinteraction or knowing the password), (2) an optional server relief protocol (saving the server’sresources at the cost of the client), and (3) a variant Catena-KG for secure key derivation (tosecurely generate many cryptographic keys of arbitrary lengths such that compromising somekeys does not help to break others). We denote a password scrambler as a PSF with a certaininstantiation.

Keywords: password hashing, memory-hard, cache-timing attack

⋆ The research leading to these results received funding from the Sillicon Valley Community Foundation,under the Cisco Systems project Misuse Resistant Authenticated Encryption for Complex and Low-EndSystems (MIRACLE).

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Frequently used Password Scramblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Properties of Modern Password Scramblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Catena – A Memory-Hard Password-Scrambling Framework . . . . . . . . . . . . . . . . . . . 11

6 Security Analysis of the Catena Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7 Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.1 Catena-BRG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.2 Catena-DBG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Security Analysis of Catena-BRG and Catena-DBG . . . . . . . . . . . . . . . . . . . . . . . . 18

8.1 Resistance Against Side-Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8.2 Memory-Hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

8.3 Pseudorandomness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

9 Implementation and Benchmarking of Catena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

10 Catena for Proof of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

11 Catena in Different Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11.1 Using Catena with Multiple Number of Cores . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11.2 Application of Server Relief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

12 The Key-Derivation Function Catena-KG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

13 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

14 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

A The scrypt Password Scrambler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

A.1 Brief Analysis of ROMix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

A.2 Cache-Timing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

A.3 The Garbage-Collector Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1 Introduction

Passwords1 are user-memorizable secrets, commonly used for user authentication and cryp-tographic key derivation. Typical (user-chosen) passwords often suffer from low entropy andcan be attacked by trying out all possible password candidates in likelihood-order until theright one has been found. In some scenarios, when a password is used to open an interactivesession, the security of password-based authentication and key derivation can be enhanced bydedicated cryptographic protocols defeating “offline” password guessing, see [6] for an earlyexample. Otherwise, the best protection are cryptographic password scramblers performing“key stretching”. The basic idea of such schemes is to use an intentionally slow one-wayfunction for hashing a password. Therefore, the password processing takes some time forboth kinds of users legitimate ones and adversaries. A good password scrambler PS has tosatisfy at least the following basic conditions:

(1) Given a password pwd , computing PS(pwd) should be “fast enough” for the user.

1 In our context, “passphrases” and “personal identification numbers” (PINs) are also “passwords”.

(2) Computing PS(pwd) should be “as slow as possible”, without contradicting Condi-tion (1).

(3) Given y = PS(pwd), there must be no significantly faster way to test q password candi-dates x1, x2, . . . , xq for PS(xi) = y than by actually computing PS(xi) for each xi.

Traditionally, most password scramblers realize “as slow as possible” by iterating a cryp-tographic primitive (a block cipher or a hash function) many times, e.g., md5crypt [25] orsha512crypt [13]. However, an adversary who happens to have b computing units (“cores”)can easily try out b different passwords in parallel. With recent technological trends, such asthe availability of graphical processing units (GPUs) with hundreds of cores [38], the ques-tion of how to slow down such adversaries becomes a pressing one. Memory is expensive;so, a typical GPU or other cheap and massively-parallel hardware with lots of cores canonly have a limited amount of memory for each single core. More importantly, each core willhave only a very limited amount of fast (“cache”) memory. So the way to prevent b-coreadversaries from gaining some close-to-b-times speed-up is by making a password scram-bler PS not only intentionally slow on standard sequential computers, but also intentionallymemory-consuming. Thus, any adversary using b cores in parallel with less than about btimes the memory of a sequential implementation must experience a strong slow-down. Thefirst password scrambler that took this condition into account was scrypt [43]. To the bestof our knowledge, it is also the only one – up to now.

However, a memory-consuming password scrambler may suffer from new problems: (1)If the memory-access pattern depends on the password, and the adversary can observe thatpattern, this may open the way to another kind of shortcut attack. For example, a spyprocess, running on the same machine as the password scrambler (without access to theinternal memory of the password scrambler) may gather information about the passwordscrambler’s memory-access pattern by measuring cache-timings. This information can beused to greatly speed-up massively parallel attacks with low memory for each core. (2)Assume that the memory which is directly derived from the secret input (password) is rarelyor never overwritten. Then, a malicious garbage collector can built an efficient passwordcandidate filter by using the knowledge gained from this memory.

To analyze memory-consuming algorithms regarding to its memory-hardness, one canapply the so called “pebble game” approach, which dates back to the early days of TheoreticalComputer Science [10,23,26,31,42].

Background. As observed by Wilkes in the late 1960s [57], storing plain authenticationpasswords is insecure. About 10 years later, the UNIX system integrated some of Wilkesideas [35] by deploying the DES-based one-way encryption function crypt, to “encrypt” agiven password. Actually, there is no efficient way to recover the original password from theresult of the “encryption”, i.e., crypt is a one-way hash function, or a password scrambler,as we call it.

Since the introduction of crypt, storing the hash of a password and avoiding to store theplain password itself has become the minimum standard for secure password-based user au-thentication. But, even as late as 2012, major players like Yahoo and CSDN (China SoftwareDeveloper Network) seem to store plain user-passwords [32].

Two important innovations from crypt were key stretching and salts. Key stretching isthe answer to the typically low entropy of user-chosen passwords: The password scrambler isintentionally slow, but not too slow for the regular operation, e.g., a password-based log-in.This makes exhaustively searching through all likely passwords more expensive.

3

A salt refers to an additional random input value for the password scrambler, storedtogether with the password hash. It enables a password scrambler to derive lots of differentpassword hashes from a single password like an initialization vector enables an encryptionscheme to derive lots of different ciphertexts from a single plaintext. Since the salt must bechosen uniformly at random, it is most likely that different users have different salts. Thus,it hinders against attacks where password hashes from many different users are known tothe adversary, e.g., against the usage of rainbow tables [39].

Note that there are further ways to thwart adversaries, i.e., ways to perform key stretch-ing. One is to keep p bits of the salt secret, turning them into pepper [33]. Both adversariesand legitimate users have to try out all 2p values the pepper can have (or 2p−1 on the av-erage). Note that a careless implementation of this approach could leak a few bits of thepepper via timing information, when trying out all possible values in a specific order. Thus,a better approach would be to start at a random value and wrap around at 2p. Kelsey et al.[27] analyzed another key stretching approach where a cryptographic operation is iterated ntimes. Boyen proposed in [8] a user-defined implicit choice for n by iterating until the userpresses a “halt” button. Further, the proposed technique allows the user to forget the valueof n, too.

Contribution. In this paper we introduce the password scrambling framework (PSF)Catena (Latin for “chain”, due the sequential structure of its core function). Catena

provides innovative features like client-independent update, i.e., if the security parameter ofthe password scrambler is increased, the new hash value can be computed without requiringuser interaction. For Catena, this can for example be done by increasing the security pa-rameter garlic or by turning some bits of the salt into pepper. The notion of garlic reflectsthe property that incrementing this parameter by ’1’ doubles the memory usage and at leastdoubles the computational time. Further, Catena provides built-in support for server relief,which allows to shift the main effort for computing a password hash to the client. Note thatthe idea of server relief is not new (see [12] and [37]), but Catena is the first PSF whichsupports this technique in the first place.

We present two instantiations (1) Catena-BRG– based on a structure called (G, λ)-bit-reversal graph and (2) Catena-DBG– based on a structure called (G, λ)-double-butterflygraph. Further, we introduce the notion of λ-Memory-Hardness and discuss why Catena-

DBG has the properties of a λ−memory-hard function, whereas Catena-BRG only satisfiesthe properties of a memory-hard function. For the analysis of both instantiations we referto the work of Lengauer and Tarjan [30], and restate as well as discuss their results. Forboth instantiations it holds that Catena thwarts back massively parallelized attacks usingGPUs and similar hardware. Additionally, it has been designed to be resistant against (1)cache-timing attacks and (2) garbage-collector attacks (GCA) – a notion which we introducein this work. Resistance against GCA is naturally provided by Catena-DBG, whereasCatena-BRG only satisfies this kind of resistance for λ ≥ 2. The need to address theseissues has been inspired by revealing vulnerability of scrypt against cache-timing as well asgarbage-collector attacks. To substantiate these vulnerability of scrypt, we present a ratheracademical cache-timing attack and a GCA. To the best of our knowledge, such attacks forscrypt have not been known before.

Moreover, Catena fits very well in a proof of work scenario, and we show that it is wellsuited for usage as a password-based key-derivation function – called Catena-KG. Notethat we often refer Catena to as a password scrambler. In this situations it is meant thatthe considered property holds for both presented instantiations.

4

Outline. Section 2 briefly overviews common password scramblers and their properties.In Section 3 we present desired properties of modern password-scrambling algorithms. InSection 4 we introduce some notations and preliminaries which help to understand the paper.Section 5 provides the specification and novel properties of Catena. In Section 6 we discussthe security properties of the Catena framework. Section 7 introduces two instantiationsof Catena. A comprehensive security analysis of these instantiations is given in Section 8.The choice of the parameters and the inner hash function for an implementation of Catena

are presented in Section 9. The application of Catena for the scenario of proof of work, adiscussion about Catena in different environment, and the application of Catena as a key-derivation function are given in Section 10, Section 11, and Section 12. Section 13 concludesthe paper.

2 Frequently used Password Scramblers

Table 1 provides an overview of password scramblers that are or have been in frequent use,compared with Catena. Note that this comparison holds for both presented instantiations.It indicates the amount of memory used and the cost factor to generate a password hash,as well as if the certain algorithm supports server relief and client-independent updates.Furthermore, it points out issues from which the considered password scrambler may suffer.

Hash Function Based Password Scramblers. Not long ago, md5crypt [25] was used innearly all Free-BSD and Linux-based systems to scramble user passwords. It is based on thewell-known MD5 [46] hash function with a fixed number of 1,000 iterations. Due to the factthat CPUs and GPUs become more and more powerful, md5crypt can now be computed toofast, e.g., over 5 million times per second on a AMD HD 6990 graphic card [52]. Additionally,its own author does not consider md5crypt secure anymore [25]. Common Linux distributionsnowadays employ sha512crypt [13], e.g., Debian, Ubuntu, or Arch Linux. It provides similarfeatures as md5crypt, but uses SHA-512 [40] instead of MD5. Furthermore, the number ofiterations can be chosen by the user. NTLMv1 [20] is a fast password scrambler, which isdeployed to generate hash values for several versions of Microsoft Windows passwords. It isvery efficient to compute: one can check over nine billion password candidates per secondon a single COTS graphic card [52]. For this and other reasons, we recommend that NTLMv1should not be used anymore. The “Password-Based Key Derivation Function 2” (PBKDF2)has been specified by the National Institute of Standards and Technology (NIST) [56]. It iswidely used either as a KDF (e.g., in WPA, WPA2, OpenOffice, or WinZip) or as a passwordscrambler (e.g., in Mac OS X, LastPass). The security of PBKDF2 is based on c iterations ofHMAC-SHA-1 [29], where c is a user-chosen value, which is given by default with c = 1000.

bcrypt. The bcrypt [44] algorithm is built upon the Blowfish block cipher [50]. Internally,Blowfish uses a slow key scheduler to generate an internal state of 4,168 bytes for the key-dependent S-boxes (4 × 1, 024 bytes) and the round keys (72 bytes). Thus, while bcrypt

was not designed with the intention to thwart parallelized adversaries by exhaustive memoryusage, the state is sufficiently large to slow down bcrypt significantly on current GPUs, e.g.,it can only computed about 4,000 times per second on an AMD HD 7970 graphic card [52].However, the state size is fixed – so if future GPUs have a larger cache, it may actuallyrun much faster. There is no tunable parameter to increase the memory requirement of thispassword scrambler. For key stretching, bcrypt invokes the Blowfish key scheduler 2c times,e.g., OpenBSD uses c = 6 for users and c = 8 for the superuser.

5

Algorithm Cost Factor Memory Server Client-Indep. Issues(default) Relief Updates

crypt [35] 25 small - - “too fast”md5crypt [25] 1,000 small - - “too fast”sha512crypt [13] 1,000–999,999 (5,000) small - - (small memory)NTLMv1 [20] 1 small - - “too fast”PBKDF2 [56] 1–∞ (1,000) small - - (small memory)bcrypt [44] 24–299 (26, 28) 4,168 bytes - - (constant memory)scrypt [43] 1–∞ (214, 220) flexible, big - - cache-timing attacks

Catena-BRG (this paper) 21–∞ (217, 220) flexible, big X X performanceCatena-DBG (this paper) 21–∞ (214, 216) flexible, big X X performance

Table 1. Comparison of state-of-the-art password scramblers and Catena. Note that all of the mentionedalgorithms support salt values. The term 21 −∞ denotes that the cost factor can be chosen arbitrarily large.

scrypt. Occupying a lot of memory hinders attacks using special-purpose hardware (storageis expensive) and GPUs. We are aware of one single password scrambler that has beendesigned to occupy a lot of memory: scrypt [43]. (There was HEKS [45], but it has beenbroken by the author of scrypt.) As its core, scrypt uses the sequentially memory-hardfunction ROMix (see [43] for a formal definition of sequential memory-hardness), which cantake G units of memory and performs 2G operations. With only G/K units of memory,the number of operations goes up to 2G · K. Unfortunately, ROMix is vulnerable againstcache-timing attacks, due to its password-dependent memory-access pattern. In [43], Percivalrecommends G = 214 and G = 220 for password hashing and key derivation, respectively.

3 Properties of Modern Password Scramblers

In this section we introduce a listing of desired properties a modern password scramblershould have.

Memory-Hardness. To describe memory requirements, we adopt and slightly change thenotion from [43]. The intuition is that for any parallelized attack, using b cores, the requiredmemory per core is decreased by a factor of 1/b, and vice versa.

Definition 1 (Memory-Hard Function).Let g denote the memory cost factor. For all α > 0, a memory-hard function f can becomputed on a Random Access Machine using S(g) space and T (g) operations, where S(g) ∈Ω(T (g)1−α).

Thus, for S · T = G2 with G = 2g, using b cores, we have(

1

b· S

)

· (b · T ) = G2.

A formal generalization of this notion is given in the following.

Definition 2 (λ−Memory-Hard Function).Let g denote the memory cost factor. For a λ−memory-hard function f , which is computedon a Random Access Machine using S(g) space and T (g) operations with G = 2g, it holdsthat

T (g) = Ω

(

Gλ+1

S(g)λ

)

.

6

Thus, we have(

1

b· Sλ

)

· (b · T ) = Gλ+1.

Remark 1. Note that for a λ-memory-hard function f , the relation S(g) · T (g) is always inΩ(Gλ+1), i.e., it holds that if S decreases, T has to increase, and vice versa.

λ-Memory-Hard vs. Sequential Memory-Hard. In [43], Percival introduced the notion ofsequential memory-hardness (SMH), which is satisfied by his introduced password scramblerscrypt. Bases on this notion, an algorithm is sequential memory-hard, if an adversary hasno computational advantage in using multiple CPUs, i.e., using b cores requires b times theeffort used for one core. It is easy to see that, in the parallel computation setting, SMH isa stronger notion than that of λ-memory-hardness (λMH). Thus, SMH is a desirable goalwhen designing a memory-consuming password scrambler. In this section we discuss whyour presented password scrambler Catena satisfies at most λMH instead of SMH, withoutreferring to details of Catena, which are presented in Section 5 and 7.

Note that a further goal of our design was to provide resistance against cache-timingattacks, i.e., both instantiations of Catena should satisfy a password-independent memory-access pattern. This goal can be achieved by providing a control flow which is independentof its input. It follows that Catena can be seen as a straight-line program, which on theother hand can be represented by a directed acyclic graph (DAG, see Definition 3).

Definition 3 (Directed Acyclic Graph). Let Π(V , E) be a graph consisting of a set ofvertices V = (v0, v1, . . . , vn−1) and a set of edges E = (e0, e1, . . . , eℓ−1), where E = ∅ is avalid variant. Π(V , E) is a directed acyclic graph, if every edge in E consists of a startingvertex vi and an ending vertex vj, with i 6= j. A path through Π(V , E) beginning at vertex vimust never reach vi again (else, there would be a cycle). If there exists a path from a vertexvi to a vertex vj in the graph with i 6= j, we will write vi ≤ vj.

Usually, a DAG can be at least partially computed in parallel. Assuming that one has bprocessors to compute a graph Π(V , E), one can partition Π(V , E) into b disjunct subgraphsπ0, . . . , πb−1. Let Ri,j denote the set of crossing edges between two subgraphs πi and πj . Ifthe available shared memory units are at least equal to the order of Ri,j , one can compute πiand πj in parallel. More detailed, in the first step one computes each vertex correspondingto a crossing edge and stores them in the global shared memory. Next, both subgraphs canbe processed in parallel by accessing this memory. It follows that if the available memory is

b−1∑

i=0

b−1∑

j=0

|Ri,j |,

then, one can compute all subgraphs π0, . . . , πb−1 in parallel. Due to the structure of Catena,or more specifically, the structure of the two proposed instantiations, one can always parti-tion its corresponding DAGs into such subgraphs and hence, Catena can be at least par-tially computed in parallel, which is a contradiction to the definition of sequential memory-hardness. Thus, we introduced the notion of λMH as described above, which is a weakernotion in the parallel computing setting but a stronger notion in the single-core setting. Tothe best of our knowledge, Catena is the first password scrambler which satisfies both to

7

be memory-consuming (by satisfying λMH) and providing resistance against cache-timingattacks.

Password Recovery (Preimage Security). For a modern password scrambler it shouldhold that the advantage of an adversary (modeled as a computationally unbounded butalways-halting algorithm) for guessing a valid password should be reasonable small, i.e., nothigher than for trying out all possible candidates. Therefore, given a password scramblerPS, we define the password-recovery advantage of an adversary A as follows.

Definition 4 (Password-Recovery Advantage). Let s denote a randomly chosen saltvalue and pwd a password randomly chosen from a source Q with e bits of min-entropy. LetPS denote a password-scrambling algorithm. Then, given a hash value h with PS(s, pwd) =h, we define the password-recovery advantage of an adversary A as

AdvRECPS (A) = Pr

s,h

[

APS,s,h ⇒ x : PS(s, x)?= h

]

.

Furthermore, by AdvRECPS (q) we denote the maximum advantage taken over all adversaries

asking at most q queries to PS.

In Section 6 we show that for Catena it holds that for guessing a valid password, anadversary either has to try all possible candidates or it has to find a preimage for theunderlying hash function.

Client-Independent Update. According to Moore’s Law [34], the available resources of anadversary increase continually over time – and so do the legitimate user’s resources. Thus, asecurity parameter chosen once may be too weak after some time and needs to be updated.This can easily be done immediately after the user has entered its password the next time.However, in many cases, a significant number of user accounts are inactive or rarely used,e.g., 70.1% of all Facebook accounts experience zero updates per month [36] and 73% of allTwitter accounts do not have at least one tweet per month [47]. It is desirable to be ableto compute a new password hash (with some higher security parameter) from the old one(with the old and weaker security parameter), without having to involve user interaction, i.e.,without having to know the password. We call this feature a client-independent update of thepassword hash. When key stretching is done by iterating an operation, client-independentupdates may or may not be possible, depending on the details of the operation, e.g., whenthe original password is one of the inputs for every operation, client-independent updatesare impossible.

Server Relief. A slow and – even worse – memory-demanding password-based log-in pro-cess may be too much of a burden for many service providers. A way to overcome this prob-lem, i.e., to shift the effort from the side of the server to the side of the client, can be foundin [37] and more recent in [12]. We realized this idea by splitting the password-scramblingprocess into two parts: (1) a slow (and possibly memory-demanding) one-way function Fand (2) an efficient one-way function H. By default, the server computes the password hashh = H(F (pwd , s)) from a password pwd and a salt s. Alternatively, the server sends s to theclient who responds y = F (pwd , s). Finally, the server just computes h = H(y). While it isprobably easy to write a generic server relief protocol using any password scrambler, noneof the existing password scramblers has been designed to naturally support this property.Note that this property is optional, e.g., for the proof of work scenario the server relief ideamakes no sense, since the whole effort should be already on the side of the client.

8

Resistance against Cache-Timing Attacks. Consider the implementation of a passwordscrambler, where data is read from or written to a password-dependent address a = f(pwd).If, for another password pwd ′, we would get f(pwd ′) 6= a and the adversary could observewhether we access the data at address a or not, then it could use this information to filterout certain passwords. Under certain circumstances, timing information related to a givenmachine’s cache behavior may enable the adversary to observe which addresses have beenaccessed. Thus, we formally introduce resistance against cache-timing attacks, which werecommend to be fulfilled by upcoming password scramblers.

Definition 5 (Resistance against Cache-Timing Attacks). Let F : 0, 1∗×0, 1k →0, 1n denote a function processing arbitrary large data together with a secret value K ∈0, 1k, and outputs a fixed length value of size n bits. We call F resistant against cache-timing attacks iff its control flow does not depend on the secret input K.

Key-Derivation Function (KDF). Beyond authentication, passwords are also used toderive symmetric keys. Obviously, one can just use the output of the password scrambler asa symmetric key – perhaps after truncating it to the required key size. This is a disadvantageif one either needs a key longer than the password hash or has to derive more than one key.Thus, it is prudent to consider a KDF as a tool of its own right – with the option to derivemore than one key and with the security requirement that compromising some of the keysdoes not endanger the other ones. Note that it is required for a KDF that the input andoutput behaviour cannot be distinguished from a set of random functions. Thus, we definethe Random-Oracle Security of a password scrambler as follows:

Definition 6 (Random-Oracle Security). Let PS : 0, 1∗ → 0, 1n be password scram-bler, which gets an input of arbitrary length and produces a fixed-length output. Let Abe a fixed adversary which is allowed to ask at most q queries to an oracle. Further, let$ : 0, 1∗ → 0, 1n be a random function which, given an input of arbitrary length, al-ways returns randomly chosen values from 0, 1n. Then, the Random-Oracle Security of apassword scrambler PS is defined by

Adv$PS(A) =

∣

∣

∣Pr

[

APS ⇒ 1]

− Pr[

A$ ⇒ 1]∣

∣

∣.

Furthermore, by Adv$PS(q) we denote the maximum advantage taken over all adversaries

asking at most q queries to an oracle.

Note that the input (of arbitrary length) of PS contains the password, the salt, and someother (optional) parameters, e.g., parameters to adjust the memory-consumption or thecomputational time.

4 Preliminaries

In this section we first introduce the notions used throughout this paper (see Table 2).Next, to increase the understanding of the proofs given in [30], we describe a technique foranalyzing directed acyclic graphs, called pebble game. Then, we introduce the definition of agarbage-collector attack.

9

Identifier Description

pwd password

λ depth of F

s salt (public random value)

p pepper (secret bits of the salt)

t tweak

d domain (application specifier) of Catena

g0, g minimum garlic; current garlic with G = 2g

PS/PSF Password Scrambler/Password-Scrambling Framework

$ function returning a fixed-size random value

h, y password hash (or intermediate hash)

S(g) memory (space) consumption; depends on the garlic

T (g) time consumption; depends on the garlic

Π(V , E) graph based on V vertices and E edges

ri i-th row

vki,j j-th vertex of the i-th row of the k-th DBG

b number of cores

AO0,...,Oℓ adversary A with access to the oracles O0, . . . , Oℓ

q number of total queries A is allowed to ask

τ Bit-Reversal Permutation

σ function determining the index of the diagonal edges (DBG)

AD associated data

K secret key

|X| size of X in bits or size of a set X

Table 2. Notions used throughout this paper.

Pebble Game. Hellman presented in [22] a possibility to trade memory/space S againsttime T in attacking cryptographic algorithms, i.e., he has introduced the idea of a time-memory trade-off (TMT) in terms of generic attacks. Hence, we can assume that an adversarywith access to this algorithm and restricted resources is always looking for a sweet spot tooptimize S ·T . To analyze the effort for a given adversary, one needs to choose a certain modelfor studying the TMT. In 1970, Hewitt and Paterson introduced a method for analyzingTMTs on directed acyclic graphs (DAG, see Definition 3, Section 3) [41], called pebble game.It has been occasionally used in cryptographic context, see, e.g. [19] for a recent example.

The pebble game model is restricted to DAGs with bounded in-degree and can be seen asa single-player game. Let Π(V , E) be a DAG and let N = |V| be the number of nodes withinΠ(V , E). For our scheme, we restrict the in-degree to 2. In the setup phase of the game, theplayer gets S pebbles (tokens) with S ≤ N . A pebble can be placed on a node (mark) orbe removed from a node (unmark) under certain requirements (where v ∈ V denotes a nodewithin the set V of all nodes of Π(V , E)):

1. A pebble may be removed from a vertex v at any time.

2. A pebble can be placed on a node v if all predecessors of the node v are marked.

10

3. If all immediate predecessors of an unpebbled node v are marked, a pebble may be movedfrom a predecessor of v to v.

A move is the application of either the second or the third action stated above. Thegoal of the game is to mark (to pebble) all nodes of a graph Π(V , E) at least once. Thetime-memory trade-off (TMT) is then defined by counting the minimum number of moves(T ) and the maximum simultaneously placed pebbles on the graph (S), which are necessaryto reach the goal. Based on the following two trivial observations (see [30]), we can define alower and an upper bound for the time-memory trade-off S · T . On the one hand, any graphof size N can be pebbled with N pebbles in time N (in topological order). On the otherhand, if a graph Π(V , E) of size N can be pebbled with S pebbles at all, it can be pebbledwith S pebbles in time

T ≤∑

0≤i≤S

(

N

i

)

≤ 2N .

Therefore, the interest of T is bounded to the range N ≤ T (S) ≤ 2N . As the TMT has alwaysto be satisfied, T (S) has to increase if S decreases, and vice versa. In general, a pebble gameis a common model to derive and analyze TMTs as shown in [48,49,51,53,54].

The Garbage-Collector (GC) Attack. The basic idea of this attack is to exploit themanagement of the memory and the internal state a password-hashing algorithm. Moredetailed, the goal of an adversary is to find out a valid preimage (password) for a given hashvalue without taking the whole effort of computing the corresponding password-hashingalgorithm for each candidate (shortcut attack). Next, we formally define the term Garbage-Collector Attack.

Definition 7 (Garbage-Collector Attack). Let PS be a memory-demanding passwordscrambler depending on a memory-cost parameter g with G = 2g. Furthermore, let v0, . . . , vG−1

denote the internal state of PS after its termination. Let A be a computationally unboundedbut always halting adversary conducting a garbage-collector attack. We say that A is suc-cessful if the knowledge about v0, . . . , vG−1 reduces the runtime of A for testing a passwordcandidate x from O(PS(x)) to O(f(x)) with O(f(x)) < O(PS(x)).

5 Catena – A Memory-Hard Password-Scrambling Framework

In this section we introduce our password-scrambling framework Catena. First, we specifyCatena and explain its properties regarding to password hashing, i.e., client-independent up-date and server relief. Thereupon, we present two instantiations of Catena, called Catena-

BRG and Catena-DBG. Both instances are designed to provide a high resilience againstcache-timing attacks, and the latter naturally defends against garbage-collector attacks,whereas the former provides this kind of resistance only for λ ≥ 2.

Specification. A formal definition is shown in Algorithm 1, where the function Fλ (seeLine 3) is a placeholder for a certain instantiation. The password-dependent input of H isappended to a prefix c, which denotes the iteration counter (garlic factor). For the initialdeployment of Catena, we recommend to set the initial garlic value g0 to g to achieve thebest ratio between running time and memory usage.

A secure password scrambler must satisfy preimage security. Catena inherits the preim-age security from the underlying hash functionH. Next, we discuss the tweak and two furthernovel features of Catena.

11

Algorithm 1 Catena

Require: λ Depth, pwd Password, t Tweak s Salt, g0 Min. Garlic, g Garlic, Fλ InstanceEnsure: x Hash of the Password1: x← H(t || pwd || s)2: for c = g0, . . . , g do3: x← Fλ(c, x)4: x← H(c || x)5: end for6: return x

Tweak. The parameter t is an additional multi-byte value which is given by:

t← d || λ || |s| || H(AD),

The first byte d specifies the application (domain) of Catena. More detailed, we use d = 0for password hashing and d = 1 for key derivation. The other values of d are reservedfor future applications. The byte value λ defines together with the value g (see above) thesecurity parameters for Catena. The 32-bit value |s| denotes the total length of the saltin bits. Finally, the n-bit value H(AD) is the hash of the associated data AD, which cancontain additional information like hostname, user-ID, name of the company, or the IP ofthe host, with the goal to customize the password hashes. Note that the order of the valuesdoes not matter as long as they are fixed for a certain application.

The tweak is processed together with the secret password and the salt (see Line 1 ofAlgorithm 1). Thus, t can be seen as a weaker version of a salt increasing the additionalcomputational effort for an adversary when using different values. Furthermore, it allows todifferentiate between different applications of Catena, and can depend on all possible inputdata. Note that one can easily provide unique tweak values (per user) when including theuser-ID in the associated data.

Client-Independent Update. Its sequential structure enables Catena to provide client-independent updates. Let h ← Catenaλ(pwd, t, s, g, Fλ) be the hash of a specific passwordpwd , where t, s, g, and Fλ denote tweak, the salt, the garlic, and the instantiation, respec-tively. After increasing the security parameter from g to g′ = g + 1, we can update the hashvalue h without user interaction by computing:

h′ = H(g′ || Fλ(g′, h)).

It is easy to see that the equation h′ = Catenaλ(pwd , t, s, g′, Fλ) holds.

Server Relief. In the last iteration of the for-loop in Algorithm 1, the client has to omit thelast invocation of the hash function H (see Line 4). The current output of Catenaλ is thentransmitted to the server. Next, the server computes the password hash by applying the hashfunction H. Thus, the vast majority of the effort (memory usage and computational time)for computing the password hash is handed over to the client, freeing the server. This enablessomeone to deploy Catena even under restricted environments or when using constraineddevices – or when a single server has to handle a huge amount of authentication requests.

12

6 Security Analysis of the Catena Framework

We denote a password scrambler to be secure if it provides at least 1-memory-hardness andpreimage security. Furthermore, it should be resistant against cache-timing attacks. Catena-

DBG (see Section 7) inherits its λ-memory-hardness (see Definition 2) from F , whereasCatena-BRG provides only 1-memory-hardness, i.e., memory-hardness (see Definition 1).

Since the memory-access pattern of Catena is static and therefore, independent from thepassword, it provides resistance against cache-timing attacks. Finally, we show that Catena

is a secure password scrambler that behaves like a good random function, which is useful forusing Catena as a secure key-derivation function (KDF).

Password-Recovery Resistance. In this section we show that Catena is a good passwordscrambler, i.e., given the hash value h it is infeasible for an adversary to do better than tryingout password candidates in likelihood order to obtain the correct password.

Theorem 1 (Catena is Password-Recovery Resistant). Let m denote the min-entropyof a password source Q. Then, it holds that

AdvRECCatena,Q(q) ≤

q

2m+Advpre

H (q, t).

Proof. Note that an adversary A can always guess a (weak) password by trying out about2m password candidates. For a maximum of q queries, it holds that the success probability isgiven by q/2m. Instead of guessing 2m password candidates, an adversary can also try to finda preimage for a given hash value h. It is easy to see from Algorithm 1 that an adversary thushas to find a preimage for H in Line 4. More detailed, for a given value h with h← H(g, x),A has to find a valid value for x. The success probability for this can be upper bounded byAdvpre

H (q, t). Our claim follows by adding up the individual terms. ⊓⊔

Pseudorandomness. In the following we analyze the advantage of an adversary A indistinguishing the output of Catena from a random bitstring of the same length as theoutput of Catena. Therefore, we model the internally used hash function H : 0, 1∗ →0, 1n as a random oracle. Note that the output length m, the depth λ, and the value g0(minimum garlic) are constant values which are set once when initializing a system the firsttime.

Theorem 2 (PRF Security of Catena). Let q denote the number of queries made by anadversary and s a randomly chosen salt value. Furthermore, let H be modelled as a randomoracle and g ≥ g0 ≥ 1. Then, it holds that

AdvPRFCatenaλ

(q, t) ≤(q · g + q)2

2n+Advcoll

Fλ(g · q).

Proof. Let ai = (pwd i || ui || si || g) represent the i-th query, where pwd i denotes thepassword, ui denotes the tweak, si the salt, and g the garlic. For this proof, we impose thereasonable condition that all queries of an adversary are distinct, i.e., ai 6= aj for i 6= j.

Suppose that yj denotes the output of Fλ(g, aj) of the j-th query (cf. Algorithm 1, Line

3). Then, H(g || yj) is the output of Catenaλ(aj). In the case that y1, . . . , yq are pairwise

distinct, an adversary A cannot distinguish H(g || ·) from a random function $(·) since inthe random-oracle model, both functions return a value chosen uniformly at random from0, 1n.

13

Therefore, we have to upper bound the probability of the event yi = yj with i 6= j. Dueto the assumption that A′s queries are pairwise distinct, there must be at least one collisionfor H or F . For q queries, we have at most q(g + 1) invocations of H. Thus, we can upperbound the collision probability by

(q · g + q)2

2n.

Furthermore, we have q · g invocations of the memory-consuming function F . We can upperbound the probability of a collision by Advcoll

Fλ(g · q). Our claim follows from the union

bound. ⊓⊔

7 Instantiations

In this section we introduce two concrete instantiations of Catena: Catena-BRG andCatena-DBG.

7.1 Catena-BRG

For Catena-BRG, Fλ is implemented by the (g, λ)-Bit-Reversal Hashing (BRHgλ) algo-

rithm, which is based on the bit-reversal permutation.

Definition 8 (Bit-Reversal Permutation τ). Fix a number k ∈ G and represent i ∈ Z2k

as a binary k-bit number, (i0, i1, . . . , ik−1). The bit-reversal permutation τ : Z2k → Z2k isdefined by

τ(i0, i1, . . . , ik−1) = (ik−1, . . . , i1, i0).

The bit-reversal permutation τ defines the (G, λ)-Bit-Reversal Graph.

Definition 9 ((G, λ)-Bit-Reversal Graph). Fix a natural number g, let V denote the setof vertices, and E the set of edges within this graph. Then, a (g, λ)-bit-reversal graph Πλ

g (V , E)consists of (λ+ 1) · 2g vertices

v00, . . . , v02g−1 ∪ v

10, . . . , v

12g−1 ∪ · · · ∪ v

λ−10 , . . . , vλ−1

2g−1 ∪ vλ0 , . . . , v

λ2g−1,

and (2λ+ 1) · 2g − 1 edges as follows:

– (λ+ 1) · (2g − 1) edges vji−1 → vji for i ∈ 1, . . . , 2g − 1 and j ∈ 0, 1, . . . , λ.

– λ · 2g edges vji → vj+1τ(i) for i ∈ 0, . . . , 2g − 1 and j ∈ 0, 1, . . . , λ− 1.

– λ additional edges vj2g−1 → vj+10 where j ∈ 0, . . . , λ− 1.

For example, Figure 1 illustrates an (8,1)-BRG. Note that this graph is almost identical –except for one additional edge e = (v07, v

10) – to the bit-reversal graph presented by Lengauer

and Tarjan in [30].

Bit-Reversal Hashing. The (g, λ)-Bit-Reversal Hashing function is defined in Algorithm 2.It requires O(2g) invocations of a given hash function H for a fixed value of x. The threeinputs g, x, and λ of BRHg

λ represent the garlic g = log2(G), the value to process, and thedepth, respectively. Thus, g specifies the required units of memory. Moreover, incrementingg by one doubles the time and memory effort for computing the password hash.

14

Output

Input

v00 v01 v02 v03 v04 v05 v06 v07

v10 v11 v12 v13 v14 v15 v16 v17

Fig. 1. An (8, 1)-BRG.

Algorithm 2 (g, λ)-Bit-Reversal Hashing (BRHgλ)

Require: g Garlic, x Value to Hash, λ Depth, H Hash FunctionEnsure: x Password Hash1: v0 ← H(x)2: for i = 1, . . . , 2g − 1 do3: vi ← H(vi−1)4: end for5: for k = 1, . . . , λ do6: r0 ← H(v0 || v2g−1)7: for i = 1, . . . , 2g − 1 do8: ri ← H(ri−1 || vτ(i))9: end for10: v ← r11: end for12: return r2g−1

7.2 Catena-DBG

Note that a (G, λ)-Double-Butterfly Graph is based on a stack of λ G-superconcentrators.The following definition of a G-superconcentrator is a slightly adapted version of that intro-duced in [30].

Definition 10 (G-Superconcentrator). A directed acyclic graph Π(V , E) with a set ofvertices V and a set of edges E, a bounded indegree, G inputs, and G outputs is called aG-superconcentrator if for every k such that 1 ≤ k ≤ G and for every pair of subsets V1 ⊂ Vof k inputs and V2 ⊂ V of k outputs, there are k vertex-disjoint paths connecting the verticesin V1 to the vertices in V2.

A double-butterfly graph (DBG) is a special form of a G-superconcentrator which is definedby the graph representation of two back-to-back placed Fast Fourier Transformations [9].More detailed, it is a representation of twice the Cooley-Tukey FFT algorithm [11] omittingone row in the middle (see Figure 2 for an example where G = 8). Therefore, a DBG consistsof 2 · g rows.Based on the DBG, we define the sequential and stacked (G, λ)-double-butterfly graph. Inthe following, we denote vki,j as the j-th vertex in the i-th row of the k-th double-butterflygraph.

Definition 11 ((G, λ)-Double-Butterfly Graph). Fix a natural number g ≥ 1 and letG = 2g. Then, a (G, λ)-double-butterfly graph Π(V , E) consists of 2g ·(λ ·(2g−1)+1) vertices

– vk0,0, . . . , vk0,2g−1 ∪ . . . ∪ vk2g−2,0, . . . , v

k2g−2,2g−1 for 1 ≤ k ≤ λ and

15

Fig. 2. A Cooley-Tukey FFT graph with eight input and output vertices.

vertical sequential + connecting layerdiagonal

Fig. 3. Types of edges as we use them in our definitions.

– vλ2g−1,0, . . . , vλ2g−1,2g−1,

and λ · (2g − 1) · (3 · 2g) + 2g − 1 edges

– vertical: 2g · (λ · (2g − 1)) edges

(vki,j , vki+1,j) for 0 ≤ i ≤ 2g − 2, 0 ≤ j ≤ 2g − 1, and 1 ≤ k ≤ λ,

– diagonal: 2g · λ · g + 2g · λ · (g − 1) edges

(vki,j , vki+1,j⊕2g−1−i) for 0 ≤ i ≤ g − 1, 0 ≤ j ≤ 2g − 1, and 1 ≤ k ≤ λ.

(vki,j , vki+1,j⊕2i−(g−1)) for g ≤ i ≤ 2g − 2, 0 ≤ j ≤ 2g − 1, and 1 ≤ k ≤ λ.

– sequential: (2g − 1) · (λ · (2g − 1) + 1) edges

(vki,j , vki,j+1) for 1 ≤ i ≤ 2g − 1, 0 ≤ j ≤ 2g − 2, 1 ≤ k ≤ λ, and

(vλ2g−1,j , vλ2g−1,j+1) for 0 ≤ j ≤ 2g − 2

– connecting layer: λ · (2g − 1) edges

(vki,2g−1, vki+1,0) for 1 ≤ k ≤ λ, 0 ≤ i ≤ 2g − 2.

Figure 3 illustrates the individual types of edges we used in our definition above. Moreover,an example for G = 8 and λ = 1 can be seen in Figure 4.

16

H

H

o

i

Fig. 4. An (8, 1)-double-butterfly graph.

Algorithm 3 (G, λ)-Double-Butterfly Hashing (DBHGλ )

Require: g Garlic, x Value to Hash, λ Depth, H Hash FunctionEnsure: x Password Hash1: v0 ← H(x)2: for i = 1, . . . , 2g − 1 do3: vi ← H(vi−1)4: end for5: for k = 1, . . . , λ do6: for i = 1, . . . , 2g − 1 do7: r0 ← H(v2g−1 ⊕ v0 || vσ(g,i−1,0))8: for j = 1, . . . , 2g − 1 do9: ri ← H(ri−1 ⊕ vi || vσ(g,i−1,j))10: end for11: v ← r12: end for13: end for14: return v2g−1

Double-Butterfly Hashing. The (G, λ)-double-butterfly hashing operation is defined inAlgorithm 3. The structure is based on a (G, λ)-double-butterfly graph. Note that the func-tion σ (see Lines 7 and 9) is given by

σ(g, i, j) =

j ⊕ 2g−1−i if 0 ≤ i ≤ g − 1,

j ⊕ 2i−(g−1) otherwise.

Thus, σ determines the indices of the vertices of the diagonal edges (see Figure 3).

Since the security of Catena in terms of password hashing is based on a time-memorytradeoff, it is desired to implement it in an efficient way, making it possible to increase therequired memory. We recommend to use BLAKE2b [4] as the underlying hash function,implying a block size of 1024 bits with 512 bits of output. Thus, it can process two inputblocks within one compression function call. This is suitable for Catena-BRG since a bit-reversal graph satisfies a fixed indegree of at most 2. When considering Catena-DBG, wecannot simply concatenate the inputs to H while keeping the same performance per hashfunction call, i.e., three inputs to H require two compression function calls, which is a strong

17

slow-down in comparison to BRGgλ. Therefore, we compute H(X,Y, Z) = H(X ⊕ Y || Z)

instead of H(X,Y, Z) = H(X || Y || Z) obtaining the same performance as Catena-BRG

per hash function call. Obviously, this doubles the probability of input collision. Nevertheless,for a 512-bit hash function the advantage for an adversary is still negligible.

Based on the approach above, the number of hash function calls to compute Row ri fromRow ri−1 is the same for Catena-BRG and Catena-DBG. Moreover, for both instantia-tions it holds that the number of hash function calls is equal to the number of compressionfunction calls (when used with BLAKE2b). More detailed, the BRGg

λ requires 2g − 1+λ · 2g

calls to H and the (G, λ)-DBG requires 2g − 1 + λ · (2g − 1) · 2g calls to H. It is easy to see,that the performance of Catena-DBG in comparison to Catena-BRG is decreased by alogarithmic factor.

8 Security Analysis of Catena-BRG and Catena-DBG

In this section we discuss the security of Catena-BRG and Catena-DBG against side-channel attacks. Furthermore, we discuss the memory-hardness and pseudorandomness ofboth instantiations.

8.1 Resistance Against Side-Channel Attacks

Straightforward implementations of either Catena-BRG or Catena-DBG provide neithera password-dependent memory-access pattern nor password-dependent branches. Therefore,both instantiations are resistant against cache-timing attacks (see Definition 5).

Considering a malicious garbage collector, each of Algorithms 2 and 3 exposes the arraysv and r. Both arrays are overwritten multiple times. Therefore, Catena-DBG is resistantagainst garbage-collector attacks. It follows that any variant of Catena with some fixedλ ≥ 2 is at least as resistant to garbage-collector attacks as the same variant with λ − 1 inthe absence of a malicious garbage collector.

8.2 Memory-Hardness

The memory-hardness of an algorithm which can be represented as a DAG with boundedindegree, can be shown by “playing” the pebble game (see Section 4). Here, we restate anddiscuss the results presented by Lengauer and Tarjan in [30].

Catena-BRG. In [30], Lengauer and Tarjan have proven the lower bound of pebble move-ments for a (G, 1)-bit-reversal graph.

Theorem 3 (Lower Bound for a BRGG1 [30]). If S ≥ 2, then, pebbling the bit-reversal

graph Πg(V , E) consisting of G = 2g input nodes with S pebbles takes time

T >G2

16S.

Biryukov and Khovratovich have shown in [7] that stacking more than one bit-reversal graphonly adds some linear factor to the quadratic time-memory tradeoff. Hence, a BRGg

λ withλ > 1 does not achieve the properties of a λ-memory-hard function.

18

Catena-DBG. Likewise, the authors of [30] analyzed the time-memory tradeoff for a stackof λ G-superconcentrators. Since the double-butterfly is a special form of a G-superconcen-trators, their bound also holds for DBGG

λ .

Theorem 4 (Lower Bound for a (G, λ)-Superconcentrator [30]). Pebbling a (G, λ)-superconcentrator using S ≤ G/20 black and white pebbles requires T placements such that

T ≥ G

(

λG

64S

)λ

.

Discussion. For scenarios where a quadratic time-memory tradeoff is sufficient, we recom-mend the efficient Catena-BRG with either λ = 1 or – if garbage-collector attacks pose arelevant threat – with λ = 2. Note that the benefit of greater values for λ is very limited sincethe costs for pebbling the bit-reversal graph remain quadratic. For scenarios that require ahigher time-memory tradeoff, we highly recommend the λ-memory-hard Catena-DBG withλ = 2 or λ = 3, which is sufficient for most practical applications.

We have to point out that the computational effort for DBHGλ with reasonable values for

G, e.g., G ∈ [217, 221], may stress the patience of many users since the number of verticesand edges grows logarithmic with G. Thus, it remains an open research problem to find a(G, λ)-superconcentrator – or any other λ-memory-hard function – that can be computedmore efficiently than a DBHG

λ .

8.3 Pseudorandomness.

For proving the pseudorandomness of Catena-BRG and Catena-DBG, we refer to thedefinition Random-Oracle Security which was introduced in Section 3 (see Definition 6).Therefore, we model the internally used hash function H : 0, 1∗ → 0, 1n as a randomoracle.

Theorem 5 (Collision Security of BRHGλ ). Let q denote the number of queries made

by an adversary and s a randomly chosen salt value. Furthermore, let H be modelled as arandom oracle. Then, we have

AdvcollBRHG

λ

(q, t) ≤q2 · (λ+ 1)2

2n−2g+

q2 · (g + 1)2

2n.

Proof. Let ai = (pwdi || ui || si || gi) represent the i-th query, where pwdi denotes thepassword, ui denotes the tweak, si the salt, and gi the garlic. For this proof, we impose thereasonable condition that all queries of an adversary are distinct, i.e., ai 6= aj for i 6= j.

Suppose that yj denotes the output of BRHGλ (g, x, λ) of the j-th query (Line 4 of Algo-

rithm 1, where F λH is replaced by the function BRHG

λ shown in Algorithm 2). Then, H(g || yj)is the output of Catena(aj). In the case that y1, . . . , yq are pairwise distinct, an adversaryA cannot distinguish H(g || ·) from $(·), since both functions are modeled as random oracles,returning a value chosen uniformly at random from the set 0, 1n.

Therefore, we have to upper bound the probability of the event yi = yj with i 6= j. Dueto the assumption that A′s queries are pairwise distinct, there must be at least one collisionfor H, i.e., z 6= z′ with H(z) = H(z′). We call such a collision a bad event

19

Then, we define a new game Catena′ which maintains two initially empty list Bin and

Bout and works as follows: When calling Catena, it stores all inputs and outputs of H whichare gained from the inner calls to the BRHG

λ operation (see Line 3 of Algorithm 1) in the listBin. All inputs and outputs which are gained from the outer calls to H (see Lines 1 and 4of Algorithm 1) are stored in the set Bout. Then, it returns a random value R and finally, anadversary wins iff one of the sets Bin or Bout contains bad event. Remark, the advantage ofwinning the game Catena

′ is higher than the advantage for distinguishing Catena(·) from$(·), since there are bad events that most likely lead to a list of unique outputs.

First, we upper bound the probability that Bin contains bad event. One call to BRHgλ

consists of (λ + 1) · 2c hashing operations, where c denotes the current garlic factor. Thus,the probability for a collision is given by

((λ+ 1) · 2c)2

2n,

where n denotes the output size of H. Second, we upper bound the probability that Bout

contains bad event. Since there are at most (c + 1) outer calls to the function H, this canbe upper bounded by

(c+ 1)2

2n.

Since an adversary is allowed to ask at most q queries, the probability that either Bin orBout contains bad event, can be upper bounded by

(q · (λ+ 1) · 2c)2

2n+

(q · (c+ 1))2

2n<

q2 · (λ+ 1)2

2n−2g+

q2 · (g + 1)2

2n.

Our claim follows from the union bound. ⊓⊔

Finally, we analyze the collision resistance of DBHGλ . Again, we model the internally used

hash function H : 0, 1∗ → 0, 1n as a random oracle.

Theorem 6 (Collision Security of DBHGλ ). Let q denote the number of queries. Further-

more, let H be modelled as a random oracle for some fixed integers g, g0, λ ≥ 1 with g ≥ g0and G = 2g. Then, it holds that

AdvcollDBHG

λ

(q, t) ≤(q · λ · g)2

2n−2g−3.

Proof. From Algorithm 3 it is easy to see that a collision DBHGλ (x) = DBHG

λ (x′) for x 6= x′

implies either an input or output collision for H.For our analysis, we replace the random oracle H by H ′(x) := H(truncaten(x)) that

truncates any input to n bits before hashing. Thus, any collision in the first n bits of theinput of H in Line 7 and 9 of Algorithm 3 leads to a collision of the output of H, regardlessof the remaining inputs.

Output Collision. In this case we upper bound the collision probability for H by deducingthe total amount of invocations of H ′ per query. There are 2g invocations of H ′ in Lines 1–4of Algorithm 3. In addition, there are λ · (2g − 1) · 2g invocations in Lines 5–13 leading to atotal of λ · 2g · 2g invocations of H. Since H is modelled as a random oracle, we can upperbound the collision probability for q queries by

(q · λ · 2g · 2g)2

2n≤

(q · λ · g)2

2n−2g−2.

20

Input Collision. In this case we have to take into account that an input collision for distinctqueries a and b in Line 7 and 9 can occur:

va2g−1 ⊕ va0 = vb2g−1 ⊕ vb0 (Algorithm 3, Line 7)

orrai−1 ⊕ vai = rbi−1 ⊕ vbi (Algorithm 3, Line 9).

For each query, this can happen λ · (2g − 1) · 2g times. Note that all values vi and ri areoutputs from the random oracle H ′, except the initial value v0. Hence, we can upper boundthe collision probability for this event by

(q · λ · (2g − 1) · 2g)2

2n≤

(q · λ · g)2

2n−2g−2.

Our claim follows from the union bound. ⊓⊔

9 Implementation and Benchmarking of Catena

In this section we (1) clarify our choice of the internally used hash function and (2) presentan optimized implementation of Catena. Recommendations for the parameters of Catena

can be found in Table 3.

Parameter Description Encoding Instantiations

BRHGλ DBHG

λ

gp garlic (password hashing) 1 byte 17 15

gk garlic (key derivation) 1 byte 20 18

λ depth 1 byte 2 2

d domain 1 byte - -

s salt byte string 16 bytes 16 bytes

|s| salt length UInt32 - -

Table 3. Parameter choices for the practical usage of Catena. By UInt32 we denote a 32-bit unsigned integerwhich is always encoded in little-endian way.

Hash Function Choice. For the practical application of Catena, we were looking fora hash function with a 512-bit (64 byte) output, since it often complies with the size of acache line on common CPUs. In any case, we assume that both the output size of H and thecache-line size are powers of two, so if they are not equal, the bigger number is a multiple ofthe smaller one. Moreover, the output of H should be byte-aligned. For Catena, we decidedto use the following hash functions: SHA2-512 [40], SHA3-512, and BLAKE2b [4]. Note thatSHA3-512 is not standardized yet and thus, we refer to Keccak-512 [?] with c = 1024, wherec denotes the capacity.

The advantage of SHA2-512 is that it is well-analyzed [2,21,28], standardized, and widelyused, e.g., in sha512crypt, the common password scrambler in several Linux distribu-tions [13]. We decided on the use of SHA3-512 as an alternative, since it will soon becomea standardized hash function, it is easy to analyze, and it maintains a 1600-bit state, whichcan provide, depending on the choice of the capacity, a high security margin. Additionally,

21

we point out that SHA3-512 has a high performance in hardware, which suits very wellfor adversaries using dedicated hardware. Our decision for BLAKE2b was motivated by itshigh performance in software, which allows to use a large value for the garlic parameter,resulting in a higher memory effort than for, e.g., SHA3-512.

Note that the security of Catena does not rely on the performance of a specific hashfunction, but on the size of F , i.e., the depth λ and the width g. Thus, even in the caseof a secure but very fast cryptographic hash function, which may be counter-intuitive inthe password-scrambling scenario, one can adapt the security parameter to reach the samecomputational effort.

10 Catena for Proof of Work

The concept of proofs of work was introduced by Dwork and Naor [15] in 1992. The principledesign goal was to combat junk mail under the usage of CPU-bounded functions, i.e., thegoal was to gain control over the access to shared resources. The main idea is “to require auser to compute a moderately hard, but not intractable, function in order to gain access to theresource ” [15]. Therefore, they introduced so called CPU-bound pricing functions based oncertain mathematical problems which may be hard to solve (depending on the parameters),e.g., extracting square roots modulo a prime. Tromp recently proposed the “first triviallyverifiable, scalable, memory-hard and tmto-hard proof-of-work system” in [55].

As an advancement to CPU-bound function,Abadi et al. [1], and Dwork et al. [14] consid-ered moderately hard, memory-bound functions, since memory access speeds do not vary somuch on different machines like CPU accesses. Therefore, they may behave more equitablythan CPU-bound functions. These memory-bound function base on a large table which israndomly accesses during the execution, causing a lot of cache misses. Dwork et al. presentedin [16] a compact representation for this table by using a time-memory trade-off for its gener-ation. Dziembowski et al. [18] as well as Ateniese et al. [3] put forward the concept of proofsof space, i.e., they do not consider the number of accesses to the memory (as memory-boundfunction do) but the amount of disk space the prover has to use. In [18], the authors proposeda new scheme using “graphs with high pebbling complexity and Merkle hash-trees”.

For Catena, there exist at least two possible attempts to be used for proofs of work.We denote by C the client which has to fulfill the challenge to gain access to a serverS. Furthermore, the methods explained below work for both introduced instantiations ofCatena and let λ and F be fix.

Guessing Secret Bits (Pepper). At the beginning, S chooses fixed values for pwd , t, sand g, where s denotes a randomly chosen k-bit salt value, where p bits of s are secret,i.e., p-bit pepper with p ≤ k. Then, S computes h = Catena(pwd , t, s, g) and sends thetuple (pwd , t, s[0,k−p−1], g, h, p) to C, where s[0,k−p−1] denote the k − p least significant bitsof s (the public part). Now, C has to guess the secret bits of the salt by computing h′ =Catena(pwd , t, s′, g) about 2p times and comparing if h = h′. If so, C gains access to S.The effort of C is given by about 2p computations of Catena (and about 2p comparisonsfor h = h′). Hence, the effort of C is scalable by adapting p.

Guessing the Correct Password. In this scenario S chooses an m-bit password pwd , t, s,and g. Then, S computes h = Catena(pwd , t, s, g) and sends the tuple (t, s, g,m, h) toC. The client C then has to guess the password by computing about 2m times h′ =Catena(pwd ′, t, s, g) for different values of pwd ′, and comparing if h′ = h. If so, C gains

22

access to S. The effort of C is given by about 2m computations of Catena (and about2m comparisons for h = h′). Hence, in this case the effort of C is scalable by adapting thelength m of the password. Furthermore, S can adjust the effort of C by excluding m fromthe tuple sent to C. Then, since C does not know the length of the original password, thetime for finding pwd ’ with pwd ′ = pwd highly depends on the way C performs passwordcracking. Note that the latter may not really be suitable for the proof-of-work scenario sincea prover with experience in password cracking can access the server significantly faster thana non-expert.

11 Catena in Different Environments

In this section we briefly discuss applications and usages of Catena.

11.1 Using Catena with Multiple Number of Cores

Catena is designed to initially thwart the possibility of being parallel computed. Neverthe-less, one can make use of a multiple number of cores when some bits of the salt are keptsecret, i.e., using pepper. Assume that |p| = 2, where p denotes the pepper. If a client iscapable of using only one core, it will compute hi = Catena(pwd , t, s0,...,|s|−2 || i, g) fori = 0, . . . , 3 sequentially until one of the values hi is verified by the server. In average, theclient will compute two out of four values. On the other hand, if a client is capable of usingfour cores, it will compute hi = Catena(pwd , t, s0,...,|s|−2 || i, g) for i = 0, . . . , 3 in parallelusing one core for each possible value of i. Thus, reducing the effort for the login phase tothe time of one call to Catena, in average. To keep the login time constant, the client willalways compute hi in parallel for all possible values of i.

Now, we briefly discuss the impact on the side of an adversary A. Assume that A isprovided with a database of ℓ leaked password hashes hj for j = 1, . . . , ℓ, where each value hjwas generated using a different value sj (salt), and a randomly chosen value for pj (pepper).To rule out a given password candidate, A has to try all possible values pj for each givenpair (sj , hj). Thus, if A needed ℓ cores to test one password candidate for all hash values hjin parallel, it now needs 2|p| · ℓ cores to reach the same speedup.

11.2 Application of Server Relief

The application of server relief leads to significantly reduced effort on the side of the serverfor computing the output of Catena by splitting it into two functions F and H, where F istime- and memory-demanding and H is efficient. Obviously, the application of this techniquemakes most sense when the server has to administrate a large amount of requests in littletime, e.g., social networks. Then, each client has to compute an intermediate hash y = F (·)and the server only has to compute h = H(y) for each y, i.e., for each user.

On the other hand, e.g., if Catena is used in the proof-of-work scenario, i.e., a clienthas to proof that it took a certain amount of time and memory to compute the output ofCatena, the application of server relief does not make sense.

12 The Key-Derivation Function Catena-KG

In this section we introduce Catena-KG – a mode of operation based on Catena, whichcan be used to generate different keys of different sizes (even larger than the natural output

23

Algorithm 4 Catena-KGRequire: pwd Password, t′ Tweak, s Salt, g Garlic, ℓ Key Size, I Key IdentifierEnsure: k ℓ-Bit Key Derived from the Password1: x← Catena(pwd, t′, s, g)2: k ← ∅3: for i = 1, . . . , ⌈ℓ/|n|⌉ do4: k ← k || H(i || I || ℓ || x)5: end for6: return Truncate(k, ℓ) truncate k to the first ℓ bits

size of Catena, see Algorithm 4). To provide uniqueness of the inputs, the domain value dof the tweak is set to 1, i.e., the tweak t′ is given by

t′ ← 0x01 || λ || |s| || H(AD).

The call to Catena is followed by an output transform that takes the output x of Catena,a one-byte key identifier I, and a parameter ℓ for the key length as the input, and generateskey material of the desired output size. Catena-KG is even able to handle the generationof extra-long keys (longer than the output size of H), by applying H in Counter Mode [17].Note that longer keys do not imply improved security, in that context.

The key identifier I is supposed to be used when different keys are generated from thesame password. For example, when Alice and Bob set up a secure connection, they mayneed four keys: an encryption and a message authentication key for messages from Alice toBob, and another two keys for the opposite direction. One could argue that I should alsobecome a part of the associated data. But actually, this would be a bad move, since settingup the connection would require legitimate users to run Catena several times. However, theadversary can search for the password for one key, and just derive the other keys, once thatpassword has been found. For a given budget for key derivation, one should rather employone single call to Catena with larger security parameters and then run the output transformfor each key.

In contrast to the password hashing scenario where a user want to log-in without no-ticeable delay, users may tolerate a delay of several seconds to derive an encryption keyfrom a password process [56], e.g., when setting up a secure connection, or when mountinga cryptographic file system. Thus, for Catena-KG, we recommend to use g = 20 (wheninstantiated with a BRGG

λ ) and g = 18 (when instantiated with a DBGGλ ).

Security Analysis. It is easy to see that Catena-KG inherits its memory-hardness fromCatena (see Section 8, Theorems 3 and 4) since it invokes Catena (Line 1 of Algorithm 4).Next, we show that Catena-KG a good pseudorandom function (PRF) in the random oraclemodel.

Theorem 7. In the random oracle model we have

AdvPRFCatena-KG(q, t) =

∣

∣

∣Pr[ACatena-KG ⇒ 1]− Pr[A$ ⇒ 1]

∣

∣

∣≤

(q · g + q)2

2n+Advcoll

F (q · g).

Proof. Suppose that H is modeled as random oracle. For the sake of simplification, we omitthe truncation step and let the adversary always get access to the untruncated key k. Supposexi denotes the output of Catena of the i-th query. In the case xi 6= xj for all values with1 ≤ i < j ≤ q, the output k is always a random value, since H is always invoked with

24

a fresh input (see Line 4, Algorithm 4). The only chance for an adversary to distinguishCatena-KG(·) from the random function $(·) is a collision in Catena. The probability forthis event can be upper bounded by similar arguments as in the proof of Theorem 2. ⊓⊔

13 Conclusion and Outlook

In this paper we have presented Catena, a novel password-scrambling framework which isbased on the memory-hard F function, where we presented two possible instantiations forF : (1) Catena-BRG– providing memory-hardness and (2) Catena-DBG– providing λ-memory-hardness. Catena is the first algorithm which naturally supports client-independentupdates and server relief. It consists of two security parameters λ (depth) and g (garlic),where λ reflects the memory-hardness (for Catena-DBG) and g the memory consumption.Note that Catena-BRG provides memory-hardness independent of λ.

Further, we have shown that Catena is provably secure in the random oracle model. Themotivation for the overall design is to thwart GPU-based attacks and to provide resistanceagainst cache-timing as well as garbage-collector attacks. Thereby, we were inspired by thediscovery of cache-timing attacks on scrypt. Nevertheless, the application of Catena is notrestricted to the generation of password hashes. It also fits very well for the usage as a key-derivation function, since we have shown that it behaves like a pseudorandom permutationin the random oracle model. Additionally, Catena is well-suitable for the usage in proofs ofwork.

Finally, based on its high flexibility and security properties, Catena (and its instantia-tions Catena-BRG and Catena-DBG) seems to be the reasonable choice for current andfuture applications in comparison with existing algorithms (see Section 2). Also, we expectthat future password scrambler designs will borrow the two novel features of Catena, i.e.,client-independent update and server relief.

14 Acknowledgement

Thanks to Bill Cox, Eik List, Colin Percival, Stephen Touset, Steve Thomas, AlexanderPeslyak, as well as the reviewers of the FSE’14 and ASIACRYPT’14 for their helpful hintsand comments, as well as fruitful discussions. Finally, we thank Alexander Biryukov andDimitry Khovratovich for pointing out a flaw in our proof of the time-memory tradeoff forthe BRHg

λ operation by providing a tradeoff cryptanalysis [7].

References

1. Martin Abadi, Michael Burrows, and Ted Wobber. Moderately Hard, Memory-Bound Functions. InNDSS, 2003.

2. Kazumaro Aoki, Jian Guo, Krystian Matusiewicz, Yu Sasaki, and Lei Wang. Preimages for Step-ReducedSHA-2. In Advances in Cryptology - ASIACRYPT 2009, 15th International Conference on the Theory andApplication of Cryptology and Information Security, Tokyo, Japan, December 6-10, 2009. Proceedings,pages 578–597, 2009.

3. Giuseppe Ateniese, Ilario Bonacina, Antonio Faonio, and Nicola Galesi. Proofs of Space: When Space isof the Essence. IACR Cryptology ePrint Archive, 2013:805, 2013.

4. Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O’Hearn, and Christian Winnerlein. BLAKE2:Simpler, Smaller, Fast as MD5. In Michael J. Jacobson Jr., Michael E. Locasto, Payman Mohassel, andReihaneh Safavi-Naini, editors, ACNS, volume 7954 of LNCS, pages 119–135. Springer, 2013.

5. Anne Barsuhn. Cache-Timing Attack on scrypt. Bauhaus-Universitat Weimar, Bachelor Dissertation,December 2013.

25

6. S.M. Bellovin and M. Merrit. Encrypted Key Exchange: Password-Based Protocols Secure Against Dic-tionary Attacks. Proceedings of the I.E.E.E. Symposium on Research in Security and Privacy (Oakland),1992.

7. Alex Biryukov and Dmitry Khovratovich. Tradeoff cryptanalysis of Catena. PHC mailing list:[email protected].

8. Xavier Boyen. Halting Password Puzzles – Hard-to-break Encryption from Human-memorable Keys. In16th USENIX Security Symposium—SECURITY 2007, pages 119–134. Berkeley: The USENIX Associa-tion, 2007. Available at http://www.cs.stanford.edu/~xb/security07/.

9. William F. Bradley. Superconcentration on a Pair of Butterflies. CoRR, abs/1401.7263, 2014.10. Stephen A. Cook. An Observation on Time-Storage Trade Off. J. Comput. Syst. Sci., 9(3):308–316, 1974.11. J. W. Cooley and J. W. Tukey. An Algorithm for the Machine Calculation of Complex Fourier Series.

Math. Comput., 19:297–301, 1965.12. Solar Designer. Enhanced challenge/response authentication algorithms.

http://openwall.info/wiki/people/solar/algorithms/challenge-response-authentication.Accessed January 22, 2014.

13. Ulrich Drepper. Unix crypt using SHA-256 and SHA-512. http://www.akkadia.org/drepper/SHA-crypt.txt.Accessed May 16, 2013.

14. Cynthia Dwork, Andrew Goldberg, and Moni Naor. On Memory-Bound Functions for Fighting Spam.In CRYPTO, pages 426–444, 2003.

15. Cynthia Dwork and Moni Naor. Pricing via Processing or Combatting Junk Mail. In CRYPTO, pages139–147, 1992.

16. Cynthia Dwork, Moni Naor, and Hoeteck Wee. Pebbling and Proofs of Work. In CRYPTO, pages 37–54,2005.

17. Morris Dworkin. Special Publication 800-38A: Recommendation for Block Cipher Modes of Operation.National Institute of Standards, U.S. Department of Commerce, December 2001.

18. Stefan Dziembowski, Sebastian Faust, Vladimir Kolmogorov, and Krzysztof Pietrzak. Proofs of Space.IACR Cryptology ePrint Archive, 2013:796, 2013.

19. Stefan Dziembowski, Tomasz Kazana, and Daniel Wichs. Key-Evolution Schemes Resilient to Space-Bounded Leakage. In CRYPTO, pages 335–353, 2011.

20. Eric Glass. The NTLM Authentication Protocol and Security Support Provider.http://davenport.sourceforge.net/ntlm.html. Accessed May 16, 2013.

21. Jian Guo, San Ling, Christian Rechberger, and Huaxiong Wang. Advanced Meet-in-the-Middle PreimageAttacks: First Results on full Tiger, and improved Results on MD4 and SHA-2. ASIACRYPT’10, volume6477 of Lecture Notes in Computer Science, 2010.

22. Martin E. Hellman. A Cryptanalytic Time-Memory Trade-Off. IEEE Transactions on InformationTheory, 26(4):401–406, 1980.

23. John Hopcroft, Wolfgang Paul, and Leslie Valiant. On Time Versus Space. J. ACM, 24(2):332–337, April1977.

24. B. Kaliski. RFC 2898 - PKCS #5: Password-Based Cryptography Specification Version 2.0. Technicalreport, IETF, 2000.

25. Poul-Henning Kamp. The history of md5crypt. http://phk.freebsd.dk/sagas/md5crypt.html. Ac-cessed May 16, 2013.

26. Takumi Kasai, Akeo Adachi, and Shigeki Iwata. Classes of Pebble Games and Complete Problems. InACM Annual Conference (2), pages 914–918, 1978.

27. John Kelsey, Bruce Schneier, Chris Hall, and David Wagner. Secure Applications of Low-Entropy Keys.In Eiji Okamoto, George I. Davida, and Masahiro Mambo, editors, ISW, volume 1396 of LNCS, pages121–134. Springer, 1997.

28. Dmitry Khovratovich, Christian Rechberger, and Alexandra Savelieva. Bicliques for Preimages: Attackson Skein-512 and the SHA-2 Family. In FSE, pages 244–263, 2012.

29. H. Krawczyk, M. Bellare, and R. Canetti. Hmac: Keyed-hashing for message authentication. RFC 2104(Informational), February 1997. Updated by RFC 6151.

30. Thomas Lengauer and Robert Endre Tarjan. Asymptotically Tight Bounds on Time-Space Trade-offs ina Pebble Game. J. ACM, 29(4):1087–1130, 1982.

31. Andrzej Lingas. A PSPACE Complete Problem Related to a Pebble Game. In ICALP, pages 300–321,1978.

32. T. Alexander Lystad. Leaked Password Lists and Dictionaries - The Password Project.http://thepasswordproject.com/leaked_password_lists_and_dictionaries. Accessed May 16, 2013.

33. Udi Manber. A Simple Scheme to Make Passwords Based on One-Way Functions Much Harder to Crack.Computers & Security, 15(2):171–176, 1996.

26

http://www.cs.stanford.edu/~xb/security07/

http://openwall.info/wiki/people/solar/algorithms/challenge-response-authentication

http://www.akkadia.org/drepper/SHA-crypt.txt

http://davenport.sourceforge.net/ntlm.html

http://phk.freebsd.dk/sagas/md5crypt.html

http://thepasswordproject.com/leaked_password_lists_and_dictionaries

34. Gordon E. Moore. Cramming more Components onto Integrated Circuits. Electronics, 38(8), April 1965.35. Robert Morris and Ken Thompson. Password Security - A Case History. Commun. ACM, 22(11):594–597,

1979.36. Krishna Neelamraju. Facebook Pages: Usage Patterns | Recommend.ly.

http://blog.recommend.ly/facebook-pages-usage-patterns/. Accessed May 16, 2013.37. C. Newman, A. Menon-Sen, A. Melnikov, and N. Williams. Salted challenge response authentication

mechanism (scram) sasl and gss-api mechanisms. RFC 5802 (Proposed Standard), July 2010.38. Nvidia. Nvidia GeForce GTX 680 - Technology Overview, 2012.39. Philippe Oechslin. Making a Faster Cryptanalytic Time-Memory Trade-Off. Advances in Cryptology-

CRYPTO 2003, 3:617–630, 2003.40. NIST National Institute of Standards and Technology. FIPS 180-2: Secure Hash Standard. April 1995.

See http://csrc.nist.gov.41. Michael S. Paterson and Carl E. Hewitt. Comparative Schematology. In Jack B. Dennis, editor, Record

of the Project MAC Conference on Concurrent Systems and Parallel Computation, chapter Computationschemata, pages 119–127. ACM, New York, NY, USA, 1970.

42. Wolfgang J. Paul and Robert Endre Tarjan. Time-Space Trade-Offs in a Pebble Game. In ICALP, pages365–369, 1977.

43. Colin Percival. Stronger Key Derivation via Sequential Memory-Hard Functions. presented at BSD-Can’09, May 2009, 2009.

44. Niels Provos and David Mazieres. A Future-Adaptable Password Scheme. In USENIX Annual TechnicalConference, FREENIX Track, pages 81–91. USENIX, 1999.

45. A.G. Reinhold. HEKS: A Family of Key Stretching Algorithms, 1999.46. Ronald L. Rivest. RFC 1321: The MD5 Message-Digest Algorithm. Internet Activities Board, April 1992.47. Semiocast SAS. Brazil becomes 2nd country on Twitter, Japan 3rd Netherlands most active country.

http://goo.gl/QOeaB. Accessed May 16, 2013.48. J. Savage and S. Swamy. Space-time trade-offs on the FFT algorithm. Information Theory, IEEE

Transactions on, 24(5):563 – 568, sep 1978.49. John E. Savage and Sowmitri Swamy. Space-Time Tradeoffs for Oblivious Interger Multiplications. In

ICALP, pages 498–504, 1979.50. Bruce Schneier. Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish). In FSE,

pages 191–204, 1993.51. Ravi Sethi. Complete Register Allocation Problems. SIAM J. Comput., 4(3):226–248, 1975.52. Jens Steube. oclHashcat-plus - Advanced Password Recovery. http://hashcat.net/oclhashcat-plus/.

Accessed May 16, 2013.53. Sowmitri Swamy and John E. Savage. Space-Time Tradeoffs for Linear Recursion. In POPL, pages

135–142, 1979.54. Martin Tompa. Time-Space Tradeoffs for Computing Functions, Using Connectivity Properties of their

Circuits. In STOC, pages 196–204, 1978.55. John Tromp. Cuckoo Cycle; a memory-hard proof-of-work system. Cryptology ePrint Archive, Report

2014/059, 2014. http://eprint.iacr.org/.56. Meltem Sonmez Turan, Elaine B. Barker, William E. Burr, and Lidong Chen. SP 800-132. Recom-

mendation for Password-Based Key Derivation: Part 1: Storage Applications. Technical report, NIST,Gaithersburg, MD, United States, 2010.

57. M.V. Wilkes. Time-Sharing Computer Systems. MacDonald computer monographs. American ElsevierPublishing Company, 1968.

A The scrypt Password Scrambler

Algorithm 5 describes the scrypt password scrambler and its core operation ROMix. For pre-and post-processing, scrypt invokes the one-way function PBKDF2 [24] to support inputs andoutputs of arbitrary length. ROMix uses a hash function H with n output bits, where n is thesize of a cache line (at current machines usually 64 bytes). To support hash functions withsmaller output sizes, [43] proposes to instantiate H by a function called BlockMix, which wewill not elaborate on. For our security analysis of ROMix, we model H as a random oracle.

ROMix takes two inputs: an initial state x, which depends on both salt and password, andthe array size G that defines the required storage. One can interpret log2(G) as the garlic

27

http://blog.recommend.ly/facebook-pages-usage-patterns/

http://goo.gl/QOeaB

http://hashcat.net/oclhashcat-plus/

http://eprint.iacr.org/

Algorithm 5 The scrypt algorithm and its core operation ROMix [43].scrypt

Require:pwd Passwords SaltG Cost Parameter

Ensure: x Password Hash10: x← PBKDF2(pwd , s, 1, 1)11: x← ROMix(x,G)12: x← PBKDF2(pwd , x, 1, 1)13: return x

ROMix

Require: x Initial State , G Cost ParameterEnsure: x Hash value20: for i = 0, . . . , G− 1 do21: vi ← x22: x← H(x)23: end for24: for i = 0, . . . , G− 1 do25: j ← x mod G26: x← H(x⊕ vj)27: end for28: return x

Algorithm 6 The algorithm ROMixMC, performing ROMix with K/G storage.Require:

x Initial State,G 1st Cost Parameter,K 2nd Cost Parameter

Ensure: x Hash Value1: for i = 0, . . . , G− 1 do2: if i mod K = 0 then3: vi ← x4: end if5: x← H(x)6: end for

7: for i = 0, . . . , G− 1 do8: j ← x mod G9: ℓ← K(j/K) “/” is the integer division 10: y ← vℓ11: for m = ℓ+ 1, . . . , j do12: y ← H(y) invariant: y ← vm 13: end for14: x← H(x⊕ y)15: end for16: return x

factor of scrypt. In the first phase (lines 20–23), ROMix initializes an array v. More detailed,the array variables v0, v1 . . . , vG−1 are set to x,H(x), . . . , H(. . . (H(x))), respectively. In thesecond phase (lines 24–27), ROMix updates x depending on vj . The sequential memory-hardness results from the way how the index j is computed, depending on the current valueof x, i.e., j ← x mod G. After G updates, the final value of x is returned and undergoes thepost-processing.

A minor issue is that scrypt uses the password pwd as one of the inputs for post-processing. Thus, it has to stay in storage during the entire password-scrambling process.This becomes a risk if there is any chance that the memory can be compromised during thetime scrypt is running. Compromising the memory should not happen, anyway, but thisissue could easily be fixed without any bad effect on the security of scrypt, e.g., one couldreplace Line 12 of Algorithm 5 by x← PBKDF2(x, s, 1, 1).

A.1 Brief Analysis of ROMix

The following part deals with a variant of ROMix which can be computed with less thanG units of memory. Suppose we only have S < G units of storage for the values in v.For convenience, we assume G is a multiple of S and set K ← G/S. As it will turn out,the memory-constrained algorithm ROMixMC (see Algorithm 6) generates the same result asROMix with less than G storage places and is Θ(K) times slower than ROMix. From the arrayv, we will only store the values v0, vK , v2k, . . . , v(S−1)K – using all the S memory unitsavailable.

At Line 9, the variable ℓ is assigned the biggest multiple of K less or equal j. By verifyingthe invariant at Line 12, it is clearly recognizable that ROMixMC computes the same hash

28

value as the original ROMix, except that vj is computed on-the-fly, beginning with vℓ. Thesecomputations call the random oracle on the average (K−1)/2 times. Thus, the second phaseof ROMixMC is about Θ(K) times slower than the second phase of ROMix, which dominatesthe workload for ROMixMC.

Next, we briefly discuss why ROMix is sequentially memory-hard (for the full proof see[43]). The intuition is as follows. The indices j are determined by the output of the randomoracle H and thus, essentially, uniformly distributed random values over 0, . . . , G−1. Withno way to anticipate the next j, the most suitable strategy is to minimize the size of the“gaps”, i.e., the number of consecutively unknown vj . This is indeed what ROMixMC does, bystoring one vi every K-th step.

A.2 Cache-Timing Attacks

Algorithm 5 (scrypt/ROMix) revisited. What could possibly go wrong?

The Spy Process. As it turns out, the idea to compute a “random” index j and then askfor the value vj , which is immensely useful for sequential memory-hardness, is also an issue.Consider a spy process, running on the same machine as scrypt. This spy process cannotread the internal memory of scrypt, but, as it is running on the same machine, it shares itscache memory with ROMix. The spy process interrupts the execution of ROMix twice:

1. When ROMix enters the second phase (Line 24 of Algorithm 5), the spy process readsfrom a bunch of addresses, to force out all the vi that are still in the cache. Thereupon,ROMix is allowed to run for another very short time.

2. Now, the spy process interrupts ROMix again. By measuring access times when readingfrom different addresses, the spy process can figure out which of the vi has been read byROMix, in between.

So, the spy process can tell us the indices j for which vj has been read, and with thisinformation, we can mount the following cache-timing attack.

Preliminary Cache-Timing Attack. Let x be the output of PBKDF2(pwd, s, 1, 1), wherepwd denotes the current password candidate and s the salt. Then, we can apply the followingpassword candidate sieve.

1. Run the first phase of ROMix, without storing the vi (i.e., skip Line 21 of Algorithm 5).2. Compute the index j ← x mod G.3. If vj is one of the values that have been read by ROMix, then store pwd in a list.4. Else, conclude that pwd is a wrong password.

This sieve can run in parallel on any number of cores, where each core tests another passwordcandidate pwd. Note that each core needs only a small and constant amount of memory – thedata structure to decide if j is one of the indices being read with vj , can be shared betweenall the cores. Thus, we can use exactly the kind of hardware, that scrypt has been designedto hinder.

Next, we discuss the gain of this attack. Let r denote the number of iterations the loopin lines 24–27 of ROMix has performed, before the second interrupt by the spy process. So,there are at most r indices j with vj being read. That means, we expect this approach tosort out all but r/G candidates. If our spy process manages to interrupt very soon, afterallowing it to run again, we have r ≪ G. This may enable us to use conventional hardwareto run full ROMix to search for the correct password among the candidates on the list.

29

Final Cache-Timing Attack. In this attack we allow the second interrupt to arrive verylate – maybe even as late as the termination time of ROMix. So, the loop in lines 24–27 ofROMix has been run r = G times. As it seems, each vi has been read once. But actually,this is only true on average; some vi have been read more then once, and we expect about(1/e)G ≈ 0.37G array elements vi not to have been read at all. So applying the basic attackallows us to eliminate about 37% of all password candidates – a rather small gain for suchhard work.

In the following we introduce a way to push the attack further, inspired by Algorithm 6,the memory-constrained ROMixMC. Our final cache-timing attack on scrypt only needs thesmallest possible amount of memory: S = 1,K = G/S = G, and thus, we only have tostore the single value v0. Like the second phase of ROMixMC, we will compute the values vjon-the-fly when needed. Unlike ROMixMC, we will stop execution whenever one of our valuesj is such that vj has not been read by ROMix (according to the info from our spy process).

Thus, if the first j has not been read, we immediately stop the execution without any on-the-fly computation; if the first j has been read, but not the second, we need one on-the-flycomputation of vj , and so forth.

Since a fraction (i.e., 1/e) of all values vi has not been read, we will need about 1/(1 −1/e) ≈ 1.58 on-the-fly computations of some vj , each at the average price of (G− 1)/2 timescalling H. Additionally, each iteration needs one call to H for computing x ← H(x ⊕ vj).With regards to the effort for the first phase, with G calls to H, the expected number ofcalls to reject a wrong password is about

G+ 1.58 ∗

(

1 +G− 1

2

)

≈ 1.79G.

As it turns out, rejecting a wrong password with constant memory is faster then computingordinary ROMix with all the required storage, which actually makes 2G calls to H, withoutcomputing any vi on-the-fly. We stress that the ability to abort the computation, thanks tothe information gathered by the spy process, is crucial. Meanwhile, we are working on animplementation to verify this attack.

A.3 The Garbage-Collector Attack

Memory-demanding password scramblers such as ROMix defend against a massively-parallelpassword-cracking approach, since the required memory is proportional to the number ofpasswords scrambled in parallel.

But, memory-demanding password scrambling may also open the gates for new attack op-portunities for the adversary. If we allocate a huge block of memory for password scrambling,holding v0, v1, . . . , vG−1, this memory becomes “garbage” after the password scrambler hasterminated, and will be collected for reuse, eventually. One usually assumes that the adver-sary learns the hash of the secret. The garbage-collector attack assumes that the adversaryadditionally learns the memory content, i.e., the values vi, after termination of the passwordscrambler.

For ROMix, the value v0 = H(x) is a plain hash of the original secret x. Hence, a maliciousgarbage collector can completely bypass ROMix and search directly for x with H(x) = v0,implying that each password candidate can be checked in time and memory complexity ofO(1). Furthermore, if the adversary fails to learn v0, but any of the other values vi = H(vi−1),the computational effort grows to O(i), but the memory complexity is still O(1).

30

Thus, ROMix does not provide much defense against garbage-collector attacks. A possiblecountermeasure would be to overwrite v0, . . . , vG−1 after running ROMix. But, this step mightbe removed by a compiler due to optimization, since it is algorithmically ineffective.

A.4 Discussion

Currently, the attacks above are of theoretical nature. The garbage-collector attack requiresthe adversary to be able to read the memory occupied by ROMix, after its usage. Whereasthe cache-timing attack requires to (1) run a spy process on the machine ROMix is running,(2) interrupt ROMix twice at the right points of time, and (3) precisely measure the timingsof memory reads. Moreover, other processes running on the same machine can add a hugeamount of noise to the cache timings. It is not clear if a “real” server can ever be attackedthat way.

However, in an idealized “laboratory” setting, the applicability of cache-timing attacksagainst ROMix has been demonstrated [5]. The idealized conditions included execution rightson the system.

Remark 2. Even without knowing the password hash at all,

1. the adversary is able to find out when the password has been changed,2. and the adversary can mount a password-guessing attack,

just from knowing the memory-access pattern.

Note that severe security issues can be caused by the second point. Consider any offlineattack on the password. When passwords are hashed using an old-style password hash func-tion, e.g., md5crypt [25], the adversary first needs to read the file containing the passwordhash. Without the password hash, mounting an offline attack is not possible. Even plain-text passwords are safe from offline adversaries which are not capable of reading the filecontaining the plaintext passwords. But, using the seemingly strong password hash functionscrypt may enable offline password cracking, even when the adversary fails to ever learn thepassword hash.

31

Documents

Catena - eprint.iacr.orgCatena: A Memory-Consuming Password-Scrambling Framework Christian Forler⋆, Stefan Lucks, and Jakob Wenzel Bauhaus-Universit¨at Weimar, Germany {Christian.Forler,