16
I Forgot Your Password: Randomness Attacks Against PHP Applications * George Argyros Dept. of Informatics & Telecom., University of Athens, [email protected] Aggelos Kiayias Dept. of Informatics & Telecom., University of Athens, [email protected] & Computer Science and Engineering, University of Connecticut, Storrs, USA. Abstract We provide a number of practical techniques and algorithms for exploiting randomness vulnerabilities in PHP applications.We focus on the predictability of password reset tokens and demonstrate how an attacker can take over user accounts in a web application via predicting or algorithmically derandomizing the PHP core randomness generators. While our techniques are designed for the PHP language, the principles behind our techniques and our algorithms are independent of PHP and can readily apply to any system that utilizes weak randomness generators or low entropy sources. Our results include: algorithms that reduce the entropy of time variables, identifying and exploiting vulnera- bilities of the PHP system that enable the recovery or reconstruction of PRNG seeds, an experimental analy- sis of the H˚ astad-Shamir framework for breaking trun- cated linear variables, an optimized online Gaussian solver for large sparse linear systems, and an algorithm for recovering the state of the Mersenne twister gen- erator from any level of truncation. We demonstrate the gravity of our attacks via a number of case studies. Specifically, we show that a number of current widely used web applications can be broken using our tech- niques including Mediawiki, Joomla, Gallery, osCom- merce and others. 1 Introduction Modern web applications employ a number of ways for generating randomness, a feature which is critical for their security. From session identifiers and pass- word reset tokens, to random filenames and password salts, almost every web application is relying on the unpredictability of these values for ensuring secure op- eration. However, usually programmers fail to under- stand the importance of using cryptographically secure pseudorandom number generators (PRNG) something that opens the potential for attacks. Even worse, the same trend holds for whole programming languages; * Research partly supported by ERC Project CODAMODA. PHP for example lacks a built-in cryptographically se- cure PRNG in its core and until recently, version 5.3, it tottaly lacked a cryptographically secure randomness generation function. This left PHP programmers with two options: They will either implement their own PRNG from scratch or they will employ whatever functions are offered by the API in a “homebrew” and ad-hoc fashion. In ad- dition, backwards compatibility and other issues (cf. section 2), often push the developers away even from the newly added randomness functions, making their use very limited. As we will demonstrate and heavily exploit in this work, this approach does not produce secure web applications. Observe that using a low entropy source or a crypto- graphically weak PRNG to produce randomness does not necessarily imply that an attack is feasible against a system. Indeed, so far there have been a very limited number of published attacks based on the insecure us- age of PRNG functions in PHP, while popular exploit databases 1 contain nearly zero exploits for such vul- nerabilities (and this may partially explain the delay in the PHP community adopting secure randomness gen- eration functions). Showing that such attacks are in fact very practical is the objective of our work. In this paper we develop generic techniques and al- gorithms to exploit randomness vulnerabilities in PHP applications. We describe implementation issues that allow one to either predict or completely recover the initial seed of the PRNGs used in most web applica- tions. We also give algorithms for recovering the in- ternal state of the PRNGs used by the PHP system, in- cluding the Mersenne twister generator and the glibc LFSR based generator, even when their output is trun- cated. These algorithms could be used in order to attack hardened PHP installations even when strong seeding is employed, as it is done by the Suhosin ex- tension for PHP and they may be of independent inter- est. We also conducted an extensive audit of several pop- ular PHP applications. We focused on the security of password reset implementations. Using our attack 1 e.g. http://www.exploit-db.com 1

I Forgot Your Password: Randomness Attacks Against PHP

  • Upload
    lynhi

  • View
    224

  • Download
    5

Embed Size (px)

Citation preview

Page 1: I Forgot Your Password: Randomness Attacks Against PHP

I Forgot Your Password: Randomness Attacks Against PHP Applications∗

George ArgyrosDept. of Informatics & Telecom.,

University of Athens,[email protected]

Aggelos KiayiasDept. of Informatics & Telecom.,

University of Athens,[email protected]

& Computer Science and Engineering,University of Connecticut, Storrs, USA.

Abstract

We provide a number of practical techniques andalgorithms for exploiting randomness vulnerabilitiesin PHP applications.We focus on the predictability ofpassword reset tokens and demonstrate how an attackercan take over user accounts in a web application viapredicting or algorithmically derandomizing the PHPcore randomness generators. While our techniques aredesigned for the PHP language, the principles behindour techniques and our algorithms are independent ofPHP and can readily apply to any system that utilizesweak randomness generators or low entropy sources.Our results include: algorithms that reduce the entropyof time variables, identifying and exploiting vulnera-bilities of the PHP system that enable the recovery orreconstruction of PRNG seeds, an experimental analy-sis of the Hastad-Shamir framework for breaking trun-cated linear variables, an optimized online Gaussiansolver for large sparse linear systems, and an algorithmfor recovering the state of the Mersenne twister gen-erator from any level of truncation. We demonstratethe gravity of our attacks via a number of case studies.Specifically, we show that a number of current widelyused web applications can be broken using our tech-niques including Mediawiki, Joomla, Gallery, osCom-merce and others.

1 Introduction

Modern web applications employ a number of waysfor generating randomness, a feature which is criticalfor their security. From session identifiers and pass-word reset tokens, to random filenames and passwordsalts, almost every web application is relying on theunpredictability of these values for ensuring secure op-eration. However, usually programmers fail to under-stand the importance of using cryptographically securepseudorandom number generators (PRNG) somethingthat opens the potential for attacks. Even worse, thesame trend holds for whole programming languages;

∗Research partly supported by ERC Project CODAMODA.

PHP for example lacks a built-in cryptographically se-cure PRNG in its core and until recently, version 5.3, ittottaly lacked a cryptographically secure randomnessgeneration function.

This left PHP programmers with two options: Theywill either implement their own PRNG from scratchor they will employ whatever functions are offered bythe API in a “homebrew” and ad-hoc fashion. In ad-dition, backwards compatibility and other issues (cf.section 2), often push the developers away even fromthe newly added randomness functions, making theiruse very limited. As we will demonstrate and heavilyexploit in this work, this approach does not producesecure web applications.

Observe that using a low entropy source or a crypto-graphically weak PRNG to produce randomness doesnot necessarily imply that an attack is feasible againsta system. Indeed, so far there have been a very limitednumber of published attacks based on the insecure us-age of PRNG functions in PHP, while popular exploitdatabases1 contain nearly zero exploits for such vul-nerabilities (and this may partially explain the delay inthe PHP community adopting secure randomness gen-eration functions). Showing that such attacks are infact very practical is the objective of our work.

In this paper we develop generic techniques and al-gorithms to exploit randomness vulnerabilities in PHPapplications. We describe implementation issues thatallow one to either predict or completely recover theinitial seed of the PRNGs used in most web applica-tions. We also give algorithms for recovering the in-ternal state of the PRNGs used by the PHP system, in-cluding the Mersenne twister generator and the glibcLFSR based generator, even when their output is trun-cated. These algorithms could be used in order toattack hardened PHP installations even when strongseeding is employed, as it is done by the Suhosin ex-tension for PHP and they may be of independent inter-est.

We also conducted an extensive audit of several pop-ular PHP applications. We focused on the securityof password reset implementations. Using our attack

1e.g. http://www.exploit-db.com

1

Page 2: I Forgot Your Password: Randomness Attacks Against PHP

framework we were able to mount attacks that takeover arbitrary user accounts with practical complex-ity. A number of widely used PHP applications areaffected (see Figure 7), while we believe that the im-pact is even larger in less known applications.

Our results suggest that randomness attacks shouldbe considered practical for PHP applications and ex-isting systems should be audited for these vulnerabili-ties. Weak randomness is a grave vulnerability in anysecure system as it was also recently demonstrated inthe widely publicized discovery of common primes inRSA public-keys by Lenstra et al. [14]. We finallystress that our techniques apply in any setting beyondPHP, whenever the same PRNG functions are used andthe attack vector relies on predicting a system definedrandom object.

This is only an extended abstract, a full version canbe found in [1].

1.1 Attack modelIn Figure 1 we present our general attack template. Anattacker is trying to predict the password reset token inorder to gain another user’s privileges (say an admin-istrator’s). Each time the attacker makes a request tothe web server, his request is handled by a web appli-cation instance, usually represented by a specific op-erating system process, which contains some processspecific state. The web application uses a number ofapplication objects with values depending on its in-ternal state, with some of these objects leaking to theattacker through the web server responses. Examplesof such objects are session identifiers and outputs ofPRNG functions. Although our focus is in passwordreset functions, the principles that we use and the tech-niques that we develop can be readily applied in othercontexts when the application relies on the generationof random values for security applications. Examplesof such applications are CAPTCHA’s and the produc-tion of random filenames.

Attack complexity. Since we present explicit practi-cal attacks, we define next the complexity under whichan attack should be consider practical. There are twomeasure of complexity of interest. The first is the timecomplexity and the second is the query or communi-cation complexity. For some of our attacks the maincompuational operation is the calculation of an MD5hash. With current GPU technologies an attacker canperform up to 230 MD5 calculations per second witha $250 GPU, while with an additional $500 can reachup to 232 calculations [9]. These figures suggest thatattacks that require up to 240 MD5 calculations canbe easilty mounted. In terms of communication com-plexity, most of our attacks have a query complexityof a few thousand requests at most, while some haveas little as a few tens of requests. Our most commu-nication intensive attacks (section 5) require less than

35K(≈ 215) requests. Sample benchmarks that we per-formed in various applications and server installationsshow that on average one can perform up to 222 re-quests in the course of a day.

2 PHP System

We will now describe functionalities of the PHP sys-tem that are relevant to our attacks. We first describethe different modes in which PHP might be running,and then we will do a description of the randomnessgeneration functions in PHP. We focus our analysis inthe Apache web server, the most popular web server atthe time of this writing, however our attacks are easilyported to any webserver that meets the configurationrequirements that we describe for each attack.

2.1 Proccess managementThere are different ways in which a PHP script is ex-ecuted. These ways affect its internal states, and thusthe state of its PRNGs. We will focus on the case whenPHP is running as an Apache module, which is the de-fault installation in most Linux distributions and is alsovery popular in Windows installations.

mod php: Under this installation the Apache webserver is responsible for the process management.When the server is started a number of child proccessesare created and each time the number of occupied pro-cesses passes a certain threshold a new process is cre-ated. Conversely, if the idle proccesses are too many,some processes are killed. One can specify a maxi-mum number of requests for each process although thisis not enabled by default. Under this setting each PHPscript runs in the context of one of the child processes,so its state is preserved under multiple connections un-less the process is killed by the web server processmanager. The configuration is similar in the case theweb server uses threads instead of processes.

Keep-Alive requests. The HTTP protocol offers arequest header, called Keep-Alive. When this headeris set in an HTTP request, the web server is instructedto keep the connection alive after the request is served.Under mod php installations this means that any sub-sequent request will be handled from the same process.This is a very important fact, that we will use in ourattacks. However in order to avoid having a processhang from one connection for infinite time, most webservers specify an upper bound on the number of con-sequent keep-alive requests. The default value for thisbound in the Apache web server is 100.

2.2 Randomness GenerationIn order to satisfy the need for generating randomnessin a web application, PHP offers a number of different

2

Page 3: I Forgot Your Password: Randomness Attacks Against PHP

Figure 1: Attack template.

randomness functions. We briefly describe each func-tion below.

php combined lcg()/lcg value(): thephp combined lcg() function is used internallyby the PHP system, while lcg value() is itspublic interface. This function is used in order tocreate sessions, as well as in the uniqid functiondescribed below to add extra entropy. It uses twolinear congruential generators (LCGs) which itcombines in order to get better quality numbers.The output of this function is 64 bits.uniqid(prefix, extra entropy): Thisfunction returns a string concatenation of theseconds and microseconds of the server time con-verted in hexadecimal. When given an additionalargument it will prefix the output string with theprefix given. If the second argument is set to true,the function will suffix the output string with anoutput from the php combined lcg() function.This makes the total output to have length up to15 bytes without the prefix.microtime(), time(): The functionmicrotime() returns a string concatenationof the current microseconds divided by 106 withthe seconds obtained from the server clock. Thetime() function returns the number of secondssince Unix Epoch.mt srand(seed)/mt rand(min, max):mt rand is the interface for the MersenneTwister (MT) generator [15] in the PHP system.In order to be compatible with the 31 bit output ofrand(), the LSB of the MT function is discarded.The function takes two optional arguments whichmap the 31 bit number to the [min,max] range.The mt srand() function is used to seed the MTgenerator with the 32 bit value seed; if no seedis provided then the seed is provided by the PHPsystem.srand(seed)/rand(min, max): rand is the in-terface function of the PHP system to the rand()function provided by libc. In unix, rand() ad-

ditive feedback generator (resembling a LinearFeedback Shift Register (LFSR)), while in Win-dows it is an LCG. The numbers generated byrand() are in the range [0,231−1] but like beforethe two optional arguments give the ability to mapthe random number to the range [min,max]. Likebefore the srand() function seeds the generatorsimilarly to the mt srand() function.openssl random pseudo bytes(length,

strong): This function is the only functionavailable in order to obtain cryptographicallysecure random bytes. It was introduced in version5.3 of PHP and its availability depends on theavailability of the openssl library in the system.In addition, until version 5.3.4 of PHP thisfunction had performance problems [2] runningin Windows operating systems. The strong

parameter, if provided, is set to true if thefunction returned cryptographically strong bytesand false otherwise. For these reasons, andfor backward compatibility, its use is still verylimited in PHP applications.

In addition the application can utilize an operating sys-tem PRNG (such as /dev/urandom). However, thisdoes not produce portable code since /dev/urandom

is unavailable in Windows OS.

3 The entropy of time measurements

Although ill-advised (e.g., [5]) many web applica-tions use time measurements as an entropy source.In PHP, time is accessed through the time() andmicrotime() functions. Consider the following prob-lem. At some point a script executing a request madeby the attacker makes a time measurement and use theresults to, say, generate a password reset token. Theattacker’s goal it to predict the output of the measure-ment made by the PHP script. The time() functionhas no entropy at all from an attacker point of view,since the server reveals its time in the HTTP responseheader as dictated by the HTTP protocol. On the other

3

Page 4: I Forgot Your Password: Randomness Attacks Against PHP

hand, microtime ranges from 0 to 106 giving a max-imum entropy of about 20 bits. We develop two dis-tinct attacks to reduce the entropy of microtime()

that have different advantages and mostly target twodifferent scenarios. The first one, Adversial Time Syn-chronization, aims to predict the output of a specifictime measurement when there is no access to othersuch measurements. The second, Request Twins, ex-ploits the fact that the script may enable the attacker togenerate a correlated leak to the target measurement.

Adversarial Time Synchronization (ATS). As wementioned above, in each HTTP response the webserver includes a header containing the full date of theserver including hour, minutes and seconds. The basicobservation is that although we get no leak regardingthe microseconds from the HTTP date header we knowthat when a second changes the microseconds are ze-roed. We use this observation to narrow down theirvalue.

The algorithm proceeds as follows: We connect tothe web server and issue pairs of HTTP requests R1and R2 in corresponding times T 1 and T 2 until a pairis found in which the date HTTP header of the cor-responding responses is different. At that point weknow that between the processing of the two HTTPrequests the microseconds of the server were zeroed.We proceed to approximate the time of this event S inlocaltime, denoted by the timestamp D, by calculatingthe average RTT of the two requests and offsetting themiddle point between T 2 and T 1 by this value dividedby two.

In the Apache web server the date HTTP header isset after processing the request of the user. If the at-tacker requests a non existent file, then the point theheader is set is approximatelly the point that a validrequest will start executing the PHP script. It fol-lows that if the attacker uses ATS with HTTP requeststo not existent files then he will synchronize approx-imately with the beggining of the script’s execution.Given a steady network where each request takes RT T

2time to reach the target server, our algorithm devia-tion depends only on the rate that the attacker can sendHTTP requests. In practice, we find that the algo-rithm’s main source of error is the network distancebetween the attacker’s system and the server cf. Fig-ure 3. The above implementation we described is aproof-of-concept and various optimizations can be ap-plied to improve its accuracy.

Request Twins. Consider the following setting: anapplication uses microtime() to generate a passwordtoken for any user of the system. The attacker has ac-cess to a user account of the application and tries totake over the account of another user. This allows theattacker to obtain password reset tokens for his accountand thus outputs of the microtime() function. Thekey observation is that if the attacker performs in rapid

succession two password reset requests, one for his ac-count and one for the target user’s account, then theserequests will be processed by the application with avery small time difference and thus the conditional en-tropy of the target user’s password reset token giventhe attacker’s token will be small. Thus, the attackercan generate a token for an account he owns and infast succession a token for the target account. Thenthe microtime() used for generating the token of hisaccount can be used to approximate the microtime()output that was used for the token of the target account.

Experiments. We conducted a series of experimentsfor both our algorithms using the following setup. Wecreated a PHP “time” script that prints out the currentseconds and microseconds of the server. To evaluatethe ATS algorithm we first performed synchronizationbetween a client and the server and afterwards we senta request to the time script and tried to predict the valueit would return. To evaluate the Request Twins algo-rithm we submitted two requests to the time script infast succession and measured the difference betweenthe output of the two responses.

In Figure 3 we show the time difference betweenthe server’s time and our client’s calculation for fourservers with different CPU’s and RTT parameters. Ourexperiments suggest that both algorithms significantlyreduce the entropy of microseconds (up to an averageof 11 bits with ATS and 14 bits with Request Twins)having different advantages each. Specifically, theATS algorithm seems to be affected by large RTT val-ues while it is less affected by differences in the CPUspeed. The situation is reversed for Request Twinswhere the algorithm is immune to changes in the RTThowever, it is less effective in old systems with lowprocessing speed.

4 Seed Attacks

In this section we describe attacks that allow either therecovery or the reconstruction of the seeds used for thePHP system’s PRNGs. This allows the attacker to pre-dict all future iterations of these functions and hencereduces the entropy of functions rand(),mt rand(),lcg value() as well as the extra entropy argumentof uniqid() to zero bits. We exploit two propertiesof the seeds used in these functions. The first one isthe reusage of entropy sources between different seeds.This enables us to reconstruct a seed without any ac-cess to outputs of the respective PRNG. The second isthe small entropy of certain seeds that allows one torecover its value by bruteforce.

We present three distinct attacks. The first attack al-lows one to recover the seed of the internal LCG seedused by the PHP system using a session identifier. Us-ing that seed our second attack reconstructs the seed ofrand() and mt rand() functions from the elementsof the LCG seed without any access to outputs of these

4

Page 5: I Forgot Your Password: Randomness Attacks Against PHP

Figure 2: ATS.

Configuration ATS Req. TwinsCPU(GHz) RTT(ms) min max avg min max avg

1×3.2 1.1 0 4300 410 0 1485 474×2.3 8.2 5 76693 4135 565 1669 11531×0.3 9 53 39266 2724 1420 23022 48492×2.6 135 73 140886 83573 2 1890 299

Figure 3: Effectiveness of our time entropy lowering techniques against four serversof different computational power and RTT. Time measurements are in microseconds.

functions. Finally, we exploit the fact that the seedused in these functions is small enough for someonewith access to the output of these functions to recoverits value by bruteforce.

Generating fresh processes. Our attacks on this sec-tion rely on the ability of the attacker to connect to aprocess with a newly initialized state. We describe ageneric technique against mod php in order to achievea connection to a fresh process. Recall that in mod phpwhen the number of occupied processes passes a cer-tain threshold new processes are created to handle thenew connections. This gives the attacker a way to forcethe creation of fresh processes: The attacker creates alarge number of connections to the target server withthe keep-alive HTTP header set. Having occupied alarge number of processes the web server will createa number of new processes to handle subsequent re-quests. The attacker, keeping the previous connectionsopen, makes a new one which, given that the attackercreated enough connections, will be handled by a freshprocess.

4.1 Recovering the LCG seed from Ses-sion ID’s

In this section we present a technique to recover thephp combined lcg() seed using a PHP session iden-tifier. In PHP, when a new session is created using therespective PHP function (session start()), a pseu-dorandom string is returned to the user in a cookie, inorder to identify that particular session. That string isgenerated using a conjuction of user specific and pro-cess specific information, and then is hashed using ahash function which is by default MD5, however thereis an option to use other hash functions such as SHA-1.The values contained in the hash are:

Client IP address (32 bits).A time measurement: Unix epoch and microsec-onds (32 + 20 bits).A value generated by php combined lcg() (64bits).

Notice now that in the context of our attack modelthe attacker controls each request thus he knows ex-

actly most of the values. Specifically, the client IP ad-dress is the attacker’s IP address and the Unix Epochcan be determined through the date HTTP header. Inaddition, if php combined lcg() is not initialized atthe time the session is created, as it happens when afresh process is spawned, then it is seeded. The stateof the php combined lcg() is two registers s1, s2 ofsize 32 bits each, which are initialized as follows. LetT1 and T2 be two subsequent time measurements. Thenwe have that

s1 = T1.sec⊕(T1.usec� 11) and s2 = pid⊕(T2.usec� 11)

where pid denotes the current process id, or if threadsare used the current thread id 2.

Process id’s have a range of 215 values in Linuxsystems In Windows systems the process id’s (resp.threads) are also at most 215 unless there are morethan 215 active processes (resp. threads) in the systemwhich is a very unlikely occurence.

Observe now that the session calculation involvesthree time measurements T0, T1 and T2. Given thatthese three measurements are conducted succesivellyit is advantageous to estimate their entropy by examin-ing the random variables T0,∆1 = T1−T0,∆2 = T2−T1.We conducted experiments in different systems to es-timate the range of values for ∆1 and ∆2. Our exper-iments suggest that ∆1 ∈ [1,4] while ∆2 ∈ [0,3]. Wealso found a positive linear correlation in the values ofthe two pairs. This enables a cutdown of the possiblevalid pairs. These results suggest that the additionallyentropy introduced by the two ∆ variables is at most 5bits.

To summarize, the total remaining entropy of thesession identifier hash is the sum of the microsecondsentropy from T0 (≈ 20 bits) the two ∆ variables (≈5 bits) and the process identifier(15 bits). These givea total of 40 bits which is tractable cf. section 1.1.Furthermore the following improvements can be made:(1) Using the ATS algorithm the microseconds entropycan be reduced as much as 11 bits on average. (2) Theattacker can make several connections to fresh pro-cesses instead of one, in rapid succession, obtaining

2In PHP versions before 5.3.2 the seed used only one time mea-surement which made it even weaker.

5

Page 6: I Forgot Your Password: Randomness Attacks Against PHP

session identifiers from each of the processes. Becausethe requests were made in a small time interval thepreimages of the hashes obtained belong into the samesearch space, thus improving the probability of invert-ing one of the preimages proportionally to the numberof session identifiers identifiers obtained. Our experi-ments with the request twins technique suggest that atleast 4 session identifiers can be obtained from withinthe same search space thus offering a reduction of atleast two bits. Adding these improvements reduces thesearch time up to 227 MD5 computations.

4.2 Reconstructing the PRNG Seed fromSession ID’s

In this section we exploit the fact that the PHP systemreuses entropy sources between different generators, inorder to reconstruct the PRNG seed used by rand()

and mt rand() functions from a PHP session identi-fier. In order to predict the seed we only need to finda preimage for the session id, using the methods de-scribed in the previous section. One advantage of thisattack is that it requires no outputs from the affectedfunctions.

When a new process is created the internal state ofthe functions rand() and mt rand() is uninitialized.Thus, when these functions are called for the first timewithin the script a seed is constructed as follows:

seed = (epoch× pid)⊕ (106×php combined lcg())

where epoch denotes the seconds since epoch and piddenotes the process id of the process handling the re-quest. It it easy to notice, that an attacker with access toa session id preimage has all the information needed inorder to calculate the seed used to initialize the PRNGssince:

epoch is obtained through the HTTP Date header.pid is known from the seed of thephp combined lcg() obtained through thepreimage of the session id from section 4.1.php combined lcg() is also known, since the at-tacker has access to its seed, he can easily predictthe next iteration after the initial value.

In summary the technique of this section allows the re-construction of the seed of the mt rand() and rand()functions given access to a PHP session id of a freshprocess. The time complexity of the attack is thesame as the one described in section 4.1 while thequery complexity is one request, given that the attackerspawned a fresh process (which itself requires only ahandful of requests).

4.3 Recovering the Seed from Applica-tion leaks

In contrast to the technique presented in the previoussection, the attack presented here recovers the seed of

the PRNG functions rand() and mt rand() when theattacker has access to the output of these functions. Weexploit the fact that the seed used by the PHP system isonly 32 bits. Thus, an attacker who connects to a freshprocess and obtains a number of outputs from thesefunctions can bruteforce the 32 bit seed that producesthe same output.

We emphasize that this attack works even if the out-puts are truncated or passed through transformationslike hash functions. The requirements of the attack isthat the attacker can define a function from the set ofall seeds to a sufficiently large range and can obtain asample of this function evaluated on the seed that theattacker tries to recover. Additionally for the attack towork this function should behave as a random map.

Consider the following example. The attacker hasaccess to a user account of an application which gen-erates a password reset token as 6 symbols where eachsymbol is defined as g(mt rand()) where g is a ta-ble lookup function for a table with 60 entries contain-ing alphanumeric characters. The attacker defines thefunction f to be the concatenation of two password re-set tokens generated just after the PRNG is initialized.The attacker samples the function by connecting to afresh process and resetting his password two times.Since the table of function g contains 60 entries, theattacker obtains 6 bits per token symbol, giving a totalrange to the function f of 72 bits.

The time complexity of the attack is 232 calculationsof f however, we can reduce the online complexityof the attack using a time-space tradeoff. In this casethe online complexity of the attack can be as little as216. The query complexity of the attack depends onthe number of requests needed to obtain a sample off . In the example given above the query complexity istwo requests.

5 State recovery attacks

One can argue that randomness attacks can be easilythwarted by increasing the entropy of the seeding forthe PRNG functions used by the PHP system. For ex-ample, the suhosin PHP hardening extension replacesthe rand() function with a Mersene Twister generatorwith separate state from mt rand() and offers a largerseed for both generators getting entropy from the oper-ating system3.

We show that this is not the case. We exploit thealgebraic structure of the PRNGs used in order to re-cover their internal state after a sufficient number ofpast outputs (leaks) have been observed by the attacker.Any such attack has to overcome two challenges. First,web applications usually need only a small range of

3The suhosin patch installed in some Unix operating systemsby default does not include the randomness patches, rather than itmainly offers protection from memory corruption attacks. The fullextension is usually installed separately from the PHP packages.

6

Page 7: I Forgot Your Password: Randomness Attacks Against PHP

Figure 4: Mapping a random number n ∈ [M] to 7buckets and the respective bits of n that are revealedgiven each bucket.

random numbers, for example to sample a random en-try from an array. To achieve that, the PHP systemmaps the output of the PRNG to the given range, anaction that may break the linearity of the generators.Second, in order to collect the necessary leaks the at-tacker may need to reconnect to the same process manytimes to collect the leaks from the same generator in-stance. Since, there could be many PHP processes run-ning in the system, this poses another challenge for theattacker.

In this section we present state recovery algo-rithms for the truncated PRNG functions rand() andmt rand(). The algorithm for the latter functionis novel, while regarding the former we implementand evaluate the Hastad-Shamir cryptanalytic frame-work [8] for truncated linearly related variables. Webegin by discussing the way truncation takes place inthe PHP system. Afterwards, we tackle the problem ofreconnecting into the same server process. Finally wepresent the two algorithms against the generators.

5.1 Truncating PRNG sequences in thePHP system

As mentioned in section 2.2 the rand() andmt rand() functions can map their output to a userdefined range. This has the effect of truncating thefunctions’ output. Here we discuss the process of trun-cating the output and its implications for the attacker.

Let n ∈ [M] = {0, . . . ,M− 1} be a random numbergenerated by rand() or mt rand(), where M = 231

in the PHP system. In order to map that number in therange [a,b] where a < b the PHP system maps n to anumber l ∈ [a,b] in the following way:

l = a+n · (b−a+1)

M

We can view the process above as a mapping fromthe set of numbers in the range [M] to b−a+1 “buck-ets.” Our goal is to recover as many bits as possibleof the original number n. Observe that given l it ispossible to recover immediately up to blog(b−a+1)cmost significant bits (MSB) of the original number nas follows:

Given that n belongs to bucket l we obtain the fol-

lowing range for possible values for n:

b (l−a) ·Mb−a+1

c ≤ n≤ b (l−a+1) ·Mb−a+1

c−1

Therefore, given a bucket number l we are able tofind an upper and lower bound for the original numberdenoted respectively by Ll and Ul . In order to recovera part of the original number n one can simply find thenumber of most significant bits of Ll and Ul that areequal and observe that these bits would be the samealso in the number n. Therefore, given a bucket l wecan compare the MSBs of both numbers and set theMSBs of n to the largest sequence of common mostsignificant bits of Ll ,Ul .

Notice that in some cases even the most signifi-cant bit of the two numbers are different, thus weare be unable to infer any bit of the original numbern with absolute certainty. For example, in Figure 4given that a number falls in bucket 3 we have that920350134≤ n≤ 1227133512. Because 920350134<230 and 1227133512 > 230 we are unable to infer anybit of the original number n.

Another important observation is that this specifictruncation algorithm allows the recovery of a fragmentof the MSBs of the original number. Therefore, in thefollowing sections we will assume that the truncationoccurs in the MSBs and we will describe our algo-rithms based on MSB truncated numbers. However, allalgorithms described work for any kind of truncation.

5.2 Process distinguisherAs we mentioned in section 2.1, if one wants to receivea number of leaks from the same PHP process one canuse keep-alive requests. However, there is an upperbound that limits the number of such requests (by de-fault 100). Therefore, if the attacker needs to observemore outputs beyond the keep-alive limit the connec-tion will drop and when the attacker connects back tothe server he may be served from a different processwith a different internal state. Therefore, in order toapply state recovery attacks (which typically requiremore than 100 requests), we must be able to submitall the necessary requests to the same process. In thissection, we will describe a generic technique that findsthe same process over and over using the PHP sessionleaks described in section 4.1.

While we cannot avoid disconnecting from a pro-cess after we have submitted the maximum number ofkeep-alive requests, we can start reconnecting back tothe server until we hit the process we were connectedbefore and continue to submit requests. The problemin applying this approach is that it is not apparent todistinguish whether the process we are currently con-nected to is the one that was serving us in the previ-ous connection. To distinguish between different pro-cesses, we can use the preimage from a session iden-tifier. Recall that the session id contains a value from

7

Page 8: I Forgot Your Password: Randomness Attacks Against PHP

the php combined lcg() function, which in turn usesprocess specific state variables. Thus, if the sessionis produced from the same process as before then thephp combined lcg() will contain the next state fromthe one it was before. This gives us a way to find thecorrect process among all the server processes runningin the server. In summary the algorithm will proceedas follows:

1. The attacker obtains a session identifier and apreimage for that id using the techniques dis-cussed in section 4.1.

2. The attacker submits the necessary requests to ob-tain leaks from the PRNG, using the keep-aliveHTTP header until the maximum number of re-quests is reached.

3. The attacker initiates connections to the server re-questing session identifiers. He attempts to ob-tain a preimage for every session identifier usingthe next value of the php combined lcg() fromthe one used before or, if the server has high traf-fic, the next few iterations. If a preimage is ob-tained the attacker repeats step 2, until all neces-sary leaks are obtained.

Notice that obtaining a preimage after disconnect-ing requires to bruteforce a maximum number of 20bits (the microseconds), and thus testing for the cor-rect session id is an efficient procedure. Even if theapplication is not using PHP sessions, or if a preimagecannot be obtained, there are other, application spe-cific, techniques in order to find the correct process.

A generic technique for Windows. In the case ofWindows systems the attacker can employ anothertechnique to collect the necessary leaks from the sameprocess in case the server has low traffic. In unixservers with apache preforked server + mod php allidle processes are in a queue waiting to handle an in-coming client. The first process in the queue handlesa client and then the process goes to the back of thequeue. Thus, if an attacker wants to reconnect to thesame process without using some process distinguisherhe will need to know exactly the number of processesin the system and if there are any intermediate requestsby other clients while the attacker tries to reconnect tothe same process. However, in Windows prethreadedserver with mod php things are slightly better for anattacker. Threads are in a priority queue and when athread in the first place of the queue handles a requestfrom a client it returns again in that first place and han-dles the first subsequent incoming request. Thus, anattacker which manages to connect to that first threadof the server, can rapidly close and reopen the connec-tions thus leaving a very small window in which thatthread could be occupied by another client. Of course,in high traffic servers the attacker would have a diffi-culty connecting in a time when the server is idle in

the first place. Nevertheless, techniques exist [16] toremotely determine the traffic of a server and thus al-low the attacker to find an appropriate time windowwithin which he will attempt this attack.

Based on the above, in the following sections wewill assume that the attacker is able to collect the nec-essary number of leaks from the targeted function.

5.3 State recovery for mt rand()The mt rand() function uses the Mersenne Twistergenerator in order to produce its output. In this sectionwe give a description of the Mersenne Twister genera-tor and present an algorithm that allows the recovery ofthe internal state of the generator even when the outputis truncated. Our algorithm also works in the presenceof non consecutive outputs as in the case resulting fromthe buckets truncation algorithm of the PHP system (cf.section 5.1).

Mersenne Twister. Mersenne Twister, and specifi-cally the widely used MT19937 variant, is a linearPRNG with a 624 32-bit word state. The MT algo-rithm is based on the following recursion: for all k,

xk+n = xk+m⊕((xk∧0x80000000)|(xk+1∧0x7fffffff))A

where n = 624 and m = 397. The logical AND oper-ation with 0x80000000 discards all but the most sig-nificant bit of xk while the logical AND with 0x7fffffffdiscards only the MSB of xk+1. A is a 32× 32 ma-trix for which multiplication by a vector x is defined asfollows:

xA =

{(x� 1) if x31 = 0(x� 1)⊕a if x31 = 1

Here a = (a0,a1, ...,a31) = 0x9908B0DF is a constant32-bit vector (note that we use x31 to denote the LSBof a vector x). The output of this recurrence is finallymultiplied by a 32× 32 non singular matrix T , calledthe tempering matrix, in order to produce the final out-put z = xT .

State recovery. Since the tempering matrix T is nonsingular, given 624 outputs of the MT generator onecan easily compute the original state by multiplyingthe output z with the inverse matrix T−1 thus obtain-ing the state variable used as xi = ziT−1. After recov-ering 624 state variables one can predict all future it-erations. However, when the output of the generator istruncated, predicting future iterations is not as straight-forward as before because it is not possible to locallyrecover all needed bits of the state variables given thetruncated output.

The key observation in recovering the internal stateis that due to the fact that the generator is in GF(2) thetruncation does not introduce non linearity even thoughthere are missing bits from the respective equations.

8

Page 9: I Forgot Your Password: Randomness Attacks Against PHP

Thus, we can express the output of the generator as aset of linear equations in GF(2) which, when solved,yield the initial state that produced the observed se-quence. From the basic recurrence of MT we can de-rive the following equations for each individual bit:

Lemma 5.1. Let x0,x1, . . . be an MT sequence and j >0. Then the following equations hold for any k ≥ 0:

1. x0jn+k = x0

( j−1)n+k+m⊕ (x31( j−1)n+k+1∧a0)

2. x1jn+k = x1

( j−1)n+k+m⊕ x0( j−1)n+k⊕ (x31

( j−1)n+k+1 ∧a1)

3. ∀i,2 ≤ i ≤ 31 : xijn+k = xi

( j−1)n+k+m ⊕xi−1( j−1)n+k+1⊕ (x31

( j−1)n+k+1∧ai)

Proof. The equations follow directly from the basic re-currence.

In addition since the tempering matrix is only a lin-ear transformation of the bits of the state variable xi,we can similarly express each bit of the final output ofMT as a linear equation of the bits of the respectivestate variable.

To recover the initial state of MT, we generate allequations over the state bit variables x0,x1, . . . ,x19936.To map any position in the MT sequence in an equationover this set of variables, we apply the equations of thelemma above recursively until all variables in the righthand side have index below 19937.

Depending on the positions observed in the MT se-quence the resulting linear system will be different.The question that remains is whether that system issolvable. Regarding the case of the 31-bit truncation,i.e. only the MSB of the output word is revealed, wecan use known properties of the generator in order toeasily prove the following:

Lemma 5.2. Suppose we obtain the MSB of 19937consecutive words from the MT generator. Then theresulting linear system is uniquely solvable.

Proof. It is known that the MT sequence is 19937-distributed to 1-bit accuracy4. The linear system isuniquely solvable iff the rows are linearly independent.Suppose that a set k ≤ 19937 of rows are lineary de-pendent. Then the last row of the set k obtained iscomputable from the other members of the k-set some-thing that contradicts the order of equidistribution ofMT.

The above result is optimal in the sense that this isthe minimum number of observed outputs needed forthe system to become fully determined. In the casewe obtain non consecutive outputs due to truncation

4Suppose that a sequence is k-distributed to u-bit accuracy. Thenknowledge of the u most significant bits of l words does not allowone to make any prediction for the u bits of the next word when l < k.This is the cryptographic interpretation of the “order of equidistribu-tion” whose exact definition can be found in [15].

or application behavior, linear dependencies may arisebetween the resulting equations and therefore we mayneed a larger number of observed outputs.

Because we cannot know in advance when the sys-tem will become solvable or the equations that will beincluded, we employ an online version of Gaussianelimination in order to form and solve the resultingsystem. In this way, the attacker can begin collectingleaks and gradually feed them to our Gaussian solveruntil he is notified that a sufficient number of indepen-dent equations have been collected. Note that regularGaussian elimination uses both elementary row and el-ementary column operations. However, because we donot have in advance the entire linear system we cannotuse elementary column operations. Instead we makeGaussian elimination using only elementary row oper-ations and utilize a bookkeeping system to enter equa-tions in their place as they are produced by the leakssupplied to the solver. Our solver employs a sparsevector representation and is capable of solving overde-termined sparse systems of tens of thousands of equa-tions in a few minutes.

We ran a sequence of experiments to determine thesolvability of the system when a different number ofbits is truncated from the output. In addition we ranexperiments when the outputs of the MT generator ispassed through the PHP truncation algorithm, with dif-ferent user defined ranges. All experiments were con-ducted in a 4×2.3 GHz machine with 4 GB of RAM.

In Figure 5 we present the number of equationsneeded when the PHP truncation algorithm is used.In the x-axis we have the logarithm of the numberof buckets. We also show the standard deviation ap-pearing as vertical bars. It can be seen that the num-ber of equations needed is much higher than the the-oretical lower bound of 19937 and fluctuates between27000 and 33000. Neverthless, the number of leaks re-quired is decreasing linearly to the number of bucketswe have. The reason is that although we have more lin-early dependend equations, the total number of equa-tions we obtain due to the larger number of buckets isbigger.

Implementation error in the PHP system. ThePHP system up to current version, 5.3.10, has an errorin the implementation of the Mersenne Twister gen-erator (we discovered this during the testing of oursolver). Specifically the following basic recurrence iseffectively used in the PHP system due to a program-ming error:

xk+n = xk+m⊕ ((xk ∧0x80000000)|(xk+1 ∧0x7ffffffe)|(xk ∧0x1))A

As a result the PHP system uses a different generatorwhich, as it turns out, has slightly more linear depen-dencies than the MT generator. This means that prob-ably the randomness properties of the PHP generatorare poorer compared to the original MT generator.

9

Page 10: I Forgot Your Password: Randomness Attacks Against PHP

Figure 5: Solving MT; y-axis:number of equations; x-axis: number of buckets (logarithm). Standard deviationshown as vertical bars.

5.4 State recovery for rand()We turn now to the problem of recovering the stateof rand() given a sequence of leaks from this gen-erator. While mt rand() is implemented within thePHP source code and thus is unchanged across differ-ent enviroments, the rand() function uses the respec-tive function defined from the standard library of theoperating system. This results in different implemen-tations across different operating systems. There aremainly two different implementations of rand() onefrom the glibc and one from the Windows library.

Windows rand(). The rand() function defined inWindows is a Linear Congruential Generator (LCG).An LCG is defined by a recurrence of the form

Xn+1 = (aXn + c) mod m

Although LCGs are fast and require a small memoryfootprint there are many problems which make theminsufficient for many uses, including of course cryp-tographic purposes. The parameters used by the Win-dows LCG are a = 214013,c = 2531011,m = 232. Inaddition, the output is truncated by default and onlythe top 15 bits are returned. If PHP is running in athreaded server in Windows then the parameters of theLCG used are a = 1103515245,c = 12345,m = 215.

Glibc rand(). In the past, glibc also used an LCG forthe rand() function. Subsequently an LFSR-like “ad-ditive feedback” design was adopted. The generatorhas a state of 31 words (of 32 bits each), over which itis defined by the following recurrence:

ri = (ri−3 + ri−31) mod 232

In addition the LSB of each word is discarded and theoutput returned to the user is oi = ri� 1. An interest-ing note is that the man page of rand() states that rand

is a non-linear generator. Nevertheless, the non linear-ity introduced by the truncation of the LSB is negligi-ble and one can easily recover the initial values givenenough outputs of the generator.

State recovery. Notice that if the generators usedhave a small state such as the Windows LCGs thenstate recovery is easy, by applying the attack from sec-tion 4 to bruteforce the entire state of the generator.However, on the Glibc generator, which has a state of992 bits, these attacks are infeasible assuming that thestate is random. Although LCGs and the Glibc gener-ators are different, they both fall into the same crypt-analytic framework introduced by Hastad and Shamirin 1985 for recovering values of truncated linear vari-ables. This framework allows one to uniquely solvean underdefined system of linear equations when thevalues of the variables are partially known. In this sec-tion we will discuss our experiences with applying thistechnique in the two aforementioned generators: TheLCG and the additive-feedback generator of glibc. Wewill briefly describe the algorithm for recovering thetruncated variables in order to discuss our experimentsand results. The interested reader can find more infor-mation about the algorithm in the original paper [8].

Suppose we are given a system with l linear equa-tions on k variables modulo m denoted by x1,x2, . . . ,xk,

a11x1 +a1

2x2 + · · ·+a1kxk = 0 mod m

a21x1 +a2

2x2 + · · ·+a2kxk = 0 mod m

. . .

al1x1 +al

2x2 + · · ·+alkxk = 0 mod m

where l < k and each variable xi is partially known.We want to solve the system uniquely by utilizing thepartial information of the k variables xi.

We use the coefficients of the l equations to create aset of l vectors, where each vector is of the form vi =

10

Page 11: I Forgot Your Password: Randomness Attacks Against PHP

(a1, . . . ,ak). In addition we add to this set the k vectorsm · ei,0 < i≤ k. The cryptanalytic framework exploitsproperties of the lattice L that is defined as the linearspan of these vectors. Observe that the dimension of Lis k and in addition for every vector v ∈ L we have that∑

ki=1 vixi = 0 mod m.Given the above the attack works as follows: first a

lattice is defined using the recurrence that defines thelinear generator; then, a lattice basis reduction algo-rithm is employed to create a set of linearly indepen-dent equations modulo m with small coefficients; fi-nally, using the partially known values for each vari-able, we convert this set of equations to equations overthe integers which can be solved uniquely. Specifi-cally, we use the LLL [13] algorithm in order to obtaina reduced basis B for the lattice L. Now because B ={w j} is a basis, the vectors of B are linearly indepen-dent. The key observation is that the lattice definitionimplies that w j ·x=w j ·(xunknown+xknown)= d j ·m forsome unknown d j. Now as long as xunknown ·w j < m/2(this is the critical condition for solvability) we cansolve for d j and hence recover k equations for xunknownwhich will uniquely determine it.

The original paper provided a relation between thesize of xknown and the number of leaks required fromthe generator so that the upper bound of m/2 is ensuredgiven the level of basis reduction achieved by LLL. Inthe case of LCGs the paper demanded the modulo m tobe squarefree. However, as shown above, in the gen-erators used it holds that m = 232 and thus their argu-ments do not apply. In addition, the lattice of the addi-tive generator of glibc is different than the one gener-ated by an LCG and thus needs a different analysis.

We conducted a thorough experimental analysis ofthe framework focusing on the two types of generatorsabove. In each case we tested the maximum possiblevalue of xknown to see if the m/2 bound holds for thereduced LLL basis. In the following paragraphs wewill briefly discuss the results of these experiments forthese types of generators.

In Figure 6 we show the relationship between thenumber of leaks required for recovering the state withthe lattice-attack and the number of leaks that are trun-cated for four LCGs: the Windows LCG, the glibcLCG (which are both 32 bits), the Visual Basic LCG(which is 24 bits) and an LCG used in the MMIX ofKnuth (which is 64 bits). It is seen that the numberof leaks required is very small but increases sharply asmore bits are truncated. In all cases the attack stopsbeing useful once the number of truncated bits leavesnone but the logw−1 most significant bits where w isthe size of the LCG state. The logarithm barrier seemsto be uniformly present and hints that the MSB’s ofa truncated LCG sequence may be hard to predict (atleast using the techniques considered here). A similarlogarithmic barrier was also found in the experimentalanalysis that was conducted by Contini and Shparlin-ski [3] when they were investigating Stern’s attack [17]

against truncated LCG’s with secret parameters.

Applying the attack in the glibc additive feedbackgenerator we found that the LLL algorithm became abottleneck in the algorithm running time; due to itslarge state the algorithm required a large number ofleaks to recover even small truncation levels there-fore increasing the lattice dimension that was givento the LLL algorithm. Our testing system (a 3.2GHzcpu with 2GB memory) ran out of memory when 7bits were truncated. The version of LLL we em-ployed (SageMath 4.8) has time complexity O(k5)where k is the dimension of the lattice (which repre-sents roughly the number of leaks). The best time-complexity known is O(k3 logk) derived from [12];this may enable much higher truncation levels to be re-covered for the glibc generator, however we were notable to test this experimentally as no implementationof this algorithm is publicly available.

We conclude that truncated LCG type of generatorscan be broken (in the sense of entirely recovering theirinternal state) for all but extremely high levels of trun-cation (e.g. in the case of 32-bit state LCG’s mod-ulo 232 when they are truncated to 16 buckets or less).For additive feedback type of generators, such as theone in glibc, the situation is similar, however higherrecursion depths require more leaks (with a linear re-lationship) that in turn affect the lattice dimension re-sulting in longer running times. Comparing the resultsbetween the LCGs and the additive feedback genera-tors one may find some justification for the adoptionof the latter in recent versions of glibc : it appears that- at least as far as lattice-based attacks are concerned -it is harder to predict truncated glibc sequences (com-pared to say, Windows LCG’s) due to the higher run-ning times of LLL reduction (note though that this doesnot mean that these are cryptographically secure).

6 Experimental results and Case studies

In order to evaluate the impact of our attacks on realapplications we conducted an audit to the passwordreset function implementations of popular PHP appli-cations. Figure 7 shows the results from our audit.In each case succesfully exploiting the application re-sulted in takeover of arbitrary user accounts5 and insome cases, when the administrator interface was af-fected, of the entire application. In addition to iden-tifiying these vulnerabilities we wrote sample exploitsfor some types of attack we presented, each on one af-fected application.

5The only exception to that is the HotCRP application wherepasswords were stored in cleartext thus there was no password resetfunctionality. However, in this case we were able to spoof registra-tions for arbitrary email accounts.

11

Page 12: I Forgot Your Password: Randomness Attacks Against PHP

Figure 6: Solving LCGs with LLL; y-axis:number of leaks; x-axis: number of bits truncated.

Application Attack Application Attackmediawiki 4.2 4.3 5.3 • Joomla 4.3 •

Open eClass 4.2 4.3 5.4 • MyBB ATSc 4.1c 5.3c ◦taskfreak 4.2 4.3 5.3 • IpBoard ATSc 4.1c 4.2c •zen-cart ATS RT • phorum 4.2 4.3 5.3 •

osCommerce 2.x ATS RT • HotCRP 4.2 4.3 5.3 •osCommerce 3.x 4.2 4.3 5.4 • gazelle 4.3 5.3 •

elgg ATSc 4.2 4.3 • tikiWiki 4.2 4.3 5.4 •Gallery RTc 4.1c 4.2c • SMF ATSc 4.3c ◦

Figure 7: Summary of audit results. The c superscript denotes that the attack need to be used in combination withother attacks with the same superscript. The • denotes a full attack while ◦ denotes a weakness for which thepractical exploitation is either unverified or requires very specific configurations. The number denotes the sectionin which the applied attack is described in the paper.

6.1 Selected Audit ResultsMany applications we audited where trivially vulnera-ble to our attacks since they used the affected PRNGfunctions in a straightforward manner, thus making itpretty easy for an attacker to apply our techniques andexploit them. However some applications attemptedto defend against randomness attacks by creating cus-tom token generators. We will describe some attacksthat resulted from using our framework against customgenerators.

Gallery. PHP Gallery is a very popular web basedphoto album organizer. In order for a user to reset hispassword he has to click to a link, which contains thesecurity token. The function that generates the token isthe following:

function hash($entropy="") {

return md5($entropy . uniqid(mt_rand(), true));

}

The token is generated using three entropy sources,

namely a time measurement from uniqid(), an out-put from the MT generator and an output from thephp combined lcg() through the extra argument inthe uniqid() function. In addition the output ispassed through the MD5 hash function so its infeasi-ble to recover the initial values given the output of thisfunction. Since we do not have access to the outputof the function, the state reconstruction attack seemsan appropriate choice for attacking this token gener-ation algorithm. Indeed, the Gallery application usesPHP sessions thus an attacker can use them to predictthe php combined lcg() and mt rand() outputs. Inaddition by utilizing the request twins technique fromsection 3 the attacker can further reduce the searchspace he has to cover to a few thousand requests.

Joomla. Joomla is one of the most popular CMS ap-plications, and it also have a long history of weak-nesses in its generation of password reset tokens [4,11]. Until recently, the code for the random token gen-eration was the following:

12

Page 13: I Forgot Your Password: Randomness Attacks Against PHP

function genRandomPassword( $length=8 ) {

$salt = abc...xyzABC...XYZ0123456789 ;

$len = strlen ( $salt );

$makepass = ‘‘’’;

- $stat = @stat ( FILE ) ;

- if (empty($stat) || !isarray($stat))

- $stat=array(phpuname());

- mt_srand(crc32(microtime().implode(|,$stat)));

for($i=0;$i<$length;$i++){

$makepass .= $salt[mt_rand(0,$len1)];

}

return $makepass;

}

In addition the output of this function ishashed using MD5 along a secret, 16 bytes, key(config.secret) which is created at installationusing the function above. The config.secret valuewas also used to create a “remember me” cookie in thefollowing way:

cookie = md5(config.secret+’JLOGIN REMEMBER’)

Since the second part of the string is constantand the config.secret is generated through the gen-RandomPassword function which has only 232 possi-ble values for each length, one could bruteforce allpossible values and recover config.secret. Allthat was left was the prediction of the output of thegenRandomPassword() function in order to predictthe security token used to reset a password. One thenobserves that although the contents of the $stat vari-able in the genRandomPassword() function are suf-ficiently random, the fact that crc is used to convertthis value to a 4 byte seed allows one to predict theseed generated and thus the token. This attack wasreported in 2010 in [11] and a year after, Joomla re-leased a patch for this vulnerability which removed thecustom seeding (dashed lines) from the token gener-ation function. The idea was that because the gener-ator is rolling constantly without reseeding one willbe unable to recover the config.secret and thus thegenerator will be secure due to its secret state. Un-fortunately, this may not be the case. If at the instal-lation time the process handling the installation scriptis fresh, a fact quite probable if we consider dedicatedservers that do not run other PHP applications, thenthe search space of the config.secret will be again232 and thus an attacker can use the same techniqueas before to recover it. After the config.secret isrecovered, exploitation of the password reset imple-mentation is straightforward using our seed recoveryattack from section 4.3. A similar attack also holdswhen mod cgi is used for script execution as each re-quest will be handled by a fresh process again reducingthe search space for config.secret in 232 values.

However, the low entropy of the config.secret

key is not the only problem of this implementation.Even if the key had enough entropy to be totally unpre-dictable, the generator would still be vulnerable. No-tice that in case the genRandomPassword() is called

with a newly initialized MT generator then there atmost 232 possible tokens, independently of the entropyof config.secret. This gives an interesting attackvector: We generate two tokens from a fresh processsequentially for a user account that we control. Thenwe start to connect to a fresh process and request a to-ken for our account. If the token matches the tokengenerated before then we can submit a second requestfor the target user’s account which, since the first to-ken matched the token we own, will match the secondtoken that we requested before (recall that the tokensare not bound to users). Observe that if we gener-ate only one pair of tokens this attack is expected tosucceed after 232 requests, assuming that the seed israndom. Nevertheless, we can request more than onepair of tokens thus increasing our success probability.Specifically, if we have n pairs of tokens then at thesecond phase the attack is expected to succeed after232/n requests. Therefore, if we denote by r(n) the ex-pected requests that the attack needs to hit a “good”token given n initial token pairs, then we have thatr(n) = 2n+ 232/n. Our goal is to minimize the func-tion r(n); this function obtains a positive minimum atn = 231/2, for which we have that r(231/2) ≈ 185000.A simple bruteforcing framework that we wrote wasable to achieve around 2500 requests per minute, a rateat which an attacker can compromise the applicationin a little more than one hour. To be fair, we have toadd the requests that are required to spawn new pro-cesses but even if we go as far as to double the neededrequests (and this is grossly overestimating) we stillhave a higly practical attack.

Gazelle. Gazelle is a torrent tracker application,which includes a frontend for building torrent shar-ing communities. It’s been under active developmentfor the last couple of years and its gaining increasingpopularity. The interesting characteristic of the appli-cation’s password reset implementation is that it usestwo generators of the PHP system (namely rand() andmt rand(). The code that generates a token is this:

function make_secret($Length = 32) {

$Secret = ’’;

$Chars=’abcdefghijklmnopqrstuvwxyz0123456789’;

for($i=0; $i<$Length; $i++) {

Rand = mt_rand(0, strlen($Chars)-1);

$Secret .= substr($Chars, $Rand, 1);

}

return str_shuffle($Secret);

}

The code generates a random string usingmt rand() and then shuffles the string using thestr shuffle() function which internally uses therand() function. If we try to apply directly the seedrecovery attack, i.e. try to ask a question of the form“which seed produces this token” then we will runinto problems because we have to take into accounttwo seeds, and a total search space of 64 bits which

13

Page 14: I Forgot Your Password: Randomness Attacks Against PHP

is infeasible. The normal action would be to followthe same path as we did in the Gallery applicationwhere we had a similar problem and utilize the seedreconstruction attack which does not require an outputof the PRNGs. However, the Gazelle application usescustom sessions (which are generated using the samefunction), and thus we cannot apply that attack either.The solution lies into slightly mofiying the seed recov-ery attack. Instead of asking the question “which seedproduces this mt rand() sequence”, which is shuffledand thus affected by the second PRNG, we instead askwhich seed produces the unsorted set which containsthe characters of our string. This set is not affected bythe shuffling and thus we can effectively bruteforcethe mt rand() seed independently. After recoveringthe mt rand() seed we know the initial sequence thatwas produced and we can subsequently recover theseed of rand() using the same attack.

6.2 Attacks ImplementationIn addition to auditing the applications, we imple-mented a number of our attacks targeting selected ap-plications. In particular, we implemented a seed re-covery attack against Mediawiki, a state reconstructionattack against the Phorum application and the requesttwins technique against Zen-cart. In the following sec-tions we will briefly describe each vulnerability andthe results of our attacks implementation.Mediawiki. Mediawiki is a very popular wiki appli-cation used, among others, by Wikipedia. Mediawikiuses mt rand() in order to generate a new passwordwhen the user requests a password reset. In order topredict the generated password we use the seed recov-ery attack of section 4.3. The function f that we sam-ple is the one used to generate a CSRF token which isthe following:

function generateToken( $salt = ’’ ) {

$token = dechex(mt_rand()).dechex(mt_rand());

return md5( $token . $salt );

}

Our function f given a seed s first seeds themt rand() generator and then uses that generator toproduce a token as the function above. To fully eval-uate the practicality of the attack we implemented theattack online, without any time-space tradeoff. Our im-plementation was able to cover around 1300000 seedevaluations of f per second in a dual-core laptop withtwo 2.3 GHz processors. This allowed us to cover thefull 232 range in about 70 minutes. Of course, usinga time-space tradeoff the search time could be furtherreduced to a few minutes.

Zen cart. Zen-Cart is a popular eCommerce applica-tion. At the time of this writing, a sample databasewhich shops enter volunterily numbers about 2500 ac-tive e-shops 6. In order to reset a user’s password

6www.zen-cart.com/index.php?main_page=showcase

zen-cart first seeds the mt rand() generator with themicrotime() function and then uses the mt rand()

function to produce a new password for the user. Thus,there at most 106 possible passwords which could beproduced. Our exploit used the request twins tech-nique to reset both our password and the target user’spassword. Afterwards, we bruteforced the generatedpassword for our account to recover the microtime()value that produced it. This takes at most a few sec-onds on any modern laptop. Then, our exploit brute-forces the passwords generated by microtime() val-ues close to the one that generated our own new pass-word. We ran our exploit in a network with RTTaround 9 ms, and Zen-Cart was installed in a 4× 2.3GHz server. The average difference of the two pass-words was about 3600 microseconds, and the exploitneeded at most two times that requests since we don’tknow which password was produced first. With therate of 2500 requests per minute that our implementa-tion achieves, the attack is completed in a few minutes.Phorum. Phorum is a classic bulletin board applica-tion. It was used, among others, by the eStream com-petition as an online discussion platform. In order fora user to reset his password the following function isused:

function phorum_gen_password($charpart=4, $numpart=3)

{

$vowels = ... //[char array];

$cons = ... //[char array];

$num_vowels = count($vowels);

$num_cons = count($cons);

$password="";

for($i = 0; $i < $charpart; $i++){

$password .= $cons[mt_rand(0, $num_cons - 1)]

. $vowels[mt_rand(0, $num_vowels - 1)];

}

$password = substr($password, 0, $charpart);

if($numpart){

$max=(int)str_pad("", $numpart, "9");

$min=(int)str_pad("1", $numpart, "0");

$num=(string)mt_rand($min, $max);

}

return strtolower($password.$num);

}

What makes this function interesting in the contextof state recovery is that at if called with no arguments(as it is in the application), at least four mt rand()

leaks are discarded in each call. We implementedthe attack having the application installed in a Win-dows server with the Apache web server and we usedour generic technique for Windows in order to recon-nect to the same process. On average, the attack re-quired around 1100 requests and 11 reconnections ofour client. The running time was about 30 minutes, andthe main source of overhead was the system solving.This fact is mainly explained from the small numberof buckets and the lost leaks of each iteration. Nev-erthless, the attack remained highly practical, as we

14

Page 15: I Forgot Your Password: Randomness Attacks Against PHP

were able to compormise any user account (includingthe administrator) within half an hour.

7 Defending against the Attacks

We believe that a major shortcoming of the PHP coreis that it does not provide a native cryptographicallysecure PRNG and token generator. In fact, a pseu-dorandom function (PRF) would be the most suitablecryptographic primitive for generating random tokensbased on program defined labels; PRF’s can be con-structed by PRNG’s [7]. We feel that this is a short-coming since developers tend to prefer functions fromthe core as they are compatible with every differentenviroment PHP is running in. A possible solutionwould be to introduce a secure PRNG in the PHPcore (as a new function). We proposed this solu-tion to the PHP development team which informed usthat the development overhead would be too big forsupporting such a function and the solution of usingopenssl random pseudo bytes() (which requiresOpenSSL) is their recommendation.

On the other hand, administrators can take a num-ber of precautions to defend against randomness at-tacks using current PHP versions. The Suhosin ex-tension provides a secure seed in the mt rand() andrand() functions. The seed exploits the fact that theMersenne Twister has a large state and fills that stateusing a hash function. Because rand() may have asmall state and is dependent from the operating sys-tem, the Suhosin extension replaces rand() with aMersenne twister generator with a different state frommt rand(). The hashed values of the seed used area concatenation of predictable values such as processidentifiers and timestamps, along with, potentially, un-predictable ones such as memory addresses of vari-ables and input from /dev/urandom. Because theaddresses in any modern operating system are ran-domized through ASLR, as a security precaution, us-ing them as a seed should provide enough additionalentropy to make the two seed attacks (sections 4.2,4.3) infeasible (assuming ASLR addresses are un-predictable). In addition, the suhosin extension ig-nores the calls to the seeding functions mt srand(),

srand() in order to defend against weak seeding fromthe application. Although this may introduce a state re-covery vulnerability, in the majority of our case stud-ies, custom seeding was pretty weak and this mea-sure (of securely seeding once and ignoring applica-tion based reseeding) increases security. We stronglybelieve that securely seeding the generators, when pos-sible, is a very useful exploit mitigation for the attackswe presented. Although state recovery attacks wouldstill be possible, these attacks are more complex thanthe seed attacks which require a handful of requestsand commodity hardware to compromise the applica-tions. Furthermore, creating a secure seed from suchsources has a negligible performance overhead. There-

fore, such measures should be employed by the PHPsystem as safeguards for applications that misuse thePHP core PRNGs.

Our session preimage attack (section 4.1) can bemitigated by utilizing an option (disabled by default)of PHP to add extra entropy, from a file, in the ses-sion identifier. By specifying /dev/urandom as theentropy file, a user can increase the entropy of asession arbitrarily thus making it infeasible for anattacker to obtain a preimage. In Windows, be-cause /dev/urandom is not available this optiongathers entropy using the same algorithm as in theopenssl random pseudo bytes() function. ThePHP developement team informed us that the aboveoption will be enabled by default in the upcoming ver-sion, PHP 5.4.

The above workarounds, if employed, will kill ourseed attacks and the generic process distinguisher wedevised. However, state recovery attacks would still bepossible either through some application specific leak,or using the generic technique described for Windowsoperating systems (section 5.2). In addition, we findthe possibility of the existence of other process distin-guishers very probable; after all, the process identifieris not considered a cryptographic secret and could beleaked either through the application or the web serveror even the operating system itself. Therefore, we feelthat even using these workarounds, one should con-sider state recovery attacks practical.

With the present state of the PHP system, developersshould avoid using directly the PRNGs of the PHP corefor security purposes. Any application that requiresa security token should employ a custom generator,that will either use the functions from the PHP exten-sions such as the openssl random pseudo bytes(),if available, or it will use other entropy sources. Wegive an example of one such function in [1].

8 Related Work

The first randomness attack in PHP that we are awareof appeared in a blog post by Stefan Esser [5, 6], wherehe described basic system properties such as keep-alive connection handling by web server processes,and described how misusing mt srand() could re-sult in security vulnerabilities that he demonstrated insome popular applications. Shortly after, the sameauthor released an update of the Suhosin extensionwhich included the randomness features for strongseeding mentioned above. Our preimage attack onPHP sessions was insipired by an attack introduced bySamy Kamkar [10], in which he described some caseswhere an adversary would be able to guess a PHP ses-sion. However these attacks assumed a side-channelof server information. Finally Gregor Kopf [11] de-scribed, along other attacks, the vulnerability in thepassword reset implementation of Joomla. This workdescribes some type of seed recovery attacks but only

15

Page 16: I Forgot Your Password: Randomness Attacks Against PHP

for the case that a fresh seeding occurs within the PHPscript executed.

9 Conclusions

We find the fact that the most popular programminglanguage in a domain that has a clear need for cryp-tographically strong randomness does not have such agenerator within its core system to be a security hazard.Still, even if such a generator existed in the language,the misuse of other functions would not disappear im-mediately as API misusage is a very common securityproblem in modern systems. Therefore, we believethat research in the practical exploitation of such in-secure functions should be continued and extended toother environments even if they do offer better secu-rity features in their API than PHP. In this paper weexplored the case of PHP installed in the Apache webserver along with mod php. We also showed the ap-plicability of some of our attacks in cgi mode whereeach request is handled by a new process. However,the case of fast cgi needs further investigation as itsbehavior depends highly on its configuration. In addi-tion, it would be interesting to check other languagesand web servers, such as PHP on an IIS web server, orPython and Ruby on Rails web applications in Apache.A problem that is also of theoretical interest is thedevelopment of faster algorithms for recovering trun-cated linear variables and finding an explanation forthe logarithmic barrier we encountered when experi-menting with the Hastad-Shamir framework. To con-clude, despite the fact that linear generators are cryp-tographically insecure, the fact that developers misusethem for security critical features makes the analysisof their practical security within a certain applicationcontext an interesting research question which we be-lieve needs further attention and awareness.

References

[1] George Argyros and Aggelos Kiayias. I forgotyour password: randomness attacks against phpapplications. http://crypto.di.uoa.gr/

CRYPTO.SEC/Randomness_Attacks.html,2012.

[2] Unknown Author.openssl random pseudo bytes() painfullyslow. PHP Bug # 51636, https:

//bugs.php.net/bug.php?id=51636, 2010.

[3] Scott Contini and Igor Shparlinski. On stern’sattack against secret truncated linear congru-ential generators. In Colin Boyd and JuanManuel Gonzalez Nieto, editors, ACISP, vol-ume 3574 of Lecture Notes in Computer Science,pages 52–60. Springer, 2005.

[4] Stefan Esser. Joomla weak random password re-set token vulnerability. SektionEins GmbH, Se-curity Advisory 2008/09/11, 2008.

[5] Stefan Esser. Lesser known security problems inphp applications. In Zend Conference, 2008.

[6] Stefan Esser. mt srand and notso random numbers. http://

www.suspekt.org/2008/08/17/mt_

srand-and-not-so-random-numbers/,2008.

[7] Oded Goldreich, Shafi Goldwasser, and SilvioMicali. How to construct random functions. J.ACM, 33(4):792–807, 1986.

[8] Johan Hastad and Adi Shamir. The cryptographicsecurity of truncated linearly related variables. InRobert Sedgewick, editor, STOC, pages 356–362.ACM, 1985.

[9] Robert ”Hackajar” Imhoff-Dousharm. Eco-nomics of password cracking in the gpu era. InDEFCON 19, 2011.

[10] Samy Kamkar. phpwn: Attacking sessions andpseudo-random numbers in php. In BlackhatUSA, Las Vegas, NV 2010, 2010.

[11] Gregor Kopf. Non-obvious bugs by example.In 27th Chaos Communication Congress CCC,2010.

[12] Henrik Koy and Claus-Peter Schnorr. Segmentlll-reduction of lattice bases. In Joseph H. Sil-verman, editor, CaLC, volume 2146 of Lec-ture Notes in Computer Science, pages 67–80.Springer, 2001.

[13] A.K. Lenstra, H.W.jun. Lenstra, and LaszloLovasz. Factoring polynomials with rational co-efficients. Math. Ann., 261:515–534, 1982.

[14] Arjen K. Lenstra, James P. Hughes, MaximeAugier, Joppe W. Bos, Thorsten Kleinjung, andChristophe Wachter. Ron was wrong, whitis right. IACR Cryptology ePrint Archive,2012:064, 2012.

[15] Makoto Matsumoto and Takuji Nishimura.Mersenne twister: A 623-dimensionally equidis-tributed uniform pseudo-random number genera-tor. ACM Trans. Model. Comput. Simul., 8(1):3–30, 1998.

[16] HD Moore and Valsmith. Tactical exploitation.In DEFCON 15, 2007.

[17] Jacques Stern. Secret linear congruential gener-ators are not cryptographically secure. In FOCS,pages 421–426. IEEE Computer Society, 1987.

16