Classical Hardness of Learning with Errorsweb.eecs.umich.edu/~cpeikert/pubs/lwehard-old.pdfclassical worst-case lattice problems, previously only known under a quantum assumption

Classical Hardness of Learning with Errors

Zvika Brakerski∗ Adeline Langlois † Chris Peikert‡ Oded Regev§ Damien Stehle ¶

Abstract

We show that the Learning with Errors (LWE) problem is classically at least as hard as standardworst-case lattice problems, even with polynomial modulus. Previously this was only known underquantum reductions.

Our techniques capture the tradeoff between the dimension and the modulus of LWE instances, leadingto a much better understanding of the landscape of the problem. The proof is inspired by techniques fromseveral recent cryptographic constructions, most notably fully homomorphic encryption schemes.

1 Introduction

Over the last decade, lattices have emerged as a very attractive foundation for cryptography. The appealof lattice-based primitives stems from the fact that their security can be based on worst-case hardnessassumptions, that they appear to remain secure even against quantum computers, that they can be quiteefficient, and that, somewhat surprisingly, for certain advanced tasks such as fully homomorphic encryptionno other cryptographic assumption is known to suffice.

Virtually all recent lattice-based cryptographic schemes are based directly upon one of two naturalaverage-case problems that have been shown to enjoy worst-case hardness guarantees: the short integersolution (SIS) problem and the learning with errors (LWE) problem. The former dates back to Ajtai’sgroundbreaking work [Ajt96], who showed that it is at least as hard as approximating several worst-caselattice problems, such as the (decision version of the) shortest vector problem, known as GapSVP, to within apolynomial factor in the lattice dimension. This hardness result was tightened in followup work (e.g., [MR04]),leading to a somewhat satisfactory understanding of the hardness of the SIS problem. The SIS problemhas been the foundation for one-way [Ajt96] and collision-resistant hash functions [GGH96], identificationschemes [MV03, Lyu08, KTX08], and digital signatures [GPV08, CHKP10, Boy10, MP12, Lyu12].∗Stanford University, [email protected]. Supported by a Simons Postdoctoral Fellowship and DARPA.†ENS de Lyon and Laboratoire LIP (U. Lyon, CNRS, ENS Lyon, INRIA, UCBL), 46 Allee d’Italie, 69364 Lyon Cedex 07,

France. [email protected].‡School of Computer Science, College of Computing, Georgia Institute of Technology. This material is based upon work

supported by the National Science Foundation under CAREER Award CCF-1054495, by DARPA under agreement number FA8750-11-C-0096, and by the Alfred P. Sloan Foundation. Any opinions, findings, and conclusions or recommendations expressed inthis material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, DARPA or theU.S. Government, or the Sloan Foundation. The U.S. Government is authorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright notation thereon.§Courant Institute, New York University. Supported by a European Research Council (ERC) Starting Grant. Part of the work

done while the author was with the CNRS, DI, ENS, Paris.¶ENS de Lyon and Laboratoire LIP (U. Lyon, CNRS, ENS Lyon, INRIA, UCBL), 46 Allee d’Italie, 69364 Lyon Cedex 07,

France. [email protected]. The author was partly supported by the Australian Research Council Discovery GrantDP110100628.

1

Our focus in this paper is on the latter problem, learning with errors. In this problem our goal is todistinguish with some non-negligible advantage between the following two distributions:

((ai, 〈ai, s〉+ ei mod q))i and ((ai, ui))i ,

where s is chosen uniformly from Znq and so are the ai ∈ Znq , ui are chosen uniformly from Zq, and the“noise” ei ∈ Z is sampled from some distribution supported on small numbers, typically a (discrete) Gaussiandistribution with standard deviation αq for α = o(1).

The LWE problem has proved to be amazingly versatile, serving as the basis for a multitude of crypto-graphic constructions: secure public-key encryption under both chosen-plaintext [Reg05, PVW08, LP11]and chosen-ciphertext [PW08, Pei09, MP12] attacks, oblivious transfer [PVW08], identity-based encryp-tion [GPV08, CHKP10, ABB10a, ABB10b], various forms of leakage-resilient cryptography (e.g., [AGV09,ACPS09, GKPV10]), fully homomorphic encryption [BV11, BGV12, Bra12] (following the seminal workof Gentry [Gen09]), and much more. It was also used to show hardness of learning problems [KS06].

Contrary to the SIS problem, however, the hardness of LWE is not sufficiently well understood. The mainhardness reduction for LWE [Reg05] is similar to the one for SIS mentioned above, except that it is quantum.This means that the existence of an efficient algorithm for LWE, even a classical (i.e., non-quantum) one,only implies the existence of an efficient quantum algorithm for lattice problems. This state of affairs is quiteunsatisfactory: even though one might conjecture that efficient quantum algorithms for lattice problems donot exist, our understanding of quantum algorithms is still at its infancy. It is therefore highly desirable tocome up with a classical hardness reduction for LWE.

Progress in this direction was made by [Pei09] (with some simplifications in the followup by Lyubashevskyand Micciancio [LM09]). The main result there is that LWE with exponential modulus is as hard as somestandard lattice problems using a classical reduction. As that hardness result crucially relies on theexponential modulus, the open question remained as to whether LWE is hard for smaller moduli, in particularpolynomial moduli. In addition to being an interesting question in its own right, this question is of specialimportance since many cryptographic applications, as well as the learning theory result of Klivans andSherstov [KS06], are instantiated in this setting. Some additional evidence that reducing the modulus is afundamental question comes from the Learning Parity with Noise (LPN) problem, which can be seen asLWE with modulus 2 (albeit with a different error distribution), and whose hardness is a long-standing openquestion. We remark that [Pei09] does include a classical hardness of LWE with polynomial modulus, albeitone based on a non-standard lattice problem, whose hardness is arguably as debatable as that of the LWEproblem itself.

To summarize, prior to our work, the existence of an efficient algorithm for LWE with polynomialmodulus was only known to imply an efficient quantum algorithm for lattice problems, or an efficient classicalalgorithm for a non-standard lattice problem. While both consequences are unlikely, they are arguably notas earth-shattering as an efficient classical algorithm for lattice problems. Hence, some concern about thehardness of LWE persisted, tainting the plethora of cryptographic applications based on it.

Main result. We provide the first classical hardness reduction of LWE with polynomial modulus. Ourreduction is the first to show that the existence of an efficient classical algorithm for LWE with any subexpo-nential modulus would indeed have earth-shattering consequences: it would imply an efficient algorithm forworst-case instances of standard lattice problems.

Theorem 1.1 (Informal). Solving n-dimensional LWE with poly(n) modulus implies an equally efficientsolution to a worst-case lattice problem in dimension

√n.

2

As a result, we establish the hardness of all known applications of polynomial-modulus LWE based onclassical worst-case lattice problems, previously only known under a quantum assumption.

Techniques. Even though our main theorem has the flavor of a statement in computational complexity, itsproof crucially relies on a host of ideas coming from recent progress in cryptography, most notably recentbreakthroughs in the construction of fully homomorphic encryption schemes.

At a high level, our main theorem is a “modulus reduction” result: we show a reduction from LWEwith large modulus q and dimension n to LWE with (small) modulus p = poly(n) and dimension n log2 q.Theorem 1.1 now follows from the main result in [Pei09], which shows that the former problem with q = 2n

is as hard as n-dimensional GapSVP. We note that the increase in dimension from n to n log2 q is to beexpected, as it essentially preserves the number of possible secrets (and hence the running time of the naivebrute-force algorithm)..

Very roughly speaking, the main idea in modulus reduction is to map Zq into Zp through the naivemapping that sends any a ∈ 0, . . . , q − 1 to bpa/qc ∈ 0, . . . , p− 1. This basic idea is confounded bytwo issues. The first is that if carried out naively, this transformation introduces rounding artifacts into LWE,ruining the distribution of the output. We resolve this issue by using a more careful Gaussian randomizedrounding procedure (Section 3). A second serious issue is that in order for the rounding errors not to beamplified when multiplied by the LWE secret s, it is essential to assume that s has small coordinates. A majorpart of our reduction (Section 4) is therefore dedicated to showing a reduction from LWE (in dimension n)with arbitrary secret in Znq to LWE (in dimension n log2 q) with a secret chosen uniformly over 0, 1. Thisfollows from a careful hybrid argument (Section 4.3) combined with a hardness reduction to the so-called“extended-LWE” problem, which is a variant of LWE in which we have some control over the error vector(Section 4.2).

We stress that even though our proof is inspired by and has analogues in the cryptographic literature,the details of the reductions are very different. In particular, the idea of modulus reduction plays a keyrole in recent work on fully homomorphic encryption schemes, giving a way to control the noise growthduring homomorphic operations [BV11, BGV12, Bra12]. However, since the goal there is merely to preservethe functionality of the scheme, their modulus reduction can be performed in a rather naive way similar tothe one outlined above, and so the output of their procedure does not constitute a valid LWE instance. Inour reduction we need to perform a much more delicate modulus reduction, which we do using Gaussianrandomized rounding, as mentioned above.

The idea of reducing LWE to have a 0, 1 secret also exists already in the cryptographic literature:precisely such a reduction was shown by Goldwasser et al. [GKPV10] who were motivated by questions inleakage-resilient cryptography. Their reduction, however, incurred a severe blow-up in the noise rate, makingit useless for our purposes. In more detail, not being able to faithfully reproduce the LWE distribution inthe output, they resort to hiding the faults in the output distribution under a huge independent fresh noise,in order to make it close to the correct one. The trouble with this “noise flooding” approach is that theamount of noise one has to add depends on the running time of the algorithm solving the target 0, 1-LWEproblem, which in turn forces the modulus to be equally big. So while in principle we could use the reductionfrom [GKPV10] (and shorten our proof by about a half), this would lead to a qualitatively much weakerresult: the modulus and the approximation ratio for the worst-case lattice problem would both grow with therunning time of the 0, 1-LWE algorithm. In particular, we would not be able to show that for some fixedpolynomial modulus, LWE is a hard problem; instead, in order to capture all polynomial time algorithms,we would have to take a super-polynomial modulus, and rely on the hardness of worst-case lattice problemto within super-polynomial approximation factors. In contrast, with our reduction, the modulus and the

3

approximation ratio both remain fixed independently of the target 0, 1-LWE algorithm.As mentioned above, our alternative to the reduction in [GKPV10] is based on a hybrid argument

combined with a new hardness reduction for the “extended LWE” problem, which is a variant of LWE inwhich in addition to the LWE samples, we also get to see the inner product of the vector of error terms with avector z of our choosing. This problem has its origins in the cryptographic literature, namely in the work ofO’Neill, Peikert, and Waters [OPW11] on (bi)deniable encryption and the later work of Alperin-Sheriff andPeikert [AP12] on key-dependent message security. The hardness reductions included in those papers are notsufficient for our purposes, as they cannot handle large moduli or error terms, which is crucial in our setting.We therefore provide an alternative reduction which is conceptually much simpler, and essentially subsumesboth previous reductions. Our reduction works equally well with exponential moduli and correspondinglylong error vectors, a case earlier reductions could not handle.

Broader perspective. As a byproduct of the proof of Theorem 1.1, we obtain several results that shednew light on the hardness of LWE. Most notably, our modulus reduction result in Section 3 is actually farmore general, and can be used to show a “modulus expansion/dimension reduction” tradeoff. Namely, itshows a reduction from LWE in dimension n and modulus p to LWE in dimension n/k and modulus pk (seeCorollary 3.4). Combined with our modulus reduction, this has the following interesting consequence: thehardness of n-dimensional LWE with modulus q is a function of the quantity n log2 q. In other words, varyingn and q individually while keeping n log2 q fixed essentially preserves the hardness of LWE.

Although we find this statement quite natural (since n log2 q represents the number of bits in the secret),it has some surprising consequences. One is that n-dimensional LWE with modulus 2n is essentially as hardas n2-dimensional LWE with polynomial modulus. As a result, n-dimensional LWE with modulus 2n, whichwas shown in [Pei09] to be as hard as n-dimensional lattice problems using a classical reduction, is actuallyas hard as n2-dimensional lattice problems using a quantum reduction. The latter is presumably a muchharder problem, requiring exp(Ω(n2)) time to solve. This corollary highlights an inherent quadratic lossin the classical reduction of [Pei09] (and as a result also our Theorem 1.1) compared to the quantum onein [Reg05].

A second interesting consequence is that 1-dimensional LWE with modulus 2n is essentially as hardas n-dimensional LWE with polynomial modulus. The 1-dimensional version of LWE is closely relatedto the Hidden Number Problem of Boneh and Venkatesan [BV96]. It is also essentially equivalent tothe Ajtai-Dwork-type [AD97] cryptosystem in [Reg03], as follows from simple reductions similar to theone in the appendix of [Reg10a]. Moreover, the 1-dimensional version can be seen as a special case ofthe Ring-LWE problem introduced in [LPR10] (for ring dimension 1, i.e., ring equal to Z). This allowsus, via the ring switching technique from [GHPS12], to obtain the first hardness proof of Ring-LWE, witharbitrary ring dimension and exponential modulus, under the hardness of problems on general lattices (asopposed to just ideal lattice problems). In addition, this leads to the first hardness proof for the Ring-SISproblem [LM06, PR06] with exponential modulus under the hardness of general lattice problems, via thestandard LWE-to-SIS reduction. (We note that since both results are obtained by scaling up from a ring ofdimension 1, the hardness does not improve as the ring dimension increases.)

A final interesting consequence of our reductions is that (the decision form of) LWE is hard with anarbitrary huge modulus, e.g., a prime; see Corollary 3.3. Previous results (e.g., [Reg05, Pei09, MM11, MP12])required the modulus to be smooth, i.e., all its prime divisors had to be polynomially bounded.

Open questions. As mentioned above, our Theorem 1.1 inherits from [Pei09] a quadratic loss in thedimension, which does not exist in the quantum reduction [Reg05] nor in the known hardness reductions

4

for SIS. At a technical level, this quadratic loss stems from the fact that the reduction in [Pei09] is notiterative. In contrast, the quantum reduction in [Reg05] as well as the reductions for SIS are iterative, and asa result do not incur the quadratic loss. We note that an additional side effect of the non-iterative reductionis that the hardness in Theorem 1.1 and [Pei09] is based only on the worst-case lattice problem GapSVP(and the essentially equivalent BDD and uSVP [LM09]), and not on problems like SIVP, which the quantumreduction of [Reg05] and the hardness reductions for SIS can handle. One case where this is very significantis when dealing with ideal lattices, as in the hardness reduction for Ring-LWE, since GapSVP turns out to bean easy problem there.

We therefore believe that it is important to understand whether there exists a classical reduction that doesnot incur the quadratic loss inherent in [Pei09] and in Theorem 1.1. In other words, is n-dimensional LWE withpolynomial modulus classically as hard as n-dimensional lattice problems (as opposed to

√n-dimensional)?

This would constitute the first full dequantization of the quantum reduction in [Reg05].While it is natural to conjecture that the answer to this question is positive, a negative answer would

be quite tantalizing. In particular, it is conceivable that there exists a (classical) algorithm for LWE withpolynomial modulus running in time 2O(

√n). Due to the quadratic expansion in Theorem 1.1, this would not

lead to a faster classical algorithm for lattice problems; it would, however, lead to a 2O(√n)-time quantum

algorithm for lattice problems using the reduction in [Reg05]. The latter would be a major progress in quantumalgorithms, yet is not entirely unreasonable; in fact, a 2O(

√n)-time quantum algorithm for a somewhat related

quantum task was discovered by Kuperberg [Kup05] (see also [Reg02]).

2 Preliminaries

Let T = R/Z denote the cycle, i.e., the additive group of reals modulo 1. We also denote by Tq its cyclicsubgroup of order q, i.e., the subgroup given by 0, 1/q, . . . , (q − 1)/q.

For two probability distributions P,Q over some discrete domain, we define their statistical distance as∑|P (i)−Q(i)|/2 where i ranges over the distribution domain, and extend this to continuous distributions

in the obvious way. We recall the following easy fact (see, e.g., [AD87, Eq. (2.3)] for a proof).

Claim 2.1. If P and Q are two probability distributions such that P (i) ≥ (1− ε)Q(i) holds for all i, thenthe statistical distance between P and Q is at most ε.

We will use the following immediate corollary of the leftover hash lemma [HILL99].

Lemma 2.2. Let k, n, q ≥ 1 be integers, and ε > 0 be such that n ≥ k log2 q+2 log2(1/ε). For H← Tk×nq ,z← 0, 1n, u← Tkq , the distributions of (H,Hz) and (H,u) are within statistical distance at most ε.

A distinguishing problem P is defined by two distributions P0 and P1, and a solution to the problem isthe ability to distinguish between these distributions. The advantage of an algorithm A with binary output onP is defined as

Adv[A] = |Pr[A(P0)]− Pr[A(P1)]| .

A reduction from a problem P to a problem Q is an efficient (i.e., polynomial-time) algorithm AB thatsolves P given access to an oracle B that solves Q. Most of our reductions (in fact all except the one inLemma 2.13) are what we call “transformation reductions:” these reductions perform some transformation tothe input and then apply the oracle to the result.

5

2.1 Lattices

An n-dimensional (full-rank) lattice Λ ⊆ Rn is the set of all integer linear combinations of some set of nlinearly independent basis vectors B = b1, . . . ,bn ⊆ Rn,

Λ = L(B) =∑i∈[n]

zibi : z ∈ Zn.

The dual lattice of Λ ⊂ Rn is defined as Λ∗ = x ∈ Rn : 〈Λ,x〉 ⊆ Z.The minimum distance (or first successive minimum) λ1(Λ) of a lattice Λ is the length of a shortest

nonzero lattice vector, i.e., λ1(Λ) = min0 6=x∈Λ‖x‖. For an approximation ratio γ = γ(n) ≥ 1, the GapSVPγis the problem of deciding, given a basis B of an n-dimensional lattice Λ = L(B) and a number d, betweenthe case where λ1(L(B)) ≤ d and the case where λ1(L(B)) > γd. We refer to [Kho10, Reg10b] for arecent account on the computational complexity of GapSVPγ .

2.2 Gaussian measures

For r > 0, the n-dimensional Gaussian function ρr : Rn → (0, 1] is defined as

ρr(x) := exp(−π‖x‖2/r2).

We extend this definition to sets, i.e., ρr(A) =∑

x∈A ρr(x) ∈ [0,+∞] for any A ⊆ Rn. The (spherical)continuous Gaussian distribution Dr is the distribution with density function proportional to ρr. Moregenerally, for a matrix B, we denote by DB the distribution of Bx where x is sampled from D1. When B isnonsingular, its probability density function is proportional to

exp(−πxT (BBT )−1x).

A basic fact is that for any matrices B1,B2, the sum of a sample from DB1 and an independent samplefrom DB2 is distributed like DC for C = (B1B

T1 + B2B

T2 )1/2.

For an n-dimensional lattice Λ and a vector u ∈ Rn, we define the discrete Gaussian distribution DΛ+u,r

as the discrete distribution with support on the coset Λ + u whose probability mass function is proportionalto ρr. There exists an efficient procedure that samples within negligible statistical distance of any (not toonarrow) discrete Gaussian distribution ([GPV08, Theorem 4.1]; see also [Pei10]). In the next lemma, provedin Section 5, we modify this sampler so that the output is distributed exactly as a discrete Gaussian. Thisalso allows us to sample from slightly narrower Gaussians. Strictly speaking, the lemma is not needed forour results, and we could use instead the original sampler from [GPV08]. Using our exact sampler leads toslightly cleaner proofs as well as a (miniscule) improvement in the parameters of our reductions, and weinclude it here mainly in the hope that it finds further applications in the future.

Lemma 2.3. There is a probabilistic polynomial-time algorithm that, given a basis B of an n-dimensionallattice Λ = L(B), c ∈ Rn, and a parameter r ≥ ‖B‖ ·

√ln(2n+ 4)/π, outputs a sample distributed

according to DΛ+c,r.

Here, B denotes the Gram-Schmidt orthogonalization of B, and ‖B‖ is the length of the longest vectorin it. We recall the definition of the smoothing parameter from [MR04].

Definition 2.4. For a lattice Λ and positive real ε > 0, the smoothing parameter ηε(Λ) is the smallest reals > 0 such that ρ1/s(Λ

∗ \ 0) ≤ ε.

6

Lemma 2.5 ([GPV08, Lemma 3.1]). For any ε > 0 and n-dimensional lattice Λ with basis B,

ηε(Λ) ≤ ‖B‖√

ln(2n(1 + 1/ε))/π.

We now collect some known facts on Gaussian distributions and lattices.

Lemma 2.6 ([MR04, Lemma 4.1]). For any n-dimensional lattice Λ, ε > 0, r ≥ ηε(Λ), the distribution ofx mod Λ where x← Dr is within statistical distance ε/2 of the uniform distribution on cosets of Λ.

Lemma 2.7 ([Reg05, Claim 3.8]). For any n-dimensional lattice Λ, ε > 0, r ≥ ηε(Λ), and c ∈ Rn, wehave ρr(Λ + c) ∈ [1−ε

1+ε , 1] · ρr(Λ).

Lemma 2.8 ([Reg05, Claim 3.9]). Let Λ be an n-dimensional lattice, let u ∈ Rn be arbitrary, let r, s > 0and let t =

√r2 + s2. Assume that rs/t = 1/

√1/r2 + 1/s2 ≥ ηε(Λ) for some ε < 1/2. Consider the

continuous distribution Y on Rn obtained by sampling from DΛ+u,r and then adding a noise vector takenfrom Ds. Then, the statistical distance between Y and Dt is at most 4ε.

Lemma 2.9 ([Reg05, Corollary 3.10]). Let Λ be an n-dimensional lattice, let u, z ∈ Rn be arbitrary, andlet r, α > 0. Assume that (1/r2 + (‖z‖/α)2)−1/2 ≥ ηε(Λ) for some ε < 1/2. Then the distribution of〈z,v〉+ e where v← DΛ+u,r and e← Dα, is within statistical distance 4ε of Dβ for β =

√(r‖z‖)2 + α2.

Lemma 2.10 (Special case of [Pei10, Theorem 3.1]). Let Λ be a lattice and r, s > 0 be such that s ≥ ηε(Λ)for some ε ≤ 1/2. Then if we choose x from the continuous Gaussian Dr and then choose y from the discreteGaussian DΛ−x,s then x + y is within statistical distance 8ε of the discrete Gaussian DΛ,(r2+s2)1/2 .

2.3 Learning with Errors

For integers n, q ≥ 1, an integer vector s ∈ Zn, and a probability distribution φ on R, let Aq,s,φ be thedistribution over Tnq × T obtained by choosing a ∈ Tnq uniformly at random and an error term e from φ, andoutputting the pair (a, b = 〈a, s〉+ e) ∈ Tnq × T.

Definition 2.11. For integers n, q ≥ 1, an error distribution φ over R, and a distribution D over Zn, the(average-case) decision variant of the LWE problem, denoted LWEn,q,φ(D), is to distinguish given arbitrarilymany independent samples, the uniform distribution over Tnq × T from Aq,s,φ for a fixed s sampled from D.The variant where the algorithm only gets a bounded number of samples m ∈ N is denoted LWEn,m,q,φ(D).

Notice that the distribution Aq,s,φ only depends on s mod q, and so one can assume without loss ofgenerality that s ∈ 0, . . . , q − 1n. Moreover, using a standard random self-reduction, for any distributionover secrets D, one can reduce LWEn,q,φ(D) to LWEn,q,φ(U(0, . . . , q − 1n)), and we will occasionallyuse LWEn,q,φ to denote the latter (as is common in previous work). When the noise is a Gaussian withparameter α > 0, i.e., φ = Dα, we use the shorthand LWEn,q,α(D). We note that by discretizing the errorusing Lemma 2.10 and using the so-called “normal form” of LWE (see [ACPS09]), one can efficiently reduceLWEn,q,α to LWEn,q,α(D) where D is the discrete Gaussian distribution DZn,

√2αq, as long as αq ≥

√n.

Finally, since the case when D is uniform over 0, 1n plays an important role in this paper, we will denote itby binLWEn,q,φ (and by binLWEn,m,q,φ when the algorithm only gets m samples).

7

Unknown (Bounded) Noise Rate. We also consider a variant of LWE in which the amount of noise issome unknown β ≤ α (as opposed to exactly α), with β possibly depending on the secret s. As the followinglemma shows, this does not make the problem significantly harder.

Definition 2.12. For integers n, q ≥ 1 and α ∈ (0, 1), LWEn,q,≤α is the problem of solving LWEn,q,β forany β = β(s) ≤ α.

Lemma 2.13. Let A be an algorithm for LWEn,m,q,α with advantage at least ε > 0. Then there exists analgorithm B for LWEn,m′,q,≤α using oracle access to A and with advantage ≥ 1/3, where both m′ and itsrunning time are poly(m, 1/ε, n, log q).

The proof is standard (see, e.g., [Reg05, Lemma 3.7] for the analogous statement for the search versionof LWE). The idea is to use Chernoff bound to estimate A’s success probability on the uniform distribution,and then add noise in small increments to our given distribution and estimate A’s behavior on the resultingdistributions. If there is a gap between any of these and the uniform behavior, the input distribution is deemednon-uniform. The full proof is omitted.

Relation to Lattice Problems. Regev [Reg05] and Peikert [Pei09] showed quantum and classical reduc-tions (respectively) from the worst-case hardness of the GapSVP problem to the search version of LWE.(We note that the quantum reduction in [Reg05] also shows a reduction from SIVP.) As mentioned in theintroduction, the classical reduction only works when the modulus q is exponential in the dimension n.This is summarized in the following theorem, which is derived from [Reg05, Theorem 3.1] and [Pei09,Theorem 3.1].

Theorem 2.14. Let n, q ≥ 1 be integers and let α ∈ (0, 1) be such that αq ≥ 2√n. Then there exists a

quantum reduction from worst-case n-dimensional GapSVPO(n/α)

to LWEn,q,α. If in addition q ≥ 2n/2 thenthere is also a classical reduction between those problems.

In order to obtain hardness of the decision version of LWE, which is the one we consider throughoutthe paper, one employs a search-to-decision reduction. Several such reductions appear in the literature(e.g., [Reg05, Pei09, MP12]). The most recent reduction by Micciancio and Peikert [MP12], which essentiallysubsumes all previous reductions, requires the modulus q to be smooth. Below we give the special case whenthe modulus is a power of 2, which suffices for our purposes. It follows from our results that (decision) LWEis hard not just for a smooth modulus q, as follows from [MP12], but actually for all moduli q, includingprime moduli, with only a small deterioration in the noise (see Corollaries 3.2 and 3.3).

Theorem 2.15 (Special case of [MP12, Theorem 3.1]). Let q be a power of 2, and α satisfy 1/q < α <1/ω(

√log n). Then there exists an efficient reduction from search LWEn,q,α to (decision) LWEn,q,α′ for

α′ = α · ω(log n).

3 Modulus-Dimension Switching

The main results of this section are Corollaries 3.2 and 3.4 below. Both are special cases of the followingtechnical theorem. We say that a distribution D over Zn is (B, δ)-bounded for some reals B, δ ≥ 0 if theprobability that x← D has norm greater than B is at most δ.

8

Theorem 3.1. Let m,n, n′, q, q′ ≥ 1 be integers, let G ∈ Zn′×n be such that the lattice Λ = 1q′G

TZn′ +Znhas a known basis B, and let D be an arbitrary (B, δ)-bounded distribution over Zn. Let α, β > 0and ε ∈ (0, 1/2) satisfy

β2 ≥ α2 + (4/π) ln(2n(1 + 1/ε)) · (maxq−1, ‖B‖ ·B)2.

Then there is an efficient (transformation) reduction from LWEn,m,q,≤α(D) to LWEn′,m,q′,≤β(G · D) thatreduces the advantage by at most δ + 14εm.

Here we use the notation ‖B‖ from Lemma 2.3. We also note that if needed, the distribution on secretsproduced by the reduction can always be turned into the uniform distribution on Zn′q′ , as mentioned afterDefinition 2.11. Also, we recall that there exists an elementary reduction from LWEn′,q′,≤β to LWEn′,q′,β(see Lemma 2.13).

Here we state two important corollaries of the theorem. The first corresponds to just modulus reduction(the LWE dimension is preserved), and is obtained by letting n′ = n, G = I be the n-dimensional identitymatrix, and B = I/q′. For example, we can take q ≥ q′ ≥

√2 ln(2n(1 + 1/ε)) · (B/α) and β =

√2α,

which corresponds to reducing an arbitrary modulus to almost B/α, while increasing the initial error rate αby just a small constant factor.

Corollary 3.2. For any m,n ≥ 1, q ≥ q′ ≥ 1, (B, δ)-bounded distribution D over Zn, α, β > 0 and ε ∈(0, 1/2) such that

β2 ≥ α2 + (4/π) ln(2n(1 + 1/ε)) · (B/q′)2,

there is an efficient reduction from LWEn,m,q,≤α(D) to LWEn,m,q′,≤β(D) that reduces the advantage by atmost δ + 14εm.

In particular, by using the normal form of LWE, in which the secret has distribution D = DZn,αq, we canswitch to a power-of-2 modulus with only a small loss in the noise rate, as described in the following corollary.Together with the known search-to-decision reduction (Theorem 2.15), this extends the known hardness of(decision) LWE to any modulus q. Here we use that D = DZn,r is (Cr

√n log(n/δ), δ)-bounded for some

universal constant C > 0, which follows by taking union bound over the n coordinates. (Alternatively, onecould use that it is (r

√n, 2−n)-bounded, as follows from [Ban93, Lemma 1.5], leading to a slightly tighter

statement for large n.)

Corollary 3.3. Let α > 0, δ ∈ (0, 1/2), m ≥ n ≥ 1, q′ ≥ 1, and let q ∈ [q′, 2q′) be the smallest power of 2not smaller than q′. There exists an efficient (transformation) reduction from LWEn,m,q,≤α to LWEn,m,q′,≤βwhere

β = Cα√n√

log(n/δ) log(nm/δ)

for some universal constant C > 0, losing at most δ in the advantage.

Another corollary illustrates a modulus-dimension tradeoff. Assume n = kn′ for some k ≥ 1, andlet q′ = qk. Let G = In′ ⊗ g, where g = (1, q, q2, . . . , qk−1)T ∈ Zk. We then have Λ = q−kGTZn′ + Zn.A basis of Λ is given by

B = In′ ⊗

q−1 q−2 · · · q−k

q−1 · · · q1−k

. . ....q−1

∈ Rn×n;

9

this is since the column vectors of B belong to Λ and the determinants match. Orthogonalizing from leftto right, we have B = q−1I and so ‖B‖ = q−1. We therefore obtain the following corollary, showingthat we can trade off the dimension against the modulus, holding n log q = n′ log q′ fixed. For example,lettingD = DZn,αq (corresponding to a secret in normal form), which is (αq

√n, 2−n)-bounded, the reduction

increases the error rate by about a√n factor.

Corollary 3.4. For any n,m, q ≥ 1, k ≥ 1 that divides n, (B, δ)-bounded distribution D over Zn, α, β > 0,and ε ∈ (0, 1/2) such that

β2 ≥ α2 + (4/π) ln(2n(1 + 1/ε)) · (B/q)2,

there is an efficient reduction from LWEn,m,q,≤α(D) to LWEn/k,m,qk,≤β(G · D) that reduces the advantageby at most δ + 14εm, where G = In/k ⊗ (1, q, q2, . . . , qk−1)T .

Theorem 3.1 follows immediately from the following lemma.

Lemma 3.5. Adopt the notation of Theorem 3.1, and let

r ≥ maxq−1, ‖B‖ ·√

2 ln(2n(1 + 1/ε))/π.

There is an efficient mapping from Tnq × T to Tn′q′ × T, which has the following properties:

• If the input is uniformly random, then the output is within statistical distance 4ε from the uniformdistribution.

• If the input is distributed according to Aq,s,Dα for some s ∈ Zn with ‖s‖ ≤ B, then the outputdistribution is within statistical distance 10ε from Aq′,Gs,Dα′

, where (α′)2 = α2 + r2(‖s‖2 +B2) ≤α2 + 2(rB)2.

Proof. The main idea behind the reduction is to encode Tnq into Tn′q′ , so that the mod-1 inner products betweenvectors in Tnq and a short vector s ∈ Zn, and between vectors in Tn′q′ and Gs ∈ Zn′ , are nearly equivalent. Ina bit more detail, the reduction will map its input vector a ∈ Tnq (from the given LWE-or-uniform distribution)to a vector a′ ∈ Tn′q′ , so that

〈a′,Gs〉 = 〈GTa′, s〉 ≈ 〈a, s〉 mod 1

for any (unknown) s ∈ Zn. To do this, it randomly samples a′ so that GTa′ ≈ a mod Zn, where theapproximation error will be a discrete Gaussian of parameter r.

We can now formally define the reduction, which works as follows. On an input pair (a, b) ∈ Tnq × T, itdoes the following:

• Choose f ← DΛ−a,r using Lemma 2.3 with basis B, and let v = a + f ∈ Λ/Zn. (The coset Λ− ais well defined since a = a + Zn is some coset of Zn ⊆ Λ.) Choose a uniformly random solutiona′ ∈ Tn′q′ to the equation GTa′ = v mod Zn. This can be done by computing a basis of the solutionset GTa′ = 0 mod Zn, and adding a uniform element from that set to an arbitrary solution to theequation GTa′ = v mod Zn.

• Choose e′ ← DrB and let b′ = b+ e′ ∈ T.

• Output (a′, b′).

10

We now analyze the reduction. First, if the distribution of the input is uniform, then it suffices to show thata′ is (nearly) uniformly random, because both b and e′ are independent of a′, and b ∈ T is uniform. To provethis claim, notice that it suffices to show that the coset v ∈ Λ/Zn is (nearly) uniformly random, becauseeach v has the same number of solutions a′ to GTa′ = v mod Zn. Next, observe that for any a ∈ Tnq andf ∈ Λ− a, we have by Lemma 2.7 (using that r ≥ ηε(Λ) by Lemma 2.5) that

Pr[a = a ∧ f = f ] = q−n · ρr(f)/ρr(Λ− a)

∈ C[1, 1+ε

1−ε]· ρr(f). (3.1)

where C = q−n/ρr(Λ) is a normalizing value that does not depend on a or f . Therefore, by summing overall a, f satisfying a + f = v, we obtain that for any v ∈ Λ/Zn,

Pr[v = v] ∈ C[1, 1+ε

1−ε]· ρr(q−1Zn + v).

Since r ≥ ηε(q−1Zn) (by Lemma 2.5), Lemma 2.7 implies that Pr[v = v] ∈[

1−ε1+ε ,

1+ε1−ε]C ′ for a constant C ′

that is independent of v. By Claim 2.1, this shows that a′ is within statistical distance 1−((1−ε)/(1+ε))2 ≤4ε of the uniform distribution.

It remains to show that the reduction maps Aq,s,Dα to Aq′,Gs,Dβ . Let the input sample from the formerdistribution be (a, b = 〈a, s〉 + e), where e ← Dα. As argued above, the output a′ is (nearly) uniformover Tn′q′ . So condition now on any fixed value a′ ∈ Tn′q′ of a′, and let v = GTa′ mod Zn. We have

b′ = 〈a, s〉+ e+ e′ = 〈a′,Gs〉+ e+ 〈−f , s〉+ e′ mod 1.

By Claim 2.1 and (3.1) (and noting that if f = f then a = v − f mod Zn), the distribution of −f is withinstatistical distance 1 − (1 − ε)/(1 + ε) ≤ 2ε of Dq−1Zn−v,r. By Lemma 2.9 (using r ≥

√2ηε(q

−1Zn)and ‖s‖ ≤ B), the distribution of 〈−f , s〉 + e′ is within statistical distance 6ε from Dt, where t2 =r2(‖s‖2 +B2). It therefore follows that e+ 〈−f , s〉+ e′ is within statistical distance 6ε from D(t2+α2)1/2 ,as required.

4 Hardness of LWE with Binary Secret

The following is the main theorem of this section.

Theorem 4.1. Let k, q ≥ 1, and m ≥ n ≥ 1 be integers, and let ε ∈ (0, 1/2), α, δ > 0, be suchthat n ≥ (k + 1) log2 q + 2 log2(1/δ), α ≥

√ln(2n(1 + 1/ε))/π/q. There exist three (transformation)

reductions from LWEk,m,q,α to binLWEn,m,q,≤√

10nα, such that for any algorithm for the latter problem withadvantage ζ, at least one of the reductions produces an algorithm for the former problem with advantage atleast

(ζ − δ)/(3m)− 41ε/2−∑

p|q, p prime

p−k−1 . (4.1)

By combining Theorem 4.1 with the reduction in Corollary 3.4 (and noting that 0, 1n is (√n, 0)

bounded), we can replace the binLWE problem above with binLWEn,m,q′,β for any q′ ≥ 1 and ξ > 0 where

β :=

(10nα2 +

4n

πq′2ln(2n(1 + 1/ξ))

)1/2

,

11

while decreasing the advantage in (4.1) by 14ξm. Recalling that LWE of dimension k =√n and modulus

q = 2k/2 (assume k is even) is known to be classically as hard as√n-dimensional lattice problems

(Theorems 2.14 and 2.15), this gives a formal statement of Theorem 1.1. The modulus q′ can be taken almostas small as

√n.

For most purposes the sum over prime factors of q in (4.1) is negligible. For instance, in derivingthe formal statement of Theorem 1.1 above, we used a q that is a power of 2, in which case the sum is2−k−1 = 2−

√n−1, which is negligible. If needed, one can improve this by applying the modulus switching

reduction (Corollary 3.3) before applying Theorem 4.1 in order to make q prime. (Strictly speaking, one alsoneeds to apply Lemma 2.13 to replace the “unknown noise” variant of LWE given by Corollary 3.3 with thefixed noise variant.) This improves the advantage loss to q−

√n−1 which is roughly 2−n.

In a high level, the proof of the theorem follows by combining three main steps. The first, given inSection 4.1, reduces LWE to a variant in which the first equation is errorless. The second, given in Section 4.2,reduces the latter to the intermediate problem extLWE, another variant of LWE in which some informationon the noise elements is leaked. Finally, in Section 4.3, we reduce extLWE to LWE with 0, 1 secret. Wenote that the first reduction is relatively standard; it is the other two that we consider as the main contributionof this section. We now proceed with more details (see also Figure 1).

Proof. First, since m ≥ n, Lemma 4.3 provides a transformation reduction from LWEk,m,q,α to first-is-errorless LWEk+1,n,q,α, while reducing the advantage by at most 2−k+1. Next, Lemma 4.7 with Z = 0, 1n,which is of quality ξ = 2 by Claim 4.6, reduces the latter problem to extLWEk+1,n,q,

√5α,0,1n while reduc-

ing the advantage by at most 33ε/2. Then, Lemma 4.8 reduces the latter problem to extLWEmk+1,n,q,

√5α,0,1n ,

while losing a factor ofm in the advantage. Finally, Lemma 4.9 provides three reductions to binLWEn,m,q,≤√

10nα:two from the latter problem, and one from LWEk+1,m,q,

√5nα, guaranteeing that the sum of advantages is

at least the original advantage minus 4mε + δ. Together with the trivial reduction from LWEk,m,q,α toLWEk+1,m,q,

√5nα (which incurs no loss in advantage), this completes the proof.

binLWEn,m,q,≤√

10nα

LWEk+1,m,q,√

5nαextLWEm

k+1,n,q,√

5α,0,1n

extLWEk+1,n,q,√

5α,0,1n

1st errorless LWEk+1,n,q,α

LWEk,m,q,α

Lemma 4.8

Lemma 4.7

Lemma 4.3

Lemma 4.9

Figure 1: Summary of reductions used in Theorem 4.1

12

4.1 First-is-errorless LWE

We first define a variant of LWE in which the first equation is given without error, and then show in Lemma 4.3that it is still hard.

Definition 4.2. For integers n, q ≥ 1 and an error distribution φ over R, the “first-is-errorless” variantof the LWE problem is to distinguish between the following two scenarios. In the first, the first sample isuniform over Tnq × Tq and the rest are uniform over Tnq × T. In the second, there is an unknown uniformlydistributed s ∈ 0, . . . , q − 1n, the first sample we get is from Aq,s,0 (where 0 denotes the distributionthat is deterministically zero) and the rest are from Aq,s,φ.

Lemma 4.3. For any n ≥ 2, m, q ≥ 1, and error distribution φ, there is an efficient (transformation)reduction from LWEn−1,m,q,φ to the first-is-errorless variant of LWEn,m,q,φ that reduces the advantage by atmost

∑p p−n, with the sum going over all prime factors of q.

Notice that if q is prime the loss in advantage is at most q−n. Alternatively, for any number q we can bound itby ∑

k≥2

k−n ≤ 2−n +

∫ ∞2

t−ndt ≤ 2−n+2,

which might be good enough when n is large.

Proof. The reduction starts by choosing a vector a′ uniformly at random from 0, . . . , q − 1n. Let r be thegreatest common divisor of the coordinates of a′. If it is not coprime to q, we abort. The probability that thishappens is at most ∑

p prime, p|q

p−n.

Assuming we do not abort, we proceed by finding a matrix U ∈ Zn×n that is invertible modulo q andwhose leftmost column is a′. Such a matrix exists, and can be found efficiently. For instance, using theextended GCD algorithm, we find an n × n unimodular matrix R such that Ra′ = (r, 0, . . . , 0)T . ThenR−1 · diag(r, 1, . . . , 1) is the desired matrix. We also pick a uniform element s0 ∈ 0, . . . , q − 1. Thereduction now proceeds as follows. The first sample it outputs is (a′/q, s0/q). The remaining samplesare produced by taking a sample (a, b) from the given oracle, picking a fresh uniformly random d ∈ Tq,and outputting (U(d|a), b + (s0 · d)) with the vertical bar denoting concatenation. It is easy to verifycorrectness: given uniform samples, the reduction outputs uniform samples (with the first sample’s bcomponent uniform over Tq), up to statistical distance 2−n+1; and given samples from Aq,s,φ, the reductionoutputs one sample from Aq,s′,0 and the remaining samples from Aq,s′,φ, up to statistical distance 2−n+1,where s′ = (U−1)T (s0|s) mod q. This proves correctness since U, being invertible modulo q, induces abijection on Znq , and so s′ is uniform in 0, . . . , q − 1n.

4.2 Extended LWE

We next define the intermediate problem extLWE. (This definition is of an easier problem than the oneconsidered in previous work [AP12], which makes our hardness result stronger.)

Definition 4.4. For n,m, q, t ≥ 1, Z ⊆ Zm, and a distribution χ over 1qZ

m, the extLWEtn,m,q,χ,Z problemis as follows. The algorithm gets to choose z ∈ Z and then receives a tuple

(A, (bi)i∈[t], (〈ei, z〉)i∈[t]) ∈ Tn×mq × (Tmq )t × (1qZ)t.

13

Its goal is to distinguish between the following two cases. In the first, A ∈ Tn×mq is chosen uniformly,ei ∈ 1

qZm are chosen from χ, and bi = AT si + ei mod 1 where si ∈ 0, . . . , q− 1n are chosen uniformly.

The second case is identical, except that the bi are chosen uniformly in Tmq independently of everything else.

When t = 1, we omit the superscript t. Also, when χ is Dq−1Zm,α for some α > 0, we replace thesubscript χ by α. We note that a discrete version of LWE can be defined as a special case of extLWE bysetting Z = 0m. We next define a measure of quality of sets Z .

Definition 4.5. For a real ξ > 0 and a set Z ⊆ Zm we say that Z is of quality ξ if given any z ∈ Z , we canefficiently find a unimodular matrix U ∈ Zm×m such that if U′ ∈ Zm×(m−1) is the matrix obtained from Uby removing its leftmost column then all of the columns of U′ are orthogonal to z and its largest singularvalue is at most ξ.

The idea in this definition is that the columns of U′ form a basis of the lattice of integer points that areorthogonal to z, i.e., the lattice b ∈ Zm : 〈b, z〉 = 0. The quality measures how “short” we can make thisbasis.

Claim 4.6. The set Z = 0, 1m is of quality 2.

Proof. Let z ∈ Z and assume without loss of generality that its first k ≥ 1 coordinates are 1 and theremaining m− k are 0. Then consider the upper bidiagonal matrix U whose diagonal is all 1s and whosediagonal above the main diagonal is (−1, . . . ,−1, 0, . . . , 0) with −1 appearing k − 1 times. The matrix isclearly unimodular and all the columns except the first one are orthogonal to z. Moreover, by the triangleinequality, we can bound the operator norm of U by the sum of that of the diagonal 1 matrix and theoff-diagonal matrix, both of which clearly have norm at most 1.

Lemma 4.7. Let Z ⊆ Zm be of quality ξ > 0. Then for any n, q ≥ 1, ε ∈ (0, 1/2), and α, r ≥ (ln(2m(1 +1/ε))/π)1/2/q, there is a (transformation) reduction from the first-is-errorless variant of LWEn,m,q,α toextLWEn,m,q,(α2ξ2+r2)1/2,Z that reduces the advantage by at most 33ε/2.

Proof. We first describe the reduction. Assume we are asked to provide samples for some z ∈ Z . Wecompute a unimodular U ∈ Zm×m for z as in Definition 4.5, and let U′ ∈ Zm×(m−1) be the matrix formedby removing the first column of U. We then take m samples from the given distribution, resulting in(A,b) ∈ Tn×mq × (Tq × Tm−1). We also sample a vector f from the m-dimensional continuous Gaussiandistribution Dα(ξ2I−U′U′T )1/2 , which is well defined since ξ2I−U′U′T is a positive semidefinite matrix byour assumption on U. The output of the reduction is the tuple

(A′ = AUT ,b′ + c, 〈z, f + c〉) ∈ Tn×mq × Tmq × 1qZ, (4.2)

where b′ = Ub+ f , and c is chosen from the discrete Gaussian distribution Dq−1Zm−b′,r (using Lemma 2.3).(As before, notice that the coset q−1Zm − b′ is well defined because b′ is a coset of Zm ⊆ q−1Zm.)

We now prove the correctness of the reduction. Consider first the case that we get valid LWE equations,i.e., A is uniform in Tn×mq and b = AT s+ e ∈ Tm where s ∈ 0, . . . , q− 1n is uniformly chosen, the firstcoordinate of e ∈ Rm is 0, and the remaining m− 1 coordinates are chosen from Dα. Since U is unimodular,A′ = AUT is uniformly distributed in Tn×mq as required. From now on we condition on an arbitrary A′ andanalyze the distribution of the remaining two components of (4.2). Next,

b′ = Ub + f = A′T s + Ue + f .

14

Since Ue is distributed as a continuous Gaussian DαU′ , the vector Ue + f is a distributed as a sphericalcontinuous Gaussian Dαξ. Moreover, since A′T s ∈ Tmq , the coset q−1Zm − b′ is identical to q−1Zm −(Ue + f), so we can see c as being chosen from Dq−1Zm−(Ue+f),r. Therefore, by Lemma 2.10 and usingthat r ≥ ηε(q

−1Zm) by Lemma 2.5, the distribution of Ue + f + c is within statistical distance 8ε ofDq−1Zm,(α2ξ2+r2)1/2 . This shows that the second component (4.2) is also distributed correctly. Finally, forthe third component, by our assumption on U and the fact that the first coordinate of e is zero,

〈z, f + c〉 = 〈z,Ue + f + c〉,

and so the third component gives the inner product of the noise with z, as desired.We now consider the case where the input is uniform, i.e., that A is uniform in Tn×mq and b is independent

and uniform in Tq × Tm−1. We first observe that by Lemma 2.6, since α ≥ ηε/m(q−1Z) (by Lemma 2.5),the distribution of (A,b) is within statistical distance ε/2 of the distribution of (A, e′ + e) where e′ ischosen uniformly in Tmq , the first coordinate of e is zero, and its remaining m− 1 coordinates are chosenindependently from Dα. So from now on assume our input is (A, e′ + e). The first component of (4.2)is uniform in Tn×mq as before, and moreover, it is clearly independent of the other two. Moreover, sinceb′ = Ue′ + Ue + f and Ue′ ∈ Tmq , the coset q−1Zm − b′ is identical to q−1Zm − (Ue + f), and so cis distributed identically to the case of a valid LWE equation, and in particular is independent of e′. Thisestablishes that the third component of (4.2) is correctly distributed; moreover, since e′ is independent ofthe first and third components, and Ue′ is uniform in Tmq (since U is unimodular), we get that the secondcomponent is uniform and independent of the other two, as desired.

We end this section by stating the standard reduction to the multi-secret (t ≥ 1) case of extended LWE.

Lemma 4.8. Let n,m, q, χ,Z be as in Definition 4.4 with χ efficiently sampleable, and let t ≥ 1 be aninteger. Then there is an efficient (transformation) reduction from extLWEn,m,q,χ,Z to extLWEtn,m,q,χ,Z thatshrinks the advantage by a factor of t.

The proof is by a standard hybrid argument. We bring it here for the sake of completeness. We note thatthe distribution of the secret vector s needs to be sampleable but otherwise it plays no role in the proof. Thelemma therefore naturally extends to any (sampleable) distribution of s.

Proof. Let A be an algorithm for extLWEtn,m,q,χ,Z , let z be the vector output by A in the first step (note thatthis is a random variable) and let Hi denote the distribution(

A, b1, . . . ,bi,ui+1, . . . ,ut, z, 〈z, ei〉i∈[t]

),

where ui+1, . . . ,ut are sampled independently and uniformly in Tmq . Then by definition Adv[A] =|Pr[A(H0)]− Pr[A(Ht)]|.

We now describe an algorithm B for extLWEn,m,q,χ,Z : First, B runs A to obtain z and sends it to thechallenger as its own z. Then, given an input (A,d, z, y) for extLWEn,m,q,χ,Z , the distinguisher B samplesi∗ ← [t], and in addition s1, . . . , si∗−1 ← Znq , e1, . . . , ei∗−1, ei∗+1, . . . , et ← χm, ui∗+1, . . . ,ut ← Tmq . Itsets bi = AT · si + ei (mod 1), and sends the following to A:

(A, b1, . . . ,bi∗−1,d,ui∗+1, . . . ,ut, z, 〈z, e1〉, . . . , 〈z, ei∗−1〉, y, 〈z, ei∗+1〉, . . . , 〈z, et〉) .

Finally, B outputs the same output as A did.

15

Note that when the input to B is distributed as P0 = (A,b, z, zT ·e) with b = AT · s+e (mod 1), thenB feeds A with exactly the distribution Hi∗ . On the other hand, if the input to B is P1 = (A,u, z, zT · e)with u← Tmq , then B feeds A with Hi∗−1.

Since i∗ is uniform in [t], we get that

tAdv[B] = t |Pr[B(P0)]− Pr[B(P1)]|

=

∣∣∣∣∑i∗∈[t]

Pr[A(Hi∗)]−∑i∗∈[t]

Pr[A(Hi∗−1)]

∣∣∣∣= |Pr[A(Ht)]− Pr[A(H0)]|= Adv[A] ,

and the result follows.

4.3 Reducing to binary secret

Lemma 4.9. Let k, n,m, q ∈ N, ε ∈ (0, 1/2), and δ, α, β, γ > 0 be such that n ≥ k log2 q + 2 log2(1/δ),β ≥

√2 ln(2n(1 + 1/ε))/π/q, α =

√2nβ, γ =

√nβ. Then there exist three efficient (transformation)

reductions to binLWEn,m,q,≤α from extLWEmk,n,q,β,0,1n , LWEk,m,q,γ , and extLWEmk,n,q,β,0n, such that ifB1, B2, and B3 are the algorithms obtained by applying these reductions (respectively) to an algorithm A,then

Adv[A] ≤ Adv[B1] + Adv[B2] + Adv[B3] + 4mε+ δ .

Pointing out the trivial (transformation) reduction from extLWEmk,n,q,β,0,1n to extLWEmk,n,q,β,0n,the lemma implies the hardness of binLWEn,m,q,≤α based on the hardness of extLWEmk,n,q,β,0,1n andLWEk,m,q,γ .

We note that our proof is actually more general, and holds for any binary distribution of min-entropy atleast k log2 q + 2 log2(1/δ), and not just a uniform binary secret as in the definition of binLWE.

Proof. The proof follows by a sequence of hybrids. Let k, n,m, q, ε, α, β, γ be as in the lemma statement.We consider z ← 0, 1n and e ← Dm

α′ for α′ =√β2‖z‖2 + γ2 ≤

√2nβ = α. In addition, we let

A← Tn×mq , u← Tm, and define b:=AT · z + e (mod 1). We consider an algorithm A that distinguishesbetween (A,b) and (A,u).

We let H0 denote the distribution (A,b) and H1 the distribution

H1 = (A,AT z−NT z + e mod 1),

where N← Dn×mq−1Z,β and e← Dm

γ . Using ‖z‖ ≤√n and that β ≥

√2ηε(Zn)/q (by Lemma 2.5), it follows

by Lemma 2.9 that the statistical distance between −NT z + e and Dmα′ is at most 4mε. It thus follows that

|Pr[A(H0)]− Pr[A(H1)]| ≤ 4mε . (4.3)

We define a distributionH2 as follows. Let B← Tk×mq and C← Tk×nq . Let A:=qCT ·B+N (mod 1).Finally,

H2 = (A, AT · z−NT z + e) = (A, qBT ·C · z + e) .

We now argue that there exists an adversary B1 for problem extLWEmk,n,q,β,0,1n , such that

Adv[B1] = |Pr[A(H1)]− Pr[A(H2)]| . (4.4)

16

This is because H1, H2 can be viewed as applying the same efficient transformation on the distributions(C,A,NT z) and (C, A,NT z) respectively. Since distinguishing the latter distributions is exactly theextLWEmk,n,q,β,0,1n problem (where the columns of q · B are interpreted as the m secret vectors), thedistinguisher B1 follows by first applying the aforementioned transformation and then applying A.

For the next hybrid, we define H3 = (A,BT · s + e), for s← Zkq . It follows that

|Pr[A(H2)]− Pr[A(H3)]| ≤ δ (4.5)

by the leftover hash lemma (see Lemma 2.2), since H2, H3 can be derived from (C, qC · z) and (C, s)respectively, whose statistical distance is at most δ.

Our next hybrid makes the second component uniform: H4 = (A,u). There exists an algorithm B2 forLWEk,m,q,γ such that

Adv[B2] = |Pr[A(H3)]− Pr[A(H4)]| , (4.6)

since H3, H4 can be computed efficiently from (B,BT s + e), (B,u).Lastly, we change A back to uniform: H5 = (A,u). There exists an algorithm B3 for extLWEmk,n,q,β,0n

such thatAdv[B3] = |Pr[A(H4)]− Pr[A(H5)]| . (4.7)

Eq. (4.7) is derived very similarly to Eq. (4.4): We notice that H4, H5 can be viewed as applying the sameefficient transformation on the distributions (C, A) and (C,A) respectively. Since distinguishing the latterdistributions is exactly the extLWEmk,n,q,β,0n problem (where the columns of q ·B are interpreted as the msecret vectors), the distinguisher B3 follows by first applying the aforementioned transformation and thenapplying A.

Putting together Eq. (4.3), (4.4), (4.5), (4.6), (4.7), the lemma follows.

5 Exact Gaussian Sampler

In this section we prove Lemma 2.3. As in [GPV08], the proof consists of two parts. In the first we considerthe one-dimensional case, and in the second we use it recursively to sample from arbitrary lattices. Ourone-dimensional sampler is based on rejection sampling, just like the one in [GPV08]. Unlike [GPV08], weuse the continuous normal distribution as the source distribution which allows us to avoid truncation, and as aresult obtain an exact sample. Our second part uses the same recursive routine as in [GPV08], but adds arejection sampling step to it in order to take care of the deviation of its output from the desired distribution.

Before proceeding to the proof, we remark that another approach towards improving the samplerof [GPV08] would consist in adapting the Markov chain Monte Carlo sampler of Sinclair and Jerrum [SJ89](see also [HS10]). We expect that this would allow to improve the bound on r in Lemma 2.3 to ‖B‖ at theexpense of a larger (but still polynomial) running time.1 Because this extra effort seems to offer small gain,we do not pursue this direction further here.

5.1 The one-dimensional case

Here we show how to sample from the discrete Gaussian distribution on arbitrary cosets of one-dimensionallattices. We use a standard rejection sampling procedure (see, e.g. [Dev86, Page 117] for a very similarprocedure).

1We thank Ronnie Barequet for discussions that led to this idea.

17

By scaling, we can restrict without loss of generality to the lattice Z, i.e., we consider the task of samplingfrom DZ+c,r for a given coset representative c ∈ [0, 1) and parameter r > 0. The sampling procedure is asfollows. Let Z0 =

∫∞c ρr(x)dx, and Z1 =

∫ c−1−∞ ρr(x)dx. These two numbers can be computed efficiently

by expressing them in terms of the error function. Let Z = Z0 + Z1 + ρr(c) + ρr(c − 1). The algorithmrepeats the following until it outputs an answer:

• With probability ρr(c)/Z it outputs c;

• With probability ρr(c− 1)/Z it outputs c− 1;

• With probability Z0/Z it chooses x from the restriction of the continuous normal distribution Dr tothe interval [c,∞). Let y be the smallest element in Z + c that is larger than x. With probabilityρr(y)/ρr(x) output y, and otherwise repeat;

• With probability Z1/Z it chooses x from the restriction of the continuous normal distribution Dr tothe interval (−∞, c− 1]. Let y be the largest element in Z + c that is smaller than x. With probabilityρr(y)/ρr(x) output y, and otherwise repeat.

Consider now one iteration of the procedure. The probability of outputting c is ρr(c)/Z, that of outputtingc− 1 is ρr(c− 1)/Z, that of outputting c+ k for some k ≥ 1 is

Z0

Z· 1

Z0

∫ c+k

c+k−1ρr(x) · ρr(c+ k)

ρr(x)dx =

ρr(c+ k)

Z,

and similarly, that of outputting c−1−k for some k ≥ 1 is ρr(c−1−k)/Z. From this it follows immediatelythat conditioned on outputting something, the output distribution has support on Z + c and probability massfunction proportional to ρr, and is therefore the desired discrete Gaussian distribution DZ+c,r. Moreover, theprobability of outputting something is

ρr(Z + c)

Z=

ρr(Z + c)

Z0 + Z1 + ρr(c) + ρr(c− 1)≥ ρr(Z + c)

ρr(Z + c) + ρr(c) + ρr(c− 1)≥ 1

2.

Therefore at each iteration the procedure has probability of at least 1/2 to terminate. As a result, theprobability that the number of iterations is greater than t is at most 2−t, and in particular, the expected numberof iterations is at most 2.

5.2 The general case

For completeness, we start by recalling the SampleD procedure described in [GPV08]. This is a recursiveprocedure that gets as input a basis B = (b1, . . . ,bn) of an n-dimensional lattice Λ = L(B), a parameterr > 0, and a vector c ∈ Rn, and outputs a vector in Λ + c whose distribution is close to that of DΛ+c,r.Let b1, . . . , bn be the Gram-Schmidt orthogonalization of b1, . . . ,bn, and let b1, . . . ,bn be the normalizedGram-Schmidt vectors, i.e., bi = bi/‖bi‖. The procedure is the following.

1. Let cn ← c. For i← n, . . . , 1, do:

(a) Choose vi from D‖bi‖Z+〈ci,bi〉,rusing the exact one-dimensional sampler.

(b) Let ci−1 ← ci + (vi − 〈ci,bi〉) · bi/‖bi‖ − vibi.

18

2. Output v :=∑n

i=1 vibi.

It is easy to verify that the procedure always outputs vectors in the coset Λ + c. Moreover, the probabilityof outputting any v ∈ Λ + c is

n∏i=1

ρr(vi)

ρr(‖bi‖Z + 〈ci,bi〉)=

ρr(v)∏ni=1 ρr(‖bi‖Z + 〈ci,bi〉)

,

where ci are the values computed in the procedure when it outputs v. Notice that by Lemma 2.5 and ourassumption on r, we have that r ≥ η1/(n+1)(‖bi‖Z) for all i. Therefore, by Lemma 2.7, we have that for allc ∈ R,

ρr(‖bi‖Z + c) ∈[1− 2

n+ 2, 1

]ρr(‖bi‖Z).

In order to get an exact sample, we combine the above procedure with rejection sampling. Namely, weapply SampleD to obtain some vector v. We then output v with probability∏n

i=1 ρr(‖bi‖Z + 〈ci,bi〉)∏ni=1 ρr(‖bi‖Z)

∈((

1− 2

n+ 2

)n, 1

]⊆ (e−2, 1], (5.1)

and otherwise repeat. This probability can be efficiently computed, as we will show below. As a result, in anygiven iteration the probability of outputting the vector v ∈ Λ + c is

ρr(v)∏ni=1 ρr(‖bi‖Z)

.

Since the denominator is independent of v, we obtain that in any given iteration, conditioned on outputtingsomething, the output is distributed according to the desired distribution DΛ+c,r, and therefore this is also theoverall output distribution of our sampler. Moreover, by (5.1), the probability of outputting something in anygiven iteration is at least e−2, and therefore, the probability that the number of iterations is greater than t is atmost (1− e−2)t, and in particular, the expected number of iterations is at most e2.

It remains to show how to efficiently compute the probability in (5.1). By scaling, it suffices to show howto compute

ρr(Z + c) =∑k∈Z

exp(−π(k + c)2/r2)

for any r > 0 and c ∈ [0, 1). If r < 1, the sum decays very fast, and we can achieve any desired t bits ofaccuracy in time poly(t), which agrees with our notion of efficiently computing a real number (following, e.g.,the treatment in [Lov86, Section 1.4]). For r ≥ 1, we use the Poisson summation formula (see, e.g., [MR04,Lemma 2.8]) to write

ρr(Z + c) = r ·∑k∈Z

exp(−πk2r2 + 2πick) = r ·∑k∈Z

exp(−πk2r2) cos(2πck),

which again decays fast enough so we can compute it to within any desired t bits of accuracy in time poly(t).

Acknowledgments: We thank Elette Boyle, Adam Klivans, Vadim Lyubashevsky, Sasha Sherstov, VinodVaikuntanathan and Gilles Villard for useful discussions.

19

References

[ABB10a] S. Agrawal, D. Boneh, and X. Boyen. Efficient lattice (H)IBE in the standard model. InEUROCRYPT, pages 553–572. 2010.

[ABB10b] S. Agrawal, D. Boneh, and X. Boyen. Lattice basis delegation in fixed dimension and shorter-ciphertext hierarchical IBE. In CRYPTO, pages 98–115. 2010.

[ACPS09] B. Applebaum, D. Cash, C. Peikert, and A. Sahai. Fast cryptographic primitives and circular-secure encryption based on hard learning problems. In CRYPTO, pages 595–618. 2009.

[AD87] D. Aldous and P. Diaconis. Strong uniform times and finite random walks. Adv. in Appl. Math.,8(1):69–97, 1987. ISSN 0196-8858. doi:10.1016/0196-8858(87)90006-6.

[AD97] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case equivalence.In STOC, pages 284–293. 1997.

[AGV09] A. Akavia, S. Goldwasser, and V. Vaikuntanathan. Simultaneous hardcore bits and cryptographyagainst memory attacks. In TCC, pages 474–495. 2009.

[Ajt96] M. Ajtai. Generating hard instances of lattice problems. In Complexity of computations andproofs, volume 13 of Quad. Mat., pages 1–32. Dept. Math., Seconda Univ. Napoli, Caserta, 2004.Preliminary version in STOC 1996.

[AP12] J. Alperin-Sheriff and C. Peikert. Circular and KDM security for identity-based encryption. InPublic Key Cryptography, pages 334–352. 2012.

[Ban93] W. Banaszczyk. New bounds in some transference theorems in the geometry of numbers.Mathematische Annalen, 296(4):625–635, 1993.

[BGV12] Z. Brakerski, C. Gentry, and V. Vaikuntanathan. (Leveled) fully homomorphic encryption withoutbootstrapping. In ITCS, pages 309–325. 2012.

[Boy10] X. Boyen. Lattice mixing and vanishing trapdoors: A framework for fully secure short signaturesand more. In Public Key Cryptography, pages 499–517. 2010.

[Bra12] Z. Brakerski. Fully homomorphic encryption without modulus switching from classical GapSVP.In CRYPTO, pages 868–886. 2012.

[BV96] D. Boneh and R. Venkatesan. Hardness of computing the most significant bits of secret keys inDiffie-Hellman and related schemes. In N. Koblitz, editor, CRYPTO, volume 1109 of LectureNotes in Computer Science, pages 129–142. Springer, 1996. ISBN 3-540-61512-1.

[BV11] Z. Brakerski and V. Vaikuntanathan. Efficient fully homomorphic encryption from (standard)LWE. In FOCS, pages 97–106. 2011.

[CHKP10] D. Cash, D. Hofheinz, E. Kiltz, and C. Peikert. Bonsai trees, or how to delegate a lattice basis.In EUROCRYPT, pages 523–552. 2010.

[Dev86] L. Devroye. Nonuniform random variate generation. Springer-Verlag, New York, 1986. Availableat http://luc.devroye.org/rnbookindex.html.

20

http://luc.devroye.org/rnbookindex.html

[Gen09] C. Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages 169–178. 2009.

[GGH96] O. Goldreich, S. Goldwasser, and S. Halevi. Collision-free hashing from lattice problems.Electronic Colloquium on Computational Complexity (ECCC), 3(42), 1996.

[GHPS12] C. Gentry, S. Halevi, C. Peikert, and N. P. Smart. Ring switching in BGV-style homomorphicencryption. In SCN, pages 19–37. 2012.

[GKPV10] S. Goldwasser, Y. T. Kalai, C. Peikert, and V. Vaikuntanathan. Robustness of the learning witherrors assumption. In ICS, pages 230–240. 2010.

[GPV08] C. Gentry, C. Peikert, and V. Vaikuntanathan. Trapdoors for hard lattices and new cryptographicconstructions. In STOC, pages 197–206. 2008.

[HILL99] J. Hastad, R. Impagliazzo, L. A. Levin, and M. Luby. A pseudorandom generator from anyone-way function. SIAM J. Comput., 28(4):1364–1396, 1999.

[HS10] T. P. Hayes and A. Sinclair. Liftings of tree-structured Markov chains. In APPROX-RANDOM,pages 602–616. 2010.

[Kho10] S. Khot. Inapproximability results for computational problems on lattices. In P. Nguyen andB. Vallee, editors, The LLL Algorithm: Survey and Applications. Springer-Verlag, New York,2010.

[KS06] A. R. Klivans and A. A. Sherstov. Cryptographic hardness for learning intersections of halfspaces.In FOCS, pages 553–562. 2006.

[KTX08] A. Kawachi, K. Tanaka, and K. Xagawa. Concurrently secure identification schemes based onthe worst-case hardness of lattice problems. In ASIACRYPT, pages 372–389. 2008.

[Kup05] G. Kuperberg. A subexponential-time quantum algorithm for the dihedral hidden subgroupproblem. SIAM J. Comput., 35(1):170–188, 2005.

[LM06] V. Lyubashevsky and D. Micciancio. Generalized compact knapsacks are collision resistant. InICALP (2), pages 144–155. 2006.

[LM09] V. Lyubashevsky and D. Micciancio. On bounded distance decoding, unique shortest vectors,and the minimum distance problem. In CRYPTO, pages 577–594. 2009.

[Lov86] L. Lovasz. An algorithmic theory of numbers, graphs and convexity, volume 50 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and AppliedMathematics (SIAM), Philadelphia, PA, 1986.

[LP11] R. Lindner and C. Peikert. Better key sizes (and attacks) for LWE-based encryption. In CT-RSA,pages 319–339. 2011.

[LPR10] V. Lyubashevsky, C. Peikert, and O. Regev. On ideal lattices and learning with errors over rings.In EUROCRYPT, pages 1–23. 2010.

[Lyu08] V. Lyubashevsky. Lattice-based identification schemes secure under active attacks. In PublicKey Cryptography, pages 162–179. 2008.

21

[Lyu12] V. Lyubashevsky. Lattice signatures without trapdoors. In EUROCRYPT, pages 738–755. 2012.

[MM11] D. Micciancio and P. Mol. Pseudorandom knapsacks and the sample complexity of LWEsearch-to-decision reductions. In CRYPTO, pages 465–484. 2011.

[MP12] D. Micciancio and C. Peikert. Trapdoors for lattices: Simpler, tighter, faster, smaller. InEUROCRYPT, pages 700–718. 2012.

[MR04] D. Micciancio and O. Regev. Worst-case to average-case reductions based on Gaussian measures.SIAM J. Comput., 37(1):267–302, 2007. Preliminary version in FOCS 2004.

[MV03] D. Micciancio and S. P. Vadhan. Statistical zero-knowledge proofs with efficient provers: Latticeproblems and more. In CRYPTO, pages 282–298. 2003.

[OPW11] A. O’Neill, C. Peikert, and B. Waters. Bi-deniable public-key encryption. In CRYPTO, pages525–542. 2011.

[Pei09] C. Peikert. Public-key cryptosystems from the worst-case shortest vector problem. In STOC,pages 333–342. 2009.

[Pei10] C. Peikert. An efficient and parallel Gaussian sampler for lattices. In CRYPTO, pages 80–97.2010.

[PR06] C. Peikert and A. Rosen. Efficient collision-resistant hashing from worst-case assumptions oncyclic lattices. In TCC, pages 145–166. 2006.

[PVW08] C. Peikert, V. Vaikuntanathan, and B. Waters. A framework for efficient and composableoblivious transfer. In CRYPTO, pages 554–571. 2008.

[PW08] C. Peikert and B. Waters. Lossy trapdoor functions and their applications. In STOC, pages187–196. 2008.

[Reg02] O. Regev. Quantum computation and lattice problems. SIAM J. Comput., 33(3):738–760, 2004.Preliminary version in FOCS 2002.

[Reg03] O. Regev. New lattice-based cryptographic constructions. J. ACM, 51(6):899–942, 2004.Preliminary version in STOC 2003.

[Reg05] O. Regev. On lattices, learning with errors, random linear codes, and cryptography. J. ACM,56(6):1–40, 2009. Preliminary version in STOC 2005.

[Reg10a] O. Regev. The learning with errors problem. In Proc. of 25th IEEE Annual Conference onComputational Complexity (CCC), pages 191–204. 2010.

[Reg10b] O. Regev. On the complexity of lattice problems with polynomial approximation factors. InP. Nguyen and B. Vallee, editors, The LLL Algorithm: Survey and Applications. Springer-Verlag,New York, 2010.

[SJ89] A. Sinclair and M. Jerrum. Approximate counting, uniform generation and rapidly mixingMarkov chains. Inform. and Comput., 82(1):93–133, 1989.

22

Documents

Classical Hardness of Learning with Errorsweb.eecs.umich.edu/~cpeikert/pubs/lwehard-old.pdfclassical worst-case lattice problems, previously only known under a quantum assumption