“fermat” and (“last theorem” or “great theorem”)

Secure Computation ofConstant-Depth Circuits

with Applications to

Database Search Problems

Omer Barkol Yuval Ishai

Technion

“fermat” and (“last theorem” or “great theorem”)

Server

Motivation: private database search

Article on Fermat’s Last Theorem

Client

q D

f(q,D)

PIR [CGKS95]: f(q,D)=Dq

What is he working on?

OT/SPIR

q?

D?

Want:

•Server work: O(|D|)

•Client work: O(|q|)

•Communication: O(|q|)

omerb

- Think about the examle.- "encryption doesn't protect vs. corupt party."- Use the word feature for "server privacy". Define the latter.- After stating the efficiency requrement say that "In the PIR setting these requirements are fulfilled"Our main motivating application is the case of private database search.Assume there is a person who is secretly working on Fermat Last Theorem. He wants to send a query to Google without anyone knowing what is he after. Notice that encryption doesn't protect vs. corupt party. Formally, assume a client holding a query q and a large database D. Suppose the client wants to privately compute f(q,D) without revealing q. For the special case where f is D_q this is PIR. One can also consider adding the feature of server privacy, where the client learns nothing but f(q,D). In the PIR setting this is called OT. Idially, we would like to have the work of each party to be at most linear in its input and the communication should be linear in the smaller input. These efficiency goals are fulfilled in the PIR setting.

Current approaches Dq

f(q,D)

Benchmark: partial match?

• Send all of D to the client• Too much communication (|D|)• No server privacy

• Use general purpose secure computation [Yao86,GMW87]

• Communication > circuit size > |D|

• Use PIR as a building block:– PIR + data-structures [CGN97,FIPR05,OS05]

• Applies to a very limited class of problems: – set membership / keyword search – approximate nearest neighbor

– Communication preserving protocol compiler [NN01]

• Generally requires exponential computation

Nothing

f( *1*0 , 0010 0110 1111 )=1

Oh no! This might take me 7 years!

omerb

- For general purpose add that it provide server privacy but with even more communication- When giving basic problem can add that also google search problem such as given in first slide has also nothing.Going over existing approaches: as I mensioned the server could send the whole database but this would exceed the communication limits. Alternativly, one can use general purpose secure computation protocols such as Yao or GMW, which gives also server privacy but involves an even higher communication. A different approach is to use PIR as a building block,by using an efficient datastructure for a problem and search it using PIR. A data structure algorithm allows a client search the database by a small amount of probes.This works only for problems that have good data-structures (can add that for some of these computational complexity is still highly in-feasable). Alternatively, Naor and Nissim scheme of compiling non-private protocols into private ones with efficient communication would generally requires the computation to be exponential. So, lets look what can be done, for example, in the case of the basic but important problem of partial match. Here the database is a set of points, the query consists of a point of the same legth but some of its entries are "wild card". The question is whether there is a point that agrees on all indices that are not "wild card". Here the second point is a match for the query so f(q,D) is 1. This is a basic problem that is not known to have a good data-structure and therefor nothing efficient can be done. Notice that the google search problem such as given in first slide has also nothing

Observation: Many database search problems can be implemented by constant-depth circuits

x1

xmx2

depth 2

• Gates: OR,AND,NOT and XOR

• Unbounded fan-in and fan-out

• Depth: length of the longest input→output path

output

inputs

omerb

Our departure point is the observation that many database seach problems can be implemented by constant-depth cirucits. First recall what is that. We consider circuits with OR,AND,NOT and exclusive OR gates. We consider the fan-in and fan-out to be unbounded. The parameters we use are a constant c depth, m bits of input and the size s stands for the number of wires of the circuit.


qD

f(q,D)

1x 2x 5x

6x

Cx

C(x) = f(q,D)

omerb

- Use the word "translate"- The client perform independently- Mension the large size of C comparing to x.In the client server setting, the server translates the database into a circuit C, and independantly, the client translate the query into input for the circuit, in a way that f(q,D)=C(x). This way we reduced our problem into evaluating a large circuit held by a server on a much smaller input held be the client. Communication efficient circuit evaluation seems to be a very dificult problem. Recently, Boneh, Goh and Nissim showed an efficient way to do this for the simple special case of the circuit being a 2-DNF. A similar solusion for general DNF or even for 3-DNF would be extremely interesting.

Example: partial match

*1*01010

0110

1110

Preprocess:

0 → 10

1 → 01

* → 11

1 1 0 1 1 1 1 0

1011

0110

omerb

- Say: important special case- Mention the problem and goal- explain the size of the DNFIn order to make this more concrete, let us look at the important special case of partial match. As we said the query consists of "wild cards" and the database has n points. One can see the second point is a match for this query (in fact the last one as well).First the client preprosseses his input in the following manner. Independantly, the server translates his input database in the following manner: each point creates one term in the DNF, each index has to variables - x_1,0 and x_1,1 etc. The term includes one of this couple according to whether it is a 1 or a 0. As one can see, if the query agrees with the point, then it sattisfies the correct variable. In the case of wild card, it will match the variable in either case. So, if a point is a match the correlating term will be satisfied. BTW, the search problem can be also computed by a low-depth circuit.


• “Computing on encrypted data” – longstanding question• Case of 2-DNF recently solved [BGN05]

1 3 2 34 4(x x ) (x x ) (x x )

qD

f(q,D)

1x 2x 5x

6x

Cx

C(x) = f(q,D)

omerb

- Use the word "translate"- The client perform independently- Mension the large size of C comparing to x.In the client server setting, the server translates the database into a circuit C, and independantly, the client translate the query into input for the circuit, in a way that f(q,D)=C(x). This way we reduced our problem into evaluating a large circuit held by a server on a much smaller input held be the client. Communication efficient circuit evaluation seems to be a very dificult problem. Recently, Boneh, Goh and Nissim showed an efficient way to do this for the simple special case of the circuit being a 2-DNF. A similar solusion for general DNF or even for 3-DNF would be extremely interesting.

Relaxation: multiple servers

C

C

C

• Used in information theoretic PIR• Replicated databases are common

– p2p networks– Web content delivery (e.g., Akamai)

• t-privacy– Client can choose servers he trusts

t servers

x

C(x)

1x 2x 5x

6x

x?

omerb

- Mention that by default t=1- The redundancy of DNS and Akamai is giving robustness but follows mainly from caching motivation.- Mention that this relaxation does not change the situation as for the previous results.- Colusion of t serversTo make things easier, we will use a relaxed model in which the database is replicated amoung several servers. In this model all servers hold D, and during the protocol any colusion of t servers or less cannot know q. By default t=1. This model is used in information-theoretic PIR and is common in practice for different reasons such as availablilty and robustness. For instance, the emerging peer-to-peer networks which infclude a massive duplication of information. The model is inferior since privacy depends on not to big collusions but the client is capable of choosing a subset of server to work with. One should notice that this relaxation does not change the situation as for the previous results.

Main results

t-secure protocol with:– Servers: t·(log|C|)depth-1

– Communication: Õ(|x|)– Client computation: Õ(|x|)– Server computation: Õ(|C|)– Rounds: 1

C

C

C

Yeh!

1x 2x 5x

6x

Communication and work are optimal up to

polylog factors

omerb

Considering a circuit with the mentioned parameters: size s, depth c, m input bits and security threshold t, we get a protocol with polylogarithmic number of servers and communication and computation proportional to the inputs up to polylogarithmic factors. The protocol is a 1 round protocol and both the correctness and the server privacy are statistical, dependant on a security parameter - sigma. You can see the parties are happy with the efficiency ;-)Use the phrase: statistical security.

Main results: DNF/CNF/partial match

• n-term DNF / database with n entries• Security threshold 1• Secure protocol with:

– Servers: ½logn– Communication: Õ(|x|)– Client computation: Õ(|x|)– Server computation: Õ(n)

D has 230

entries

We need ~15 servers

C

C

C

1x 2x 5x

6x

omerb

For the important special case of n-term DNF, or equivalently CNF or as we will see partial match (which also has a depth 2 circuit) we get the amount of servers reduced to half logn. This means that even in the case of a database of a bilion entries (each can be of kilobytes) the number of needed servers would be as feasable as 15.

Second model: multiparty computation

input: x1

party

Const-depth circuit C

C(x)

x=x1°x2°.... °xk

party

input: x2

input: x3

party

1x 2x 5x

6x

• General purpose secure computation [GMW87,BGW88,CCD88]

• Communication > circuit size

• Communication efficient multiparty computation [BFKR90]

• Computation exponential in |x| • Number of servers

input: x4

party

input: x5

party

( x )

omerb

We also consider the standard model of multiparty computation. Here the functionality is computed by a constant depth circuit known to all, and the input to the circuit is partitioned between the parties. They securely compute the circuit on their joint input. Security is against a collution of up to t parties. In this setting the query on a database problem is less usefull, but still, consider the problem where a constant-depth circuit is knwon to all parties, and the input of the circuit is privately distributed between the parties. We want the privacy of all parties to hold againt a colution of up to t malisious parties. Idially we want the communication to be proportional to the small input and not to the large circuit.Here also, the usage of general perpose secure computation protocols lead either to high communication and the usage of the communication efficient protocol of Beaver et al. requires computation and the minimal number of parties to be large.

Results: multiparty setting

t-secure multiparty protocol with– Parties: t·(log|C|)depth-1

– Communication: Õ(|x|·poly(#parties))– Computation: Õ(|C|)– Rounds: O(1) optimal up to polylog factors

1x 2x 5x

6x

omerb

Our protocol requires only a polylogarithmoic number of parties and communicational complexity proportional to the input size and computational complexity proportional to the circuit size, both up to polylogarithmic factors. The number of rounds required is a small constant.

n

Database

D

Server

Circuit

Server

1x 2x 5x

6x

1

Roadmap

Polynomials

p1(x)

p2(x)

pj(x)

Server

2

Polynomials

3

Server

Server

Server

Server

Server

Client

From database search to protocol

omerb

- Say that the polynomials are low-degree- Say that the number of servers is porportional to the degree and that the communication is proportional to the vertor size.-add client in all stages?Let's see how do we go from having a query on database to a secure protocol. We will use a chain of reduction: at first the server holds the database and the client holds the query. The first step would be for them to translate the database into a constant depth circiut and the query into an input to the circuit x in the manner I explained. Then, the server will construct a vector of low-degree polynomials over the input variables of the circuit. This is done so that from the output of these polynomials on the input x, C(x) can be recovered. So, if at first the problem was reduced into computing a circuit on it input, it is now reduced into computinmg low degree polynomials on a point in the space. Then, the client and the servers commit a protocol to evaluate these polynomials.Mention the privacy of the polynomials?

n

Database

D

Server

Circuit

Server

1x 2x 5x

6x

1

Roadmap

Polynomials

p1(x)

p2(x)

pj(x)

Server

2

Polynomials

3

Server

Server

Server

Server

Server

Client

From database search to circuit

omerb

As I said, first step would be to translate a database problem into a constant-depth circuit evaluation problem.

n

Database

D

Server

Circuit

Server

1x 2x 5x

6x

1

Roadmap

Polynomials

p1(x)

p2(x)

pj(x)

Server

2

Polynomials

3

Server

Server

Server

Server

Server

Client

From circuit to polynomials

omerb

We turn to the seconed reduction. The server now holds a constant depth circuit and the client holds an input to this circuit. The server want to translate the circuit into a vector of low-degree polynomials so that from the evaluation of the client's point of them one can recover C(x).


x1 x2 x4

x1+x2+x4

deg 1

no error

Step A:• Represent a circuit by a low-degree randomized

multivariate polynomial • Field = GF(2) • Rely on technique of [Raz87, Smo87]

Goal: x: Probr[pr(x) ≠ C(x)] ≤ 2-σ

omerb

The staring point is a technique by Razborov and Smolensky to represent a circuit by a low-degree randomized multivariate polynomial over GF(2). Their goal was to prove lower bounds on circuit complexity. While there polynomial should have been correct for most of the inputs, in our case the goal is that for every input x the probability the polynomial's outputis not equivalent to the circuit output is exponentially small in the security parameter - sigma. For this we will add the random input string rho. Let us look first on the simple case where the circuit is exlusive or of some input bits. In this case the summation is a degree 1 polynomial with no error.

t

ij jj 1i 1

1 (1 r x )

t

j jj 1

r x

t

1jj )x(11

t

ij jj 1i 1

(1 r (1 x ))r1

r2

…

rt

x1 x2 … xt

deg γ

err 2-γ

r11

r12

…

r1t

rγ1

rγ2

…

rγt

… deg 1

err ½…

…

…

set γ = σ



r

ε-biased

PRG

deg t

no error

omerb

The case the circuit being an OR gate, this cannot be done in a low-degree deterministic way. In this case random bits will be used. The inner product is always correct if all x's are 0 and is correct with probability half if there is a x=1. Thus we have a degree 1 prolynomial with one-sided error half. This can be improved by multiplying this gamma times. The error was reduced to be exponentially small in gamma but the degree increased to be gamma. Thus, by choosing gamma to be sigma, we fulfill our goal. As for AND gate, this can be implemented by de-Morgan rules producing a polynomial with similar parameters. On remark: we use here randomness proportional to the fan-in. In some settings this will later effect the communication complexity, so we want to reduce its amout substancially. In fact by using epsilon-bias pseudo-randomness generators, we can reduce it to be logarithmic. This is OK since the random bits were used to fool a linear test.Say we get a "uniform random bit"

x1 x2 x3 x4 x5 x6

n-term DNF

deg γ

err 2-γ

deg γ

err 2-γ

deg γ

err 2-γ

deg γ

err 2-γ

deg γ

err 2-γ

Prob[pr(x) ≠ C(x)] ≤ (n+1)·2-γ

= ( σ + log(n+1))2Total degree γ2


For error 2-σ set γ = σ + log(n+1)


omerb

Our next example is the more complicated case of n-term DNF. Every AND and OR gate can be assigned to by the polynomial I just mentioned. Notice that the degree would be squared gamma, since the polynomial of the second level acts on the polynomials of the first level as its inputs. In order to keep the error less than 2^{-sigma} we will need to choose gamma to be slightly bigger than sigma. Here, by union bound the error probability is at most (n+1)2^{-gamma} so by choosing gamma to be sigma + log(n+1) we get the wanted error probability. The degree of the resulting polynomial is thus larger than squared sigma. This is what is achived by activating this technique. The resulted degree will be too large - it will mean we will need thousands of servers in our application.

pr1(x)

x1 x2 x3 x4 x5 x6

deg γ

err 2-γ

deg γ

err 2-γ

deg γ

err 2-γ

deg γ

err 2-γ

deg 3

err ⅛

From circuit to polynomialsStep B: Optimizations – example for n-term DNF

Goal: Vector pr(x) s.t. x: Probr[R(pr(x)) ≠ C(x)] ≤ 2-σ

Prob[pr(x) ≠ C(x)] ≤ n·2-γ +⅛ ≤¼

= 3( logn+3) Total degree 3γ

For error ¼ set set γ = logn + 3

omerb

In order to reduce the degree, we will chance our goal. Instead of having a single polynomial that errs with small probability, we want to produce a vector of polynomials so for every x the probability a reconsruction algorithm will not recover is exponentially small. We will produce a vercor of polynomial, each of which is correct on C(x) with a constant probability. We take the output gate to have a constant degree - say 3 - so it errs with probability 1/8. Now, if we choose the gamma of the rest of the gates to be logn + 3, we have by a union bound that this polynomial errs on C(x) with probability 1/4. The resultant degree is 3gamma, i.e. probportional to logn. In order to achieve the goal we now repeat this independantly.

xr1

pr1(x)

xr2

pr2(x)

xr3

pr3(x)

xrO(σ)

prO(σ)(x)…

deg 3logn

err ¼

More careful analysis: degree logn+2

C(x)=0: Prob[p(x)=1] ≤ ⅛

C(x)=1: Prob[p(x)=1] ≥⅜Recover C(x) using MajorityRecover C(x) using Threshold¼


omerb

So, by repeating the mentioned proceedure O(sigma) times we will be able to recover C(x) using a majority function. The degree here is about 3logn. A more carefull analysis allow us to use only degree 1 on the output gate and thus total degree logn +2. In this case, if C(x)=0 then Pr[P(x)=1] ≤ ⅛ and ifC(x)=1 then Pr[P(x)=1] ≥⅜. Thus by using the threshold function that produces 1 if more that 1/4 of the entries are 1 we can recover C(x) with high probability, and achieve our goal.

O(σ) polynomials of degree logn+2

n

Server

I have no privacy!

Prob[th¼(pr(x)) ≠ C(x)] ≤ 2-σ

pr1(x)

pr2(x)

prO(σ)(x)


¼ ⅜⅛0

C(x)=0 C(x)=1

omerb

Summing up, we have O(sigma) polynomials of degree logn+2 from which we can recover C(X) wth high probability. But, we don't have server privacy. Notice that even if two circuits have the same output (say 0) on an input x they might have different error probability (recall that 1/8 and 3/8 were only crude bounds).This means that the distribution of the outputs reflects not only its output but also its structure.

n

Server

pr1(x,ρ)

pr2(x,ρ)

prσO(1)(x,ρ)

Randomizing polynomials for threshold

[IK00]

private randomness

th¼:{0,1}O(σ)→{0,1}pr1

(x)

pr2(x)

prO(σ)(x)

Step C: Server PrivacyFrom circuit to polynomials

omerb

Think of the situation: we have a vercot we want to compute the threshold function on. We want an output from which one can deduce this function output but nothing else. This is exactly implemented by the randomizing polynomial notion. By enlarging the output polynomially (still only polynomial in the security parameter) and without enlarging the degree at all, we get a vector of polynomials that reviels nothing but the output of the threshold. Notice that unlike the random bits rho, these new random bits should be private. We also note that that this adds only additively to the comp. and comm. and does not depend on the size of the database. In any case it is not expected to be the botleneck of the system.

n

Database

D

Server

Circuit

Server

1x 2x 5x

6x

1

Roadmap

Polynomials

p1(x)

p2(x)

pj(x)

Server

2

Polynomials

3

Server

Server

Server

Server

Server

Client

From polynomials to protocol

omerb

- Mention what is the starting point of this phase.In the third phase, the servers hold a vector of low-degree polynomials and the client holds a point in the spaces. The client's goal is to privatelly evaluate the polynomials on its point.

Client-Servers protocols from polynomials• Goal: evaluate multivariate polynomials held by the

servers on a point held by the client.• Standard techniques for secure computation [BGW88,

CCD88, BF90]

• Number of servers proportional to the degree• Communication proportional to # of polynomials (and

client’s input)• Enhancements:

– Protecting server privacy [GIKM98]

– Reducing number of servers [WY05]

pp

p

xp

p

Shamir-shares of x

Evaluate pr on shares

Public randomness r

Recover pr(x) by interpolation

omerb

- Mention: "information theoretic"- Stres: Main goal to reduce the number of servers (even with the cost of enlarging the communication).As I said, the goal is to evaluate multivariate polynomials held be the servers on a point held by the client. In order to do this we use standard techniques for secure computation like BGW. In order to protect the server privacy we use secret disclosure techniques from GIKM (Getner, Ishai, Kushilevitz and Malkin). As you saw, we optimize the number of servers up to half logn by using techniques from a new PIR protocol by WY (Woodruff and Yekhanin). Remember that the number of servers is proportional to the degree of the polynomials and the communication complexity is proportional to the number of polynomials. Still the main goal is to reduce the number of servers (even with the cost of enlarging the communication). When mentioning shamir say this is over an extention field of GF(2)Fir GIKM say there is a need in shared randomness CRS.

Multiparty protocols from polynomials• Goal: evaluate multivariate polynomials known to all

on distributed input and randomness.• Standard techniques for secure computation [BGW88,

CCD88, GRR98]

• Number of parties proportional to the degree• Communication proportional to # of polynomials (and

input lenght)• Randomness:

– Public randomness (r) independent of the inputs– Private randomness (ρ) should remain a secret

omerb

- Say the standard techniques are applied to low degree polynomials variant.- Say the num of parties and communication is as before.-Say the public randomness should be generated after the inputs where committed (but then can be known to all).- Say the private randomness is produced in a distributed manner.

n

Database

D

Server

Circuit

Server

1x 2x 5x

6x

1

RoadmapSecure computation of constant-depth circuits with

applications to database search problems

Polynomials

pr1(x,ρ)

pr2(x,ρ)

prj(x,ρ)

Server

2

Polynomials

3

Server

Server

Server

Server

Server

Client

Conclusions• Practically feasible solutions to large scale database search

problems, e.g., partial match– Nearly optimal communication and computation– Reasonable number of servers (½logn for partial match)– No expensive crypto (e.g., public key operations)

• Challenge: obtain similar protocols in 2-party setting– Extend [BGN05] from degree 2 to degree logn?

• Multiparty setting: – Nearly optimal communication and computation for a useful

class of functions (AC0)– Communication almost does not grow with circuit size

• Challenge: Higher complexity classes, e.g., NC1

omerb

Summing up: by a chain of reductions from database problem to circuit evaluation and then to polynomials evaluation we constructed feasable solusions for large scale database search problems such as partial match. Notice that our solusion is nearly optimal (comparing with communication complexity), and also notice we did not use any computationally expensive crypto mechanizems such as public key encryption. -AC0 - say constant-depth- NC1 - say log-depth

nDatabase

D

Server

Questions?

Server

1x 2x 5x

6x

1

Pρ1(x,r)

Pρ2(x)

r)

Server

2 3

Server

SerServer

Ser

Documents

“fermat” and (“last theorem” or “great theorem”)