Searchable Encryption: A Survey€¦ · searchable encryption: 3 parts 1 Encrypted index generation 2 Search protocol 3 Update protocol need to add new record … updated records

Searchable Encryption: A Survey

David CashUniversity of Chicago

cloud service

Information retrieval in cloud services

…

term records

while 4, 9,37

return 9,37,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

search for “goto”

‣ Files never leave client in plaintext

‣ Key retained only at client

‣ … but searching incompatible with privacy goals of traditional encryption

cloud service

, ,

Addressing service compromise: End-to-end encryption

cloud service

Addressing service compromise: End-to-end encryption

, ,

‣ Files never leave client in plaintext

‣ Key retained only at client

‣ … but searching incompatible with privacy goals of traditional encryption

term records

while 4, 9,37

return 9,37,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

client cloud provider

searchable encryption: 3 parts‣ special protocols to enable provider to “search without decrypting”

‣ all searching in this talk is for single keywords

upload encrypted records + extra helper info

1 Encrypted index generation

[Song-Wagner-Perrig,S&P’00][Curtmola-Garay-Kamara-Ostrovsky,CCS’06]


searchable encryption: 3 parts

want all docscontaining

“kos”

, ,

…

1 Encrypted index generation 2 Search protocol

Decrypt locally:

‣ special protocols to enable provider to “search without decrypting”




searchable encryption: 3 parts

1 Encrypted index generation 2 Search protocol 3 Update protocol

need to addnew record

…

updated records + helper info

‣ searches should still “work” on added document

‣ special protocols to enable provider to “search without decrypting”



Outline

1. An example SE construction

2. Evaluating SE: Security and usability

3. A survey of SE constructions

4. Empirical security analysis of SE

5. Locality lower bound

Outline






cloud service

Simple search on encrypted documents

key K

term records

while 4, 9,37

return 9,37,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

‣ replace each term with H(K, term)

‣ e.g. H(K, while) = 45e8a

cloud service


key K

term records

45e8a 4, 9,37

return 9,37,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75



cloud service


key K

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75



cloud service


key K

encrypt

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75

cloud service


key K

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75


“while”

H(K, while) = 45e8a search for“45e8a”

, ,

unmodified cloud service

Equivalent “Efficiently Deployable” Version

key K

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75

H(K, w1) = 45e8aH(K, w2) = f61b5

45e8a, f61b5

‣ Simply attach hashes of terms to encrypteddocument

‣ Let unmodified service build index

‣ Index built using existingsearch system

Outline






Goal #1: Security

‣ Security: Confidentiality against service compromise

‣ Persistent adversary, controls provider and observes many interactions

‣ Some information always leaked

‣ Thus goal is to minimize leakagecloud provider

Security Definition

Pr[𝒜 outputs 1 in REAL] ≈Pr[𝒜 outputs 1 in IDEALℒ,𝒮]

Def. An SE scheme Π is ℒ-secure if ∀𝒜∃𝒮 :

[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]

Formal security model: Game

challengerInit: Docs D

search: query q

(server view)

update: inputs

(server view)

(many queries)

Output: Bit b

(server view)

REAL[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]

Formal security model: Game

challenger

search: query q

(sim. server view)

update: inputs u(many queries)

‣ “Leakage function” (stateful, randomized) ‣Simulator (stateful, randomized)

Parameters:

Init: Docs D

(sim. server view)

(sim. server view)

(sim. server view)(sim. server view)

Output: Bit b

IDEALℒ,𝒮ℒ

𝒮

← 𝒮(ℒ(q))

← 𝒮(ℒ(D))

← 𝒮(ℒ(u))

[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]

Leakage Function L for Simple SE

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75

‣ Leaked info on initialize:

1. Number of documents, and sizes

2. Number of terms

3. Frequency of all terms

4. Co-occurrence information for all terms

‣ Leaked info on search:

1. Pointers to documents matching search

2. “Equality pattern” of searches

‣ Leaked info on update:

1. Pointer to document being changed

2. Whether or not changes match old searches

Goal #2: Usability (mostly ignored today)


‣ Usability: Query support, deployability

‣ Single keyword, multi-keyword, phrase

‣ Ranked versus unranked

‣ How much server modification needed?

cloud provider

Goal #3: Efficiency


cloud provider

‣ Usability: Query support, information retrieval

‣ Efficiency: Computation, storage, bandwidth

‣ Client storage/computation

‣ Server storage/computation

‣ Bandwidth, rounds of interaction

Outline






Construction 1 for reducing leakage

key K

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75

‣ hash terms as before ‣ derive key for each row

and encrypt

[Song-Wagner-Perrig, S&P’00]

‣ Leaked info from encrypted index:

1. Number of documents

2. Number of terms



‣ Frequency info still leaked


key K

term records

45e8a 4, 9,37

092ff 9,37,93,94,95

f61b5 1,8,89,90,94

cc562 4,37,62,75

[Song-Wagner-Perrig, S&P’00]



2. Number of terms



[Curtmola-Garay-Kamara-Ostrovsky, CCS ’05][Chang-Mitzenmacher, ACNS’05][Chase-Kamara, ASIACRYPT’10][Kamara-Papamanthou-Roeder, CCS’12][Kurosawa-Ohtaki, FC’12][Kamara-Papamanthou, FC’13][Naveed-Prabhakaran-Gunter, S&P’14][P-K-V-M-C-G-K-B, S&P’14][Stefanov-Papamanthou-Shi, NDSS’14]

Many works followed, hiding frequency info:

‣ Frequency info still leaked

Generic key/value store


term records

while 4, 9,37

return 9,37,93,94,95

goto 8,37,89,90

foreach 4,37,62,75

Label/Ciphertext Set

H(K1,1),

H(K1,2),

H(K1,3),

H(K2,1),

H(K2,2),

H(K2,3),

H(K2,4),

H(K2,5),

H(K3,1),

H(K3,2), H(K3,3),

H(K3,4),

H(K4,1),

H(K4,2), H(K4,3),

H(K4,4),

[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]

insert in random

order

‣ derive two keys for each row:

‣ encrypt each docid:

‣ compute ‘labels” by hashing a counter:

term count

while 3

return 5

goto 4

foreach 4

‣ client must store final counter values

(Ki, K′�i) ← H(K, 𝚝𝚎𝚛𝚖i)

ci ← Enc(K′�i, docidi)

ℓi,ctri← H(Ki, ctri)

To process i-th row:

cloud service


key K


“while”

‣ Compute labels:

‣ Query each label to get a ciphertext, decrypt


get(ℓctr)


term count

while 3

return 5

goto 4

foreach 4

, ,

(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(K1, K′�1)

ℓctr ← H(K1, ctr)

cloud service


key K

want to add doc

containing “while”



term count

while 3

return 5

goto 4

foreach 4

(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)ℓ ← H(K1,4)c ← Enc(K′�1, 𝚍𝚘𝚌𝚒𝚍)

, (ℓ, c)

‣ Add to store (ℓ, c)

cloud service


key K

want to delete doc #34



term count

while 3

return 5

goto 4

foreach 4

‣ Not gracefully supported

‣ One solution: Revocation list (but does not recover space)

cloud service

[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]Construction 2 for reducing leakage




2. Number of terms



1. Simple implementation

2. Parallel server search processing

Practical advantages:

Practical disadvantages:1. Client-side state for counters

2. No true support for deletions

cloud service

Construction 2 is not “forward private”

key K


“while”Generic key/value store

term count

while 3

return 5

goto 4

foreach 4

(Ki, K′�i)

(Ki, K′�i)

(Ki, K′�i) ← H(K, 𝚠𝚑𝚒𝚕𝚎)

Remember for this unknown query

(Ki, K′�i)

Construction 2 is not “forward private”

cloud service

key KGeneric key/value store

term count

while 3

return 5

goto 4

foreach 4

want add doc

containing “while”

(Ki, K′�i)

(y, c)

y ∈ {H(Ki,1), H(Ki,2), …}If , then newdocument contains previous query

y ← H(K1,4)(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)

c ← Enc(K′�1, 𝚍𝚘𝚌𝚒𝚍)

Adding Forward Privacy to Construction 2[Bost-Minaud-Ohrimenko, CCS’17]

‣ Idea: Replace H with a “Constrained PRF” for range constraints

‣ H comes with additional “key constraining” algorithm:

‣ Constrained key allows evaluation of only for

‣ Efficient constructions from blockciphers (log number of evaluations)

‣ Forward Privacy Requirement: Old queries cannot be re-run against newly added documents.

Ka,b ← Constrain(K, a, b)

x ∈ [a, b]H(K, x)

[Boneh-Waters, ASIACRYPT’13]

Forward Privacy with a Constrained PRF[Bost-Minaud-Ohrimenko, CCS’17]

K1,3 ← Constrain(K,1,3)

key K


“while”

(K1,3, K′�i)

(Ki, K′�i) ← H(K, 𝚠𝚑𝚒𝚕𝚎)

term count

while 3

return 5

goto 4

foreach 4

cloud service

‣ Can compute labels only for 1,2,3

‣ Can’t mount prev attack

get(ℓctr)


ℓctr ← H(K1,3, ctr)

Other Constructions

‣ Construction gracefully supporting deletes via ORAM-like techniques [Stefanov-Papamanthou-Shi, NDSS’14] [Naveed-Prabhakaran-Gunter,SP’14]

‣ UC-security definitions and constructions [Kurosawa-Ohtaki, FC’12]

‣ Boolean query support [Cash-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, CRYPTO’13]

‣ Graph query support [Chase-Kamara, ASIACRYPT’10]

‣ Several constructions optimizing “locality” - covered later

‣ Other application: Exact match queries in encrypted database tables (CryptDB) [Popa-Redfield-Zeldovich-Balakrishnan, SOSP’11]

Outline






SE security definition: What about the IDEAL game?

Pr[𝒜 outputs 1 in REAL] ≈Pr[𝒜 outputs 1 in IDEALℒ,𝒮]

Def. An SE scheme Π is ℒ-secure if ∀𝒜∃𝒮 :

‣ Definition may allow damaging attacks exist in IDEAL game

‣ This part: Attacks in IDEAL game with “real-world” data

‣ Several attacks exist, but broad conclusions are elusive

What does -security mean for practice?ℒ

cloud service

get(ℓctr)


‣ In most constructions, curious server learns:

Q1 x x x

Q2 x x x

Q3 x x

Q4 x x x x x

…

Documents

Queries

x ⟹ “Query matched this doc”

‣ Now: two attacks that try to recover queries

Attack 1: When documents are “known” [Islam-Kuzu-Kantarcioglu, NDSS’12]

while goto return for

while 1 0.26 0.19 0.44

goto 1 0.13 0.20

return 1 0.11

for 1

…

‣ Suppose adversary knows document distribution well enough to estimate “co-occurrence probabilities” of all possible terms in documents

‣ Adversary can compute empirical co-occurrence probabilities:Q1 Q2 Q3 Q4

Q1 1 0.14 0.59 0.10

Q2 1 0.38 0.25

Q3 1 0.06

Q4 1

…

‣ Then attempt to find mapping from Qi to text terms.

‣ IKK further assume some of the Qi are “known” to adversary (!)

IKK Attack Sketch [Islam-Kuzu-Kantarcioglu, NDSS’12]

while goto return for

while 1 0.26 0.19 0.44

goto 1 0.13 0.20

return 1 0.11

for 1

…

Q1 Q2 Q3 Q4

Q1 1 0.14 0.59 0.10

Q2 1 0.38 0.25

Q3 1 0.06

Q4 1

…

‣ Find assignment from Qi to plaintext terms that minimizes “error” defined by integer program

‣ Some Qi are “known” to help optimization (!)

‣ Integer programming formulation given is NP-complete, so approximate solutions used

Training co-occurrences Observed co-occurrences

IKK Attack Experiment Setup [Islam-Kuzu-Kantarcioglu, NDSS’12]

‣ Document dataset: Enron emails sent folder

‣ 30,109 documents

‣ Parsed terms, stemmed, removed 200 “stop-words”, 77,000 unique terms remained

‣ Repeated experiments with top-N most common terms for different N

‣ Queries:

‣ Drawn i.i.d. from Zipfian distribution (probability of term w inversely proportional to its rank)

‣ Varied number of queries

‣ Experiment:

‣ Take top-N most common terms, and draw Q queries

‣ Fix random subset as “known”

‣ Run attack on leakage and measure accuracy (number of queries correctly predicted)

IKK Attack Results [Islam-Kuzu-Kantarcioglu, NDSS’12]

‣ 150 Queries, 22 known

‣ Varied number of terms in index

‣ Very strong for small numbers of possible terms, but drop-off was not explored

‣ Other experiments varied number of queries and number of known queries

Attack 2 setting: Encrypted Searchable Email

Client-side

SE index…

update protocol

Client-side

SE index…

update protocol

Leakage induced by my crafted

email!

Attack 2: Query recovery via document injection

‣ Another example: Inject rows into account databases by creating accounts

[Cash-Grubbs-Perry-Ristenpart, CCS’15][Zhang-Katz-Papamanthou, USENIX’16]

Attack 2: Query recovery via document injection [Cash-Grubbs-Perry-Ristenpart, CCS’15][Zhang-Katz-Papamanthou, USENIX’16]

SE Server

Chosen docs

Insert chosen documents

K queries for random terms

… and query leakage!

‣ Adversary outputs guesses for queries

Document injection attack details[Zhang-Katz-Papamanthou, USENIX’16]

‣ Compute terms present in training documents, say

‣ Construct documents each with n/2 keywords:

w1, w2, …, wn

log n D1, D2, …, Dlog n

Di := {wj : i-th bit of j is 0}

Stage 1: Choosing documents to inject

Stage 2: Guessing query from leakage‣ Some is queried and attack wants to learn j

‣ For each i, check if was returned and learn i-th bit of jDi

wj

D1 D2 D3w1w2

w8

…

Document injection attack results[Zhang-Katz-Papamanthou, USENIX’16]

‣ Setup: Use same Enron data, index top 5000 terms. Select queries uniformly. ‣ Reveal varying percentage of documents for training

‣ Attack typically injects 10 documents ( )log2(2000) ≈ 10

Other attacks

‣ IKK extended to case where training data was imperfect, but resultsare unconvincing

‣ More devastating attacks against leakier SE constructions

‣ Countermeasures against document injection have been suggestedand broken

[Cash-Grubbs-Perry-Ristenpart, CCS’15]

[Islam-Kuzu-Kantarcioglu, NDSS’12]

[Zhang-Katz-Papamanthou, USENIX’16]

[Pouliot-Wright, CCS’16]

Outline




4. Attacks against SE


Memory locality of searchable encryption

‣ One random key/value store query per document

‣ Contrast with plaintext search: Read predictable blocks of memory

➡ Runtime bottleneck: disk latency,not crypto processing.

➡ True for all known frequency-hiding constructions

cloud service

‣ Compute labels by hashing counter

‣ Query each label to get a ciphertext, decrypt

get(label)


“Theorem”: Any -secure searchable encryption must either:

(1) Have a very large encrypted index,

or

(2) Read memory in a highly “non-local” fashion,

or

(3) Read more memory than a plaintext search.

‣ unconditional (no complexity assumptions)

‣ different type of locality lower bound: security vs. correctness

A memory locality lower bound[Cash-Tessaro, EUROCRYPT’14]

(super-linear size)

(super-constant locality)

(super-linear)

‣ Let be the leakage profile from construction 2ℒ

ℒ

53

Enc Ind Size ExtraRead Localitylower bound: 1 of ω(N) ω(1) ω(1)

Most schemes N 1 R[Chase-Kamara, ASIACRYPT’10] N 1 1

trivial “read all” N N 1[Cash-Tessaro, EUROCRYPT’14] N log N log N log N

[Asharov-Naor-Segev-Shahaf,STOC’16] N loglog N loglog N 1

N = no. postings in input index, R = no. postings in search

2

Memory locality of SE constructions

Intuition for memory locality lower bound

Server memory:

‣ Suppose a construction is “perfectly local”

‣ Curious server can remember which memory regions it touches during searchers

‣ Server may infer number of documents associated with some other term (which is not be allowed by security definition with )

‣ We can relate building local schemes to strategies in new two-player game

‣ Lower bound follows by proving game has winning strategy for other player

inference from patterns

ℒ

1

1

2

4

4

7

Local SE Construction ⟷ Interval-Packing Game Strategy

Player 1Player 2Referee

‣ Choose one multi-set at random, send to Player 2

1

1

2

4

4

7

1

1

1

2

2

3

4

5

Common lengths: 1, 1, 2, 4

‣ Pack intervals into space

‣ Reveal to Player 1 the common-length intervals(ties chosen randomly)

Choose two multi-sets of intervals‣ lengths all integral ‣ sum of lengths equal

1

1

2

4

4

7



1

1

2

4

4

7

1

1

1

2

2

3

4

5




1 1 24



1

1

2

4

4

7



1

1

2

4

4

7

1

1

1

2

2

3

4

5




‣ Player 1 tries to guesswhich multi-set was used

1 1 24



1

1

2

4

4

7

Player 1


Player 2Referee‣ Choose one multi-set at

random, send to Player 2

1

1

2

4

4

7

1

1

1

2

2

3

4

5




‣ Player 1 tries to guesswhich multi-set was used

1 1 24

Thm: If allowed length is Ω(interval-sum),then Player 1 can win w. prob > 0.5+O(1)

[Asharov-Naor-Segev-Shahaf,STOC’16]Thm: Result is essentially tight.


left set packed

Assume < 1.5n - 1 blocks of space

Interval-Packing Game: Warm-up Case

Show now: If <1.5n space is used then Player 1 can win.

1n-1

1

1

1

1

1

1

…

1

‣ let n = sum of lengthsright set packed

Player 1

Player 2

Common lengths: 1

Two observations:

1. If right set packed, revealed block must leave large contiguous untouched region on one side

2. If left set packed, ≥ 1/n chance this does not happen

‣ Proof: < 1.5n places to store n blocks, so one must be “close to center”, preventing large block fitting

No room forlarge block

No room forlarge block

Large block always fits

➡ Player 1 checks if large block could fit, decides which set was packed.

➡ Wins with advantage > 1/2 + 1/n.

revealedblock

revealedblock

left set packed right set packed

Interval-Packing Game: Warm-up Case

Assume < 1.5n - 1 blocks of space

A Local SE Construction: First Attempt[Asharov-Naor-Segev-Shahaf,STOC’16]

term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

1 2 3 4 5 6 7 8 9 10

Setup

j ← H(K, w)

Store first entry in bucket j, next in j+1, etc

For each term w:

Pad all buckets to a max-size

5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)


4 9 37

term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

1 2 3 4 5 6 7 8 9 10

5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)

Setup

j ← H(K, w)


For each term w:



4 9

8

37

22

93 94 95

term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

1 2 3 4 5 6 7 8 9 10

5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)6 ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)

Setup

j ← H(K, w)


For each term w:



1 8 89

4

90

37

4

94

62

9

8

75

37

22

93 94 95

term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

1 2 3 4 5 6 7 8 9 10

5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)6 ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)1 ← H(K, 𝚐𝚘𝚝𝚘)3 ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)

Setup

j ← H(K, w)


For each term w:


Search(w)j ← H(K, w)

Retrieve entirety of buckets j, j+1,…, j+countw

‣ Max-bucket size will be O(log(N)), where N is number of entries of input index ‣ Locality 1, but storage O(N log(N)) and O(L log(N)) bits read for list of size L

A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]

term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup

( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load

For each term w:


1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


4 9 37

1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


1 2 3 4 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)4 9 37

65 7

(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


93 94 95 4 9 37 8 22

1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


93 94 95 4 9 37 8 22

1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


93 94 95 1 4

8

9

89

37

90

94 8 22

1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)

Setup


For each term w:


93 94 95 1 4

8

9

89

37

90

94 8 22

1 2 3 4 5 6 7 8 9 10

(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)

Setup


For each term w:


93 94 95 1 4

8

9

89

37

90

4

94

37

8

62

22

75

1 2 3 4 5 6 7 8 9 10

(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)


term records

while 4, 9,37

return 8,22,93,94,95

goto 1,8,89,90,94

foreach 4,37,62,75

Setup


For each term w:


Search(w)

Retrieve entirety of buckets j, j+1,…, j+countw and buckets j’, j’+1,…, j’+countw

( j, j′�) ← H(K, w)

93 94 95 1 4

8

9

89

37

90

4

94

37

8

62

22

75

1 2 3 4 5 6 7 8 9 10

(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)

‣ Need to show: Less padding required to prevent overflows and hide number stored in each bucket.


Theorem: The Power-of-Two-Choices SE construction achieves:

(1) O(N loglog(N)) index size,

(2) O(1) locality,

and

(3) Read O(L loglog(N)) bits to retrieve a list of size L.

‣ Non-trivial proof uses techniques from prior power-of-two-choices work. ‣ Note: Actual results stated with different parameter regime.

[Azar-Broder-Karlin-Upfal,STOC’94]

Open Problems

1. Lower bounds for SE with updates

• Existing SE constructions with single-round search and update need substantial client memory and do not fully support deletes

• Such SE might imply ORAM, where lower bounds are known

2. Get a better understanding of real-world security

• Known attacks do not scale to very large document sets (or document sets with a very large number of terms). Better attacks likely exist.

• Identify properties of text that make attacks harder or easier.

• Heuristic countermeasures (like random “dummy documents”) have been explored but are inconclusive.

Thanks!