Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Searchable Encryption: A Survey
David CashUniversity of Chicago
cloud service
Information retrieval in cloud services
…
term records
while 4, 9,37
return 9,37,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
search for “goto”
‣ Files never leave client in plaintext
‣ Key retained only at client
‣ … but searching incompatible with privacy goals of traditional encryption
cloud service
, ,
Addressing service compromise: End-to-end encryption
cloud service
Addressing service compromise: End-to-end encryption
, ,
‣ Files never leave client in plaintext
‣ Key retained only at client
‣ … but searching incompatible with privacy goals of traditional encryption
term records
while 4, 9,37
return 9,37,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
client cloud provider
searchable encryption: 3 parts‣ special protocols to enable provider to “search without decrypting”
‣ all searching in this talk is for single keywords
upload encrypted records + extra helper info
1 Encrypted index generation
[Song-Wagner-Perrig,S&P’00][Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
client cloud provider
searchable encryption: 3 parts
want all docscontaining
“kos”
, ,
…
1 Encrypted index generation 2 Search protocol
Decrypt locally:
‣ special protocols to enable provider to “search without decrypting”
‣ all searching in this talk is for single keywords
[Song-Wagner-Perrig,S&P’00][Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
client cloud provider
searchable encryption: 3 parts
1 Encrypted index generation 2 Search protocol 3 Update protocol
need to addnew record
…
updated records + helper info
‣ searches should still “work” on added document
‣ special protocols to enable provider to “search without decrypting”
‣ all searching in this talk is for single keywords
[Song-Wagner-Perrig,S&P’00][Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Empirical security analysis of SE
5. Locality lower bound
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Empirical security analysis of SE
5. Locality lower bound
cloud service
Simple search on encrypted documents
key K
term records
while 4, 9,37
return 9,37,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
‣ replace each term with H(K, term)
‣ e.g. H(K, while) = 45e8a
cloud service
Simple search on encrypted documents
key K
term records
45e8a 4, 9,37
return 9,37,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
‣ replace each term with H(K, term)
‣ e.g. H(K, while) = 45e8a
cloud service
Simple search on encrypted documents
key K
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
‣ replace each term with H(K, term)
‣ e.g. H(K, while) = 45e8a
cloud service
Simple search on encrypted documents
key K
encrypt
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
cloud service
Simple search on encrypted documents
key K
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
want all docscontaining
“while”
H(K, while) = 45e8a search for“45e8a”
, ,
unmodified cloud service
Equivalent “Efficiently Deployable” Version
key K
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
H(K, w1) = 45e8aH(K, w2) = f61b5
45e8a, f61b5
‣ Simply attach hashes of terms to encrypteddocument
‣ Let unmodified service build index
‣ Index built using existingsearch system
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Empirical security analysis of SE
5. Locality lower bound
Goal #1: Security
‣ Security: Confidentiality against service compromise
‣ Persistent adversary, controls provider and observes many interactions
‣ Some information always leaked
‣ Thus goal is to minimize leakagecloud provider
Security Definition
Pr[𝒜 outputs 1 in REAL] ≈Pr[𝒜 outputs 1 in IDEALℒ,𝒮]
Def. An SE scheme Π is ℒ-secure if ∀𝒜∃𝒮 :
[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
Formal security model: Game
challengerInit: Docs D
search: query q
(server view)
update: inputs
(server view)
(many queries)
Output: Bit b
(server view)
REAL[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
Formal security model: Game
challenger
search: query q
(sim. server view)
update: inputs u(many queries)
‣ “Leakage function” (stateful, randomized) ‣Simulator (stateful, randomized)
Parameters:
Init: Docs D
(sim. server view)
(sim. server view)
(sim. server view)(sim. server view)
Output: Bit b
IDEALℒ,𝒮ℒ
𝒮
← 𝒮(ℒ(q))
← 𝒮(ℒ(D))
← 𝒮(ℒ(u))
[Curtmola-Garay-Kamara-Ostrovsky,CCS’06]
Leakage Function L for Simple SE
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
‣ Leaked info on initialize:
1. Number of documents, and sizes
2. Number of terms
3. Frequency of all terms
4. Co-occurrence information for all terms
‣ Leaked info on search:
1. Pointers to documents matching search
2. “Equality pattern” of searches
‣ Leaked info on update:
1. Pointer to document being changed
2. Whether or not changes match old searches
Goal #2: Usability (mostly ignored today)
‣ Security: Confidentiality against service compromise
‣ Usability: Query support, deployability
‣ Single keyword, multi-keyword, phrase
‣ Ranked versus unranked
‣ How much server modification needed?
cloud provider
Goal #3: Efficiency
‣ Security: Confidentiality against service compromise
cloud provider
‣ Usability: Query support, information retrieval
‣ Efficiency: Computation, storage, bandwidth
‣ Client storage/computation
‣ Server storage/computation
‣ Bandwidth, rounds of interaction
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Empirical security analysis of SE
5. Locality lower bound
Construction 1 for reducing leakage
key K
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
‣ hash terms as before ‣ derive key for each row
and encrypt
[Song-Wagner-Perrig, S&P’00]
‣ Leaked info from encrypted index:
1. Number of documents
2. Number of terms
3. Frequency of all terms
4. Co-occurrence information for all terms
‣ Frequency info still leaked
Construction 1 for reducing leakage
key K
term records
45e8a 4, 9,37
092ff 9,37,93,94,95
f61b5 1,8,89,90,94
cc562 4,37,62,75
[Song-Wagner-Perrig, S&P’00]
‣ Leaked info from encrypted index:
1. Number of documents
2. Number of terms
3. Frequency of all terms
4. Co-occurrence information for all terms
[Curtmola-Garay-Kamara-Ostrovsky, CCS ’05][Chang-Mitzenmacher, ACNS’05][Chase-Kamara, ASIACRYPT’10][Kamara-Papamanthou-Roeder, CCS’12][Kurosawa-Ohtaki, FC’12][Kamara-Papamanthou, FC’13][Naveed-Prabhakaran-Gunter, S&P’14][P-K-V-M-C-G-K-B, S&P’14][Stefanov-Papamanthou-Shi, NDSS’14]
Many works followed, hiding frequency info:
‣ Frequency info still leaked
Generic key/value store
Construction 2 for reducing leakage
term records
while 4, 9,37
return 9,37,93,94,95
goto 8,37,89,90
foreach 4,37,62,75
Label/Ciphertext Set
H(K1,1),
H(K1,2),
H(K1,3),
H(K2,1),
H(K2,2),
H(K2,3),
H(K2,4),
H(K2,5),
H(K3,1),
H(K3,2), H(K3,3),
H(K3,4),
H(K4,1),
H(K4,2), H(K4,3),
H(K4,4),
[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]
insert in random
order
‣ derive two keys for each row:
‣ encrypt each docid:
‣ compute ‘labels” by hashing a counter:
term count
while 3
return 5
goto 4
foreach 4
‣ client must store final counter values
(Ki, K′�i) ← H(K, 𝚝𝚎𝚛𝚖i)
ci ← Enc(K′�i, docidi)
ℓi,ctri← H(Ki, ctri)
To process i-th row:
cloud service
[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]
key K
want all docscontaining
“while”
‣ Compute labels:
‣ Query each label to get a ciphertext, decrypt
Construction 2 for reducing leakage
get(ℓctr)
Generic key/value store
term count
while 3
return 5
goto 4
foreach 4
, ,
(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(K1, K′�1)
ℓctr ← H(K1, ctr)
cloud service
[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]
key K
want to add doc
containing “while”
Construction 2 for reducing leakage
Generic key/value store
term count
while 3
return 5
goto 4
foreach 4
(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)ℓ ← H(K1,4)c ← Enc(K′�1, 𝚍𝚘𝚌𝚒𝚍)
, (ℓ, c)
‣ Add to store (ℓ, c)
cloud service
[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]
key K
want to delete doc #34
Construction 2 for reducing leakage
Generic key/value store
term count
while 3
return 5
goto 4
foreach 4
‣ Not gracefully supported
‣ One solution: Revocation list (but does not recover space)
cloud service
[Cash-Jaeger-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, NDSS’14]Construction 2 for reducing leakage
Generic key/value store
‣ Leaked info from encrypted index:
1. Number of documents
2. Number of terms
3. Frequency of all terms
4. Co-occurrence information for all terms
1. Simple implementation
2. Parallel server search processing
Practical advantages:
Practical disadvantages:1. Client-side state for counters
2. No true support for deletions
cloud service
Construction 2 is not “forward private”
key K
want all docscontaining
“while”Generic key/value store
term count
while 3
return 5
goto 4
foreach 4
(Ki, K′�i)
(Ki, K′�i)
(Ki, K′�i) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
Remember for this unknown query
(Ki, K′�i)
Construction 2 is not “forward private”
cloud service
key KGeneric key/value store
term count
while 3
return 5
goto 4
foreach 4
want add doc
containing “while”
(Ki, K′�i)
(y, c)
y ∈ {H(Ki,1), H(Ki,2), …}If , then newdocument contains previous query
y ← H(K1,4)(K1, K′�1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
c ← Enc(K′�1, 𝚍𝚘𝚌𝚒𝚍)
Adding Forward Privacy to Construction 2[Bost-Minaud-Ohrimenko, CCS’17]
‣ Idea: Replace H with a “Constrained PRF” for range constraints
‣ H comes with additional “key constraining” algorithm:
‣ Constrained key allows evaluation of only for
‣ Efficient constructions from blockciphers (log number of evaluations)
‣ Forward Privacy Requirement: Old queries cannot be re-run against newly added documents.
Ka,b ← Constrain(K, a, b)
x ∈ [a, b]H(K, x)
[Boneh-Waters, ASIACRYPT’13]
Forward Privacy with a Constrained PRF[Bost-Minaud-Ohrimenko, CCS’17]
K1,3 ← Constrain(K,1,3)
key K
want all docscontaining
“while”
(K1,3, K′�i)
(Ki, K′�i) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
term count
while 3
return 5
goto 4
foreach 4
cloud service
‣ Can compute labels only for 1,2,3
‣ Can’t mount prev attack
get(ℓctr)
Generic key/value store
ℓctr ← H(K1,3, ctr)
Other Constructions
‣ Construction gracefully supporting deletes via ORAM-like techniques [Stefanov-Papamanthou-Shi, NDSS’14] [Naveed-Prabhakaran-Gunter,SP’14]
‣ UC-security definitions and constructions [Kurosawa-Ohtaki, FC’12]
‣ Boolean query support [Cash-Jarecki-Jutla-Krawcyzk-Rosu-Steiner, CRYPTO’13]
‣ Graph query support [Chase-Kamara, ASIACRYPT’10]
‣ Several constructions optimizing “locality” - covered later
‣ Other application: Exact match queries in encrypted database tables (CryptDB) [Popa-Redfield-Zeldovich-Balakrishnan, SOSP’11]
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Empirical security analysis of SE
5. Locality lower bound
SE security definition: What about the IDEAL game?
Pr[𝒜 outputs 1 in REAL] ≈Pr[𝒜 outputs 1 in IDEALℒ,𝒮]
Def. An SE scheme Π is ℒ-secure if ∀𝒜∃𝒮 :
‣ Definition may allow damaging attacks exist in IDEAL game
‣ This part: Attacks in IDEAL game with “real-world” data
‣ Several attacks exist, but broad conclusions are elusive
What does -security mean for practice?ℒ
cloud service
get(ℓctr)
Generic key/value store
‣ In most constructions, curious server learns:
Q1 x x x
Q2 x x x
Q3 x x
Q4 x x x x x
…
Documents
Queries
x ⟹ “Query matched this doc”
‣ Now: two attacks that try to recover queries
Attack 1: When documents are “known” [Islam-Kuzu-Kantarcioglu, NDSS’12]
while goto return for
while 1 0.26 0.19 0.44
goto 1 0.13 0.20
return 1 0.11
for 1
…
‣ Suppose adversary knows document distribution well enough to estimate “co-occurrence probabilities” of all possible terms in documents
‣ Adversary can compute empirical co-occurrence probabilities:Q1 Q2 Q3 Q4
Q1 1 0.14 0.59 0.10
Q2 1 0.38 0.25
Q3 1 0.06
Q4 1
…
‣ Then attempt to find mapping from Qi to text terms.
‣ IKK further assume some of the Qi are “known” to adversary (!)
IKK Attack Sketch [Islam-Kuzu-Kantarcioglu, NDSS’12]
while goto return for
while 1 0.26 0.19 0.44
goto 1 0.13 0.20
return 1 0.11
for 1
…
Q1 Q2 Q3 Q4
Q1 1 0.14 0.59 0.10
Q2 1 0.38 0.25
Q3 1 0.06
Q4 1
…
‣ Find assignment from Qi to plaintext terms that minimizes “error” defined by integer program
‣ Some Qi are “known” to help optimization (!)
‣ Integer programming formulation given is NP-complete, so approximate solutions used
Training co-occurrences Observed co-occurrences
IKK Attack Experiment Setup [Islam-Kuzu-Kantarcioglu, NDSS’12]
‣ Document dataset: Enron emails sent folder
‣ 30,109 documents
‣ Parsed terms, stemmed, removed 200 “stop-words”, 77,000 unique terms remained
‣ Repeated experiments with top-N most common terms for different N
‣ Queries:
‣ Drawn i.i.d. from Zipfian distribution (probability of term w inversely proportional to its rank)
‣ Varied number of queries
‣ Experiment:
‣ Take top-N most common terms, and draw Q queries
‣ Fix random subset as “known”
‣ Run attack on leakage and measure accuracy (number of queries correctly predicted)
IKK Attack Results [Islam-Kuzu-Kantarcioglu, NDSS’12]
‣ 150 Queries, 22 known
‣ Varied number of terms in index
‣ Very strong for small numbers of possible terms, but drop-off was not explored
‣ Other experiments varied number of queries and number of known queries
Attack 2 setting: Encrypted Searchable Email
Client-side
SE index…
update protocol
Client-side
SE index…
update protocol
Leakage induced by my crafted
email!
Attack 2: Query recovery via document injection
‣ Another example: Inject rows into account databases by creating accounts
[Cash-Grubbs-Perry-Ristenpart, CCS’15][Zhang-Katz-Papamanthou, USENIX’16]
Attack 2: Query recovery via document injection [Cash-Grubbs-Perry-Ristenpart, CCS’15][Zhang-Katz-Papamanthou, USENIX’16]
SE Server
Chosen docs
Insert chosen documents
K queries for random terms
… and query leakage!
‣ Adversary outputs guesses for queries
Document injection attack details[Zhang-Katz-Papamanthou, USENIX’16]
‣ Compute terms present in training documents, say
‣ Construct documents each with n/2 keywords:
w1, w2, …, wn
log n D1, D2, …, Dlog n
Di := {wj : i-th bit of j is 0}
Stage 1: Choosing documents to inject
Stage 2: Guessing query from leakage‣ Some is queried and attack wants to learn j
‣ For each i, check if was returned and learn i-th bit of jDi
wj
D1 D2 D3w1w2
w8
…
Document injection attack results[Zhang-Katz-Papamanthou, USENIX’16]
‣ Setup: Use same Enron data, index top 5000 terms. Select queries uniformly. ‣ Reveal varying percentage of documents for training
‣ Attack typically injects 10 documents ( )log2(2000) ≈ 10
Other attacks
‣ IKK extended to case where training data was imperfect, but resultsare unconvincing
‣ More devastating attacks against leakier SE constructions
‣ Countermeasures against document injection have been suggestedand broken
[Cash-Grubbs-Perry-Ristenpart, CCS’15]
[Islam-Kuzu-Kantarcioglu, NDSS’12]
[Zhang-Katz-Papamanthou, USENIX’16]
[Pouliot-Wright, CCS’16]
Outline
1. An example SE construction
2. Evaluating SE: Security and usability
3. A survey of SE constructions
4. Attacks against SE
5. Locality lower bound
Memory locality of searchable encryption
‣ One random key/value store query per document
‣ Contrast with plaintext search: Read predictable blocks of memory
➡ Runtime bottleneck: disk latency,not crypto processing.
➡ True for all known frequency-hiding constructions
cloud service
‣ Compute labels by hashing counter
‣ Query each label to get a ciphertext, decrypt
get(label)
Generic key/value store
“Theorem”: Any -secure searchable encryption must either:
(1) Have a very large encrypted index,
or
(2) Read memory in a highly “non-local” fashion,
or
(3) Read more memory than a plaintext search.
‣ unconditional (no complexity assumptions)
‣ different type of locality lower bound: security vs. correctness
A memory locality lower bound[Cash-Tessaro, EUROCRYPT’14]
(super-linear size)
(super-constant locality)
(super-linear)
‣ Let be the leakage profile from construction 2ℒ
ℒ
53
Enc Ind Size ExtraRead Localitylower bound: 1 of ω(N) ω(1) ω(1)
Most schemes N 1 R[Chase-Kamara, ASIACRYPT’10] N 1 1
trivial “read all” N N 1[Cash-Tessaro, EUROCRYPT’14] N log N log N log N
[Asharov-Naor-Segev-Shahaf,STOC’16] N loglog N loglog N 1
N = no. postings in input index, R = no. postings in search
2
Memory locality of SE constructions
Intuition for memory locality lower bound
Server memory:
‣ Suppose a construction is “perfectly local”
‣ Curious server can remember which memory regions it touches during searchers
‣ Server may infer number of documents associated with some other term (which is not be allowed by security definition with )
‣ We can relate building local schemes to strategies in new two-player game
‣ Lower bound follows by proving game has winning strategy for other player
inference from patterns
ℒ
1
1
2
4
4
7
Local SE Construction ⟷ Interval-Packing Game Strategy
Player 1Player 2Referee
‣ Choose one multi-set at random, send to Player 2
1
1
2
4
4
7
1
1
1
2
2
3
4
5
Common lengths: 1, 1, 2, 4
‣ Pack intervals into space
‣ Reveal to Player 1 the common-length intervals(ties chosen randomly)
Choose two multi-sets of intervals‣ lengths all integral ‣ sum of lengths equal
1
1
2
4
4
7
Player 1Player 2Referee
‣ Choose one multi-set at random, send to Player 2
1
1
2
4
4
7
1
1
1
2
2
3
4
5
Common lengths: 1, 1, 2, 4
‣ Pack intervals into space
‣ Reveal to Player 1 the common-length intervals(ties chosen randomly)
1 1 24
Local SE Construction ⟷ Interval-Packing Game Strategy
Choose two multi-sets of intervals‣ lengths all integral ‣ sum of lengths equal
1
1
2
4
4
7
Player 1Player 2Referee
‣ Choose one multi-set at random, send to Player 2
1
1
2
4
4
7
1
1
1
2
2
3
4
5
Common lengths: 1, 1, 2, 4
‣ Pack intervals into space
‣ Reveal to Player 1 the common-length intervals(ties chosen randomly)
‣ Player 1 tries to guesswhich multi-set was used
1 1 24
Local SE Construction ⟷ Interval-Packing Game Strategy
Choose two multi-sets of intervals‣ lengths all integral ‣ sum of lengths equal
1
1
2
4
4
7
Player 1
Choose two multi-sets of intervals‣ lengths all integral ‣ sum of lengths equal
Player 2Referee‣ Choose one multi-set at
random, send to Player 2
1
1
2
4
4
7
1
1
1
2
2
3
4
5
Common lengths: 1, 1, 2, 4
‣ Pack intervals into space
‣ Reveal to Player 1 the common-length intervals(ties chosen randomly)
‣ Player 1 tries to guesswhich multi-set was used
1 1 24
Thm: If allowed length is Ω(interval-sum),then Player 1 can win w. prob > 0.5+O(1)
[Asharov-Naor-Segev-Shahaf,STOC’16]Thm: Result is essentially tight.
Local SE Construction ⟷ Interval-Packing Game Strategy
left set packed
Assume < 1.5n - 1 blocks of space
Interval-Packing Game: Warm-up Case
Show now: If <1.5n space is used then Player 1 can win.
1n-1
1
1
1
1
1
1
…
1
‣ let n = sum of lengthsright set packed
Player 1
Player 2
Common lengths: 1
Two observations:
1. If right set packed, revealed block must leave large contiguous untouched region on one side
2. If left set packed, ≥ 1/n chance this does not happen
‣ Proof: < 1.5n places to store n blocks, so one must be “close to center”, preventing large block fitting
No room forlarge block
No room forlarge block
Large block always fits
➡ Player 1 checks if large block could fit, decides which set was packed.
➡ Wins with advantage > 1/2 + 1/n.
revealedblock
revealedblock
left set packed right set packed
Interval-Packing Game: Warm-up Case
Assume < 1.5n - 1 blocks of space
A Local SE Construction: First Attempt[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
1 2 3 4 5 6 7 8 9 10
Setup
j ← H(K, w)
Store first entry in bucket j, next in j+1, etc
For each term w:
Pad all buckets to a max-size
5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)
A Local SE Construction: First Attempt[Asharov-Naor-Segev-Shahaf,STOC’16]
4 9 37
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
1 2 3 4 5 6 7 8 9 10
5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)
Setup
j ← H(K, w)
Store first entry in bucket j, next in j+1, etc
For each term w:
Pad all buckets to a max-size
A Local SE Construction: First Attempt[Asharov-Naor-Segev-Shahaf,STOC’16]
4 9
8
37
22
93 94 95
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
1 2 3 4 5 6 7 8 9 10
5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)6 ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)
Setup
j ← H(K, w)
Store first entry in bucket j, next in j+1, etc
For each term w:
Pad all buckets to a max-size
A Local SE Construction: First Attempt[Asharov-Naor-Segev-Shahaf,STOC’16]
1 8 89
4
90
37
4
94
62
9
8
75
37
22
93 94 95
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
1 2 3 4 5 6 7 8 9 10
5 ← H(K, 𝚠𝚑𝚒𝚕𝚎)6 ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)1 ← H(K, 𝚐𝚘𝚝𝚘)3 ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)
Setup
j ← H(K, w)
Store first entry in bucket j, next in j+1, etc
For each term w:
Pad all buckets to a max-size
Search(w)j ← H(K, w)
Retrieve entirety of buckets j, j+1,…, j+countw
‣ Max-bucket size will be O(log(N)), where N is number of entries of input index ‣ Locality 1, but storage O(N log(N)) and O(L log(N)) bits read for list of size L
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
4 9 37
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
1 2 3 4 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)4 9 37
65 7
(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
93 94 95 4 9 37 8 22
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
93 94 95 4 9 37 8 22
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
93 94 95 1 4
8
9
89
37
90
94 8 22
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
93 94 95 1 4
8
9
89
37
90
94 8 22
1 2 3 4 5 6 7 8 9 10
(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
93 94 95 1 4
8
9
89
37
90
4
94
37
8
62
22
75
1 2 3 4 5 6 7 8 9 10
(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
term records
while 4, 9,37
return 8,22,93,94,95
goto 1,8,89,90,94
foreach 4,37,62,75
Setup
( j, j′�) ← H(K, w)Use whichever of j or j’ results in lower total load
For each term w:
Pad all buckets to a max-size
Search(w)
Retrieve entirety of buckets j, j+1,…, j+countw and buckets j’, j’+1,…, j’+countw
( j, j′�) ← H(K, w)
93 94 95 1 4
8
9
89
37
90
4
94
37
8
62
22
75
1 2 3 4 5 6 7 8 9 10
(5,1) ← H(K, 𝚠𝚑𝚒𝚕𝚎)(6,9) ← H(K, 𝚛𝚎𝚝𝚞𝚛𝚗)(1,4) ← H(K, 𝚐𝚘𝚝𝚘)(7,4) ← H(K, 𝚏𝚘𝚛𝚎𝚊𝚌𝚑)
‣ Need to show: Less padding required to prevent overflows and hide number stored in each bucket.
A Local SE Construction: The Power of Two Choices[Asharov-Naor-Segev-Shahaf,STOC’16]
Theorem: The Power-of-Two-Choices SE construction achieves:
(1) O(N loglog(N)) index size,
(2) O(1) locality,
and
(3) Read O(L loglog(N)) bits to retrieve a list of size L.
‣ Non-trivial proof uses techniques from prior power-of-two-choices work. ‣ Note: Actual results stated with different parameter regime.
[Azar-Broder-Karlin-Upfal,STOC’94]
Open Problems
1. Lower bounds for SE with updates
• Existing SE constructions with single-round search and update need substantial client memory and do not fully support deletes
• Such SE might imply ORAM, where lower bounds are known
2. Get a better understanding of real-world security
• Known attacks do not scale to very large document sets (or document sets with a very large number of terms). Better attacks likely exist.
• Identify properties of text that make attacks harder or easier.
• Heuristic countermeasures (like random “dummy documents”) have been explored but are inconclusive.
Thanks!