Upload
anais
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Anti-Persistence or History Independent Data Structures. Moni Naor Vanessa Teague Weizmann Institute Stanford. Why hide your history?. Core dumps Losing your laptop The entire memory representation of data structures is exposed Emailing files - PowerPoint PPT Presentation
Citation preview
Anti-Persistence or
History Independent Data Structures
Moni Naor Vanessa Teague
Weizmann Institute Stanford
Why hide your history?
Core dumps Losing your laptop
– The entire memory representation of data structures is exposed
Emailing files– The editing history may be exposed (e.g. Word)
Maintaining lists of people– Sports teams, party invitees
Making sure that nobody learns from history
A data structure has:– A “legitimate” interface: the set of operations
allowed to be performed on it– A memory representation
The memory representation should reveal no information that cannot be obtained from the “legitimate” interface
History of history independence
Issue dealt with in Cryptographic and Data Structures communities
Micciancio (1997): history independent trees– Motivation: incremental crypto– Based on the “shape” of the data structure, not including
memory representation– Stronger performance model!
Uniquely represented data structures– Treaps (Seidel & Aragon), uniquely represented dictionaries– Ordered hash tables (Amble & Knuth 1974)
More History
Persistent Data Structures: possible to reconstruct all previous states of the data structure (Sarnak and Tarjan)– We want the opposite: anti-persistence
Oblivious RAM (Goldreich and Ostrovsky)
Overview
Definitions History independent open addressing hashing History independent dynamic perfect hashing
– Memory Management
(Union Find) Open problems
Precise Definitions
A data structure is – history independent: if any two sequences of
operations S1 and S2 that yield the same content induce the same probability distribution on memory representation
– strongly history independent: if given any two sets of breakpoints along S1 and S2 s.t. corresponding points have identical contents, S1 and S2 induce the same probability distributions on memory representation at those points
Relaxations Statistical closeness Computational indistinguishability
– Example where helpful: erasing Allow some information to be leaked
– Total number of operations– n-history independent: identical distributions if the last n
operations where identical as well Under-defined data structures: same query can yield several
legitimate answers, – e.g. approximate priority queue – Define identical content: no suffix T such that set of permitted results
returned by S1 T is different from the one returned by S2 T
History independence is easy (sort of)
If it is possible to decide the (lexicographically) “first” sequence of operations that produce a certain contents, just store the result of that
This gives a history independent version of a huge class of data structures
Efficiency is the problem…
Dictionaries
Operations are insert(x), lookup(x) and possibly delete(x)
The content of a dictionary is the set of elements currently inserted (those that have been inserted but not deleted)
Elements x U some universe Size of table/memory N
Goal
Find a history independent implementation of dictionaries with good provable performance.
Develop general techniques for history independence
Approaches
Unique representation– e.g. array in sorted order– Yields strong history independence
Secret randomness– e.g. array in random order– only history independence
Open addressing: traditional version
Each element x has a probe sequence
h1(x), h2(x), h3(x), ...– Linear probing: h2(x) = h1(x)+1, h3(x) = h1(x)+2, ...– Double hashing– Uniform hashing
Element is inserted into the first free space in its probe sequence– Search ends unsuccessfully at a free space
Efficient space utilization– Almost all the table can be full
y
Open addressing: traditional version
y x
y
x arrived before y, so move y
No clash, so insert y
Not history independent becauselater-inserted elements move furtheralong in their probe sequence
History independent version
At each cell i, decide elements’ priorities independently of insertion order
Call the priority function pi(x,y). If there is a clash, move the element of lower
priority At each cell, priorities must form a total order
Insertion
xy x
x
p2(x,y)? No, so move xxy
Search
Same as in the traditional algorithm In unsuccessful search, can quit as soon as
you find a lower-priority element
No deletionsNo deletions
Problematic in open addressing anyway
Strong history independence
Claim:
For all hash functions and priority functions, the final configuration of the table is independent of the order of insertion.
Conclusion:
Strongly history independent
Proof of history independence
A static insertion algorithm (clearly history independent):
x3
x5x5
x3
x6
x4
x2
x1
x5
x3x6
x4
x2 x1 p1(x2,x1) so insert x2
p1(x6,x4) and p6(x3,x6), so insert x3
insert x5
x2
x3
x2
x5
x2
x5
x3
Gather up the rejects and restart
x1
x6 x4
p3(x4,x5) and p3(x4,x6).Insert x4 and remove x5
x4x4
x1
x4 x5
x1
Proof of history independence
Nothing moves further in the static algorithm than in the dynamic one– By induction on rounds of the static alg.
Vice versa– By induction on the steps in the dynamic alg.
Strongly history independent
Some priority functions
Global– A single priority independent of cell
Random– Choose a random order at each cell
Youth-rules– Call an element “younger” if it has moved less far
along its probe sequence; younger elements get higher priority
Youth-rules
x
y
p2(x,y) because x has taken fewer steps than y
x
y
y
Use a tie-breaker if steps are equalThis is a priority function
Specifying a scheme
Priority rule– Choice of priority functions– In Youth-rules – determined by probe sequence
Probe functions– How are they chosen– Maintained– Computed
Implementing Youth-rules
Let each hi be chosen from a pair-wise independent collection– For any two x and y the r.v. hi(x) and hi(y) are
uniform and independent.
Let h1, h2, h3,… be chosen independently
Example: hi(x) = (ai ·x mod U) + bi mod NSpace: 2 elements per function
Need only log N functions
Performance Analysis
Based on worst-case insertion sequence The important parameter: - the fraction of
the table that is used ·N elementsAnalysis of expected insertion time and
search time (number of probes to the table)– Have to distinguish successful and
unsuccessful search
Analysis via the Static Algorithm
For insertions, the total number of probes in static and dynamic algorithm are identical– Easier to analyze the static algorithm
Key point for Youth-rules: in the phase i all unsettled elements are in the ith probe in their sequence– Assures fresh randomness of hi (x)
Performance
For Youth-rules, implemented as specified: For any sequence of insertion the expected
probe-time for insertion is at most 1/(1-) For any sequence of insertion the expected
probe-time for successful or unsuccessful search is at most 1/(1-)
Analysis based on static algorithm
is the fraction of the table that is used
Comparison to double hashing
Analysis of double hashing with truly random functions [Guibas & Szemeredi, Lueker & Molodowitch]
Can be replaced by log n wise independent functions (Schmidt & Siegel)
log n wise independent is relatively expensive: – either a lot of space or log n time
Youth-rules is a simple and provably efficient scheme with very little extra storageExtra benefit of considering history independence
Other Priority Functions
[Amble & Knuth] log(1/(1-)) for global– Truly random hash functions
Experiments show about log(1/(1-)) for most priority functions tried
Other types of data structures
Memory management (dealing with pointers)– Memory Allocation
Other state-related issues
Dynamic perfect hashing:FKS scheme, dynamized
x6
x4
x1
x2
x5
x3s0
h0
h1
hk
h s1
sk
Top-level table:O(n) space
Low-level tables:O(n) space total.
Each gets about si2
n elements tobe inserted
The hi are perfect on their respective sets. Rechoose h or some hi to maintain perfection and linear space.
A subtle problem:the intersection bias problem
Suppose we have:– a set of states {1, 2, ...}– a set of objects {h1, h2, ...}– a way to decide whether hi is “good” for j.
Keep a “current” h as states change Change h only if it is no longer “good”.
– Choose uniformly from the “good” ones for . Then this is not history independent
– h is biased towards the intersection of those good for current and for previous states.
Dynamized FKS is not history independent
Does not erase upon deletion Uses history-dependent memory allocation Hash functions (h, h1, h2, ...) are changed
whenever they cease to be “good”– Hence they suffer from the intersection bias
problem, since they are biased towards functions that were “good” for previous sets of elements
– Hence they leak information about past sets of elements
Making it history independent
Use history independent memory allocation
Upon deletion, erase the element and rechoose the appropriate hi. This solves the low-level intersection bias problem.
Some other minor changes Solve the top-level intersection bias
problem...
Solving the top-level intersection bias problem
Can’t afford a top-level rehash on every deletion
Generate two “potential h”s 1 and 2 at the beginning– Always use the first “good” one– If neither are good, rehash at every deletion– If not using 1, keep a top-level table for it for easy
“goodness” checking (likewise for 2)
Proof of history independence
Table’s state is defined by: The current set of elements Top-level hash functions
– Always the first “good” i, or rechosen each step
Low-level hash functions– Uniformly chosen from perfect functions
Arrangement of sub-tables in memory– Use history-independent memory allocation
Some other history independent things
Performance
Lookup takes two steps Insertion and deletion take expected amortized
O(1) time– There is a 1/poly chance that they will take more
Open Problems
Better analysis for youth-rules as well as other priority functions with no random oracles.
Efficient memory allocation – ours is O(s log s)
Separations– Between strong and weak history independence– Between history independent and traditional
versions e.g. for union find
Can persistence and (computational) history independence co-exist efficiently?
Conclusion
History independence can be subtle We have two history independent hash tables:
– Based on open addressing Very space efficient but no deletion
– Dynamic perfect hashing Allows deletion, constant-time lookup
Open addressing: implementing hash functions
For all i, generate random independent ai, bi
hi(x) = (aix mod U + bi) mod N– U : size of universe; prime– N : size of hash table
x’s probe sequence is h1(x), h2(x), h3(x), ... We need log n hash functions
– n is the number of elements in the table