Upload
redell
View
49
Download
0
Embed Size (px)
DESCRIPTION
Improve sketching of Hamming Distance with Error Correcting. Ely Porat Bar-Ilan University Google Inc. Ohad Lipsky Bar-Ilan University Check Point Inc. December 2003. Problem Definition (1). Alice. Bob. T A. T B. n. n. hamm(T A ,T B ). Given k - bound on the number of mismatches. - PowerPoint PPT Presentation
Citation preview
Improve sketching of Hamming Distance with Error Correcting
Ely Porat
Bar-Ilan University
Google Inc
Ohad Lipsky
Bar-Ilan University
Check Point Inc
December 2003
Problem Definition (1)Alice Bob
n nTA TB
hamm(TA,TB)
Given k - bound on the number of mismatches
December 2003
Problem Definition (2)
n nTA TB
Calculate hamm(TA,TB) given only SA,SB
SA SB
S S
Finding the mistakes
Given k - bound on the number of mismatches
December 2003
Motivations
• Data Bases
• Internet
• Error Correcting
Router A
Router B
Router C
Router D
December 2003
Outline:
• Simple Solution
• Error Correcting
• Improved Solution
• Improve more
• Recursion
• File sharing
December 2003
Simplest Solution - O(k2log1/)
• Binary Alphabet
• Allocate k2 cells.
• Take the input array and hash each bit to one of the cells.
• In each cell remember the xor of all the values hash to it.
0 1 1 0December 2003
Simplest Solution - O(k2log1/)
1 1 0 0
0 1 0 0
December 2003
Simplest Solution - O(k2log1/)
• Due to the birthday principal:The probability that 2 Error will fallto the same cell < 1/2
• log1/ - to get a probability to fail
0 1 1 0December 2003
Alphabet
• Denote with S the size of the alphabet.• We can encode each latter with it’s unary
representation.
• The only effect is that each mistake will be counted twice.
0 - 1000000….01 - 0100000….0.S-1 - 0000000….1
0 - 1000000….05 - 0000010….0
December 2003
Error correcting - O(k2logNS)
• Here we allocate two kind of k2 cellsk2 of logS bits. k2 of logNS bits.
5 8 3 2
15 6 7 8
C1[h(A[i])]+=A[i]
C2[h(A[i])]+=iA[i]
December 2003
Error correcting - O(k2logNS)
• As before with probability > 1/2 there won’t fall 2 Errors in the same cell.
5 8 3 2
15 6 7 8
C1[h(A[i])]+=A[i]
C1[h(A[i])]+=iA[i]
December 2003
Error correcting - O(k2logNS)
• We get from the red cells:
5 8 3 2
C1[h(A[i])]+=A[i]
5 6 3 2
5
3
8 - 6 = 5 - 3
December 2003
Error correcting - O(k2logNS)
• We get from the blue cells:
15 11 7 5
15 9 7 5
5
3
11 - 9 = 2*(5 - 3) => i=2
C2[h(A[i])]+=iA[i]
0 1 2
December 2003
Error correcting - O(k2logNS)
• The probability to succeed is about 1/2.
• To lower the failer probability we will run it 3 times.
• We will get a list of possible mistakes each time.
• Output all the mistakes that appear in at least 2 of the 3 runs.
December 2003
O(klog2k) - Solution
• The Idea is two stage hashes:
k/logk
w.h.p O(logk)
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
O(klog2k) - Solution
O(logk)
O(log2k)
The Probability to fail is less then 1/2.
Run it 2logk timesAnd take the max.
=> failer probabilty less then 1/k2
Space = O(log3k)
keep accumulated XOR
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
O(klog2k) - Solution
k/logkO(log3k) O(log3k) O(log3k) O(log3k)
O(klog2k)
P(Failer) k/logk * 1/k2 < 1/k
Bar-Yossef, Jayram, Kumar, Sivakumar 03 December 2003
O(k2log*klogk) -Idea (recursion)
k/logk
logk/loglogk
Pr(F)<1/logck
logk/loglogk runs, take max
December 2003
Error Correcting O(klogNS)Alice Bob
n nTA TB
r0r1r2…
p=(N3S)
ri random w.p
1
k0 o.w
1 TA riaimod pi0
n 1
1 TB ribimod pi0
n 1
1 TA
1 TA 1 TB 0
rj a j b j random
nomistake
onemistake
more thenone
Constant Probability
December 2003
Error Correcting O(klogNS)Alice Bob
n nTA TB
1 TA riaimod pi0
n 1
1 TB ribimod pi0
n 1
1 TA
1 TA 1 TB 0
rj a j b j random
nomistake
onemistake
more thenone
1' TA iriaimod pi0
n 1
1' TB iribimod pi0
n 1
1' TA
1' TA 1' TB 1 TA 1 TB
jrj a j b j rj a j b j
j
If we wrong w.h.p j>n
December 2003
Error Correcting O(klogNS)Alice Bob
n nTA TB
1' TA 1' TB 1 TA 1 TB
j
rj , aj - bj
December 2003
Error Correcting O(klogNS)Alice Bob
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck ln k TA ,ck ln k ' TA
O(klnk)
December 2003
RecursionAlice Bob
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck TA ,ck ' TA
ck
ri random w.p
1
k0 o.w
n nTA TB
1 TA ,1' TA
2 TA ,2 ' TA
ck2TA ,ck
2' TA
ri random w.p
2
k0 o.w
ck
2
December 2003
RecursionAlice Bob
n nTA TB
ck
ri random w.p
1
k0 o.w
ri random w.p
2
k0 o.w
ck
2
ri random w.p
4
k0 o.w
ck
4
ck ck
2ck
4 2ck
O(klogNS)
December 2003
Complexity
n nTA TB
SA SB
S S
Size: O(klogNS)Computing sketch: O(nlogk)Comparing sketches: O(klogk)
December 2003
O(klogk) -Solution
• We can just encode in unary and hash the input to k3 cells and then run the O(klogNS)=O(klogk) algorithm.
December 2003
Reed-Solomon Codes
1 1 1 1
1 2 3 2k
1 22 32 2k 2
1 2n 3n 2k n
a0
a1
a2
an 1
p 1 p 2
p 2k
p x a0 a1x a2x2 an 1x
n 1
We manage to develop a deterministic algorithm based on that.But the encoding and the decoding is slower.
Amir, Farach 95Feigenbaum, Ishai, Malkin, Nissim, Strauss, Wright 01Bar-Yossef, Jayram, Kumar, Sivakumar 03
Efremenko, Porat, Rothschild 06Efremenko, Porat 07
File Sharing
nsource Napster
Source need to stay until someone will have the whole file. (and willing to stay)
There is bottleneck at the end.
File Sharing
nsource emule/kazaa/torrent
The source has to send nlnn blocksbefore disconnecting.
Sometimes there are some bottlenecks
Improved File Sharing - Ver 1
a0a1a2…………….an-1n
source
p x a0 a1x a2x2 an 1x
n 1
ai F2b
0 , p
0 ,
1 , p1 ,
2 , p2 , n6, p n6
n6
Improved File Sharing - Ver 1n6
Each client that got n points can recreate the file
There is no more nlnn
Almost no bottlenecks
Improved File Sharing - Ver 2
ai F2ba0a1a2…………….an-1
nsource
Send linear equations on the file.
r0,0 r0,1 r0,n 1
r1,0 r1,1 r1,n 1
rn 1,0 rn 1,1 rn 1,n 1
Pr success 12b
n 1
2bn
1
2bn 2
2bn
1
2bn i
2bn
1
1
2bn
1 2 b 1
Improved File Sharing - Ver 2
a0a1a2…………….an-1n
source
Problems: 1. Heavy to encode each packet we need to go over all the file.2. Very heavy to decode O(n2) block operation + O(n3) fields operations.
Facts:1. If you get n(1/2-) random combination of two blocks you won’t have dependents w.h.p.2. If you have d - pairs combinations you can easilly reduce your system to n-d variables.
Solution: Use sparse functionals
Improved File Sharing - Ver 2
a0a1a2…………….an-1n
source
Futures: 1. Backward compatibility.2. Even if you don’t have the whole file you can mix functionals.