Upload
francisco-harmond
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
The beauty of prime numbersvs
the beauty of the random
Ely PoratBar-Ilan University
Israel
Outline
• Applications• Prime Numbers Group Testing• De-randomized approach for group testing• Applications getting into details• Length Reduction
Pattern Matching
• Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P.
T=
P=
• The character of T arrive one by one• We can’t save T
Streaming Model
T=
P=
Our goal is to do that without saving P
Φ(P)
Automata?
Hamming distance with wildcards
• Find a pattern in a text with 2 complications:– Don’t cares (wildcards Ø)– Mismatches
Text:
Pattern:
Summaries results
• Offline– O(nklog2m) hamming distance with wildcards
• Online Pattern Matching– hamming distance– O(klog2m) hamming distance with wildcards– O(klogm) Edit distance
• Streaming– O(log2m) space O(logm) time – Exact match– O(k3log5m) space O(k2log2m) time – hamming
log logO k k m
Open problem
• Online convolution in o(log2m) time per symbol. • Offline is done by FFT in O(nlogm).
t1 t2 t3 t4 t5 t6 . . . tn
p1 p2 p3 p4 p5
t1p1+t2p2+…t5p5
p1 p2 p3 p4 p5
t2p1+t3p2+…t5p6
m=5
• m people• at most k are sick• Query: Is someone in
this set sick? • Goal: identify the sick
people by only few tests.
• Non-adaptive
? ? ??? ?
.
.
.
Problem Definition
. . .
Motivations• Syphilis, HIV [Dor43]• Mapping genomes [BLC91, BBK+95, TJP00]• Quality control in product testing [SG59]• Searching files in storage systems [KS64]• Sequential screening of experimental variables [Li62]• Efficient contention resolution algorithms for multiple access
communication [KS64, Wol85]• Data compression [HL00]• Software testing [BG02, CDFP97]• DNA sequencing [PL94]• Molecular biology [DH00, FKKM97, ND00, BBKT96]
Background
• Same conditions:– Deterministic KS64– Random KS64– Heavy deterministic AMS06
• Lower bound:– CR96
• Relaxed conditions:– Fully adaptive– Two staged group testing and selectors [CGR00,
Kni95, BGV03, CMS01, BV03, BGV05]– Optimal monotone encoding [AH08]
• Similar problems:– Inhibitors [FKKM97, Dam98, BV98, BGV03]– Bayesian case [Kni95, BL02, BL03, A.J98, BGV03]– Errors [BGV98]
• DIMACS 2006
)log( 2 nk k
)log( ln22 nk nk
)ln( 2 nk
)ln( 2 nk
)log( 2 nk k
)log( ln22 nk nk
Scheme size
Deterministic
Random andHeavy deterministic
Lower bound
Our Results
• Deterministic
• Size
• Fast construction
)ln( 2 nk
)log( 2 nk k
)log( ln22 nk nk
Scheme size
Deterministic
Random andHeavy deterministic
Lower bound
)ln( 2 nk
)ln( nnk
Prime Numbers Group Testing
, { | mod }i pT x U x p i
0, 1 1, 1 1 1, 1 0, 1, 1,{ , ,... ,...., , ,... }p p p p pr pr pr prT T T T T T
1 2...k
rp p p n1 2, ,..., kx x x Position of sicks
Bad event: Exist y s.t
mod i ji j y p x
Prime Numbers Group TestingBad event: Exist y s.t
mod i ji j y p x
1 2 3 4 5 6...k
rp p p p p p p nx1
x2
x3
x4
.
.
.xk
There is a dot below each prime There exisit xi that for pi1pi2…pid>nY mod pij=xi
By CRT xi=y
Prime Numbers Group Testing
This give group testing of size:p1+p2+…+pr
By choosing good enough primes we get O(k2log2m)
Randomized Group Testing
• Just choose O(k2logn) random sets of size n/k.
Overall derandomization plan
Error correction codes
• • Length of words = m• Number of words = • Distance = • Rate = R• Relative distance =
• Linear code
ECCmRmm q ),,( q ||
LCmRmm q ],,[
Rmq
m
Rm
m
Good random linear error correction codes
• GV bound: There existswith
• Linear codes faster construction• Algorithm: Pick the entries of the generating
matrix uniformly and independently.
ECCmRmm q ),,( )1()(1 oHR q
pp
p
qppH qqq
1
1log)1(
1log)(
Method of conditional probabilities
• Algorithm: Pick the entries of the generating matrix one by one.
• In each step minimize the expected number of collisions between code words.
0
1
2
0
1
2
0
1
2
0 0
0 1
0 2
1 0
1 1
1 2
2 0
2 1
2 2
0
2
1
0
2
1
1
0
2
0
2
1
0
2
1
0
0
1
2
1
1
1
2
1
C=[3,2,2]3-RS
C=[3,2,2]3-RS:1: 0 0 02: 1 1 13: 2 2 24: 0 1 25: 1 2 06: 2 0 17: 0 2 18: 2 1 09: 1 0 2
Reduction from Error correction codes to group testing schemes
GT scheme:{1,4,7}{2,5,9}{3,6,8}{1,6,9}{2,4,8}{3,5,7}{1,5,8}{2,6,7}{3,4,9}
Why should it work?
• Theorem: Let C be an Then F(C) is a group testing scheme for n people with up to sick people.
ECCmnm qq ),log,(
C=[3,2,2]3-RS:1: 0 0 02: 1 1 13: 2 2 24: 0 1 25: 1 2 06: 2 0 17: 0 2 18: 2 1 09: 1 0 2
GT scheme:{1,4,7}{2,5,9}{3,6,8}{1,6,9}{2,4,8}{3,5,7}{1,5,8}{2,6,7}{3,4,9}
111
(Up to 2Sick people)
Why should it work? Proof
A codeword representing a healthy man:
Codewords representing sick men:
k
Worst Case
A codeword representing a healthy man:
Codewords representing sick men:
k
What we got?
)ln( 2 nk
)log( 2 nk k
)log( ln22 nk nk
Scheme size
Deterministic
Random andHeavy deterministic
Lower bound
Applications getting into details
• Streaming• Up to 1 mismatch:
– Assume we have a black box for searching for exact match.
p1p2p3p4p5…pmP:
p1 p3 p5…pmP1,2:
p2 p4 …P2,2:
There is more then one mistake
The other way around isn’t true
Streaming: Up to 1 mismatchp1p2p3p4p5…pmP:
p1 p3 p5…pmP1,2:
p2 p4 …P2,2:
p1 p4 …pm
p2 p5…P2,3 :
p3 …P3,3:
P1,3:
Pq,q:
2*3*5*7*11*…*q>m
With CRT we be able to find the position of the mismatch.
In order to support more mistake we will had on that The Prime numbers group testing