61
Ely Porat Bar-Ilan University Group Testing and New Algorithmic Applications

Group Testing and New Algorithmic Applications

  • Upload
    coby

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

Group Testing and New Algorithmic Applications. Ely Porat Bar- Ilan University. Compressive sensing. Theory of Big data. Pattern matching. Distributed. Coding theory. Group testing. Game theory. Theory of Big data. Succinct data structures. Streaming algorithm. Sketching & LSH. - PowerPoint PPT Presentation

Citation preview

Page 1: Group Testing and New Algorithmic Applications

Ely Porat

Bar-Ilan University

Group Testing and New Algorithmic Applications

Page 2: Group Testing and New Algorithmic Applications

Theory of Big data Pattern matching

Game theoryCoding theory

Compressive sensing

Group testing Distributed

Page 3: Group Testing and New Algorithmic Applications

Bloom filters

Theory of Big data

Succinct data structures

Streaming algorithmSketching & LSH

Big Databases

Page 4: Group Testing and New Algorithmic Applications

Group Testing Overview

Test soldier for a disease

WWII example: syphillis

Page 5: Group Testing and New Algorithmic Applications

Group Testing Overview

Test an army for a disease

WWII example: syphillis

What if only one soldier has the

disease?

Can pool blood samples and

check if at least one soldier has

the disease

Page 6: Group Testing and New Algorithmic Applications

More Motivations• Syphilis, HIV [Dor43]• Mapping genomes [BLC91, BBK+95, TJP00]• Quality control in product testing [SG59]• Searching files in storage systems [KS64]• Sequential screening of experimental variables [Li62]• Efficient contention resolution algorithms for multiple access

communication [KS64, Wol85]• Data compression [HL00]• Software testing [BG02, CDFP97]• DNA sequencing [PL94]• Molecular biology [DH00, FKKM97, ND00, BBKT96]

Page 7: Group Testing and New Algorithmic Applications

Adaptive group testing

Number of sickd ≤ 2

Page 8: Group Testing and New Algorithmic Applications

Adaptive general case

Number of sick≤d

2dAt most d positive => There remain n/2

Run in recursion

n

O(dlog(n/d))

Page 9: Group Testing and New Algorithmic Applications

Non adaptive group testing

• All the tests set in advance.

n

t

Page 10: Group Testing and New Algorithmic Applications

Non adaptive group testing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

110101

0

0

0

1

0

0

0

0

0

1

0

0

=

(and,or) matrix vector multiplication

Page 11: Group Testing and New Algorithmic Applications

Non adaptive group testing

1 2 3 n…………

1

2

3

t

.

.

.

1 0 0 1………….

0 0 1 0………….

0 0 0 1………….

1 1 1 0………….

.

.

.

x1

x2

x3

xn

.

.

.

.

.

.

r1

r2

r3

rt

.

.

.

unknown

To be designed

Observed

Upper bound: t=O(d2logn) [PR08]Lower bound: t=Ω(d2logdn) [DR82]

Page 12: Group Testing and New Algorithmic Applications

Non adaptive group testing

Page 13: Group Testing and New Algorithmic Applications

2-Stage group testing

Page 14: Group Testing and New Algorithmic Applications

2-Stage group testing

We misclassified 2 soldiers.

Using O(dlog n/d) measurement.We will misclassified O(d) soldiers,

which we can easily one by one in a second stage

Property of unbalanced expander.

Page 15: Group Testing and New Algorithmic Applications

Adaptive vs Non adaptiveIf one test take a day performing.Adaptive testing might take a month

2 stage group testing – take 2 daysTime

Store lessto be check later

Page 16: Group Testing and New Algorithmic Applications

Group testing for Pattern Matching

Text:n

Pattern:m

Page 17: Group Testing and New Algorithmic Applications

Part of 20M€ consortium project which is supported by MOI (cyber security)

Supported byGroup testing for Pattern Matching

Page 18: Group Testing and New Algorithmic Applications

Motivation…• Stock market

Page 19: Group Testing and New Algorithmic Applications

Motivation..• Espionage

The rest we monitor

Page 20: Group Testing and New Algorithmic Applications

Motivation…• Viruses and malware

Software solutions:Snort: 73.5MbClamAV: 1.48Gb

Using TCAMs:Snort: 680KbClamAV: 25Mb

Our solution (software):Snort: 51KbClamAV: 216Kb

Page 21: Group Testing and New Algorithmic Applications

Group testing for Pattern Matching

Text:

Pattern:

• Pattern matching with wildcards – O(nlogm) [CH02]

• Up to k mismatches [CEPR07,CEPR09].

• Sketching hamming distance [PL07,AGGP13].• Pattern matching in the streaming model [PP09]

n

m

Page 22: Group Testing and New Algorithmic Applications

Group testing for Pattern Matching

Text:

Pattern:

• Up to k mismatch using group testing

Group testing scheme

Performing the tests is easy.However how can we analyze the results?

Page 23: Group Testing and New Algorithmic Applications

Fast DecodingThe naïve decoding take O(nt) time.

Page 24: Group Testing and New Algorithmic Applications

Fast DecodingWe perform 3 GT schemes.

1. The original.2. First projection.3. Second projection.

Page 25: Group Testing and New Algorithmic Applications

Fast DecodingWe first decode the projections.

Then we check the d2 options naively

In [NPR11] we mange to have scheme With optimal number of measurements

and decode time O(d2log2n). (Using recursion and 2-stage GT)

If we use the scheme of 2 stage GT,We will have 4d2 candidate to check

Page 26: Group Testing and New Algorithmic Applications

Faster Decoding

According to LW theorem the number of candidate in the join is d1.5 In [NPRR12] we show how to do join in optimal time.Best paper award

This give a scheme with optimal number of measurements, which can be decode in time O(d1+Ԑpoly(logn))

Page 27: Group Testing and New Algorithmic Applications

Compressive Sensing

n

t

2

2

0

10

1

Page 28: Group Testing and New Algorithmic Applications

Compressive Sensing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

220101

0

0

0

1

0

0

0

0

0

1

0

0

=

Page 29: Group Testing and New Algorithmic Applications

Compressive Sensing

n

t

1 0 1 1 0 0 0 1 1 0 100 0 1 0 1 0 1 0 1 0 110 1 0 1 0 1 1 0 0 1 011 0 1 1 0 1 0 1 0 1 001 1 0 1 1 0 0 1 0 0 100 1 0 0 1 0 1 0 1 0 11

13.7

0.1

0.2

0.1

5.8

0.1

0.3

0.1

0.2

0.1

7.3

0.1

0.2

=

13.9

0.7

6.4

1.08.2

Page 30: Group Testing and New Algorithmic Applications

Compressive SensingProblem definition

Find a matrix Ф and an algorithm A s.t.:

)(* yAxxyRx n

qdp xxCxx |||*|

qdkxk xxxk

||minarg )(support

In [PS12] we gave the first optimal number of measurement sublinear decoding time.For p=q=1In [GLPS09, GNPRS13] we gave a randomized solution (foreach) for p=q=2 with sublineardecoding.

Page 31: Group Testing and New Algorithmic Applications

How Compressive Sensing help Massive Recommender Systems

• Consider designing recommender system for web pages– Time a user examines a page is an implicit rating– Millions of users– Each user examines thousands of pages throughout

the year– Hard to store and process the information

Page 32: Group Testing and New Algorithmic Applications

Fingerprint Based Approach

F1a1 C1

F2a2 C2

Fnan Cn

Similarity (ai,aj)...

Page 33: Group Testing and New Algorithmic Applications

Sampling Approach

c,l,t

a1 C1

a,c,d,f,h,l,m,n,p,r,s,t

f,m,s

a2 C2

a,b,c,f,h,l,m,n,o,p,r,s

Regular sampling doesn’t work

Page 34: Group Testing and New Algorithmic Applications

Minwise hashing approach

h

a1

a,c,d,f,h,l,m,n,p,r,s,t

h

a2

a,b,c,f,h,l,m,n,o,p,r,s

h(x) 5,3, 7,9,2,8

h(x) 5,4, 3,7,2,8

[BHP09,BPR09,BP10,FPS11,FPS12,T13]

Page 35: Group Testing and New Algorithmic Applications

Min wise hash function

A B

)(minarg)(minarg xhxh BAxBAx

Page 36: Group Testing and New Algorithmic Applications

Min wise hash function

A B

Page 37: Group Testing and New Algorithmic Applications

Similarity

A B

We get ±є approximation with probability 1-δ

Min wise independent

Page 38: Group Testing and New Algorithmic Applications

Reducing sketching space [BP10]Instead of

Additional pairwise independent hash

It was discover independently by Ping Li and Christian Konig

Page 39: Group Testing and New Algorithmic Applications

Reducing sketching space [BP10]

Our algorithm estimates

Page 40: Group Testing and New Algorithmic Applications

Reducing sketching space even farther [BP10]

We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t

A B A-B

0110100101

0100101101

001000-1000

CS 20-2

Page 41: Group Testing and New Algorithmic Applications

Reducing sketching space even farther [BP10]

We usually interesting in the case that sets are very similar.Assume J>1-t => p>1-0.5t

A B A xor B

0110100101

0100101101

0010001000

CS 101

This give an improvement of2

2log2

tt

Page 42: Group Testing and New Algorithmic Applications

Removing the min wise independent requirement [BP11]

• [KNW10] gave bits sketch for distinct count (F0)

• Their sketch is not linear – However given S(A) and S(B) one can calculate

S(A+B) (that will give the size of the union)

1log1

2O

Page 43: Group Testing and New Algorithmic Applications

Removing the min wise independent requirement [BP11]

BABABA

BABA

J

)(~

OJ

BABABA

J

Using F2 instead of F0 we managed to reduce the sketch size to

tt

O 1log1log)(

12

Using more randomness we mange to remove factor t1log

Page 44: Group Testing and New Algorithmic Applications

File sharingThe naïve way

Supported by

Page 45: Group Testing and New Algorithmic Applications

File sharingTorrent/Emule/Kazaa

Page 46: Group Testing and New Algorithmic Applications

File sharingSource:

Clients:

Coupon collector O(nlogn)In practice it could be 7Gb instead 1Gb

Page 47: Group Testing and New Algorithmic Applications

Network coding

Page 48: Group Testing and New Algorithmic Applications

Network coding

1 2 i nSource:

Client 1: 3X7+2X17, 5X2+X5+4X10, ....Client 2: 2X1+3X3+X17, ....Client 3: Client 4:

In a big field, n linear combinations will sufficeWe require 1Gb upload for 1Gb file

Page 49: Group Testing and New Algorithmic Applications

PoisonTorrent/Emule/Kaza

Page 50: Group Testing and New Algorithmic Applications

Signatures against poison

MD5

Si

.torrent file

S1S2...Sn

1 2 i n

We might receive poisoned packetBut we won't forward it

Page 51: Group Testing and New Algorithmic Applications

Signatures in network coding

MD5

Si

.torrent fileS1,S2,...Sn,S(X1+X2),S(X1+X3),.......

1 2 i n

There are exponential number of options

Page 52: Group Testing and New Algorithmic Applications

Zhao - Homomorphic signature

1 2 n

1

2

n

1 0 ... 0

0 1 ... 0

. . . .

0 0 ... 1

M=

We can find a vector u s.t. Mu=0

A correct packet v will be orthogonal to u<v,u>=0

Page 53: Group Testing and New Algorithmic Applications

Zhao - Homomorphic signatureWe can find a vector u s.t. Mu=0

A correct packet v will be orthogonal to u<v,u>=0

But if Eve know u then she can find v which is orthogonal to u.

Solution:Instead of sending u to everyone send vector

Page 54: Group Testing and New Algorithmic Applications

Zhao - Homomorphic signature

Given v which is a linear combination of the files packets

It require n+m power operations.In practice it take more time then downloading

Page 55: Group Testing and New Algorithmic Applications

Selective verification [PW12]

S'i

Packeti

S''i

If we have both signatures we can choose randomly which to check

Page 56: Group Testing and New Algorithmic Applications

Problem

Eve can combine signatures

Page 57: Group Testing and New Algorithmic Applications

Solution

Use a linear error correcting code.

12

n

1 0 ... 00 1 ... 0. . . .0 0 ... 1

We perform Zhao signature on each block

Page 58: Group Testing and New Algorithmic Applications

Analysis

q^n – True combinations

12

n

1 0 ... 00 1 ... 0. . . .0 0 ... 1

=defective (for our GT)

Page 59: Group Testing and New Algorithmic Applications

Analysis

Pr[one block pass the test]<qn/qdn=q-(d-1)n

Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2

dnn+m

r1 2

Page 60: Group Testing and New Algorithmic Applications

Analysis

dnn+m

r1 2

Using union bound: the probability that a bad packet exist is bounded by q(n+m)+r/log q-(d-1)nr

Pr[one block pass the test]<qn/qdn=q-(d-1)n

Pr[r/2 out of r pass the test]< 2rq-(d-1)r/2

In practice we improve Zhao signature by a factor of 60.

Page 61: Group Testing and New Algorithmic Applications

Conclusion

• Group testing/Compressive sensing is very effective tool.

• We improved both construction and achieved sublinear decoding time.

• Surprising important applications.