34
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir

Two Glass Balls and a Tower

  • Upload
    zody

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Two Glass Balls and a Tower. Amihood Amir. Bar Ilan University And Georgia Tech. Costas Iliopoulos. Oren Kapah. Ely Porat. Artistic Consultant: Aviya Amir. Given: A glass ball. An n storied building. - PowerPoint PPT Presentation

Citation preview

Page 1: Two Glass Balls and a Tower

Bar Ilan UniversityAnd

Georgia Tech

Artistic Consultant: Aviya Amir

Page 2: Two Glass Balls and a Tower

Given: A glass ball. An n storied building.

Find: The floor k such that the ball breaks when dropped from it, but does not break if dropped from floor k-1.

Page 3: Two Glass Balls and a Tower

STRATEGY 1: Only one ball given to experiment with.

O(n) experiments necessary.

Sequential search.

nn-1

4321

Page 4: Two Glass Balls and a Tower

STRATEGY 2: As many balls as necessary are given to experiment with.

O(log n) experiments necessary.

Binary search.

Page 5: Two Glass Balls and a Tower

STRATEGY 3: Only two balls given to experiment with.

O( ) experiments necessary.

Bounded divide-and-conquer

n

)( nO

)( nO

Experiments 1st ball

Experiments 2nd ball

n

n

n

n

n

n

Page 6: Two Glass Balls and a Tower

Meaning of two Balls

“Bounded” Divide-and-Conquer.

In reality: Different paradigms.

1. Works on large groups.

2. Works within a group.

Page 7: Two Glass Balls and a Tower

In Pattern Matching

1. Works on large groups:

Convolutions:

O(n log m) using FFT

210

2423222120

1413121110

0403020100

012

43210

rrr

bababababa

bababababa

bababababa

bbb

aaaaab0 b1 b2 b0 b1 b2b0 b1 b2

Page 8: Two Glass Balls and a Tower

Problem: O(n log m) only in algebraically closed fields, e.g. C.

Solution: Reduce problem to (Boolean/integer/real) multiplication.

This reduction costs!

Example: Hamming distance.

Counting mismatches is equivalent to Counting matches

A B A B C

A B B B A

Page 9: Two Glass Balls and a Tower

Example:

Count all “hits” of 1 in pattern and 1 in text.

011

01001

00000

01001

101

010011 0 11 0 11 0 1

Page 10: Two Glass Balls and a Tower

For a

Define:

)(ba1 if a=b

0 o/w

)()...()()()...( 321321 naaaana SSSSSSSS

Example:

1001100)( abbaabba

Page 11: Two Glass Balls and a Tower

For cba ,,

Do:

)()(

)()(

)()(

Rcc

Rbb

Raa

PT

PT

PT

+

+

Result: The number of times a in pattern matches a in text + the number of times b in pattern matches b in text + the number of times c in pattern matches c in text.

Page 12: Two Glass Balls and a Tower

So for alphabet with a symbols (a fixed) the time is:

O(n a log m) = O(n log m)

Problem: Infinite alphabets.

Page 13: Two Glass Balls and a Tower

Without loss of generality:

|| = m + 1Since every element of T not in P is replaced by some symbol x not in P.

ABCDEFGH same number of errors as ABXXXXGH

ABBBBBGH ABBBB BGH

Example:

Page 14: Two Glass Balls and a Tower

Divide and Conquer Idea (Wrong)

0

iSiSe

Split to 1 U 2 of size m/2 each.

Construct T1, P1 and T2, P2. Where for S = { T, P } and e = { 1, 2 }:

if

o/weiS ][

Page 15: Two Glass Balls and a Tower

The Algorithm

1. Find num1 = number of matches of P1 in T1

2. Find num2 = number of matches of P2 in T2

3. matches num1 + num2

Time: O(n) every iteration for changing alphabet.

Page 16: Two Glass Balls and a Tower

Time: T(m)=2T(m/2) + n.

Closed Form:

T(m) = 2i T(m/2i) +

(2i-1 + 2i-2 + … + 2 + 1) n =

(2 log m + … + 2 + 1) n =

O(m2 n)

THIS IS BAD !!!

Page 17: Two Glass Balls and a Tower

Needed: Faster way to compute matches of x to itself.

Such a method exists if x appears in the pattern a very small number of times.

Assume: x appears in pattern c times.

For every occurrence of x in text, update just the appropriate counters of the c occurrences of x in the pattern.

Text:

Pattern: XXX

XXXXX

XXX XXX

Time: O(nc).

Page 18: Two Glass Balls and a Tower

Problem: In general it could be that x occurs in the pattern O(m) times, then total time becomes O(nm). BAD again.

Tradeoff: If x appears in the pattern more than c times, count matches by FFT, in time O(n log m), per x.

For all x’s that appear in the pattern less than c times, count matches (simultaneously) in time O(nc).

Page 19: Two Glass Balls and a Tower

How many elements appear at least c times?

c

m

For these elements , time: O((m/c) n log m).

For all other elements, time: O(nc).

The optimal case is when they equal, i.e.

cnc

mmn

log

2log cmm

cmm log

Total Time: )log( mmnO )A-87,K-87(

Page 20: Two Glass Balls and a Tower

In our Tower Metaphor:

>c

>c

>c

A Separate convolution for each group of floors (repetitions of a number x).

>c

<c<c<c

Every element within the group is taken care of individually.

However, all groups are “scanned” together.

Page 21: Two Glass Balls and a Tower
Page 22: Two Glass Balls and a Tower

Weighted Sequences

Alignment of “similar” sequences – one of the challenges of string matching.

Assume: from a set of sequences over alphabet a set of “probabilities” is constructed as follows:

Text: i

a1 a2

ak

kaa ...,,1

)( 1aP i

)( 2aP i

)( ki aP

Where is the probability that symbol aj occurs in text location i. .

)( ji aP

k

jj

i aP1

1)(

Page 23: Two Glass Balls and a Tower

This text of probabilities is called a weighted sequence.

Our problem: Given: Weighted sequence T, pattern P=s1,…,sm, and probability . Find: All text locations i such that P occurs there with probability > , i.e. .

Example:

Pattern ACDB occurs at location 2 of the text with probability

)()()( 12

11 m

miii sPsPsP

4

10

4

100

5

1:

4

10

4

1

2

10

5

1:

4

1

3

2

4

1

2

10

5

2:

4

1

3

1

4

101

5

1:

D

C

B

A

12

1

342

2

3

2

4

1

2

11

Page 24: Two Glass Balls and a Tower

Iliopoulos et. al. , in a number of recent papers answer the following questions about weighted sequences:

1. Do exact matching.

2. Construct weighted suffix tree for indexing.

Exact Matching:

1. Convert probabilities to logarithms. Now we use sums rather than products.

2. Consider every text row separately. Let Ta be the text row of a, for some Then the log probability of the pattern at every location is given by the formula:

.a

a

Raa PT ))((

Page 25: Two Glass Balls and a Tower

Example: P = ABABCAB

1010 0 10 x TA Gives the sum of the log probabilities of A.

0101 001 x TB Gives the sum of the log probabilities of B.

000 0100 x TC Gives the sum of the log probabilities of C.

Add them all up and get the result.

Time: O(n log m).

Page 26: Two Glass Balls and a Tower

Weighted Hamming Distance (A, Iliopoulos, Kapah – 06)

Compute the smallest number of mismatches for every location.

Mismatches are not symmetric.

If errors are assumed to be in the text:

How many text elements need to be changed (so that they will have probability 1 matching the corresponding pattern symbol) to produce a match at each location?

Example: Text ,pattern ACDB, =1/3.

There exists a match at location 2 with 1 mismatch. 4

10

4

100

5

1:

4

10

4

1

2

10

5

1:

4

1

3

2

4

1

2

10

5

2:

4

1

3

1

4

101

5

1:

D

C

B

A0001

Page 27: Two Glass Balls and a Tower

If errors assumed to be in the pattern:

How many pattern symbols need to be replaced in order to have a match at a given location?

Example:

For Text ,pattern ACDB, and =1/3

no match exists in location 2 even with 4 mismatches since every element already has highest probability.

So changing the pattern letter D to A,B, or C will leave the same probability.

4

10

4

100

5

1:

4

10

4

1

2

10

5

1:

4

1

3

2

4

1

2

10

5

2:

4

1

3

1

4

101

5

1:

D

C

B

A

Page 28: Two Glass Balls and a Tower

We solve both types of mismatch weighted sequences problems, (as well as a few flavors of edit distance). Here we show the simpler of the two mismatch definitions – changes to text.

We solve a more general problem:

Input: Text where N.

Pattern .

Natural number e.

Find: For every text location i, the smallest number of text locations that, when changed to 0, bring the convolution result to be no greater than e.

(We change the negatives to positives and dropped the requirement that the numbers be log probabilities of weighted sequences.)

ntt ,...,0 it 1,0;,...,0 im ppp

Page 29: Two Glass Balls and a Tower

We use the Tower Metaphor.

Assumption: n<2m+1.

Observation:

For every text location we need to sort all O(m) text elements, and find out what is the precise point where the sum of all elements becomes less than e.

m2 m2 m2 m2 m2 m2

m2 m2 m2 m2 m2

Page 30: Two Glass Balls and a Tower

Text elements sorted (biggest at bottom, smallest on top)

m

m

m

m

m

m

e

First find the block where the sum is still .

Then Find where that change occurs within the block.

Need to known: For each text location, how many text elements from each block.

e

Page 31: Two Glass Balls and a Tower

How many text elements in each block?

One convolution per block.

Let Tj be such that

Do convolution: for every block j and save in each text location.

Time:

wo

blockiiniTifiT

th

j/0

][1][

jTP

)log( mmnO

Page 32: Two Glass Balls and a Tower

Let Tj be

For every block j do:

We now know for each text location what is the sum of block values less than e and how many such values exist. All we need to do is find exact number within the seam block.

For every text location: For every element in seam block, from top to bottom:

If element matches 1 in pattern, multiply. Until number exceeds e.

wo

blockiiniTifiTiT

th

j/0

][][][

jTP

Page 33: Two Glass Balls and a Tower

Example:

1 1 11 1

8 9 9 7 6 8 9

Implementation:

Keep index for every element in every block, subtract from it the index of text location i and check if it hits a 1.

Total Time for correction: )( mnO

Page 34: Two Glass Balls and a Tower

As always, taking block sizes

rather than

Will make the

Total Time:

)log( mmO

)( mO

)log( mmnO