An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)

An Approximation Algorithm for Binary Searching in Trees

Marco Molinaro

Carnegie Mellon University

joint work with

Eduardo Laber (PUC-Rio)

Searching in sorted lists

3 14 ...

Sorted list of numbers Marked number m Find the marked number using

queries ‘x ≤ m?’

5 106


3 6 10 14 ...5

Search strategy: procedure that indicates which number should be queried next

Can be represented by a decision tree (DT) # queries to find m = path length

10

145

6

10

14

6

3

53

>

>>

>>≤

≤

≤ ≤

≤

DT10

5

6

6


We are given the probability of each number being the marked one

Expected number of queries of a strategy = expected path length of the corresponding decision tree

Efficient strategy is one with minimum expected path

3 6 10 14 ...510

145

6

10

14

6

3

53

>

>>

>>≤

≤

≤ ≤

≤

0,05 0,1 0.2 0,1 0,5 ...

0,5

0,05 0,1 0,2 0,1

Searching in trees Tree with exactly one marked node m We can query an arc and find out which endpoint

is closer to the marked node

Searching in trees

Search strategy: procedure that indicates which arc should be queried next

Can be represented by a decision tree

(c,d)

(a,b) (f,h)

(d,f)

f

b

~a ~f

~d

~d

~h~b

~c

~f

DT

h

f

d

b

a

c

Searching in trees

Search strategy: procedure that indicates which arc should be queried next

Can be represented by a decision tree # queries to find m = path length

(c,d)

(a,b) (f,h)

(d,f)

f

b

~a ~f

~d

~d

~h~b

~c

~f

DT

h

f

d

b

a

c

(c,d)

(f,h)

(d,f)

f

Searching in trees We are given the probability of each node being the

marked one Expected number of queries is the expected path length of

the corresponding decision tree The goal is to find a DT with minimum expected path

(c,d)

(a,b) (f,h)

(d,f)

f

b

~a ~f

~d

~d

~h~b

~c

~f

h

f

a

b

d

c.2

.2

.1

.1

.3

.3

Searching in trees

Def: Given a tree T and weights w, compute a decision tree for searching in T with minimum expected path from root to leaves w.r.t. w

Motivation Generalizes searches in totally ordered structures to

(one type of) partially ordered structures Application to software testing and filesystem

synchronization

Related work

Searching in sorted listsWorst-case

Binary search is optimal

Average-case Knuth [Acta Informatica 71]: O(n2) de Prisco, de Santis [IPL 93]: good approximation

in linear time

Related work Searching in trees

Worst-case Ben-Asher et al. [SIAM J. Comput. 99]: O(n4 log3

n) Onak, Parys [FOCS 06]: O(n3) Mozes et al. [SODA 08]: O(n)

Average-case Kosaraju et al. [WADS 99]: O(log n)-

approximation

Related work Searching in posets

Worst-case Arkin et al. [Int. J. Comput. Geometry Appl. 98]:

O(log n)-approximation Carmo et al. [TCS 04]

Finding optimal strategy is NP-Hard Constant-factor approximation for random posets

Average-case Kosaraju et al. [WADS 99]: O(log n)-approximation

Our results

First constant-factor approximation for searching in trees (average-case metric)

Linear running time

Overview

We know how to search in sorted lists with probabilities

Searching in paths = searching in ordered lists

Overview

Search strategy

Algorithm1. Find a (heavy) path

2. Compute a decision tree for this path

3. Append decision trees for querying the hanging arcs

4. Recursively find strategies for the hanging subtrees and append them

Analysis T – input tree w(u) – likelihood of node u being the marked one w(T’) = ∑u є T’ w(u)

Tij – Hanging subtrees of T

Cost of a decision tree – expected path length

input tree T

subtrees Tij

Analysis – upper bound

ALGO(T) = expected path of the computed DT

= cost(■) + cost(■) + cost(■) ≤ H + w(T) + ∑i,j j w(Ti

j) + ∑i,j ALGO(Ti

j)entropy of {w(u)}

input tree T decision tree

Analysis – lower bounds

When H >> w(T) UB and LB1

When H ≤ w(T) UB and (LB1 + LB2)

only when H is large

for all H, ALGO(T) ≤ α OPT(T)

UB:

LB1:

LB2:

Analysis – entropy lower bound OPT(T) = from root to (■) + from (■) to (■) + from (■) to leaves from root to (■): using Shannon’s lossless coding

theorem, we can lower bound by H / log 3 – w(T) from (■) to (■):

There are at most 2 purple nodes per level

from (■) to leaves: Every query to arcs in the trees Ti

j are descendants of purple nodes

Costs at least as much as searching inside the trees Ti

j, namely ∑i,j OPT(Tij)

D*

≥ These paths cost

Analysis – alternative lower bound

OPT(T) ≥ from root to (■) + from (■) to leaves

from root to (■): Costs = ∑i,j distance to i-th purple node .

w(Tij)

At most one purple node can have distance 0 w(Ti

j) ≤ w(T)/2 Costs at least w(T)/2

from (■) to leaves: Costs at least as much as searching inside the

trees Tij, namely ∑i,j OPT(Ti

j)D*

Efficient implementation

Most steps take linear time In order to find a good strategy, the

algorithm uses sorting of weightsUse linear time approximate sorting

The algorithm can be implemented in linear time

Conclusions

First constant-factor approximation for searching in trees (average-case)

Linear running time

Open questions Is searching in trees polynomially solvable? Improved approximations for more general posets

Thank you!

Documents

An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)