Upload
marcia-gibbs
View
214
Download
1
Embed Size (px)
Citation preview
An Approximation Algorithm for Binary Searching in Trees
Marco Molinaro
Carnegie Mellon University
joint work with
Eduardo Laber (PUC-Rio)
Searching in sorted lists
3 14 ...
Sorted list of numbers Marked number m Find the marked number using
queries ‘x ≤ m?’
5 106
Searching in sorted lists
3 6 10 14 ...5
Search strategy: procedure that indicates which number should be queried next
Can be represented by a decision tree (DT) # queries to find m = path length
10
145
6
10
14
6
3
53
>
>>
>>≤
≤
≤ ≤
≤
DT10
5
6
6
Searching in sorted lists
We are given the probability of each number being the marked one
Expected number of queries of a strategy = expected path length of the corresponding decision tree
Efficient strategy is one with minimum expected path
3 6 10 14 ...510
145
6
10
14
6
3
53
>
>>
>>≤
≤
≤ ≤
≤
0,05 0,1 0.2 0,1 0,5 ...
0,5
0,05 0,1 0,2 0,1
Searching in trees Tree with exactly one marked node m We can query an arc and find out which endpoint
is closer to the marked node
Searching in trees
Search strategy: procedure that indicates which arc should be queried next
Can be represented by a decision tree
(c,d)
(a,b) (f,h)
(d,f)
f
b
~a ~f
~d
~d
~h~b
~c
~f
DT
h
f
d
b
a
c
Searching in trees
Search strategy: procedure that indicates which arc should be queried next
Can be represented by a decision tree # queries to find m = path length
(c,d)
(a,b) (f,h)
(d,f)
f
b
~a ~f
~d
~d
~h~b
~c
~f
DT
h
f
d
b
a
c
(c,d)
(f,h)
(d,f)
f
Searching in trees We are given the probability of each node being the
marked one Expected number of queries is the expected path length of
the corresponding decision tree The goal is to find a DT with minimum expected path
(c,d)
(a,b) (f,h)
(d,f)
f
b
~a ~f
~d
~d
~h~b
~c
~f
h
f
a
b
d
c.2
.2
.1
.1
.3
.3
Searching in trees
Def: Given a tree T and weights w, compute a decision tree for searching in T with minimum expected path from root to leaves w.r.t. w
Motivation Generalizes searches in totally ordered structures to
(one type of) partially ordered structures Application to software testing and filesystem
synchronization
Related work
Searching in sorted listsWorst-case
Binary search is optimal
Average-case Knuth [Acta Informatica 71]: O(n2) de Prisco, de Santis [IPL 93]: good approximation
in linear time
Related work Searching in trees
Worst-case Ben-Asher et al. [SIAM J. Comput. 99]: O(n4 log3
n) Onak, Parys [FOCS 06]: O(n3) Mozes et al. [SODA 08]: O(n)
Average-case Kosaraju et al. [WADS 99]: O(log n)-
approximation
Related work Searching in posets
Worst-case Arkin et al. [Int. J. Comput. Geometry Appl. 98]:
O(log n)-approximation Carmo et al. [TCS 04]
Finding optimal strategy is NP-Hard Constant-factor approximation for random posets
Average-case Kosaraju et al. [WADS 99]: O(log n)-approximation
Our results
First constant-factor approximation for searching in trees (average-case metric)
Linear running time
Overview
We know how to search in sorted lists with probabilities
Searching in paths = searching in ordered lists
Algorithm1. Find a (heavy) path
2. Compute a decision tree for this path
3. Append decision trees for querying the hanging arcs
4. Recursively find strategies for the hanging subtrees and append them
Analysis T – input tree w(u) – likelihood of node u being the marked one w(T’) = ∑u є T’ w(u)
Tij – Hanging subtrees of T
Cost of a decision tree – expected path length
input tree T
subtrees Tij
Analysis – upper bound
ALGO(T) = expected path of the computed DT
= cost(■) + cost(■) + cost(■) ≤ H + w(T) + ∑i,j j w(Ti
j) + ∑i,j ALGO(Ti
j)entropy of {w(u)}
input tree T decision tree
Analysis – lower bounds
When H >> w(T) UB and LB1
When H ≤ w(T) UB and (LB1 + LB2)
only when H is large
for all H, ALGO(T) ≤ α OPT(T)
UB:
LB1:
LB2:
Analysis – entropy lower bound OPT(T) = from root to (■) + from (■) to (■) + from (■) to leaves from root to (■): using Shannon’s lossless coding
theorem, we can lower bound by H / log 3 – w(T) from (■) to (■):
There are at most 2 purple nodes per level
from (■) to leaves: Every query to arcs in the trees Ti
j are descendants of purple nodes
Costs at least as much as searching inside the trees Ti
j, namely ∑i,j OPT(Tij)
D*
≥ These paths cost
Analysis – alternative lower bound
OPT(T) ≥ from root to (■) + from (■) to leaves
from root to (■): Costs = ∑i,j distance to i-th purple node .
w(Tij)
At most one purple node can have distance 0 w(Ti
j) ≤ w(T)/2 Costs at least w(T)/2
from (■) to leaves: Costs at least as much as searching inside the
trees Tij, namely ∑i,j OPT(Ti
j)D*
Efficient implementation
Most steps take linear time In order to find a good strategy, the
algorithm uses sorting of weightsUse linear time approximate sorting
The algorithm can be implemented in linear time
Conclusions
First constant-factor approximation for searching in trees (average-case)
Linear running time
Open questions Is searching in trees polynomially solvable? Improved approximations for more general posets