Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu [email protected] Computer Science Department, UCLA

Searching for Extremes Among Distributed Data Sources with Optimal ProbingZhenyu (Victor) Liu

[email protected]

Computer Science Department, UCLA

Why Extremes?Central Server

Sensor 1 Sensor 2 Sensor n

query: highest raindrop

Sensor i (the highest one), plus its value

Identifying severe weatherconditions (flood / drought)

Central Server

link 1 link 2 link n

query: slowest link

link i (the slowest one), plus its transferring speed

a network path from L.A. to N.Y.

Identifying the networkbottleneck

Central Server

Amazon Barns & Noble CampusI.com

query: best Web site for “Computer Algorithms”

Website i (the best one), plus the matching Web pages

Identifying the best Web database for a user’s query

What Is the Challenge?

Constant communication between sensors and the central server is too expensive

Can the central server contact only a few sensors (i.e. use probing) to find out the maximum?

Central Server

Sensor 1 Sensor 2 Sensor n

query: highest raindrop

Sensor i (the highest one), plus its value

A Motivating Example

Central Server

Sensor 1

Sensor 2

Sensor n

expensivecommunicationcost

Sensor 2

the possible value range of Sensor 1

actual value ofSensor 1 (unknown)

( )Sensor n

Sensor 1 ( )

( )

a) The central server without the latest sensor updates

Central ServerSensor 1

Sensor 2

Sensor n

Sensor 2

( )Sensor n

Sensor 1

( )

1000

probe

1000

b) Probing sensors’ reading to reduce uncertainty

Data Model

The reading of each source as a random variable, X1, …, Xn

[li, ui] as Xi’s value range Bounded model: li, ui as real numbers Unbounded model: [-, ui], [li, +], [-, +]

Given Xi’s probability distribution in [li, ui] fi(x), Fi(x)

X1, …, Xn independent Probing Xi results in xi, costs ci

uniform-cost model, c1=c2= … = 1 non-uniform-cost model

U(<X2, 880>) = 0.12, cost: probing 1U(<X2, 880>) = 0, cost: probing 2

Uncertainty in The Answer

Two variables X1 and X2, uniform distribution

0

f1(x)

880

0.12

1000 X1

X2

f2(x)

600

f1(x)

900800

f2(x)

Uncertainty / Probing Cost Tradeoff

Uncertainty in the answer

0

Less probing,high uncertainty

More probinglow uncertainty

Probing cost

Tradeoff point

The user-specified uncertainty threshold

The Problem

Given the uncertainty data model, design a probing policy

P: X1PX2

P…XnP

that incurs the least probing cost finds the maximum variable with an uncertainty

lower than Brute force searching takes n!

Optimal Probing under Zero-Uncertainty = 0, i.e. return an absolutely correct answer Two policies

P1: X1X2

P2: X2X1

0

f1(x)X1

X2900800

f2(x)

1000

f1(x)

f2(x)

Optimal Probing under Zero-Uncertainty Theorem 1: X1, …, Xn are ranked in a

descending order of their upper bounds, i.e., u1 > … > un,

P: X1X2…Xn

is optimal in the zero-uncertainty case The upper bound ui as a “representative

point” for Xi

Optimal Probing under Non-Zero-Uncertainty = 0.15 Two policies

P1: X1X2, saves the 2nd probing if X1>885

P2: X2X1, saves the 2nd probing if X2>850

0

f1(x)X1

X2900800

f2(x)

1000

885

850

Critical Point

Critical point,i [li, ui] s.t.P(Xi > i ) =

Lemma 1: With two variables X1 and X2, the optimal policy always probes the one with the larger critical point

0

f1(x)X1

X2900800

f2(x)

1000

885

850

x8501

8852

F1(x)

F2(x)

1

0.85(1-)

Deriving The Optimal Policy from The Critical Points? Theorem 2: The optimal policy should always

place Xi before Xj if:Cond1: i > j

Cond2: x >j, Fi(x) < Fj(x)

x

1-1

Fi(x)Fj(x)

j i

Applying Theorem 2 to Derive The Optimal Policy

x

1-1

2 1n

F1(x)F2(x)Fn(x)

Case 1:

Optimal policy:P: X1X2…Xn

Applying Theorem 2 to Derive The Optimal Policy Case 2: Possible candidate

policies {X1,X2,X3} {X4,X5}

and X1 must be before X2

X1X2X3X4X5

X1X2X3X5X4X1

X3X2X4X5

X1X3X2X5X4

X3X1X2X4X5

X3X1X2X5X4

x

1-

1

F3(x)

F4(x)

F5(x)

F1(x)F2(x)

Experimental Set-up

166 rainfall sensors across Washington State

Recording the rainfall at each sensor location, on every day over the past 46 years

Probability Distribution

From the historical data, generate one distribution per sensor per day

Distinguish two kinds of historical data: Yesterday was dry Yesterday was rainy

Preliminary Results

Complexity of optimal-policy searching

Future Experimental Study

The behavior of the optimal policy on the rainfall sensor data Uncertainty threshold vs. number of sensor probing

The behavior of the optimal policy on synthetic datasets Reduction in the search space vs. number of sensor probing

Summary

Under the proposed data model, find the maximum variable with uncertainty less than

Optimal probing policy = 0, sort variables according to their upper

bounds > 0, derive probing preferences (Xi before Xj)

and reduce the search space

Documents

Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu [email protected] Computer Science Department, UCLA