Upload
ophelia-boone
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks?
Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI)
Presenter: Vijay Sundaram
2
Deployment: Microclimate monitoring at James Reserve Park (UC Riverside)
How well does data fit model <M> of variation of temperature
with altitude.
Send robotic agent to edge between
low and high precipitation
regions
Get detailed data from node with maximum precipitation
from Sept to Dec 2003
Hmm…I wonder why packet-loss is so high. Get a connectivity map of the network for all transmit power
settings
Weather Sensor Network
3
Goals Flexible spatio-temporal querying
Provide ability to mine for interesting patterns and features in data.
Drill-down on details Distributed Long-term networked data
storage Preserve ability for long-term data mining, while
catering to node storage constraints Performance
Reasonable Accuracy for wide range of queries Low communication (energy) overhead
4
How can we achieve goals? Exploit redundancy in data
Potentially huge gains from lossy compression exploiting spatio-temporal correlation
Exploit rarity of interesting features Preserve only interesting features.
Exploit scale of sensor network. large distributed storage, although limited local
storage. Exploit low cost of approximate query
processing allow approximate query processing that obtain
sufficiently accurate responses.
5
Can existing systems satisfy design goals?
P2P: DHT Gnutella
Data Correlation Vs Decentralization
Non
eT
empo
ral
Spa
tial
Centralized Hierarchical Fully Distributed
WebCaches
CentralizedData
Collection
Geo-SpatialData Mining, Streaming
Media (MPEG-2)
WirelessSensor Networks
Exp
loit
ed
Data
C
orr
ela
tion
Degree of Decentralization
6
PR
OG
RES
SIV
ELY
AG
E
DIMENSIONS Design: Key Ideas
Construct hierarchy of lossy compressed summaries of data using wavelet compression.
Queries “drill-down” from root of hierarchy to focus search on small portions of the network.
Progressively age lossy data along spatio-temporal hierarchy to enable long-term storage
Level 0
Level 1
Level 2
PR
OG
RES
SIV
ELY
LO
SS
Y
7
Roadmap Why wavelets? Example Precipitation Hierarchy Spatial and Temporal Processing
internals Initial Results: Precipitation
Dataset
8
Enabling Technique: Wavelets Very popular signal processing approach,
that provides good time and frequency localization. JPEG2000, Geo-Spatial Data Mining
preserves spatio-temporal features (edges, discontinuities) while providing good approximation of long-term trends in data
Efficient distributed implementation possible.
9
Sample Architecture: Precipitation Hierarchy
Local Processing: Construct lossy time-series summary (zero communication cost)
Spatial Data Processing: Hierarchical Lossy Compression
Organize network into hierarchy. At each higher level, reduce number of participating nodes by a factor of 4.
At each step of the hierarchy, summarize data from 4 quadrants, and propagate
Wavele
t
Coeff
s
What is the maximum precipitation between Sept-Dec 2002?
Direct query to quadrant that
best matches query
decreasingtemporalresolution
decreasingspatial
resolution
10
Spatial Decomposition Recursively split network into
non-overlapping square grids. At each level of the hierarchy,
Elect clusterhead Cluster-head combines and
summarizes data from 4 quadrants
Cluster-head propagates compressed data to the next level of the hierarchy.
Routing protocol: GPSR variant (DCS - Ratnasamy et al,)
Hierarchyconstruction
11
Wavelet Compression Internals
Haar Filter Debauchies 9/7 filter
Cost Metric
Communication Budget Error bound
WaveletSubband
Decomposition
Thresholding+
Quantization+
Drop SubbandsFilter
LosslessEncoder
x
time
y
InputData
xy
CompressedOutput
time
12
Initial Results with Precipitation Dataset: Communication Overhead 15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest†. Gridded before processing.
Handpicked choice of threshold, quantization intervals, subbands to drop. Huffman Encoder at output.
Very large compression ratio up the hierarchyLevel Raw data
size (Kb) - RMean data sent to n ext level (Kb) - M
Compression Ratio = R/M
1 262.5 5.6 46.6
2 984.4 3.8 257.2
3 3937.7 7.4 987
4 11813.2 2.5 2286.2†M. Widmann and C.Bretherton. 50 km resolution daily precipitation for the Pacific Northwest, 1949-94.
13
Find maximum annual precipitation for each year.
Exact Answer for 89% of queries. Within 90% of answer for >95% of queries.
Queries require less than 3% of network.
Good performance on average with very low lookup overhead
Drill Down Query: Error in Max Annual Precipitation from 1949-1994
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
1950 1960 1970 1980 1990 2000
Year Queried
Fra
ctio
n E
rro
r1
- (D
rillD
ow
n A
nsw
er/R
eal
An
swer
)
14
Locate boundary in annual precipitation between Low and High Precipitation Areas
Error Metric: Number of nodes greater than 1 pixel distance from drill-down boundary
Accuracy: Within 25% error for 93% of the queries (or within 13% error for 75% of the queries)
Less than 5% of the network queried.
Drill Down Edge Query: Number of Nodes Missed Per Year
0
0.2
0.4
0.6
0.8
1
1.2
1940 1950 1960 1970 1980 1990 2000
Year Queried
Fra
ctio
n o
f B
ou
nd
ary
no
des
mis
sed
15
Open Issues Load Balancing and Robustness
Hierarchical Model vs Peer Model: lot of work in p2p systems…
Irregular Node Placement Use wavelet extensions for irregular node
placement. Computationally more expensive Gridify dataset with interpolation
Providing Query Guarantees Can we bound error in response obtained for a
drill-down query at a particular level of hierarchy?
Implementation on IPAQ/mote network
16
Summary DIMENSIONS provides a holistic
data handling architecture for sensor networks that can Support a wide range of sensor-network
usage and query models (using drill-down querying of wavelet summaries)
Provide a gracefully degrading lossy storage model (by progressively ageing summaries)
Offer ability to tune energy expended for query performance. (tunable lossy compression)
17
Different optimization metrics
Internet-based Peer-to Peer Systems
Geo-Spatial Data Mining
Web Caches
Streaming Media (MPEG-2)
Wireless Sensor Networks
Energy
Latency
Bandwidth
Approximate Results OK
Spatio-temporal Query Performance
Lookup Cost
Resource Constraints
18
Other Examples: Packet Loss
Different example of dataset that exhibits spatial correlation
Throughput from one transmitter to proximate receivers is correlated
Throughput from multiple proximate transmitters to one receiver is correlated.
Typically, what we want to query is the deviations from normal and average throughput.
0 5 10 150
5
10
15
Distance (ft)D
ista
nce
(ft)
Contour Map
0.2
0.2
0.2
0.20.2
0.20.2
0.2
0.2
0.2
0.2
0.2
0.2
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.40.4
0.4
0.40.4
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.8
19
Involves expensive transfer of 12x14 map from each node.
Good approximate results can be obtained from querying compressed data.
Packet-Loss Dataset: Get Throughput Vs Distance Map
20
Long-term Storage: Concepts
Data is progressively aged, both locally, and along the hierarchy. Summaries that
cover larger areas and longer time-periods are retained for much longer than raw time-series.
Slo
wer
Ag
ein
g
Wavele
t C
oeffi
cients
21
Load Balancing and Robustness: Concepts Hierarchical Model
Naturally fits wavelet processing Strict hierarchies are vulnerable to node
failures. Failures near root of hierarchy can be expensive to repair
Decentralized Peer Model Summaries communicated to multiple
nodes probabilistically. Better robustness, but incurs greater
communication overhead.