View
220
Download
0
Category
Preview:
Citation preview
Data Persistence in Sensor Networks: Towards Optimal
Encoding for Data Recovery in Partial Network Failures
Abhinav Kamra, Jon Feldman, Vishal Misra and Dan RubensteinDNA Research Group, Columbia University
Motivation and Model
Typical Scenario of Sensor Networks Large number of nodes deployed to
``sense'' environmentData collected periodically
pulled/pushed through a sink/gateway node
Nodes prone to failure (disaster, battery life, targeted attack)
Want data to survive individual node failures``Data Persistence''
OverviewErasure codes
LT-CodesSoliton distribution
Coding for failure-prone sensor networksMajor resultsA brief sketch of proofsA case study of failure-prone sensor
networks
Erasure Codes
Message
Encoding
Received
Message
Encoding Algorithm
Decoding Algorithm
Transmission
n
cn
n≥
n
Luby Transform Codes
Simple Linear CodesImprovement over “Tornado codes”Rateless Codes
Erasure Codes: LT-Codes
b1 b2 b3 b4 b5F=
n=5 input blocks
LT-Codes: Encoding
b1 b2 b3 b4 b5
c1
1. Pick degree d1 from a pre-specified distribution. (d1=2)
2. Select d1 input blocks uniformly at random. (Pick b1 and b4 )
3. Compute their sum (XOR).
4. Output sum, block IDs
E(F)=
F=
LT-Codes: Encoding
E(F)=
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7
L
F=
LT-Codes: Decoding
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7
F= b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7
F= b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7
F= b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7
335445555ccbccbccb←−←−←−
F= b1 b2 b3 b4 b5
c1 c2 c3 c4 c5 c6 c7E(F)=
F=
b5 b5 b5
b1 b2 b3 b4 b5
c1 c2 c3 c5 c6 c7
F=
b5c4b5 b5
b1 b2 b3 b4 b5
c1 c2 c3 c5 c6 c7
F=
b5c4b5 b5
b1 b2 b3 b4 b5
c1 c2 c3 c5 c6 c7
F=
b5c4b5 b5
b1 b2 b3 b4 b5
c1 c2 c3 c5 c6 c7
F=
b5c4b5 b5b2b2
b1 b2 b3 b4 b5
c1 c2 c3 c5 c6 c7
F=
b5c4b5 b5b2b2
Degree Distribution for LT-Codes Soliton Distribution:
Avg degree H(N) ~ ln(N) In expectation: Exactly one degree 1 symbol in
each round of decoding Distribution very fragile in practice
Niii
N
i <<−
=
=
1 for )1(
1
11
π
π
Failure-prone Sensor NetworksAll earlier works:
How many encoded symbols needed to recover all original symbols (all or nothing decoding)
Failure-prone networks:How many original symbols can be
recovered from given surviving encoded symbols
Iterative Decoder
x1 x3
x5 x2
x1 x3
x4
x3
Received Symbols
Recovered Symbols
• 5 original symbols x1 … x5
• 4 encoded symbols received
• Each encoded symbol is XOR of component original symbols
x3
x1
x4
Sensor Network Model
Encoded Symbols remaining: kWant to maximize “r”, the recovered
original data symbolsNo idea apriori what k will be
Coding is bad, for small kN original symbolsk encoded symbols receivedIf k ≤ 0.75N, no coding required
N = 128
Proof SketchTheorem: To recover first N/2 symbols, it is best to not do any
encodingProof:
1. Let C(i, j) = Expected symbols recovered from i degree 1 and j symbols of degree 2 or more.
2. C(i, j) ≤ C(i+1, j-1) if C(i, j) ≤ N/2a. Sort given symbols in decoding orderb. All degree 1 symbols will be decoded before other symbolsc. Last symbol in decoded order will be of degree > 1 (see b.)d. Replace this symbol by a random degree 1 symbole. New degree 1 symbol more likely to be useful
3. Hence, more degree 1 symbols => Better output4. No coding is best to recover any first N/2 symbols5. All degree 1 => Coupon Collector’s => ≈ 3N/4 symbols to
recover N/2 distinct symbols
Ideal Degree Distribution
Theorem: To recover r data units such that
r < jN/(j+1), the optimal degree distribution has symbols of degree j or less only.
Lower degree are better for small kIf k ≤ kj, use symbols of up to degree j
So, use kj – kj-1 degree j symbols in close to optimal distribution
N = 128
Case Study: Single-sink Sensor Network
Storage
1
2
4
3
Sink
Sensor nodenodes exchange symbols
nodes 2 and 3 transfer new symbols to the sink
Case Study: Single-sink Sensor NetworkNetwork prone to failureNodes store unencoded symbols at first
and higher degrees with timeSink receives low degree symbols first and
higher degree as time goes on
1
2
4
3
Sink
Distributed SimulationClique TopologyN = 128 nodes in a clique topologySink receives one symbol per unit time
Distributed SimulationChain Topology
N = 128 nodes in a chain topology
1 2 3 … N
Related WorkBulk Data Distribution: Coding is
usefulTornado (Efficient Erasure Correcting Codes by M. Luby et. al.,
IEEE Transactions on Information Theory, vo. 47, no. 2, 2001)
LT-Codes (LT Codes by M. Luby, FOCS 2002)
Reliable Storage in Sensor NetworksDecentralized erasure code (Ubiquitous Access to
Distributed Data in Large-Scale Sensor Networks through Decentralized Erasure Codes by A. Dimakis et. al., IPSN 2005)
Random Linear Coding (“How Good is Random Linear Coding Based Distributed Networked Storage?” by M. Medard et. al., NetCod 2005)
Recommended