40
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007

Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

  • Upload
    samira

  • View
    49

  • Download
    1

Embed Size (px)

DESCRIPTION

Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes. Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007. Outline. Introduction Preliminaries Persistent Data Access Two-way random walks EDFC and ADFC Discussion of Multiple Encoded Blocks Performance Evaluation - PowerPoint PPT Presentation

Citation preview

Page 1: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

1

Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

Yunfeng Lin, Ben Liang, Baochun Li

INFOCOM 2007

Page 2: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

2

Outline

Introduction Preliminaries Persistent Data Access

Two-way random walks EDFC and ADFC

Discussion of Multiple Encoded Blocks Performance Evaluation Conclusion

Page 3: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

3

Introduction (1/5)

It has been a conventional assumption that measured data in individual sensors are gathered and processed at powered sinks. Internet Connections via Data Aggregation

This assumption may not realistically hold. large-scale sensor networks inaccessible geographical regions

Page 4: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

4

Introduction (2/5)

Our proposed vision is: Ask the sensors to collaboratively store

measured data over a historical period of time. After a later time of convenience, a collector

collects such measured data directly from the sensors.

PUSH Model PULL Model

Sensors send data periodically.

Sensors are passively polled by the collector.

Page 5: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

5

Introduction (3/5)

We propose a novel decentralized implementation of fountain codes in sensor networks. Data can be encoded in a distributed fashion.

A sensor disseminates its data to a random subset of sensors in the network.

Each sensor only encodes data it has received. The collector is able to decode original data by

collecting a sufficient number of encoded data blocks.

Page 6: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

6

Introduction (4/5)

Our decentralized implementation of fountain codes does not require the support of a generic layer of routing protocols. Do not need Routing Table or Geographical

Routing Protocols. Use random walks to disseminate data.

Page 7: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

7

Introduction (5/5)

: sensing nodes

: caching nodes

Sensed Data

Caching

Caching

Caching

Source Blocks

Encoded Blocks

failure!

Caching CollectorDecoding!

Page 8: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

8

PreliminariesWhy Fountain Codes?

Replication backup sensors But a large number of replicas are required.

Error-correcting Codes Implemented in a centralized fashion

Random Linear Codes decentralized But the decoding process is computationally expensive.

Fountain Codes Low decoding complexity: superior decoding performance

)( 3KO)ln( KKO

“Digital Fountain Codes V.S. Reed-Solomon Code For Streaming Applications” S. K. Chang

Page 9: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

9

PreliminariesLT Codes

In LT codes, K source blocks can be decoded from any subset of encoded blocks. with probability

degree the number of source blocks used to generate an encoded

block The degree distribution of encoded blocks in LT cod

es follows the Robust Soliton distribution.

))(ln ( 2 K/δKOK -1

Page 10: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

10

PreliminariesLT Codes

Ideal Soliton distribution

Let

Robust Soliton distribution

)(ρ

, ..., K, i)/i(i-

i/Kiρ

32for 111 if 1

)(

KK/δcR )(ln

, ..., K K/Ri

K/Ri / KR/δR -, ..., K/R i R/iK

iτ1for 0

for )(ln11for

)(

i

iτiρiτiρiμ

)( )()( )( )(

Page 11: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

11

PreliminariesLT Codes

Example of Robust Soliton distribution

K=10000, c=0.2, and =0.05δ

The encoded blocks with a degree higher than K/R are not essential in decoding!

K/R = 41spike!

Page 12: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

12

PreliminariesRandom Walks on Graphs

We describe random walks in the context of disseminating a source block. sensor: node in the graph The next hop is randomly chosen from the neighbors of the

source node. A random walk corresponds to a time-reversible

Markov chain. In this paper, we choose a variant of the Metropolis

algorithm. a generalization of the natural random walks for the

Markov chain non-uniform steady-state distribution

Page 13: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

13

PreliminariesMetropolis Algorithm

The Metropolis algorithm computes the transition matrix. Steady-state distribution : neighbors of node i : maximal node degree in the graph

Pij P

)(iN

M

ij ij

ij

ij

ji P-iNj and ji

iNj and jiM/π, π

P if 1

)( if 0

)( if )/1min(

)( 21 , ..., πππ

Page 14: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

14

Persistent Data AccessDecentralized Fountain Codes

: sensing nodes K

: caching nodes N

Sensed Data

Caching

Caching

Caching Source Blocks

Encoded Blocks

Caching

degree d

Sensed Data

requestsource blocks

based on two-way random walks

request

source blocks

Page 15: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

15

Persistent Data AccessDecentralized Fountain Codes

We seek to construct decentralized fountain codes with only one traversal of random walks. from sensing nodes to the caching nodes Cache Nodes: Encode and store the source blocks. Collector: Decode the source blocks.

We propose two heuristic algorithms. EDFC and ADFC guarantee the Robust Soliton distribution of LT codes

Page 16: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

16

Persistent Data Access Exact Decentralized Fountain Codes

The randomization introduced by random walks. Distinct source blocks received by a node is uncertain. We must disseminate more than source blocks on

each node.

Redundancy Coefficient: Assume each node receives blocks. , Pr (receive less than d nodes)

dx

d

dxd

dx

Page 17: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

17

Persistent Data Access Exact Decentralized Fountain Codes

The number of random walks:

Probabilistic forwarding tables:

K

ddμxN b

K

d d 1)(

K

i i

dd

dddd

iiμxNdxπ

bKdxπdxπbK

1)(

Page 18: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

18

Persistent Data Access Exact Decentralized Fountain Codes

: sensing nodes K

: caching nodes N

Sensed Data

Caching

Caching

Source Blocks

Encoded Blocks

Caching

degree d

Sensed Data

source blocks

, forwarding Table, and # of random walks.

source blocks

degree d

degree d

degree ddegree d

degree d Collector

Decoding!

Page 19: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

19

Persistent Data Access Exact Decentralized Fountain Codes

The steps of EDFC are:Step 1. Degree generation.

Step 2. Compute steady-state distribution.

Step 3. Compute probabilistic forwarding table.

Step 4. Compute the number of random walks.

Step 5. Block dissemination.

Step 6. Encoding.

from the Robust Soliton distributiondπ

by the Metropolis algorithm

b: number of random walks

by bitwise XOR of a subset of d source blocks

based on the probabilistic forwarding table

The source node IDs are attached in the encoded block!

Page 20: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

20

Persistent Data Access Exact Decentralized Fountain Codes

Overhead ratio

Violation Probability

Optimization Problem:

K

d

K

d d

ddμ

ddμxbbg

1

1

01

)(

)(

d-xd

dedK

dX|dY

) Pr(

) Pr( dX|dY

, ..., K/Rdx

δdX|dY

ddμx

d

d

K

dd

1for 1

) Pr( subject

)( minimize1

trade-off between coding performance and communication overhead

d-xd

K-dK

dxd

K-d

d-

j

K-jj

d/K-xE/Kd/E-x

bdi

d

d

dd

edK

edeK

-pdK

dX|dYdX|dY

-ppjK

dd|XY

-e-e

-π-dX|Y

)1(

) Pr( ) Pr(

)1( )Pr(

1 1

NEdx

-1

)1(1) 1Pr(

)(-

1

0

))((

NE/Kd

Page 21: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

21

Persistent Data Access Exact Decentralized Fountain Codes

Solve the optimization problem by MATLAB Parameter Setting

(constraints of violation probabilities) = 0.05 N (the number of total nodes) = 2000 K (the number of sensing nodes) = 1000 c = 0.01, = 0.05

Further numerical computation overhead ratio = 1.4508

δ

Page 22: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

22

Persistent Data Access Approximate Decentralized Fountain Codes

Design a new distribution to be a hypothetical chosen degree distribution. attempt to avoid its redundant random walks

Number of random walks

Steady-state distribution of the random walks

)(υ

K

ddυNb

K

d 1)(

K

i

diiυN

dπ1

)(

K

iiiυE

1)(

Page 23: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

23

Persistent Data Access Approximate Decentralized Fountain Codes

Optimization Problem:

.1for 0 )(

1 )( subject to

))( - )(( minimize

1

1

2

, ..., Kiiυ

iμiυ'

K

j

K/R

j

N/Ki NE

d--dX|Y )1(1 ) 1Pr(

K

d

K-d'd'

K

d

-ppd'K

dd'|XYdXd'Y

1

1

)1()(

))Pr(Pr( )Pr(

)(υ'

minimize the mean-square error between and)(υ' )(

actual degree distribution of a node

p

Page 24: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

24

Persistent Data Access Approximate Decentralized Fountain Codes

The steps of ADFC are:Step 1. Degree generation.Step 2. Compute steady-state distribution.Step 3. Compute probabilistic forwarding table.Step 4. Compute the number of random walks.Step 5. Block dissemination.Step 6. Encoding.

from the chosen degree distributiondπ

)(υ

by the Metropolis algorithm

b: number of random walks

based on the probabilistic forwarding table

by bitwise XOR of all received source blocks

The source node IDs are attached in the encoded block!

Page 25: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

25

Persistent Data Access Approximate Decentralized Fountain Codes

Overhead ratio of ADFC b: the number of random walks in ADFC b0: the number of random walks in the ideal algorithm

By further numerical computation The overhead ratio is only 0.2326. Less transmission cost is required. But…

K

d

K

d

ddμ

ddυ

bb g

1

1

02

)(

)(

2g

Page 26: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

26

Persistent Data Access Approximate Decentralized Fountain Codes

Parameter Setting N (number of total nodes) = 2000 K (number of sensing nodes) = 1000 c = 0.01, = 0.05 Robust Soliton distributionδ

chosen degree distribution actual degree distribution)(υ

inaccuracy!

Page 27: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

27

Discussion of Multiple Encoded Blocks

Does it improve the coding performance if different encoded blocks are maintained?

Source Blocks

Source Blocks

Source Blocks……

Encoded Blocks

Cache Node

may lose some information…

Sensing Nodes

Page 28: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

28

Discussion of Multiple Encoded Blocks

Theorem 2 When the code-degree distribution conforms to the Robus

t Soliton distribution, even if the source blocks on each node are not encoded, the collector must visit nodes in order to collect all source blocks with probability .

is a small positive number. is a random variable that assumes the value 1 if the s

ource block j is collected when visiting ith node.

)Ω(K-1

K

d

K

dii,jii,j

K(K/δ c

Kddμ

d| XYdXY

1

1

1

)ln )(

) 1)Pr(Pr( )1Pr(

i,jY

average degree of an encoded block [3]

Page 29: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

29

Discussion of Multiple Encoded Blocks

has value 1 if source block j is collected after visiting M nodes.

E denote the event that all blocks are collected after visiting M nodes.

All blocks are collected with probability

jZ

M

M

i

M

i i,ji,jj

KK/δc

-

Y-YZ

))(ln

1(

))1(Pr1( )0Pr( )0Pr(

1

1 1

KMK

j j KK/δ c

--ZE )))(ln

1(1( )1Pr( )Pr( 11

-1

-δ K

K/δ c-- KM 1 ))

)(ln1(1( 1

Page 30: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

30

Discussion of Multiple Encoded Blocks

Apply logarithm to both sides

By using similar approximation, we obtain

-δ-δ K

K/δ c--K M )1(ln ))

)(ln1(1( ln 1

1K/cM )(Ki.e., M

K -δK

K/δ c-- M / ))(ln1( 1

The collector needs to visit nodes to collect all K source blocks.

)(K

Page 31: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

31

Performance Evaluation

We implement both the original centralized and the decentralized implementation of fountain codes. To evaluate the effectiveness and performance

Centralized implementation of fountain codes about 1000 lines of C++ code Optimized implementation of encoding and decoding

algorithms. Decentralized implementation of fountain codes

also simulated in C++

Page 32: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

32

Performance Evaluation

Use two-dimensional Geometric Random Graph as the topological model. N sensors are uniformly distributed on a unit disk K sensing nodes are uniformly distributed among the N

sensors. Radio range: r

We set K=10000, N=20000, and r=0.033 in most experiments.

The average number of neighbors for each node is 21.

Page 33: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

33

Performance EvaluationCommunication Cost and Decoding Ratio

Two main performance metrics Communication Cost and Decoding Ratio

Communication Cost the length of random walks the number of random walks

Decoding Ratio number of nodes need to be visited by a collector

for decoding Normalized by the number of sensing nodes.

fault tolerance!

Page 34: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

34

Performance EvaluationCommunication Cost and Decoding Ratio

The impact of the length of random walks on decoding ratio.

Each Data Point: the average and the 95% confidence interval from 10 experiments

1.0550050

Page 35: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

35

Performance EvaluationCommunication Cost and Decoding Ratio

The ratio of dissemination costs of EDFC and ADFC to that of the two-way algorithm.

0.2

0.8

Page 36: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

36

Performance EvaluationMultiple Encoded Blocks Cannot Do Better

Theorem 2: Keeping multiple encoded blocks on each node does not

offer any asymptotic performance advantage over keeping a single encoded block.

The collector needs to visit close to K nodes even if the source blocks are not encoded.

The number of nodes to be visited before collecting all source blocks.

Page 37: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

37

Performance EvaluationOverestimation of K and N

The failure of sensors are common events. in large-scale sensor networks

It is not feasible to update K and N to all nodes in the network whenever they change. Update K and N periodically. Each node may overestimate K and N.

Page 38: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

38

Performance EvaluationOverestimation of K and N

The consequence of overestimating N: N: the number of total nodes

Actual N = 20000.

1.05

Page 39: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

39

Performance EvaluationOverestimation of K and N

The impact of overestimating K: K: the number of sensing nodes

Estimated K = 10000.

EDFC is more robust!

Page 40: Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes

40

Conclusion

In this paper, we seek to improve the fault tolerance and data persistence in sensor networks. decentralized implementation of fountain codes disseminate original data throughout the network with

random walks Superior decoding performance and low decoding

complexity of fountain codes. as the number of nodes scales up

The proposed algorithms are able to provide near-optimal fault tolerance. with minimal demand on local storage