71
On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

On Fault Tolerance in Wireless Ad Hoc

Networks

Seth Gilbert

Nancy Lynch Celebration, 2008

Page 2: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Nancy Lynch

1994Late 1980’s?? 1997 2002-2008

Through the years…

Page 3: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

1980 1984 19881992 1996

2000 20042008

FLP: Impossibility of distributed consensus with one faulty process

DLS: Consensus in the Presence of Partial Synchrony

LT: An Introduction to Input / Output Automata

Fault tolerance

Replication

Consiste

ncy

Formal Methods

Simulati

on

Relation

s,

Invarian

t-based

Argument

s

Timing

Increasingly complex, increasingly

dyamic:

• Group communication / membership

• Publish / Subscribe

• Peer-to-peer systems

• Wireless ad hoc networks

Page 4: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

The Virtual Infrastructure Project

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 5: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

The Virtual Infrastructure Project

Papers:

GeoQuorums: Implementing Atomic Memory in Mobile Ad Hoc Networks, DGLSW, DISC’03, DC’05

Virtual Mobile Nodes for Mobile Ad Hoc Networks, DGLSSW, DISC’03

Consensus and Collision Detectors in Wireless Ad Hoc Networks, CDGNN, PODC’05, DC’08

Timed Virtual Stationary Automata for Mobile Networks, DGLLN, Allerton’05, OPODIS’05

Autonomous Virtual Mobile Nodes, DGSSW, DIALM-POMC’05

A Middleware Framework for Robust Applications in Wireless Ad Hoc Networks, CDGN, Allerton’05

Reconciling the theory and practice of unreliable wireless broadcast, CDGLNN, ADSN’05

Self-Stabilizing Mobile Node Location Management and Message Routing, DLLN, SSS’05

Motion Coordination Using Virtual Nodes, LMN, CDC’05

The Virtual Node Layer: A Programming Abstraction for Wireless Sensor Networks, BGLNNS, WWWSNA’07

A Virtual Node-Based Tracking Algorithm for Mobile Networks, NL, ICDCS’07

Self-stabilization and Virtual Node Layer Emulations, NL, SSS’07

Secret Swarm Unit: Reactive k-Secret Sharing, DLY, IndoCrypt’07

Virtual Infrastructure for Collision-Prone Wireless Networks, CGL, PODC’08

Theses:

Virtual Infrastructure for Wireless Ad Hoc Networks, G, PhD 2007

Air Traffic Control Using Virtual Stationary Automata, B, MEng 2007

Simulation and Evaluation of the Reactive Virtual Node Layer, S, MEng 2008

Virtual Stationary Timed Automata for Mobile Networks, N, PhD 2008

In Progress:

Self-Stabilizing Robot Formations over Unreliable Networks, GLMN

Using Virtual Infrastructure to Adapt Wireline Protocols to MANET, W

Virtual Infrastructure Routing for Mobile Ad Hoc Networks, DN

Page 6: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Scenarios:•Sensor networks•Social networks•Coordination

Wireless Ad Hoc Networks

Page 7: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Scenarios:•Sensor networks•Social networks•Coordinated

applications

Wireless Ad Hoc Networks

— environmental monitoring

— intrusion detection

— border monitoring— fire detection

Page 8: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Scenarios:•Sensor networks•Social networks•Coordinated

applications

Wireless Ad Hoc Networks

— messaging— conferences / events

— HikingNet— TrafficNet

Page 9: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Scenarios:•Sensor networks•Social networks•Coordination

Wireless Ad Hoc Networks

emergency response & military

— firefighting— police response— terrorism

Page 10: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Scenarios:•Sensor networks•Social networks•Coordination

Wireless Ad Hoc Networks

Page 11: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Unreliable communication

Unknown availability

Wireless ad hoc networks are really hard to use.

NoiseCollisions

Dynamic

Unknown participants

Unknown topologyFault

prone

Lost

Messages

Page 12: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Fixed Infrastructure

Deploy:— Base stations— Cell towers— Servers

Problems:— Too expensive— Not feasible

Page 13: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Virtual InfrastructureUnreliable

ReliableAd hoc Fixed

net

Page 14: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Network Layers

Service Service

Middleware

Wireless Ad Hoc Network

Application

Page 15: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Network Layers

Routing Tracking

Virtual

Infrastructure

Wireless Ad Hoc Network

Application

Page 16: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Building Virtual Infrastructure

Basic idea: replicated state machine

Page 17: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Building Virtual Infrastructure

Basic idea: replicated state machine

1. Each participant is a replica.

2. Replicas execute a consistency protocol

3. Leader / backup

4. Leader sends & receives messages for the virtual node

Page 18: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Today’s Questions

1. What is virtual infrastructure?

2. What can you do with it?—Dynamic distributed coordination.

—Air traffic control

3. Does it really work?—Two simulation studies: routing and

address allocation.

Page 19: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

Challenging problem:

o Highly dynamic environment

o Unreliable network

o Safety-critical applications

Ideal for Virtual Infrastructure solution:

o Static overlay

o Simpler, verifiable algorithms

o Fate-sharing

Page 20: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008
Page 21: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008
Page 22: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

Note:• Number of (non-failed) robots unknown.• Location of other robots unknown.• Pattern may change over time.

Page 23: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

In each round:

1.All robots stop.

2.All robots send location info.

3.Coordinators exchange info.

In each round:

4.Coordinators calculate.

5.Coordinators send out targets.

6.Robots move to target.

Page 24: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

Rule 1: If only 1 robot, keep it.

Calculating new targets

Page 25: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Rule 2: If not on the curve and no neighbors on the curve: distribute evenly all but one.

Dynamic Distributed Coordination

Calculating new targets

Page 26: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Rule 3: If not on the curve: distribute among less populated neighbors on the curve.

Dynamic Distributed Coordination

Calculating new targets

Page 27: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Rule 4: If on the curve: distribute among less dense neighbors on the curve.

Dynamic Distributed Coordination

Calculating new targets

Page 28: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Rule 4: If on the curve: distribute among less dense neighbors on the curve.

Dynamic Distributed Coordination

Calculating new targets

Page 29: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Rule 5: Distribute robots evenly on the curve in each region.

Dynamic Distributed Coordination

Calculating new targets

Page 30: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

Step 1: Eventually, robots cease moving from regions “off the curve” to regions “on the curve”.

Step 2: If neighbor g is the most dense neighbor of u after time t, then u is less dense than g after time t+1.

Step 3: Eventually, robots remain always in the same region.

Correctness

Page 31: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed CoordinationSelf-stabilization

What happens when something goes wrong?

Too many lost messages

Too much churn

INCONSISTENT REPLICAS

Option 1: Design for the very, very worst case.

Option 2: Design a system that can recover from faults.

Page 32: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Emulating Virtual Infrastructure

Self-stabilization techniques

Leader Election:

o Heartbeats, timeouts

o Resolve leader competitions

Replica Consistency:

o Leader sends “checksums” of the state.

o If out-of-synch, then re-join.

Page 33: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Building Virtual Infrastructure

Self-stabilization claims

Assume that:

o A is a self-stabilizing algorithm.

o A is designed for the virtual infrastructure abstraction.

o A is executed with the emulator.

o The system begins in an arbitrary (corrupt) state.

Then if the system is eventually well-behaved:

o From some point on, the state of A is as if it had really executed on a fixed infrastructure.

Page 34: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed Coordination

Summary

Coordination algorithm is self-stabilizing.

o In each round, all state is recalculated.

o Underlying virtual infrastructure emulation is self-stabilizing.

Implications:o Converges to changing curve.

o Recovers from network instability, lost messages, etc.

Page 35: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed CoordinationAdditional comments

Tina Nol

te

Virtual

Stationa

ry Timed

Automat

a for

Mobile N

etworks

PhD 2008

Page 36: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed CoordinationAir traffic control

Free Flight

o No flight plan, no control towers!

o Each pilot chooses a route independently.

o More efficient:

—Adapt to wind currents.—Avoid turbulence / bad weather.

Page 37: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed CoordinationAir traffic control

Goal: Free Flight

o Each pilot chooses a route independently.

o More efficient:

—Adapt to wind currents.—Avoid turbulence / bad weather.

In the USA, minimum separation: 3 miles lateral distance OR 1000 feet altitude

Page 38: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Dynamic Distributed CoordinationAdditional comments

Matthew

D. Brown

Air Traf

fic Cont

rol Usin

g Virtua

l

Stationa

ry Autom

ata

MEng, 20

08

Page 39: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Today’s Questions

1. What is virtual infrastructure?

2. What can you do with it?—Dynamic distributed coordination.

3. Does it really work?—Two simulation studies.

Page 40: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulating Virtual Infrastructure

Study #1 — Routing / Geocast

— Custom-built simulator (python)

— Simple communication model

Study #2— Address allocation (i.e., DHCP)

— ns2 simulator

— 802.11 MAC layer

Page 41: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

GeoCast

Location-based routing

Source Destination

Page 42: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

GeoCast

Location-based routing

Source Destination

Page 43: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Location Service

Store current location at home

Target

geocast

geocast

hash(id, 1)

hash(id, 2)

Page 44: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Location Service

Where are you?

Target

geocast hash(id, 2)

Source

hash(id, 1)

Page 45: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Routing

Point-to-point communication

Two step process: 1.Lookup destination location.

2.Geocast message to destination’s region.

Page 46: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

400 m

400 m

250 m

Simulation Setup

Number of devices: • 25 / 50 / 100

Velocity: • 0-20 meters / second

Mobility model:• Random waypoint• Pause time: 100-900s

Simulation time: • 1000 seconds

Basic settings

Page 47: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

400 m

400 m

250 m

Simulation Setup

GeoCast:• 10 send/receive pairs

• 1 msg every 5 secs

Routing• 10 send/receive pairs

• 1 msg every 0.5 secs

• 15 second simulation

Application settings

Page 48: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Mobility and DensityP

erce

nt o

f T

ime

Non

-Fai

led

Pause Time

100%

80%

60%

20%

40%

200 400 600 800

25 devices

100 devices

50 devices

When density is sufficient, virtual nodes work.

Page 49: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Leadership ChangesLe

ader

ship

Cha

nges

pe

r R

egio

n10

Pause Time

8

6

2

4

200 400 600 800

100 devices

There is continuous turn-over in the leader.

Page 50: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Message OverheadM

essa

ges

per

Reg

ion

per

seco

nd

Pause Time

0.5

0.4

0.3

0.05

0.1

0.01

200 400 600 800

Heartbeat

JoinLeader

Most overhead is heartbeats. (Overhead is negligible.)

Page 51: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Geocast Latency Overhead

VN-GeoCast is 2-3 times slower than simple GeoCast.

Late

ncy

(in s

econ

ds) 0.5

Pause Time

0.4

0.3

0.1

0.2

200 400 600 800

100 devices

simple Geocast

Page 52: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Routing

79%0.46

seconds0.58

seconds

Delivery Rate

Median Latency

Average Latency

End-to-end performance

Each message requires 3 GeoCast messages.

** devices=50, pausetime=400

Page 53: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulation Summary

Virtual nodes are stable if:—sufficient density (e.g.,

4/region), OR—low-enough churn

Message overhead: negligible.

GeoCast latency overhead: factor of 2.

Routing: relatively slow.

Page 54: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulation SummaryAdditional comments

Mike Spi

ndel

Simulati

on and E

valuatio

n of

th

e Reacti

ve Virtu

al Node

Layer

MEng 200

8

Page 55: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulating Virtual Infrastructure

Study #1 — Routing / Geocast

— Custom-built simulator (python)

— Simple communication model

Study #2— Address allocation (i.e., DHCP)

— ns2 simulator

— 802.11 MAC layer

Page 56: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

— Mobile devices join and leave.

— Each device needs an address.

— Addresses should be assigned dynamically.

— Addresses should be unique.

Basic problem

Address Allocation

Challenges: Highly dynamic. No central authority. Unreliable network. Limited address pool.

Page 57: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simple Scheme Each region is allocated a cache of addresses.

Basic protocol: Client send REQUEST Server reply OFFER Client send ACQUIRE Server reply ACK

Renew protocol: Client send RENEW Server reply RACK

Message forwarding…

REQUEST

ACQUIRE

RENEW

RENEW

OFFER

ACK

RACK

RACK

Virtual Node Client

Page 58: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Number of devices: • 160MAC Layer: • 802.11• Models collisionsMobility model:• Random waypointSimulation time: • 40000 seconds

700 m

700 m

250 m

Simulation Setup

Basic settings

Page 59: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Number of addresses:

30 per regionLease time:

400 secondsForwarding limit:

2 hop - REQUEST2 hop - RACKVarying - RENEW

700 m

700 m

250 m

Simulation Setup

Application settings

Page 60: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulation Setup

Simulation settings

Very Slow

SlowMedium Slow

Medium Fast

Fast

Min. Speed (m/s) 0.365 0.73 1.46 2.92 7.3

Max. Speed (m/s) 1.48 2.92 5.84 11.68 29.2

Average Pause Time (s)

4400 2200 1100 550 220

Average Cross Time (s)

82.20 41.10 20.55 10.27 4.11

Page 61: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Message Overhead

Messages per 400 secs

Percent

Heartbeats 360 76

Leader Request 24 5

Leader Reply 50 11

Synch-Request 20 4

Synch-Reply 20 4

Total Message Overhead

474

Maximum observed:

Less than 2-4.5kbps

Page 62: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Message Overhead

0

1000

2000

3000

4000

5000

6000

very slow slow medium slow medium fast fast

Other emulator messages per region

LeaderRequest msgs/region

LeaderReply msgs/region

SYN_REQUEST msgs/region

SYN_ACK msgs/region

Different speeds

Page 63: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Message Overhead

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

40 60 80 100 120total number of nodes

other emulator messages per node

LeaderRequest msgs per region

LeaderReply msgs per region

SYN_REQUEST msgs per region

SYN_ACK msgs per region

Different densities

Page 64: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Protocol Performance

messages per region

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

very slow slow medium slow medium fast fast

allocations per client

messages per region

Different speeds

Page 65: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

delay per renewal

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

very slow slow medium slow medium fast fast

renewal delay

delay per renewal

Renewal cost

Protocol Performance

Page 66: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulation Summary

Message overhead: still negligible.

— Even with collisions…— Backoff…— Bigger simulations…

Simple address allocation scheme:

— Reasonably efficient…— Scales well…

Page 67: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Simulation SummaryAdditional comments

Jiang Wu

Using Vi

rtual In

frastruc

ture to

Adapt Wi

relines

Protcols

to

MANET

Page 68: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Summary

What is virtual infrastructure? Dynamic distributed coordination

Robotic motion coordination

Self-stabilization

(Preliminary) simulation results.

The Virtual Infrastructure Project

Page 69: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Distributed Algorithms

Focus on fault-tolerance

— Replication

— Consistency

— Agreement

Design principles

— Abstraction / layered design

— IOA / TIOA formalism

Classical techniques, modern networks

Page 70: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

Seth

Gilbert

George Varghese

Boaz Patt-Shamir

Jennifer Welch

Brian Coan Kenneth Goldman

Shinya Umeno

Alex Cornejo

Mark Tuttle

Joshua Tauber

Eugene Stark

Rainer Gawlick

Alan Fekete

Victor Luchancgo

Roberto Segala

Rui FanTina NolteSayan Mitra

Calvin Newport

Carl Lividas

Jim Burns

Roger KhazanRoberto DePriscoCongratulations, Nancy, and thank you!!

Page 71: On Fault Tolerance in Wireless Ad Hoc Networks Seth Gilbert Nancy Lynch Celebration, 2008

The End