47
Geographic Hash Table S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin and F. Yu

Geographic Hash Table S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin and F. Yu

Embed Size (px)

Citation preview

Geographic Hash Table

S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin and F. Yu

Overview

Data Centric Storage Data is stored inside the network – each name

corresponds to a location in space All data with the same name will be stored at the same

sensor network location E.g an elephant sighting

Why Data centric Storage? Energy efficiency Robustness against mobility and node failures Scalability

Keywords and Terminology Observation ♦ low-level readings from sensors

♦ e.g. Detailed temperature readings

Events ♦ Predefined constellations of low-level observations

♦ e.g. temperature greater than 75 F

Queries ♦Used to elicit information from sensor network

Performance Metric:Total Usage /Hotspot Usage

Use communication as a cost function for energy consumption

Total Usage Total number of packets sent in the Sensor network

Hotspot Usage The maximal number of packets send by a particular sensor

node Costs used in the evaluation

Message flooding cost O(n) Point-to-point routing cost n is the number of nodes nO

Alternative Storage Schemes

External Storage (ES) Events propagated and stored at an external location

Local Storage (LS) Events stored locally at the detecting node Queries are flooded to all nodes and the events are

sent back Data Centric Storage (DCS)

Data for an event stored within the sensor network Queries are directed to the node that stores the data

External Storage (ES)

External

storage

event

Local Storage (LS)

EventData

EventData

event

event

Queries flooded at allthe nodes

Why do we need DCS?

Scalability Robustness against Node failures and Node

mobility To achieve Energy-efficiency

Design Criterial: Scalability & Robustness

Node failures Topology changes System scale to large number of nodes Energy Constraints Persistence

(k,v) pair must remain available to queries, despite sensor node failures and changes in sensor network topology

Consistency A query k must be routed correctly to a node where (k,v) pairs

are stored – if these node change, then they should do this consisently

Scaling in Database Size Topological generality – system should scale well on a large

number of topologies

Assumptions in DCS

Large Scale networks whose approximate geographic boundaries are known

Nodes have short range communication and are within the radio range of several other nodes

Nodes know their own locations by GPS or some localization scheme

Communication to the outside world takes place by one or more access points

Data Centric Storage

Relevant Data are stored by “name” at nodes within the Sensor network

All data with the same general name will be stored at the same sensor-net node.

e.g. (“elephant sightings”) Queries for data with a particular name are

then sent directly to the node storing those named data

Data centric Storage

EventData

Elephant Sighting

source:lass.cs.umass.edu

Geographic Hash Table Events are named with keys and both the

storage and the retrieval are performed using keys

GHT provides (key, value) based associative memory

Geographic Hash Table Operations

GHT supports two operations

♦ Put(k,v)-stores v (observed data) according to the key k

♦ Get(k)-retrieve whatever value is associated with key k

Hash function

♦ Hash the key in to the geographic coordinates

♦ Put() and Get() operations on the same key “k” hash k to the same location

Storing Data in GHT Put (“elephant”, data)

(12,24)

Hash (‘elephant’)=(12,24)

source:lass.cs.umass.edu

Retrieving data in GHT

Get (“elephant”)

Hash (‘elephant’)=(12,24)

(12,24)

Geographic Hash Table

PDA

Node A

Node B

Algorithms Used By GHT

Geographic hash Table uses GPSR for Routing(Greedy Perimeter Stateless Routing)

PEER-TO-PEER look up system (data object is associated with key and each

node in the system is responsible for storing a certain range of keys)

Algorithm (Contd) GPSR- Packets are marked with position of

destinations and each node is aware of its position

Greedy forwarding algorithm Perimeter forwarding algorithm

A

B

A

B

GPSR: Right-Hand Rule In Perimeter Forwarding

x

y

z

1

2

3

Home Node and Home perimeter Home node: Node geographically nearest to the destination

coordinates of the packet Serves as the rendezvous point for Get() and Put() operations on the

same key In GHT packet is not addressed to specific node but only to a specific

location Use GPSR to find the home node only perimeter mode of GPSR to find Home Perimeter

Home Perimeter – perimeter that encloses the destination Start from the home node, and use perimeter mode to make a cycle

and return to the home node

Problems

Robustness could be affected» Nodes could move (i.d. of Home node?)» Node failure can Occur» Deployment of new Nodes

Not Scalable» Storage capacity of the home nodes» Bottleneck at Home nodes

Solutions to the problems

Perimeter refresh protocol mostly addresses the robustness issue

Structured Replication address the scalability issue how to handle storage of many events

Perimeter refresh protocol Replicates stored data for key k at nodes around the location to which k

hashes Stores a copy of the key value pair at each node on the home perimeter Each node on the perimeter is called a replica node

How do you ensure consistency & persistence A node becomes the home node if a packet for a particular key arrives at

that node The perimeter refresh protocols periodically sends out refresh packets

After a time period Th generate a refresh packet that contains the data for that key

Packet forwarded on the home perimeter in the same way as Get() and Put() The refresh packet will take a tour of the home perimeter regardless the

changes in the network topology since the key’s insertion This property maintains the perimeter

Perimeter Refresh Protocol How do you guard against node failures

When a replica node receives a packet it did not originate, it caches the data in the refresh and sets up a takeover timer Tt

Timer is reset each time a refresh from another node arrives

If the timer expires the replica node initiates a refresh packet addressed to the key’s hashed location

Note: That particular node does not determine a new home node. The GHT routing causes the refresh to reach a node home node

Perimeter Refresh Protocol

E

F

B

D

A

C

L

home

Replica

ReplicaAssume key k hashes at location L

A is closest to L so it becomesthe home node

Perimeter Refresh Protocol

ED

F

C

B

L

ReplicaReplica

home

Replica

Replica

Suppose the node A dies

Time Specifications Refresh time (Th)

Take over time (Tt)

Death time (Td)

General rule

Td>Th and Tt>Th

In GHT Td=3Th and Tt=2Th

Characteristics Of Refresh Packet

Refresh packet is addressed to the hashed location of the key

Every (Th) secs the home node will generate refresh packet

Refresh packet contains the data stored for the key and routed exactly as get() and put() operations

Refresh packet always travels along the home perimeter

Structured Replication

Too many events are detected then home node will become the hotspot of communication.

Structured replication is used to address the scaling problem

Hierarchical decomposition of the key space Event names have a certain hierarchy depth

Structured Replication

Structured Replication A node that detects a new event, stores that event

to its closest mirror this is easily computable

This reduces the storage cost, but increases the query cost

GHT has to route the queries to all mirror nodes Queries are routes recursively

First route query to the root, then to the first level and then to the second level mirrors

Structured replication becomes more useful for frequently detected events

Evaluation

Simulation to test if the protocol is functioning correctly

Done in the ns-2 network simulator using an IEEE 802.11 mac This is a well known event driven simulator for ad-

hoc networks Larger scale simulations for the comparative

study where done with a custom simulator

Comparative Study

Simulation compares the following schemes External Storage (ES) Local Storage (LS) Normal DCS – A query returns a separate message for

each detected event Summarized DCS(S-DCS): A query returns a single

message regardless of the number of detected events Structured Replication DCS (SR_DCS) – Assuming an

optimal level of SR Comparison based on Cost Comparison based on Total usage and Hot spot

usage

Assumptions in comparison

Asymptotic costs of O(n) for floods and O( n) for point to point routing

Event locations are distributed randomly Event locations are not known in advance No more than one query for each event type

(Q –Queries in total) Assume access points to be the most heavily used area

of the sensor network

Comparison based onHot-spot/Total Usage

n - Number of nodes T - Number of Event types Q – Number Of Event types queried for Dtotal – Total number of detected events

DQ- Number of detected events for queries

DCS TYPES Normal DCS – Query returns a separate

message for each detected event Summarized DCS – Query returns a single

message regardless of the number of detected events

(usually summary is preferred)

Comparison Study – contd..

ES LS DCS

Total

Hot

spot

nDtotal nDQn q nDnDnQ qtotal

totalDqDQ qDQ

)(summarynQnDnQ total

)(2 summaryQ

Observations from the Comparison

DCS is preferable only in cases where

Sensor network is Large There are many detected events and not all

event types queried

Dtotal>>max(Dq,Q)

Simulations

To check the Robustness of GHT

To compare the Storage methods in terms of total and hot spot usage

Simulation Setup

ns-2 Node Density – 1node/256m2

Radio Range – 40 m Number of Nodes -50,100,150,200 Mobility Rate -0,0.1,1m/s Query generation Rate -2qps Event types – 20 Events detected -10/type Refresh interval -10 s

Performance metrics

Availability of data stored to Queriers

(In terms of success rate)

Loads placed on the nodes participating in GHT (hotspot usage)

Simulation Results for Robustness

GHT offers perfect availability of stored events in static case

It offers high availability when nodes are subjected to mobility and failures

Simulation Results under varying Q

Number of nodes is

constant= 10000

Simulation results under varying N

Number of Queries Q =50

Simulation Results for comparison of 3-storage methods

S-DCS have low hot-spot usage under varying “Q”

S-DCS is has the lowest hot-spot usage under varying “n”

Conclusion

Data centric storage entails naming of data and storing data at nodes within the sensor network

GHT- hashes the key (events) in to geographical co-ordinates and stores a key-value pair at the sensor node geographically nearest to the hash

GHT uses Perimeter Refresh Protocol and structured replication to enhance robustness and scalability

DCS is useful in large sensor networks and there are many detected events but not all event types are Queried