Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris

Distributed Hash Tables

CPE 401 / 601Computer Network Systems

Modified from Ashwin Bharambe and Robert Morris

P2P Search

How do we find an item in a P2P network? Unstructured:

• Query-flooding• Not guaranteed• Designed for common case: popular items

Structured• DHTs – a kind of indexing• Guarantees to find an item• Designed for both popular and rare items

The lookup problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Put (Key=“title”Value=file data…) Client

Get(key=“title”)

?

• At the heart of all DHTs

Centralized lookup (Napster)

Publisher@

Client

Lookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

Simple, but O(N) state and a single point of failure

Key=“title”Value=file data…

N4

Flooded queries (Gnutella)

N4Publisher@

Client

N6

N9

N7N8

N3

N2N1

Robust, but worst case O(N) messages per lookup


Lookup(“title”)

Routed queries (Freenet, Chord, etc.)

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(“title”)


Hash Table

Name-value pairs (or key-value pairs) E.g,. “Mehmet Hadi Gunes” and [email protected] E.g., “http://cse.unr.edu/” and the Web page E.g., “HitSong.mp3” and “12.78.183.2”

Hash table Data structure that associates keys with values

7

lookup(key) valuekey value

Distributed Hash Table

Hash table spread over many nodes Distributed over a wide area

Main design goals Decentralization

• no central coordinator Scalability

• efficient even with large # of nodes Fault tolerance

• tolerate nodes joining/leaving

Distributed hash table (DHT)

Distributed hash table

Distributed application

get (key) data

node node node….

put(key, data)

Lookup service

lookup(key) node IP address

• Application may be distributed over many nodes• DHT distributes data storage over many nodes

Distributed Hash Table

Two key design decisions How do we map names on to nodes? How do we route a request to that node?

Hash Functions

Hashing Transform the key into a number And use the number to index an array

Example hash function Hash(x) = x mod 101, mapping to 0, 1, …, 100

Challenges What if there are more than 101 nodes?

Fewer? Which nodes correspond to each hash value? What if nodes come and go over time?

Definition of a DHT

Hash table supports two operations insert(key, value) value = lookup(key)

Distributed Map hash-buckets to nodes

Requirements Uniform distribution of buckets Cost of insert and lookup should scale

well Amount of local state (routing table size)

should scale well

Plaxton Trees

Motivation Access nearby copies of replicated objects Time-space trade-off

• Space = Routing table size• Time = Access hops

Plaxton Trees Algorithm

9 A E 4 2 4 7 B

1. Assign labels to objects and nodes

Each label is of log2b n digits

Object Node

- using randomizing hash functions

Plaxton Trees Algorithm

2 4 7 B

2. Each node knows about other nodes with varying prefix matches

Node

2 4 7 B

2 4 7 B

2 4 7 B2 4 7 B

3

1

5

3

6

8

A

C

2

2

2 4

2 42 4 7

2 4 7

Prefix match of length 0




Plaxton Trees Object Insertion and Lookup

Given an object, route successively towards nodes

with greater prefix matches

2 4 7 B

Node

9 A E 2

9 A 7 6

9 F 1 0

9 A E 4

Object

Store the object at each of these locations

Plaxton Trees Object Insertion and Lookup

Given an object, route successively towards nodes

with greater prefix matches

2 4 7 B

Node

9 A E 2

9 A 7 6

9 F 1 0

9 A E 4

Object

Store the object at each of these locations

log(n) steps to insert or locate object

Plaxton Trees Why is it a tree?

2 4 7B

9 F 1 0

9A7 6

9AE 2

Object

Object

Object

Object

Plaxton Trees Network Proximity Overlay tree hops could be totally

unrelated to the underlying network hops

USA

Europe

East Asia

Plaxton trees guarantee constant factor approximation! Only when the topology is uniform in some sense

Chord Map nodes and keys to identifiers

Using randomizing hash functions Arrange them on a circle

IdentifierCircle

x

succ(x)

010110110

010111110

pred(x)

010110000

Consistent Hashing0

4

8

12 Bucket

14

• Construction– Assign each of C hash buckets to random points on

mod 2n circle; hash key size = n– Map object to random position on circle– Hash of object = closest clockwise bucket

• Desired features– Balanced: No bucket responsible for large number of objects– Smoothness: Addition of bucket does not cause major movement among

existing buckets– Spread and load: Small set of buckets that lie near object

• Similar to that later used in P2P Distributed Hash Tables (DHTs)• In DHTs, each node only has partial view of neighbors

Consistent Hashing Large, sparse identifier space (e.g., 128

bits) Hash a set of keys x uniformly to large id space Hash nodes to the id space as well

0 1

Hash(name) object_idHash(IP_address) node_id

Id space represented as a ring

2128-1

Where to Store (Key, Value) Pair?

Mapping keys in a load-balanced way Store the key at one or more nodes Nodes with identifiers “close” to the key

• where distance is measured in the id space

Advantages Even distribution Few changes as

nodes come and go…


Hash a Key to Successor

N32

N10

N100

N80

N60

CircularID Space

• Successor: node with next highest ID

K33, K40, K52

K11, K30

K5, K10

K65, K70

K100

Key ID Node ID

Joins and Leaves of Nodes Maintain a circularly linked list around the

ring Every node has a predecessor and successor

node

pred

succ

Successor Lists Ensure Robust Lookup

N32

N10

N5

N20

N110

N99

N80

N60

• Each node remembers r successors• Lookup can skip over dead nodes to find blocks

N40

10, 20, 32

20, 32, 40

32, 40, 60

40, 60, 80

60, 80, 99

80, 99, 110

99, 110, 5

110, 5, 10

5, 10, 20

Joins and Leaves of Nodes

When an existing node leaves Node copies its <key, value> pairs to its

predecessor Predecessor points to node’s successor in the

ring When a node joins

Node does a lookup on its own id And learns the node responsible for that id This node becomes the new node’s successor And the node can learn that node’s

predecessor• which will become the new node’s predecessor

Nodes Coming and Going

Small changes when nodes come and go Only affects mapping of keys mapped to the

node that comes or goes


How to Find the Nearest Node? Need to find the closest node

To determine who should store (key, value) pair

To direct a future lookup(key) query to the node

Strawman solution: walk through linked list Circular linked list of nodes in the ring O(n) lookup time when n nodes in the ring

Alternative solution: Jump further around ring “Finger” table of additional overlay links

Links in the Overlay Topology Trade-off between # of hops vs. # of neighbors

E.g., log(n) for both, where n is the number of nodes E.g., such as overlay links 1/2, 1/4 1/8, … around the ring Each hop traverses at least half of the remaining distance

1/2

1/4

1/8

Chord “Finger Table” Accelerates Lookups

N80

½¼

1/8

1/161/321/641/128

Chord lookups take O(log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19

Chord Self-organization Node join

Set up finger i: route to succ(n + 2i) log(n) fingers ) O(log2 n) cost

Node leave Maintain successor list for ring connectivity Update successor list and finger pointers

Chord lookup algorithm properties

Interface: lookup(key) IP address Efficient: O(log N) messages per lookup

N is the total number of servers Scalable: O(log N) state per node Robust: survives massive failures Simple to analyze

Comparison of Different DHTs

# Links per node Routing hops

Pastry/Tapestry O(2b log2b n) O(log2

b n)

Chord log n O(log n)

CAN d dn1/d

SkipNet O(log n) O(log n)

Symphony k O((1/k) log2 n)

Koorde d logd n

Viceroy 7 O(log n)

What can DHTs do for us?

Distributed object lookup Based on object ID

De-centralized file systems CFS, PAST, Ivy

Application Layer Multicast Scribe, Bayeux, Splitstream

Databases PIER

Where are we now?

Many DHTs offering efficient and relatively robust routing

Unanswered questions Node heterogeneity Network-efficient overlays vs. Structured

overlays• Conflict of interest!

What happens with high user churn rate? Security

Are DHTs a panacea?

Useful primitive Tension between network efficient

construction and uniform key-value distribution

Does every non-distributed application use only hash tables? Many rich data structures which cannot be built

on top of hash tables alone Exact match lookups are not enough Does any P2P file-sharing system use a DHT?

Documents

Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris