Lecture 10
Naming services for flat namespaces
EECE 411: Design of Distributed Software Applications
Logistics / reminders Project
Send Samer and me your group membership by the end of the week
Quizzes: Q1: next time Q2: 11/16
EECE 411: Design of Distributed Software Applications
Implementation options: Flat namespace
Problem: Given an essentially unstructured name how can we design a scalable solution that associates names to addresses?
Possible designs: [last time] Simple solutions (broadcasting,
forwarding pointers) Hash table-like approaches
Consistent hashing, Distributed Hash Tables
EECE 411: Design of Distributed Software Applications
Functionality to implement Map: names access points (addresses)
Similar to a hash-table Manage (huge) list of pairs (name, address)
or (key, value)
Put (key, value) Lookup (key) value
Key idea: partitioning. Allocate parts of the list to different nodes
EECE 411: Design of Distributed Software Applications
Why the put()/get() interface?
API supports a wide range of applications imposes no structure/meaning on keys
Key/value pairs are persistent and global Can store keys in other values (indirection) And thus build complex data structures
EECE 411: Design of Distributed Software Applications
Why Might The Design Be Hard?
Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming
EECE 411: Design of Distributed Software Applications
The Lookup Problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Put (Key=“title”Value=file data…) Client
Get(key=“title”)
?
• At the heart of all these services
EECE 411: Design of Distributed Software Applications
Motivation: Centralized Lookup (Napster)
Publisher@
Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) state and a single point of failure
Key=“title”Value=file data…
N4
EECE 411: Design of Distributed Software Applications
Motivation: Flooded Queries (Gnutella)
N4Publisher@
Client
N6
N9
N7N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=file data…
Lookup(“title”)
EECE 411: Design of Distributed Software Applications
Motivation: FreeDB, Routed DHT Queries (Chord, &c.)
N4Publisher
Client
N6
N9
N7N8
N3
N2N1
Lookup(H(audio data))
Key=H(audio data)Value={artist,
album title, track title}
EECE 411: Design of Distributed Software Applications
Hash table-like approaches Consistent hashing, Distributed Hash Tables
EECE 411: Design of Distributed Software Applications
Partition Solution: Consistent hashing
Consistent hashing: the output range of a hash function is treated as a
fixed circular space or “ring”.
CircularID Space N32
N10
N100
N80
N60
Key ID Node ID
K52
K30
K5
K99
K11
K33
128 0
EECE 411: Design of Distributed Software Applications
Partition Solution: Consistent hashing
Mapping keys to nodes Advantages: incremental scalability, load
balancing
N32
N10
N100
N80
N60
CircularID Space
K33, K40, K52
K11, K30
K5, K10
K65, K70
K99
Key ID Node ID
EECE 411: Design of Distributed Software Applications
Consistent hashing
How do store & lookup work?
N32
N10
N100
N80
N60 K33, K40, K52
K11, K30
K5, K10
K65, K70
K99
Key ID Node ID
“Key 5 isAt N10”
What node stores K5?
EECE 411: Design of Distributed Software Applications
Additional trick: Virtual Nodes
Problem: How to do load balancing when nodes are heterogeneous?
Solution idea: Each node owns an ID space proportional to its ‘power’
Virtual Nodes: Each physical node hosts multiple (similar) virtual nodes. Virtual nodes are treated the sameAdvantages: load balancing, incremental scalability, dealing with
failures Dealing with heterogeneity: The number of virtual nodes that a
node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.
When a node joins (if it supports many VN) it accepts a roughly equivalent amount of load from each of the other existing nodes.
If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.
EECE 411: Design of Distributed Software Applications
Consistent Hashing – Summary so far
Mechanism: Nodes get an identity by hashing their IP address, keys are also
hashed into same space A key with id (hashed into) k, is assigned to first node whose
hashed id is equal or follows k, in circular space: successor(k)
Advantage Incremental scalability, load balancing Theoretical results:
[N number of nodes, k number of keys in the system] [With high probability] Each node is responsible for at most
(1+)K/N keys [With high probability] Joining or leaving of a node relocates
O(K/N) keys (and only to or from the responsible node)
EECE 411: Design of Distributed Software Applications
BUT Consistent hashing – problem
How large is the state maintained at each node? O(N); N number of nodes.
N32
N10
N100
N80
N60 K33, K40, K52
K11, K30
K5, K10
K65, K70
K99
Key ID Node ID
“Key 5 isAt N10”
EECE 411: Design of Distributed Software Applications
Basic Lookup (nonsolution)
N32
N10
N5
N20
N110
N99
N80
N60
N40
“Where is key 50?”
“Key 50 isAt N60”
• Lookups find the ID’s successor• Correct if successors are correct
EECE 411: Design of Distributed Software Applications
Successor Lists Ensure Robust Lookup
N32
N10
N5
N20
N110
N99
N80
N60
• Each node remembers r successors• Lookup can skip over dead nodes
N40
10, 20, 32
20, 32, 40
32, 40, 60
40, 60, 80
60, 80, 99
80, 99, 110
99, 110, 5
110, 5, 10
5, 10, 20
EECE 411: Design of Distributed Software Applications
“Finger Table” Accelerates Lookups
N80
½¼
1/8
1/161/321/641/128
EECE 411: Design of Distributed Software Applications
Lookups take O(log N) hops
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
EECE 411: Design of Distributed Software Applications
Summary of Performance Characteristics
Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership
changes
EECE 411: Design of Distributed Software Applications
Joining the Ring Three step process
Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node
Two invariants to maintain to insure correctness Each node’s successor list is maintained successor(k) is responsible for monitoring k
EECE 411: Design of Distributed Software Applications
N36
1. Lookup(37,38,40,…,100,164)
N60
N40
N5
N20N99
N80
Join: Initialize New Node’s Finger Table
Locate any node p in the ring Ask node p to lookup fingers of new
node
EECE 411: Design of Distributed Software Applications
N36
N60
N40
N5
N20N99
N80
Join: Update Fingers of Existing Nodes
New node calls update function on existing nodes Existing nodes recursively update fingers of other
nodes
EECE 411: Design of Distributed Software Applications
Copy keys 21..36from N40 to N36 (the others saty)K30
K38
N36
N60
N40
N5
N20N99
N80
K30
K38
Join: Transfer Keys
Only keys in the range are transferred
EECE 411: Design of Distributed Software Applications
N120
N113
N102
N80
N85
N10
Lookup(90)
Handling Failures Problem: Failures could cause incorrect lookup Solution: Fallback: keep track of successor’s
successor (i.e., keep list of r successors)
EECE 411: Design of Distributed Software Applications28
Choosing Successor List Length
r - length of successor list N – nodes in the system
Assume 50% of the nodes fail P(successor list all dead for a specific node) =
(1/2)r i.e., P(this node breaks the ring) depends on independent failure assumption
P(no broken nodes) = (1 – (1/2)r)N
r = 2log(N) makes prob. = 1 – 1/N
EECE 411: Design of Distributed Software Applications
DHT – Summary so far
Mechanism: Nodes get an identity by hashing their IP address, keys are also
hashed into same space A key with id (hashed into) k, is assigned to first node whose
hashed id is equal or follows k, in circular space: successor(k)
Properties Incremental scalability, good load balancing Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership changes
EECE 411: Design of Distributed Software Applications
Some experimental results
EECE 411: Design of Distributed Software Applications
Chord Lookup Cost Is O(log N)
Number of Nodes
Avera
ge M
ess
ag
es
per
Looku
p
Constant is 1/2
EECE 411: Design of Distributed Software Applications
Failure Experimental Setup Start 1,000 CFS/Chord servers
Successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs
Five replicas of each Stop X% of the servers Immediately perform 1,000 lookups
EECE 411: Design of Distributed Software Applications
DHash Replicates Blocks at r Successors
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
• Replicas are easy to find if successor fails• Hashed node IDs ensure independent failure
EECE 411: Design of Distributed Software Applications
Massive Failures Have Little Impact
0
0.2
0.4
0.6
0.8
1
1.2
1.4
5 10 15 20 25 30 35 40 45 50
Faile
d L
ooku
ps
(Perc
en
t)
Failed Nodes (Percent)
(1/2)6 is 1.6%
EECE 411: Design of Distributed Software Applications
Applications
EECE 411: Design of Distributed Software Applications
An Example Application: The CD Database
Compute Disc Fingerprint
Recognize Fingerprint?
Album & Track Titles
EECE 411: Design of Distributed Software Applications
An Example Application: The CD Database
Type In Album andTrack Titles
Album & Track Titles
No Such Fingerprint
EECE 411: Design of Distributed Software Applications
A DHT-Based FreeDB Cache FreeDB is a volunteer service
Has suffered outages as long as 48 hours Service costs born largely by volunteer
mirrors Idea: Build a cache of FreeDB with a
DHT Add to availability of main service Goal: explore how easy this is to do
EECE 411: Design of Distributed Software Applications
Cache Illustration
DHTDHTNew Albums
Disc Fingerp
rint
Disc In
fo
Disc Fingerprint
EECE 411: Design of Distributed Software Applications
Trackerless BitTorrent:
A client wants to download the file: Contacts the tracker identified in
the .torrent file (using HTTP) Tracker sends client a (random)
list of peers who have/are downloading the file
Client contacts peers on list to see which segments of the file they have
Client requests segments from peers Client reports to other peers it knows about that it
has the segment Other peers start to contact client to get the segment
(while client is getting other segments)
EECE 411: Design of Distributed Software Applications
Next
A distributed system is: a collection of independent computers that
appears to its users as a single coherent system
Components need to: Communicate Cooperate => support needed
Naming – enables some resource sharing Synchronization