Download ppt - Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me

Lecture 10

Naming services for flat namespaces

EECE 411: Design of Distributed Software Applications

Logistics / reminders Project

Send Samer and me your group membership by the end of the week

Quizzes: Q1: next time Q2: 11/16


Implementation options: Flat namespace

Problem: Given an essentially unstructured name how can we design a scalable solution that associates names to addresses?

Possible designs: [last time] Simple solutions (broadcasting,

forwarding pointers) Hash table-like approaches

Consistent hashing, Distributed Hash Tables


Functionality to implement Map: names access points (addresses)

Similar to a hash-table Manage (huge) list of pairs (name, address)

or (key, value)

Put (key, value) Lookup (key) value

Key idea: partitioning. Allocate parts of the list to different nodes


Why the put()/get() interface?

API supports a wide range of applications imposes no structure/meaning on keys

Key/value pairs are persistent and global Can store keys in other values (indirection) And thus build complex data structures


Why Might The Design Be Hard?

Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming


The Lookup Problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Put (Key=“title”Value=file data…) Client

Get(key=“title”)

?

• At the heart of all these services


Motivation: Centralized Lookup (Napster)

Publisher@

Client

Lookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

Simple, but O(N) state and a single point of failure

Key=“title”Value=file data…

N4


Motivation: Flooded Queries (Gnutella)

N4Publisher@

Client

N6

N9

N7N8

N3

N2N1

Robust, but worst case O(N) messages per lookup

Key=“title”Value=file data…

Lookup(“title”)


Motivation: FreeDB, Routed DHT Queries (Chord, &c.)

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(H(audio data))

Key=H(audio data)Value={artist,

album title, track title}


Hash table-like approaches Consistent hashing, Distributed Hash Tables


Partition Solution: Consistent hashing

Consistent hashing: the output range of a hash function is treated as a

fixed circular space or “ring”.

CircularID Space N32

N10

N100

N80

N60

Key ID Node ID

K52

K30

K5

K99

K11

K33

128 0


Partition Solution: Consistent hashing

Mapping keys to nodes Advantages: incremental scalability, load

balancing

N32

N10

N100

N80

N60

CircularID Space

K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID


Consistent hashing

How do store & lookup work?

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID

“Key 5 isAt N10”

What node stores K5?


Additional trick: Virtual Nodes

Problem: How to do load balancing when nodes are heterogeneous?

Solution idea: Each node owns an ID space proportional to its ‘power’

Virtual Nodes: Each physical node hosts multiple (similar) virtual nodes. Virtual nodes are treated the sameAdvantages: load balancing, incremental scalability, dealing with

failures Dealing with heterogeneity: The number of virtual nodes that a

node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

When a node joins (if it supports many VN) it accepts a roughly equivalent amount of load from each of the other existing nodes.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.


Consistent Hashing – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Advantage Incremental scalability, load balancing Theoretical results:

[N number of nodes, k number of keys in the system] [With high probability] Each node is responsible for at most

(1+)K/N keys [With high probability] Joining or leaving of a node relocates

O(K/N) keys (and only to or from the responsible node)


BUT Consistent hashing – problem

How large is the state maintained at each node? O(N); N number of nodes.

N32

N10

N100

N80

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

K99

Key ID Node ID



Basic Lookup (nonsolution)

N32

N10

N5

N20

N110

N99

N80

N60

N40

“Where is key 50?”


• Lookups find the ID’s successor• Correct if successors are correct


Successor Lists Ensure Robust Lookup

N32

N10

N5

N20

N110

N99

N80

N60

• Each node remembers r successors• Lookup can skip over dead nodes

N40

10, 20, 32

20, 32, 40

32, 40, 60

40, 60, 80

60, 80, 99

80, 99, 110

99, 110, 5

110, 5, 10

5, 10, 20


“Finger Table” Accelerates Lookups

N80

½¼

1/8

1/161/321/641/128


Lookups take O(log N) hops

N32

N10

N5

N20

N110

N99

N80

N60

Lookup(K19)

K19


Summary of Performance Characteristics

Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership

changes


Joining the Ring Three step process

Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node

Two invariants to maintain to insure correctness Each node’s successor list is maintained successor(k) is responsible for monitoring k


N36

1. Lookup(37,38,40,…,100,164)

N60

N40

N5

N20N99

N80

Join: Initialize New Node’s Finger Table

Locate any node p in the ring Ask node p to lookup fingers of new

node


N36

N60

N40

N5

N20N99

N80

Join: Update Fingers of Existing Nodes

New node calls update function on existing nodes Existing nodes recursively update fingers of other

nodes


Copy keys 21..36from N40 to N36 (the others saty)K30

K38

N36

N60

N40

N5

N20N99

N80

K30

K38

Join: Transfer Keys

Only keys in the range are transferred


N120

N113

N102

N80

N85

N10

Lookup(90)

Handling Failures Problem: Failures could cause incorrect lookup Solution: Fallback: keep track of successor’s

successor (i.e., keep list of r successors)

EECE 411: Design of Distributed Software Applications28

Choosing Successor List Length

r - length of successor list N – nodes in the system

Assume 50% of the nodes fail P(successor list all dead for a specific node) =

(1/2)r i.e., P(this node breaks the ring) depends on independent failure assumption

P(no broken nodes) = (1 – (1/2)r)N

r = 2log(N) makes prob. = 1 – 1/N


DHT – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Properties Incremental scalability, good load balancing Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership changes


Some experimental results


Chord Lookup Cost Is O(log N)

Number of Nodes

Avera

ge M

ess

ag

es

per

Looku

p

Constant is 1/2


Failure Experimental Setup Start 1,000 CFS/Chord servers

Successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs

Five replicas of each Stop X% of the servers Immediately perform 1,000 lookups


DHash Replicates Blocks at r Successors

N40

N10

N5

N20

N110

N99

N80

N60

N50

Block17

N68

• Replicas are easy to find if successor fails• Hashed node IDs ensure independent failure


Massive Failures Have Little Impact

0

0.2

0.4

0.6

0.8

1

1.2

1.4

5 10 15 20 25 30 35 40 45 50

Faile

d L

ooku

ps

(Perc

en

t)

Failed Nodes (Percent)

(1/2)6 is 1.6%


Applications


An Example Application: The CD Database

Compute Disc Fingerprint

Recognize Fingerprint?

Album & Track Titles


An Example Application: The CD Database

Type In Album andTrack Titles

Album & Track Titles

No Such Fingerprint


A DHT-Based FreeDB Cache FreeDB is a volunteer service

Has suffered outages as long as 48 hours Service costs born largely by volunteer

mirrors Idea: Build a cache of FreeDB with a

DHT Add to availability of main service Goal: explore how easy this is to do


Cache Illustration

DHTDHTNew Albums

Disc Fingerp

rint

Disc In

fo

Disc Fingerprint


Trackerless BitTorrent:

A client wants to download the file: Contacts the tracker identified in

the .torrent file (using HTTP) Tracker sends client a (random)

list of peers who have/are downloading the file

Client contacts peers on list to see which segments of the file they have

Client requests segments from peers Client reports to other peers it knows about that it

has the segment Other peers start to contact client to get the segment

(while client is getting other segments)


Next

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming – enables some resource sharing Synchronization