View
218
Download
1
Tags:
Embed Size (px)
Citation preview
DHT* Applications
Jeffrey PangCMU NetTalk, Dec. 5, 2003
* and DOLR
Jeffrey Pang, Carnegie Mellon, NetTalk 2Dec. 5, 2003
Brief Review of DHTs
Many DHTs: PRR Trees, Pastry, Tapestry Chord, Symphony CAN SkipNet, Kademlia, Koorde, Viceroy, etc., etc.
Good Properties: Distributed construction/maintenance Load-balanced with uniform identifiers O(log n) hops / neighbors per node Provides underlying network proximity
Jeffrey Pang, Carnegie Mellon, NetTalk 3Dec. 5, 2003
Brief Review of DHTs
2 4 7 B
9 F 1 0
9 A 7 6
9 A E 2
IdentifierCircle
x
succ(x)
010110110
010111110
pred(x)010110000
source
key
Zone
Jeffrey Pang, Carnegie Mellon, NetTalk 4Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 5Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 6Dec. 5, 2003
DHT vs. DOLR
Distributed Hash Table Paradigm: Location of objects determined by overlay put(key, object) get(key, object)
Distributed Object Location and Routing Paradigm: Location of objects determined by application Application publishes pointers in overlay publish(key, id) locate(key)
Jeffrey Pang, Carnegie Mellon, NetTalk 7Dec. 5, 2003
DHT Paradigm
obj key obj
put(key, object) get(key, object)
Jeffrey Pang, Carnegie Mellon, NetTalk 8Dec. 5, 2003
DOLR Paradigm
key
publish(key, id) locate(key)
- back pointer
key
Of course, many apps usea little bit of both paradigms...
Jeffrey Pang, Carnegie Mellon, NetTalk 9Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 10Dec. 5, 2003
Storage Systems
Mnemosnye [Hand & Roscoe, IPTPS02] stenographic storage
PAST [Rowstron & Druschel, SOSP01] file-based storage substrate
CFS [Dabek, et al., SOSP01] single writer cooperative storage
Ivy [Muthitacharoen, et al., OSDI02] small group read/write storage
OceanStore [Kubiatowicz, et al., ASPLOS00, FAST03] global-scale persistent storage
Jeffrey Pang, Carnegie Mellon, NetTalk 11Dec. 5, 2003
Mnemosnye Target:
Data that requires privacy and plausible deniability Uses:
Tapestry as DHT Basic idea:
Compute n hashes for a block: h0, h1 = H(h0), ..., hn-1 = H(hn-2)
Store the (encrypted) block at the addresses h0, ..., hn-1 (mod X = size of store).
Given h0 and key, try to lookup and decrypt each replica in turn (success if passes validity check)
In a p2p overlay, use part of the hash value as a node address, the other part as the block addr on that node
Importance: Simple. Only uses the basic get/put operators. ... but requires end nodes to obey block addresses
Jeffrey Pang, Carnegie Mellon, NetTalk 12Dec. 5, 2003
PAST Target:
Wide area heterogeneous storage (e.g., web) Uses:
Pastry as DHT Basic Idea:
Store a file at h = H(file); Lookup with h Replicate file at leaf-set of root (l nearest
nodes in id-space) Cache file along lookup paths Deal with heterogeneity using virtual nodes
and replica diversion Importance:
Graceful degradation under high utilization
Jeffrey Pang, Carnegie Mellon, NetTalk 13Dec. 5, 2003
PAST Space Management
Jeffrey Pang, Carnegie Mellon, NetTalk 14Dec. 5, 2003
PAST Caching
Jeffrey Pang, Carnegie Mellon, NetTalk 15Dec. 5, 2003
CFS Target:
Single writer, multiple readers (e.g., FTP) Uses:
Chord as DHT Basic Idea:
FS implemented on top of DHash layer DHash replication, caching, load balancing same as
PAST Secure updates and deletion using signed root block and
cryptographic hashes to identify directory and file blocks Pre-fetch blocks of the same file/directory
Importance: “Real-life” evaluation comparable to FTP
Jeffrey Pang, Carnegie Mellon, NetTalk 16Dec. 5, 2003
CFS File System Structure
signature
public key
Root Block
D
DirectoryBlock
H(D)
F
H(F)
File Block
B1 B2
H(B1)H(B2)
Jeffrey Pang, Carnegie Mellon, NetTalk 17Dec. 5, 2003
CFS “Real-Life” Evaluation
CFS Pair-wise TCP
Jeffrey Pang, Carnegie Mellon, NetTalk 18Dec. 5, 2003
Ivy Target:
Read/write storage for small groups (e.g., CVS) Uses:
Chord as DHT Basic Idea:
Implemented on top of DHash layer (identical to CFS) Each FS has a view consisting of n logs, one per writer Write operations go to personal log Reads reconstruct data by reading all logs in view;
occasionally snapshot FS to prevent long traversals Consistency using version vectors (application resolvers
for concurrent versions; e.g., created during partition) Importance:
Another “real-life” evaluation, but disappointing Practical model for read/write in a p2p environment
Jeffrey Pang, Carnegie Mellon, NetTalk 19Dec. 5, 2003
Ivy Log Structure
Log head
Log head
Alice
Bob
create
write
linkex-create
deletewrite
delete
View
Jeffrey Pang, Carnegie Mellon, NetTalk 20Dec. 5, 2003
Ivy Wide Area PerformanceModifiedAndrewBenchmark
CVS
Jeffrey Pang, Carnegie Mellon, NetTalk 21Dec. 5, 2003
OceanStore Target:
Global storage as a “utility” Uses:
Tapestry as DOLR Basic Idea:
Use Tapestry for (all) object and service location. Writes go to an Inner-Ring, serialized using Byzantine
Agreement Writes create new versions of blocks, which are
permanently dispersed into archive using erasure codes Reads go to closest replica in a dissemination tree
rooted at Inner-Ring Importance:
Wide area Byzantine commit Performance of strong crypto in critical path Caching in a DOLR (only participating nodes involved)
Jeffrey Pang, Carnegie Mellon, NetTalk 22Dec. 5, 2003
OceanStore Update Path
Jeffrey Pang, Carnegie Mellon, NetTalk 23Dec. 5, 2003
OceanStore Object Model
Jeffrey Pang, Carnegie Mellon, NetTalk 24Dec. 5, 2003
OceanStore Inner Ring Perf.
Jeffrey Pang, Carnegie Mellon, NetTalk 25Dec. 5, 2003
OceanStore Read Perf.
Archive Read
Streaming Readsfrom Replicas
Jeffrey Pang, Carnegie Mellon, NetTalk 26Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 27Dec. 5, 2003
Multicast Applications
Bayeux [Zhuang, et al., NOSSDAV01] Simple single tree per source on DOLR
Scribe [Rowstron, et al., NGC01, INFOCOMM03] Simple single tree per source on DHT
SplitStream [Castro, et al., SOSP03] Multiple disjoint trees per source
i3 [Stoica, et al. SIGCOMM02] Internet Indirection Infrastructure (mobility,
{multi,any}cast, service composition)
Jeffrey Pang, Carnegie Mellon, NetTalk 28Dec. 5, 2003
Bayeux Target:
Multimedia Streaming Uses:
Tapestry as DOLR Basic Idea:
Advertise session with fake file in Tapestry Clients join by routing message to source id (after
learning of it by lookup up the session) All intermediate routers on path join tree Support multiple roots by having multiple sources
advertise a session (lookups converge to “closest”) Take advantage of routing redundancy to provide best
performance (shortest link) / tolerate faults (predict link reliability)
Importance: Relatively simple (no “frills”) multicast on a DOLR
Jeffrey Pang, Carnegie Mellon, NetTalk 29Dec. 5, 2003
Scribe Target:
Event notification / pubsub systems (e.g., IM) Uses:
Pastry or CAN as DHTs Basic Idea:
Publications routed to root in Pastry Recursively forwarded to all children in tree Subscriptions cause all nodes on path to root to join tree When your parent dies repair by routing to a new parent More complex ways to load balance (e.g., make children
into grandchildren) described in later JSAC article Importance:
Another simple multicast on a DHT Building block for more complex applications
Jeffrey Pang, Carnegie Mellon, NetTalk 30Dec. 5, 2003
SplitStream Target:
P2P streaming / bulk file transfer Uses:
Pastry Basic Idea:
Split content into k stripes Construct k interior-node disjoint Scribe trees Distribute one stripe per tree Receivers choose number of stripes to receive (e.g.,
trade off quality for inbound capacity) Limit out-degree of nodes with join-heuristics (later)
Importance: All nodes share in forwarding of data (w.h.p.) Nifty use of Pastry ids to construct forest (next slide)
Jeffrey Pang, Carnegie Mellon, NetTalk 31Dec. 5, 2003
SplitStream Forest Construction
Notice that all interior nodes must have the same first digit in their node id Pastry routing: first hop will match first digit
Source sends stripes to k different trees Root trees at nodes with different first digits
If each digit is b bits, make k = 2b stripes Each node will be interior node of at most one tree (the
tree that matches their first digit)
Jeffrey Pang, Carnegie Mellon, NetTalk 32Dec. 5, 2003
SplitStream Limiting Out-Degree
If too many children, kick one out First, orphaned child tries “push-down”
Can I join a sibling? And continue recursively on sibling’s children
Second, use the spare capacity group Independent scribe multicast tree Composed of nodes that have spare capacity Orphan anycasts message to this group Receiver of anycast starts DFS of spare
capacity tree until it finds a node that has the desired stripe
Orphan joins that node If in-degree = k, this never fails*
Jeffrey Pang, Carnegie Mellon, NetTalk 33Dec. 5, 2003
SplitStream Overhead
Forest Construction
Control MessageOverhead underHigh Churn
Jeffrey Pang, Carnegie Mellon, NetTalk 34Dec. 5, 2003
i3 - Internet Indirection Infrastructure
Target: Rendezvous-based communication (IP indirection)
Uses: Chord
Basic Idea: Receivers insert triggers (id, receiver_id) into DHT Senders send to id, meet at triggers, which send to
receivers Supports:
Mobility: reinsert your trigger when you move Multicast & anycast: use longest-prefix match on ids to
build tree Service composition: use stacks of triggers, which act like
source routing in IP Importance:
Very low level, best-effort service built on DHT
Jeffrey Pang, Carnegie Mellon, NetTalk 35Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 36Dec. 5, 2003
PIER [Huebsch, et al., VLDB03] Target:
in situ distributed querying (e.g., network monitoring) Uses:
CAN as DHT Basic Idea:
Tables named by (namespace, resourceID); e.g., (application, primary_key)
Store tables in DHT keyed by this pair Lookup tuples by routing to a table(s)’ key and having
the end nodes do an lscan for you Join NR and NS by creating a new namespace NQ in DHT
and rehashing tuples to NQ which determines matches Importance:
Another simple multicast on a DHT Building block for more complex applications
Jeffrey Pang, Carnegie Mellon, NetTalk 37Dec. 5, 2003
PIER Performance
Jeffrey Pang, Carnegie Mellon, NetTalk 38Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 39Dec. 5, 2003
Misc. Applications
POST [Mislove, et al., HotOS03] Collaborative Applications
Approximate Object Location [Zhou, et al. Middleware03] Collaborative Spam Filtering
Jeffrey Pang, Carnegie Mellon, NetTalk 40Dec. 5, 2003
POST Target:
Toolbox for collaborative apps (e.g., email, IM, etc.) Uses:
Pastry as DHT Basic Idea:
Use PAST as storage substrate Use Scribe as notification system Assume certificate authority for assigning user IDs, keys Example: Email
Insert new mail into PAST (encrypted) Notify recipient using Scribe (delegate if not online)
Importance: Use second level systems as substrate for more complex
applications (see also OceanStore: email, nfs, web cache)
Jeffrey Pang, Carnegie Mellon, NetTalk 41Dec. 5, 2003
Approximate Object Location Target:
Collaborative filtering (e.g., Spam detection) Uses:
Tapestry as DOLR Basic Idea:
Calculate checksums of all strings of length L in message. Select N of them deterministically (“feature” vector)
Two messages match if enough features match To mark spam, insert my node into Tapestry keyed by
each feature To detect spam, lookup its features. Will get back a set of
nodes that marked each feature as spam (“votes”). Importance:
Scary, but looking more and more useful. E.G., recent DoS attacks on RBLs.
They have a plug-in for Outlook that works
Jeffrey Pang, Carnegie Mellon, NetTalk 42Dec. 5, 2003
Overview of Talk
Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals
Jeffrey Pang, Carnegie Mellon, NetTalk 43Dec. 5, 2003
API & Infrastructure Proposals
One Ring to Rule them All [Castro, et al. SIGOPS02] Bootstrapping multiple overlays
Common P2P API [Dabek, et al. IPTPS02] DHT/DOLR as a library
OpenHash [Karp, et al. IPTPS04*] DHT as a service
*submitted
Jeffrey Pang, Carnegie Mellon, NetTalk 44Dec. 5, 2003
One Ring to Rule them All Goal:
Bootstrap multiple overlays Basic Idea:
Everyone joins a “universal” Pastry ring This ring implements PAST, Scribe, and distributed
search (see Harren, et al., IPTPS02) Advertise your overlay service in the search engine Store your code and certificates in PAST Upgrades disseminated through Scribe
Importance: How one might use an overlay to manage overlays Interesting title for a Microsoft paper :)
Jeffrey Pang, Carnegie Mellon, NetTalk 45Dec. 5, 2003
Common P2P API Goal:
Common API for structured overlays Basic Idea:
First, described a common layer that both DHT and DOLR could be implemented on
Second, looked at applications developed so far See what abstractions can be derived Described what DHT “library” functions might be
Importance: How much has to be exposed to application developers? Any DHT App can be implemented on any DHT
Jeffrey Pang, Carnegie Mellon, NetTalk 46Dec. 5, 2003
Common P2P API Classification
Jeffrey Pang, Carnegie Mellon, NetTalk 47Dec. 5, 2003
Common P2P API: API
void route(key,msg,nodeHint) void forward(key,msg,nextHop) void deliver(key,msg) node[] localLookup(key,num,safeFlag) node[] neighborSet(num) node[] replicaSet(key,maxRank) void update(node,joinedFlag) bool range(node,rank,keyRange)
Jeffrey Pang, Carnegie Mellon, NetTalk 48Dec. 5, 2003
OpenHash Goal:
DHT as a single service multiple apps can use Basic Idea:
Some simple apps only require get/put. Support these “out of box”
App operations can be classified as “endpoint” operators (at root/successor) or “hop-by-hop” operators (on path to root)
Support endpoint operators App specific code lives on nodes “outside” the main DHT Route app specific requests only to nodes that have the app’s code
Argue that don’t need to support hop-by-hop operators Most functionality can be achieved another way
Importance: How one might deploy DHT as an active service Allow people other than academics to deploy these apps?
Jeffrey Pang, Carnegie Mellon, NetTalk 49Dec. 5, 2003
OpenHash ReDir Algorithm
rendezvous points for X find successor for k
Jeffrey Pang, Carnegie Mellon, NetTalk 50Dec. 5, 2003
Conclusion DHT Apps not going away
Are they still struggling to find a purpose? Would any of these apps be better off not on
top of a DHT? Using basic apps to build more complex
ones: CFS, Ivy build on DHash POST, OneRing build on PAST, Scribe SplitStream builds on Scribe
Starting to notice that no one besides researchers using DHTs 3+ years of research... How to make them useful to real people?