Upload
merrill
View
27
Download
0
Embed Size (px)
DESCRIPTION
Sharding and the Isis 2 DHT. Did you understand how the Isis 2 distributed hash table works?. DHT Basics. Suppose we have a group containing 500 members, and decide to store data in shards of size 3. This isn’t going to work: the shard size needs to divide evenly into the group size. - PowerPoint PPT Presentation
Citation preview
Sharding and the Isis2 DHT
Did you understand how the Isis2 distributed hash table works?
DHT Basics
• Suppose we have a group containing 500 members, and decide to store data in shards of size 3.A. This isn’t going to work: the shard size needs to
divide evenly into the group size.B. In Isis2 some shards can be a little too big, or too
small. The value you specify is more of a target
DHT Basics
• Suppose we have a group containing 500 members, and decide to store data in shards of size 3.A. This isn’t going to work: the shard size needs to
divide evenly into the group size.B. In Isis2 some shards can be a little too big, or too
small. The value you specify is more of a target
You do get to specify a minimum size for the group as a whole, below which Isis2
temporarily disables the DHT functionality
DHT Basics
• With a DHT storing dataA. Both Put and Get operations have costs roughly
proportional to the time to do a remote procedure call: one RTT to each participant, issued in parallel
B. Like other DHTs, the Isis2 has costs proportional to the log of the group size. This relates to needing to route requests in a binary search manner: half way, then a quarter way, etc.
DHT Basics
• With a DHT storing dataA. Both Put and Get operations have costs roughly
proportional to the time to do a remote procedure call: one RTT to each participant, issued in parallel
B. Like other DHTs, the Isis2 has costs proportional to the log of the group size. This relates to needing to route requests in a binary search manner: half way, then a quarter way, etc.
Isis2 offers a so-called “1-hop” DHT. No indirect routing occurs and none of these log(N) delays arise in this approach. Indirect routing is
perceived as a problem with many other DHTs, like Chorus or Pastry, but doesn’t apply to the Isis2 version
DHT Basics
• Within a group, shard membershipA. Counts off by rank: 0… NS-1, 0… NS-1, etc (where NS
is the number of shards: the group size divided by the target shard size)
B. The first shard will be on left and includes members 0..S-1, where S is the shard size. The second shard will include members S..(2*S-1)
C. Membership is pretty random: you need to hash the address of the member onto a ring and then the shard is that member and the next S-1 along the edge
DHT Basics
• Within a group, shard membershipA. Counts off by rank: 0… NS-1, 0… NS-1, etc (where NS
is the number of shards: the group size divided by the target shard size)
B. The first shard will be on left and includes members 0..S-1, where S is the shard size. The second shard will include members S..(2*S-1)
C. Membership is pretty random: you need to hash the address of the member onto a ring and then the shard is that member and the next S-1 along the edge
We use this scheme because it has low cost when a failure occurs.
We ruled this approach out because “churn” after a failure is too expensive: large numbers of members might need to be reinitialized.
This is how Chord and Pastry work, but not the way that the Isis2 DHT works.
When inserting an item…
The key is first hashed by computing the hashcode modulo the number of shards. This gives the desired shard number. Then…A. A multicast is sent to just the shard members.B. The value is multicast to the entire group and
those members that are in the matching shard retain it. Others ignore the multicast.
When inserting an item…
The key is first hashed by computing the hashcode modulo the number of shards. This gives the desired shard number. Then…A. A multicast is sent to just the shard members.B. The value is multicast to the entire group and
those members that are in the matching shard retain it. Others ignore the multicast.
When inserting a list of items
The list of keys is first hashed by computing the hashcode item by item modulo the number of shards. This yields a list of shards.A. Now each item is inserted in a separate
parallel action, independently.B. Now all items are inserted using a single
multicast that goes to only the full set of shard members in the list
When inserting a list of items
The list of keys is first hashed by computing the hashcode item by item modulo the number of shards. This yields a list of shards.A. Now each item is inserted in a separate
parallel action, independently.B. Now all items are inserted using a single
multicast that goes to only the full set of shard members in the list
By using a single multicast, we guarantee all-or-nothing behavior. But this is a special protocol that only reaches the subset of
members that are in the target shards. Moreover, it is an exceptionally fast protocol, like a parallel unicast.
Consistency
To obtain strong consistency guarantees from the Isis2 DHTA. The insert and get should be done using the
DHTOrderedPut and DHTOrderedGet methodsB. An OrderedSend must be done targetting the
entire groupC. The Isis2 system doesn’t offer consistency for
DHT operations
Consistency
To obtain strong consistency guarantees from the Isis2 DHTA. The insert and get should be done using the
DHTOrderedPut and DHTOrderedGet methodsB. An OrderedSend must be done targetting the
entire groupC. The Isis2 system doesn’t offer consistency for
DHT operations
DHT Fault Tolerance
• True or False. The Isis2 DHT automatically masks faults, retrieving data redundantly in DHTGet and deduplicating to return just one key-value pair for any given key.
DHT Fault Tolerance
• True. If DHTGet or DHTOrderedGet lack a response from some participant, the Isis2 DHT automatically retries.
An exception is thrown if an entire shard fails, since that would prevent the system from getting even a single response for keys mapped to that shard.
When using OrderedGet, the entire OrderedGet will be reissued if necessary, to ensure that the responses are collected along a consistent cut.