15
Sharding and the Isis 2 DHT Did you understand how the Isis 2 distributed hash table works?

Sharding and the Isis 2 DHT

  • Upload
    merrill

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Sharding and the Isis 2 DHT. Did you understand how the Isis 2 distributed hash table works?. DHT Basics. Suppose we have a group containing 500 members, and decide to store data in shards of size 3. This isn’t going to work: the shard size needs to divide evenly into the group size. - PowerPoint PPT Presentation

Citation preview

Page 1: Sharding  and the Isis 2  DHT

Sharding and the Isis2 DHT

Did you understand how the Isis2 distributed hash table works?

Page 2: Sharding  and the Isis 2  DHT

DHT Basics

• Suppose we have a group containing 500 members, and decide to store data in shards of size 3.A. This isn’t going to work: the shard size needs to

divide evenly into the group size.B. In Isis2 some shards can be a little too big, or too

small. The value you specify is more of a target

Page 3: Sharding  and the Isis 2  DHT

DHT Basics

• Suppose we have a group containing 500 members, and decide to store data in shards of size 3.A. This isn’t going to work: the shard size needs to

divide evenly into the group size.B. In Isis2 some shards can be a little too big, or too

small. The value you specify is more of a target

You do get to specify a minimum size for the group as a whole, below which Isis2

temporarily disables the DHT functionality

Page 4: Sharding  and the Isis 2  DHT

DHT Basics

• With a DHT storing dataA. Both Put and Get operations have costs roughly

proportional to the time to do a remote procedure call: one RTT to each participant, issued in parallel

B. Like other DHTs, the Isis2 has costs proportional to the log of the group size. This relates to needing to route requests in a binary search manner: half way, then a quarter way, etc.

Page 5: Sharding  and the Isis 2  DHT

DHT Basics

• With a DHT storing dataA. Both Put and Get operations have costs roughly

proportional to the time to do a remote procedure call: one RTT to each participant, issued in parallel

B. Like other DHTs, the Isis2 has costs proportional to the log of the group size. This relates to needing to route requests in a binary search manner: half way, then a quarter way, etc.

Isis2 offers a so-called “1-hop” DHT. No indirect routing occurs and none of these log(N) delays arise in this approach. Indirect routing is

perceived as a problem with many other DHTs, like Chorus or Pastry, but doesn’t apply to the Isis2 version

Page 6: Sharding  and the Isis 2  DHT

DHT Basics

• Within a group, shard membershipA. Counts off by rank: 0… NS-1, 0… NS-1, etc (where NS

is the number of shards: the group size divided by the target shard size)

B. The first shard will be on left and includes members 0..S-1, where S is the shard size. The second shard will include members S..(2*S-1)

C. Membership is pretty random: you need to hash the address of the member onto a ring and then the shard is that member and the next S-1 along the edge

Page 7: Sharding  and the Isis 2  DHT

DHT Basics

• Within a group, shard membershipA. Counts off by rank: 0… NS-1, 0… NS-1, etc (where NS

is the number of shards: the group size divided by the target shard size)

B. The first shard will be on left and includes members 0..S-1, where S is the shard size. The second shard will include members S..(2*S-1)

C. Membership is pretty random: you need to hash the address of the member onto a ring and then the shard is that member and the next S-1 along the edge

We use this scheme because it has low cost when a failure occurs.

We ruled this approach out because “churn” after a failure is too expensive: large numbers of members might need to be reinitialized.

This is how Chord and Pastry work, but not the way that the Isis2 DHT works.

Page 8: Sharding  and the Isis 2  DHT

When inserting an item…

The key is first hashed by computing the hashcode modulo the number of shards. This gives the desired shard number. Then…A. A multicast is sent to just the shard members.B. The value is multicast to the entire group and

those members that are in the matching shard retain it. Others ignore the multicast.

Page 9: Sharding  and the Isis 2  DHT

When inserting an item…

The key is first hashed by computing the hashcode modulo the number of shards. This gives the desired shard number. Then…A. A multicast is sent to just the shard members.B. The value is multicast to the entire group and

those members that are in the matching shard retain it. Others ignore the multicast.

Page 10: Sharding  and the Isis 2  DHT

When inserting a list of items

The list of keys is first hashed by computing the hashcode item by item modulo the number of shards. This yields a list of shards.A. Now each item is inserted in a separate

parallel action, independently.B. Now all items are inserted using a single

multicast that goes to only the full set of shard members in the list

Page 11: Sharding  and the Isis 2  DHT

When inserting a list of items

The list of keys is first hashed by computing the hashcode item by item modulo the number of shards. This yields a list of shards.A. Now each item is inserted in a separate

parallel action, independently.B. Now all items are inserted using a single

multicast that goes to only the full set of shard members in the list

By using a single multicast, we guarantee all-or-nothing behavior. But this is a special protocol that only reaches the subset of

members that are in the target shards. Moreover, it is an exceptionally fast protocol, like a parallel unicast.

Page 12: Sharding  and the Isis 2  DHT

Consistency

To obtain strong consistency guarantees from the Isis2 DHTA. The insert and get should be done using the

DHTOrderedPut and DHTOrderedGet methodsB. An OrderedSend must be done targetting the

entire groupC. The Isis2 system doesn’t offer consistency for

DHT operations

Page 13: Sharding  and the Isis 2  DHT

Consistency

To obtain strong consistency guarantees from the Isis2 DHTA. The insert and get should be done using the

DHTOrderedPut and DHTOrderedGet methodsB. An OrderedSend must be done targetting the

entire groupC. The Isis2 system doesn’t offer consistency for

DHT operations

Page 14: Sharding  and the Isis 2  DHT

DHT Fault Tolerance

• True or False. The Isis2 DHT automatically masks faults, retrieving data redundantly in DHTGet and deduplicating to return just one key-value pair for any given key.

Page 15: Sharding  and the Isis 2  DHT

DHT Fault Tolerance

• True. If DHTGet or DHTOrderedGet lack a response from some participant, the Isis2 DHT automatically retries.

An exception is thrown if an entire shard fails, since that would prevent the system from getting even a single response for keys mapped to that shard.

When using OrderedGet, the entire OrderedGet will be reissued if necessary, to ensure that the responses are collected along a consistent cut.