24
Click to edit Present’s Name Efficient Bootstrapping for Decentralised Shared-nothing Key-value Stores Han Li , Srikumar Venugopal

Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

  • Upload
    han-li

  • View
    675

  • Download
    1

Embed Size (px)

DESCRIPTION

This slide was presented in ACM/IFIP/USENIX Middleware 2013, for the paper of "Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores". Abstract of the paper is shown below. Abstract. Distributed key-value stores (KVSs) have become an important component for data management in cloud applications. Since resources can be provisioned on demand in the cloud, there is a need for efficient node bootstrapping and decommissioning, i.e. to incorporate or eliminate the provisioned resources as a members of the KVS. It requires the data be handed over and the load be shifted across the nodes quickly. However, the data partitioning schemes in the current-state shared nothing KVSs are not efficient in quick bootstrapping. In this paper, we have designed a middleware layer that provides a decentralised scheme of auto-sharding with a two-phase bootstrapping. We experimentally demonstrate that our scheme reduces bootstrap time and improves load-balancing thereby increasing scalability of the KVS.

Citation preview

Page 1: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

Click to edit Present’s Name

Efficient Bootstrapping for Decentralised Shared-nothing Key-value StoresHan Li, Srikumar Venugopal

Page 2: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Agenda

• Motivations for Node Bootstrapping• Research Gap• Challenges and Solutions• Evaluations• Conclusion

Page 3: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

On-demand Provisioning

The Capacity versus Utilisation Curve

Page 4: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Key-value Stores

• The standard component for cloud data management

• Increasing workload Node bootstrapping– Incorporate a new, empty node as a member of KVS

• Decreasing workload Node decommissioning– Eliminate an existing member with redundant data off the KVS

Page 5: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Goals for Efficient Node Bootstrapping

• Minimise the overhead of data movement– How to partition/store data?

• Balance the load at node bootstrapping– Both data volume and workload– How to place/allocate data?

• Maintain data consistency and availability– How to execute data movement?

Page 6: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Background: Storage model

• Shared Storage– Access same storage

• Distributed file systems• Networked attached storage

– E.g. GFS, HDFS– Simply exchange metadata

• Albatross, by S. Das, UCSB

• Shared Nothing– Use individual local storage– Decentralised, peer-to-peer– E.g. Dynamo, Cassandra,

Voldemort, etc.– Require data movement

• Lightweight solutions?

Page 7: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Background: Split-Move Approach

Partition at node bootstrapping

Page 8: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Background: Virtual-Node Approach

Partition at system startup

Data skew: e.g., the majority of data is stored in a minority of partitions.Moving around giant partitions is not a good idea.

Page 9: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Research Gap

• Shared Storage vs. Shared Nothing– Require data movement

• Centralised vs. Decentralised– Require coordination

• Split-Move vs. Virtual-node Based– Partition at node bootstrapping is heavyweight – Partition at system startup causes data skew

• The Gap: A scheme of data partitioning and placement that improves the efficiency of bootstrapping in shared-nothing KVS

Page 10: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Our Solution

• Virtual-node based movement– Each partition of data is stored in separated files – Reduced overhead of data movement– Many existing nodes can participate in bootstrapping

• Automatic sharding– Split and merge partitions at runtime– Each partition stores a bounded volume of data

• Easy to reallocate data• Easy to balance the load

Page 11: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

The timing for data partitioning• Shard partitions at writes (insert and delete)

– Split: Size(Pi) ≤ Θmax

– Merge: Size(Pi) + Size(Pi+1) ≥ Θmin

Θmax ≥ 2Θmin

Avoid oscillation!

Page 12: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Challenge 1: Sharding coordination

• Issues– Totally decentralised– Each partition has multiple replicas– Each replica is split or merged locally

• Question– How to guarantee that all the replicas of certain partition are

simultaneously sharded?

Page 13: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Challenge 1: Sharding coordination

• Solution: Election-based coordination

Page 14: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Challenge 2: Node failover during sharding

Page 15: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Challenge 3: Data consistency during sharding• Use two sets of replicas at sharding

– Original partition and future partition– Data from different partitions is stored separate files

• Approach 1– Write to future partition, roll back at failure– Read from both partitions

• Approach 2– Write to both partitions, abandon future partition at failure – Read from original partition

Page 16: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Challenge 3: Data consistency during movement• Use a pair of tokens for each partition

– A Boolean token to approve and disapprove read/write

Page 17: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Replica Placement at Node Bootstrap

• Partition re-allocation and sharding are mutually exclusive;• Maintain data availability

– Each partition has at least R replicas

• Balance the load (e.g., number of requests)– Heavily loaded nodes have higher priority to “move out” data

• Balance the data– Balance the number of partitions across nodes

• Each partition, via sharding, is of similar size

• Two-phase bootstrap– Phase 1: guarantee R replicas, shift load from heavily loaded nodes– Phase 2: achieve load and data balancing in low-priority threads

Page 18: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Evaluation Setup

• ElasCass: An implemention of auto-sharding, building on Apache Cassandra (version 1.0.5), which uses Split-Move approach.

• Key-value stores: ElasCass vs. Cassandra (v1.0.5)• Test bed: Amazon EC2, m1.large type, 2 CPU cores, 8GB ram• Benchmark: YCSB• System scale: Start from 1 node, with 100GB of data, R=2. Scale up

to 10 nodes.

Page 19: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Evaluation – Bootstrap Time

• In Split-Move, data volume transferred reduces by half from 3 nodes onwards.

• In ElasCass, data volume transferred remains below 10GB from 2 nodes.

• Bootstrap time is determined by data volume transferred. ElasCass exhibits a consistent performance at all scales.

Page 20: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Evaluation – Data Volume

• ElasCass uses two-phase bootstrap. More data is pulled in at phase 2.• Imbalance Index = standard deviation / average. Data is well balanced in ElasCass.• ElasCass occupies less storage space than Split-Move approach.

Page 21: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Evaluation – Query Processing

• ElasCass is scalable, while Split-Move is not.

• Write throughput is higher than read throughput.

• ElasCass has better resources utilisation.

• ElasCass achieves balanced load.

Page 22: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Key Takeaways

• Using virtual nodes introduces less overhead in data movement, and reduces the bootstrap time to below 10 mins.– Apache Cassandra v.1.1 uses virtual nodes

• Consolidating the partitions into bounded ranges simplifies replica placement and facilitates load-balancing– MySQL, MongoDB start to auto-shard partitions

• A balanced load leads to 80% resource utilisation and increasing throughput scalable to #nodes.

Page 23: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Contributions and Acknowledgments

• We have designed and implemented a decentralised auto-sharding scheme that– consolidates each partition replica into single transferable units to

provide efficient data movement;– automatically shards the partitions into bounded ranges to address data

skew;– reduces the time to bootstrap nodes, achieves more balancing load and

better performance of query processing.

• The authors would like to thank Smart Services CRC Pty Ltd for the grant of Services Aggregation project that made this work possible.

Page 24: Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores

School of Computer Science and Engineering

Thank You!