BlobSeer in NoSQL world

2

BlobSeer: Architecture

Clients Perform fine grain blob accesses

Providers Store the pages of the blob

Provider manager Monitors the providers Favours data load balancing

Metadata providers Store information about page location

Version manager Ensures concurrency control

Clients

Providers

Metadata providers

Provider manager

Version manager

3

BlobSeer: What may be refined

Hotspots/fault-tolerance Fixed single version manager Fixed provider manager

Load balancing Version manager, provider manager may become hotspots Fixed metadata providers

4

BlobSeer: What I am thinking of

5

Background: Lighting-weigh DHT(may not correct) Using consistent hashing to hash distribute keys

Load balancing Fault tolerance Elasticity

Lookup cost: O(1) Base on Gossip overlay (borrowed from NoSQL world) Or base on Kelips P2P prototype (I have just know about it) Given a key, node know the destination exactly in most cases Overhead: OK ref. NoSQL world (Facebook Cassandra, Amazon Dynamo,

Voldermort)

I will try solving my given problems by building BlobSeer on top of this DHT

6

Distributed version managers

Distributed version managers: A 2 levels Splitting BLOB_ID namespace

DHT-based Fortunately, blob is independent from each other Hash (BLOB_ID) => ID of version manager server

Splitting version ID’s space per BLOB Easily Rely on DHT replication Hash (BLOB_ID) => {neighbouring version managers}

Lookup cost = O(1), equally to BlobSeer

7

Concurrent writing/appending need to be serialized On master Blob.getlatest() Blob.write() Blob.append()

Access to history versions Randomly on {master, slaves} Blob.read() Blob.getsize() Ask Master only in case of necessary

Master periodically PUTS OR Slaves PULL versions to do serialization Version info is quite tiny

8

Eliminate the provider manager Provider manager keeps cluster state to answer clients’ requests

Lookup costs O(1)

Providers can learn themselves about the system state Load and Load balancing?? Lookup costs O(1) Use the presented DHT overlay to propagate providers’ states

Gossip-based (limited in cluster size around 1000 but it is still good) Or a lighting version of P2P overlay (E.g. Kelips) Hotspot when increasing number of clients, providers

Client randomly asks any providers

9

However !!!

We will not want to use consistent hashing

10

Architecture

Version managers, metadata managers, providers, clients

DHT with consistent hashing

Distributed membership management

Gossip based

Zookeeper (like Google’s chubby)

Replication, fault tolerance, leader election

11

Access scenarios

Reading Hash blobID to know its associated version manager Go down the metadata tree Access providers O(1) for any step and equal to the current BlobSeer design

Writing The same as in BlobSeer but no provider manager

12

Overview of the implementation

Gossip based DHT

We need 3 hash namespaces Version managers Metadata providers Providers

Elasticity Is inherent if we use consistent hashing for DHT

Fault-tolerance DHT based

Load balancing DHT based

13

Advantages

Still keeping the current nice features of BlobSeer

Monolithic-based design Node provides all capabilities as a client, a version manager, a metadata

manager and a provider Simpler/easier for configuration/deployment (autonomic feature?)

Load balancing

Fault tolerance

Elasticity

Compare to NoSQL key/value store Efficient one key/ a value of TB size (versioning, throughput)

14

Some more discussions

If client is outside of BlobSeer storage cloud, client randomly chooses one node to communicate. Node is as a proxy server (Cassandra)

We may need a small number of version manager, metadata managers Leader election (can base on Apache Zookeeper) If we fix them, we will reduce overhead at DHT level

BlobSeer cloud

Client

15

BlobSeer in NoSQL paradigm

Document stores

Column stores

16

{pages} distribution

BlobSeer’s approach Distribute {pages} over different providers {pages} are mapped to physical addresses of providers directly

DHT’s approach DHT is used only to know how has {pages} but not to route {pages} Must find a good way: {pages} of single write should be distributed over

different providers? [YES or NO] Hopefully, page keys are picked by client in BlobSeer

DHT load balancing DHT fault-tolerance Lookup cost: O(1)

17

Eliminate the provider manager Provider manager keeps cluster state to answer clients’ requests

Lookup costs O(1) Hotspot when increasing number of clients, providers

Providers can learn themselves about the system state Lookup costs O(1) Use the presented DHT overlay to propagate providers’ states

Gossip-based (limited in cluster size around 1000 but it is still good) Or a lighting version of P2P overlay (E.g. Kelips)

Need a good way to distribute {pages} of each separated write operation over DHT?

BlobSeer’s approach DHT’s approach

Technology

BlobSeer in NoSQL world