26
ing Database – Distributed Key-Value Databas Nguyễn Quang Nam Zing Web-Technical Team

Zing Database

Embed Size (px)

Citation preview

Page 1: Zing Database

Zing Database – Distributed Key-Value Database

Nguyễn Quang NamZing Web-Technical Team

Page 2: Zing Database

Content

Why

Introduction

Overview architecture

1

3

2

Single Server/Storage4

Distribution5

Page 3: Zing Database

Introduction

Page 4: Zing Database

Some statistics:

- Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used.

- User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache.

- Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache

Page 5: Zing Database

Why

Page 6: Zing Database

Access time

L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 nsMutex lock/unlock 100 nsMain memory reference 100 nsCompress 1K bytes with Zippy 10,000 nsSend 2K bytes over 1 Gbps network 20,000 nsRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsDisk seek 10,000,000 nsRead 1 MB sequentially from network 10,000,000 nsRead 1 MB sequentially from disk 30,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns

by Jeff Dean (http://labs.google.com/people/jeff)

Page 7: Zing Database

Standard & Real Requirement

- Time to load a page < 200 ms- Read data rate ~12K ops/sec- Write data rate ~8K ops/sec- Caching service/Database recovery time < 5 mins

Page 8: Zing Database

Existent thing

- RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB

- Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good

- HBase/Hadoop: We use this for log system

- MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all

Page 9: Zing Database

Overview architecture

Page 10: Zing Database

ZN

onbl

ocki

ngS

erve

r

MODELRequests API

Disk

CommitlogStorage

(W)

ZiDBStorage

(RW)

LocalDatabase

LRU ICache(RW)

Remote Storage

(RW)Remote system

TCP

Transportlayer

Model(Business)

layerStorage

layer

Memory storage

Persistentstorage

Remotestorage

- Load configuration- Create & manage backend storages- Implement business rules

Page 11: Zing Database

Server/Storage

Page 12: Zing Database

ZNonblockingServer

- Based on TNonblockingServer (Apache Thrift)- 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec)- Serialize/Deserialize data- Prevent overload server- Data is not secured while transferring- Protect service from invalid requests

Page 13: Zing Database

ICache

- Least Recently Used/Time based expiration strategy- zlru_table<key_type, value_type>: hash table data structure- Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment- Support dirty-items marking => for lazy DB flush

Page 14: Zing Database

ZiDB

- Separate into DataFile & IndexFile- 1 seek for a read, 1-2 seeks for a write- IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call- Write-ahead log to avoid data loss- Data magic-padding- Checksum & checkpoint for repair data- Partitioning DB for easier maintenance

Page 15: Zing Database

Distribution

Page 16: Zing Database

Key requirements:- Scalability- Load balance- Availability- Consistency

Page 17: Zing Database

2 Models:- Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure- Peer-peer: Each server includes addressing module & storage

2 Types of routing:- Client routing: Each client itself does the addressing and query data - Server routing: The addressing is done at server

Page 18: Zing Database

Operation Flows

Business Logic Server

Addressing Server (DHT)

Storage Layer

Storage Node 1ICache ZiDB Storage

Module

Storage Node NICache ZiDB Storage

Module…

(1) Request key

locations(2)

Key locations(3)

Get & Set operations

(4)Operation

returns

* Addressing module is moved into each storage node in Peer-peer model

Page 19: Zing Database

Addressing:

- Provide key locations of resources- Basically a Distributed Hash Table, using consistent hashing- Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions: - Uniform distribution of generated keys in the key space - Consistency(MD5, SHA are bad choice since performance)

Page 20: Zing Database

Addressing - Node location:

Each node is assigned a continuous range of IDs (hashed key)

Page 21: Zing Database

Addressing - Node location: Golden ratio principle (a/b = 2b/a)

- Init ratio = 1.618- Max ratio ~ 2.6- Easy to implement- Easy for routing from client 2 3

4

5

1

Page 22: Zing Database

Server 1: 1,2,3Server 2: 4,5,6,7Server 3: 8,9

1

47

3

6

25

8

9

Addressing - Node location: Virtual nodes

- Each real server has multiple virtual nodes on ring- More virtual nodes, more balance of load- Hard to maintain table of nodes

Page 23: Zing Database

A

A

A

B

B

CAddressing – Multi-layer rings

- Store the change history of system - Provide availability/reconfigurability- Able to put a node on ring manually

* Write: data is located on the highest ring* Read: data is located on the highest ring, then lower rings if not found

Page 24: Zing Database

Replication & Backup - Each node has one primary range of IDs, and Some secondary range of IDs- Each real node need a backup instance to replace in case it’s down

* Data is queried from primary node, then secondary nodes

Page 25: Zing Database

Configuration: to find the best parameters to configure DB or to choose the suitable DB type.

- How many read/write per second?- Length Deviation of data: data length is same same or much different each others, - Has updation/deletion data? - How important of data: acceptable loss or not- The old data can be recycled?

Page 26: Zing Database

Q & A

Contact:Nguyễn Quang [email protected]://me.zing.vn/nam.nq