55
Scaling and High Performance Storage System Yosuke Hara - @yosukehara A Researcher of R.I.T. and Tech Lead LeoFS with Hiroki Matsue, LeoFS Support and Rakuten Software Engineer

Scaling and High Performance Storage System: LeoFS

Embed Size (px)

DESCRIPTION

LeoFS is an Unstructured Object Storage for the Web and a highly available, distributed, eventually consistent storage system.

Citation preview

Page 1: Scaling and High Performance Storage System: LeoFS

Scaling and High Performance Storage System

Yosuke Hara - @yosukehara A Researcher of R.I.T. and Tech Lead LeoFS with Hiroki Matsue, LeoFS Support and Rakuten Software Engineer

Page 2: Scaling and High Performance Storage System: LeoFS

LeoFS is "Unstructured Big Data Storage for the Web" and a highly available, distributed, eventually consistent storage system. !Organizations can use LeoFS to store lots of data efficently, safely and inexpensively. !

LeoFS was published as OSS on July of 2012

leo-project.net/leofs

Page 3: Scaling and High Performance Storage System: LeoFS

Overview

Brief Benchmark Report

Multi Data Center Replication

LeoFS Administration at Rakuten

Future Plans “NFS” Support and more

Page 4: Scaling and High Performance Storage System: LeoFS

Overview

Tokyo, Japan

Page 5: Scaling and High Performance Storage System: LeoFS

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

The Lion of Storage Systems

HIGH Availability

HIGH Cost Performance Ratio

HIGH Scalability

LeoFS Non Stop

Velocity: Low LatencyMinimum Resources

Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data

3 Vs in 3 HIGHs

Page 6: Scaling and High Performance Storage System: LeoFS

Metadata Object Storage

Storage Engine/Router

Monitor

GUI Console

( Erlang RPC)

LeoFS Overview

Storage

Manager

( Erlang RPC)

Gateway

( TCP/IP,SNMP )

Request from Web Applications / Browsers

w/HTTP over REST-API / S3-API

Load Balancer

Keeping High Availability Keeping High Performance Easy Administration

Metadata Object Storage

Storage Engine/Router

Metadata Object Storage

Storage Engine/Router

Page 7: Scaling and High Performance Storage System: LeoFS

Gateway

Page 8: Scaling and High Performance Storage System: LeoFS

LeoFS Overview - Gateway

Stateless Proxy + Object Cache

REST-API / S3-API

Use Consistent Hashing for decision of a primary node

[ Memory Cache, Disc Cache ]

Storage C

lusterG

ateway(s)

Clients

HTTP Request and Response Built in Object Cache Mechanism

Storage Cluster

Fast HTTP Server - Cowboy API Handler Object Cache Mechanism

Page 9: Scaling and High Performance Storage System: LeoFS

Storage

Page 10: Scaling and High Performance Storage System: LeoFS

Storage (S

torage Cluster)

Gatew

ay

LeoFS Overview - Storage

Use "Consistent Hashing" for Data Operation

in the Storage Cluster

Choosing Replica Target Node(s)

RING2 ^ 128 (MD5)

# of replicas = 3

KEY = “bucket/leofs.key”Hash = md5(Filename)

Secondary-1

Secondary-2

Primary Node

"P2P"

WRITE: Auto Replication READ : Auto Repair of an Inconsistent Object with Async

Page 11: Scaling and High Performance Storage System: LeoFS

Request From Gateway

LeoFS Overview - Storage

...

LeoFS Storage

Replicator Recoverer

...

Storage Engine

Storage E

ngine, Metadata + O

bject Storage

Gatew

ay

Storage consists of Object Storage and Metadata Storage Includes Replicator and Recoverer for the eventual consistency

Metadata Storage Object

Storage

Page 12: Scaling and High Performance Storage System: LeoFS

LeoFS Overview - Storage - Data Structure

Metadata

Storage

Object S

torage

Robust and High Performance Necessary for GC

Offset Version Time-stamp Key

<Metadata>

Checksum

for Sync

KeySize CustomMeta Size File Size

for retrieving an object

Footer (8B)

Checksum KeySize DataSize Offset Version Time-stamp Key User-Meta Footer

Header (Metadata - Fixed length) Body (Variable Length)

User-MetaSize

ActualFile

<Needle>

Supe

r-bl

ock

Nee

dle-

1

Nee

dle-

2

Nee

dle-

3

<Object Container>N

eedl

e-4

Nee

dle-

5

Page 13: Scaling and High Performance Storage System: LeoFS

To Equalize Disk Usage in Every Storage NodeTo Realize High I/O efficiency and High Availability

LeoFS Overview - Storage - Large Object Support

chunk-0

chunk-1

chunk-2

chunk-3

An Original Object’s Metadata

Original Object Name Original Object Size # of Chunks

Storage ClusterGatewayClient(s)

[ WRITE Operation ]

Chunked Objects

Every chunked object and metadata are replicated

in the cluster

Page 14: Scaling and High Performance Storage System: LeoFS

Manager

Page 15: Scaling and High Performance Storage System: LeoFS

Storage Cluster

LeoFS Overview - Manager

Monitor

Operate

RING, Node State

status, suspend, resume, detach, whereis, ...

Gateway(s)

Storage C

lusterG

ateway(s)

Manager(s)

Operate LeoFS - Gateway and Storage Cluster "RING Monitor" and "NodeState Monitor"

Page 16: Scaling and High Performance Storage System: LeoFS

Brief Benchmark Report

Hokkaido, Japan

Page 17: Scaling and High Performance Storage System: LeoFS

LeoFS kept in a stable performance through the benchmark

Brief Benchmark Report

Bottleneck is Disk I/O

The cache mechanism contributed to reduce network traffic between Gateway and Storage

Summary of the benchmark results

Page 18: Scaling and High Performance Storage System: LeoFS

Brief Benchmark Report

1st Case: Group of Value Ranges (HDD) Storage:5, Gateway:1, Manager:2 R:W = 9:1

2nd Case: Group of Value Ranges (HDD) Storage:5, Gateway:1, Manager:2 R:W = 8:2

source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r9w1_240min

source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r8w2_120min

Page 19: Scaling and High Performance Storage System: LeoFS

Brief Benchmark Report

CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads)

Memory 96GBDisk HDD - 240GB RAID0

Network 10G-Ether

Server Spec - Gateway:

CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads)

Memory 96GB

DiskHDD - 240GB RAID0 (System)

HDD - 2TB RAID0 (Data)

Network 10G-Ether

Server Spec - Storage x5:

Page 20: Scaling and High Performance Storage System: LeoFS

Network 10GbpsOS CentOS release 6.5 (Final)

Erlang OTP R16B03-1LeoFS v1.0.2

Environment:

System Consistency Level: [ N:3, W:2, R:1, D:2 ]

Duration 4.0hR:W 9:1

# of Concurrent Processes 64

# of Keys 100,000

Value Size

!!!!!

Benchmark Configuration:

Range (byte) Percentage

1024 10240 24%

10241 102400 30%

10241 819200 30%

819201 1572864 16%

Brief Benchmark Report - 1st Case (HDD / R:W=9:1)

Page 21: Scaling and High Performance Storage System: LeoFS

source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140601/tests/1m_r9w1_240min

50ms

Brief Benchmark Report - 1st Case (HDD / R:W=9:1)

50ms

1,500ops

No Errors

Page 22: Scaling and High Performance Storage System: LeoFS

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

450,000

500,000

550,000

600,000

650,000

700,000

750,000

800,000

850,000

900,000

950,000

1,000,000

1,050,000

1,100,000

1,150,000

1,200,000

1,250,000

1,300,000

1,350,000

1,400,000

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

7500s

8000s

8500s

9000s

9500s

10000s

10500s

11000s

11500s

12000s

12500s

13000s

13500s

14000s

gateway rxbyt/s gateway txbyt/sstorage-1 rxbyt/s storage-1 txbyt/sstorage-2 rxbyt/s storage-2 txbyt/sstorage-3 rxbyt/s storage-3 txbyt/sstorage-4 rxbyt/s storage-4 txbyt/sstorage-5 rxbyt/s storage-5 txbyt/s

Brief Benchmark Report - 1st Case / Network Traffic

10.0Gbps

7.0Gbps

5.0Gbps

6.0Gbps

StorageG

ateway

60%

Page 23: Scaling and High Performance Storage System: LeoFS

0.00.10.30.40.60.70.91.01.11.31.41.61.71.92.0

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

7500s

8000s

8500s

9000s

9500s

10000s

10500s

11000s

11500s

12000s

12500s

13000s

13500s

14000s

Memory Usage

CPU Load 5min

Brief Benchmark Report - 1st Case / Memory and CPU

1.0

0

10

20

30

40

50

60

70

80

90

100

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

7500s

8000s

8500s

9000s

9500s

10000s

10500s

11000s

11500s

12000s

12500s

13000s

13500s

14000s

gateway storage-1 storage-2 storage-3 storage-4 storage-5

Page 24: Scaling and High Performance Storage System: LeoFS

Network 10GbpsOS CentOS release 6.5 (Final)

Erlang OTP R16B03-1LeoFS v1.0.2

Environment:

System Consistency Level: [ N:3, W:2, R:1, D:2 ]

Duration 2.0hR:W 8:2

# of Concurrent Processes 64

# of Keys 100,000

Value Size

!!!!!

Benchmark Configuration:

Brief Benchmark Report - 2nd Case (HDD / R:W=8:2)

Range (byte) Percentage

1024 10240 24%

10241 102400 30%

10241 819200 30%

819201 1572864 16%

Page 25: Scaling and High Performance Storage System: LeoFS

Brief Benchmark Report - 2nd Case (HDD / R:W=8:2)

60-70ms 80-90ms

1,000ops

No Errors

Page 26: Scaling and High Performance Storage System: LeoFS

Compare 1st case with 2nd case

Page 27: Scaling and High Performance Storage System: LeoFS

0100,000200,000300,000400,000500,000600,000700,000800,000900,000

1,000,0001,100,0001,200,0001,300,0001,400,000

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000sgateway rxbyt/s gateway txbyt/sstorage-1 rxbyt/s storage-1 txbyt/sstorage-2 rxbyt/s storage-2 txbyt/sstorage-3 rxbyt/s storage-3 txbyt/sstorage-4 rxbyt/s storage-4 txbyt/sstorage-5 rxbyt/s storage-5 txbyt/s

0100,000200,000300,000400,000500,000600,000700,000800,000900,000

1,000,0001,100,0001,200,0001,300,0001,400,0001,500,0001,600,0001,700,0001,800,0001,900,0002,000,0002,100,0002,200,000

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

6.0Gbps

Brief Benchmark Report7.0Gbps

6.0Gbps7.0Gbps

minus 0.7Gbps

1st Case - Network Traffic

2nd Case - Network Traffic

Page 28: Scaling and High Performance Storage System: LeoFS

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

550.0

600.0

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

400.0

450.0

500.0

550.0

600.0

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

storage-1storage-2storage-3storage-4storage-5

100

100

Brief Benchmark Report

2nd Case - Disk util%

200

200

1st Case - Disk util%

1.8x high

Page 29: Scaling and High Performance Storage System: LeoFS

0.00.20.40.60.81.01.21.41.61.82.02.22.42.62.83.0

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000sBrief Benchmark Report

0.00.20.40.60.81.01.21.41.61.82.02.22.42.62.83.0

0s 500s

1000s

1500s

2000s

2500s

3000s

3500s

4000s

4500s

5000s

5500s

6000s

6500s

7000s

1.00

1.00

1.6x high2nd Case - CPU Load 5min

1st Case - CPU Load 5min

Page 30: Scaling and High Performance Storage System: LeoFS

LeoFS kept in a stable performance through the benchmark

Brief Benchmark Report

Bottleneck is Disk I/O

The cache mechanism contributed to reduce network traffic between Gateway and Storage

Conclusion:

Page 31: Scaling and High Performance Storage System: LeoFS

Multi Data Center Replication

Hokkaido, Japan

Page 32: Scaling and High Performance Storage System: LeoFS

TokyoEurope

US

Multi Data Center Replication

HIGH-ScalabilityHIGH-Availability

Easy Operation for Admins+

NO SPOF

NO Performance Degradation

Singapore

Page 33: Scaling and High Performance Storage System: LeoFS

1. Easy Operation to build multi clusters.

2. Asynchronous data replication between clusters

Stacked data is transferred to remote cluster(s)

3. Eventual consistency

Multi Data Center Replication

Designed it as simple as possible

Page 34: Scaling and High Performance Storage System: LeoFS

DC-3DC-2

Storage cluster

Manager cluster

Client

DC-1

Monitors and Replicates each “RING” and “System Configuration”

"Leo Storage Platform"

[# of replicas:1] [# of replicas:1][# of replicas:3]

"join cluster DC-2 and DC-3"

leo_rpcleo_rpc

Multi Data Center Replication

Executing “Join Cluster” on Manager Console

Preparing MDC Replication

Page 35: Scaling and High Performance Storage System: LeoFS

DC-3DC-2

Storage cluster

Manager cluster

Client

Monitors and Replicates each “RING” and “System Configuration”

"Leo Storage Platform"

[# of replicas:1] [# of replicas:1]

Request to the Target Region

Application(s)

DC-1

[# of replicas:3]

Temporally Stacking objects - One container's capacity is *32MB - When capacity is full,

send it to remote cluster(s)* 32MB: default capacity - able to set optional value

leo_rpcleo_rpc

Multi Data Center Replication

Stacking objects

Page 36: Scaling and High Performance Storage System: LeoFS

DC-3DC-2

Storage cluster

Manager cluster

Client

Monitors and Replicates each “RING” and “System Configuration”

"Leo Storage Platform"

DC-1

Stacked an object with a metadata

Compress it with LZ4

Replicated an object

Request to the Target Region

Application(s)

leo_rpc

leo_rpcleo_rpc

Multi Data Center Replication

Transferring stacked objects

Stacked objects

Page 37: Scaling and High Performance Storage System: LeoFS

DC-3DC-2

Storage cluster

Manager cluster

Client

Monitor and Replicate each “RING” and “System Configuration”

"Leo Storage Platform"

Request to the Target Region

Application(s)

DC-1

1) Receive metadata of stored objects 2) Compare them at the local cluster 3) Fix inconsistent objects

leo_rpcleo_rpc

leo_rpcleo_rpc

Multi Data Center Replication

Investigating stored objects

Page 38: Scaling and High Performance Storage System: LeoFS

LeoFS Administration at Rakuten

Presented by Hiroki Matsue Rakuten Software Engineer

Tokyo, JapanKyoto, Japan

Page 39: Scaling and High Performance Storage System: LeoFS

Storage Platform File Sharing Service Others Portal Site

Photo Storage

Background Storage of OpenStack

LeoFS Administration at Rakuten

Page 40: Scaling and High Performance Storage System: LeoFS

Storage Platform

Page 41: Scaling and High Performance Storage System: LeoFS

Storage Platform - Scaling the Storage Platform

(Movie)

Reduce Costs High Reliability Easy to Scale S3-API

Page 42: Scaling and High Performance Storage System: LeoFS

Using Various Services Total Usage: 450TB # of Files: 600Million Daily Growth: 100GB Daily Reqs: 13Million

Storage Platform - Scaling the Storage Platform

E-Commerce

Blog

Insurance Calendar

Recruiting

Review Photoshare

Portal &Contents

Bookmark

B

Storage Platform

(Movie)

Page 43: Scaling and High Performance Storage System: LeoFS

Monitor

GUI Console

( Erlang RPC)

( Erlang RPC) ( TCP/IP,SNMP )

Gatew

ay x 4Storage x 14

Manager x 2

Requests from Web Applications / Browsers

w/HTTP over S3-API

Load Balancer / Cache Servers

Storage Platform - System LayoutTotal disk space: 600TB Number of Files: 600Million Access Stats: 800Mbps (MAX) 400Mbps (AVG)

Page 44: Scaling and High Performance Storage System: LeoFS

Monitor

GUI Console

( Erlang RPC)

( Erlang RPC) ( TCP/IP,SNMP )

Gatew

ay x 4Storage x 14

Manager x 2

Storage Platform - Monitor

Send Mail Alert

Ganglia and Nagios Agent

Status Collection (Ganglia) Status Check (Nagios) Port + Threshold Check

Page 45: Scaling and High Performance Storage System: LeoFS

Storage Platform - Spreading Globally

Covering All Services with Multi DC Replication

Page 46: Scaling and High Performance Storage System: LeoFS

File Sharing Service

+https://owncloud.com/

Page 47: Scaling and High Performance Storage System: LeoFS

+

File Sharing Service - Required Targets

Reduce Costs Handle Confidential Files

Store Large Files Scale Easily

Page 48: Scaling and High Performance Storage System: LeoFS

+

Share Docs and Movies with Group Companies Over 20 Companies, Over 10 Countries

Over 4,000 Users, Over 10,000 Teams

File Sharing Service - Usage

Page 49: Scaling and High Performance Storage System: LeoFS

LDAP

Monitor

GUI Console

( Erlang RPC)

( Erlang RPC) ( TCP/IP,SNMP )

Manager x 2

Authenticate Users

Manage Configurations

Manage Login Session (KVS)

File Sharing Service - System Layout

Web GUI File Browser

Page 50: Scaling and High Performance Storage System: LeoFS

Cover 25 Countries/Regions Over 20,000 Users

+

File Sharing Service - Future Plans

Page 51: Scaling and High Performance Storage System: LeoFS

Empowering the Services and the Users Through the Cloud Storage

Page 52: Scaling and High Performance Storage System: LeoFS

Future Plans

Tokyo, JapanHokkaido, Japan

Page 53: Scaling and High Performance Storage System: LeoFS

NFS SupportFuture Plans

Data-HUB: Centralize unstructured data in LeoFS

Search / AnalysisPaaS / IaaS Photo-Storage

Many Kind of Data PhotoLog / Event Data

Loading Data

Analysis Data

Stream Processing

Page 54: Scaling and High Performance Storage System: LeoFS

SavannaDB for Statistics Data

Retrieve metrics and stats from SavannaDB's Agents

Storage Cluster

ManagerGateway

The Lion of Storage Systems

REST-API (JSON)

Operate LeoFS

Notify a message of over # of req threshold

SavannaDB's Agent Insight LeoFS

LeoInsight

Future Plans

+

Page 55: Scaling and High Performance Storage System: LeoFS

Set Sail for “Cloud Storage”Website: leo-project.net Twitter: @LeoFastStorage Facebook: www.facebook.com/org.leofs