47
Haoyuan Li, Tachyon Nexus [email protected] September 22, 2015 @ SDC 2015 A Reliable Memory-Centric Distributed Storage System

A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Embed Size (px)

Citation preview

Page 1: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Haoyuan Li, Tachyon [email protected]

September 22, 2015 @ SDC 2015

A Reliable Memory-Centric Distributed Storage System

Page 2: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

•  Team consists of Tachyon creators, top contributors, people from UC Berkeley, Google, CMU, VMware, Stanford, Facebook, etc.

•  $7.5 million Series A from Andreessen Horowitz

•  Committed to Tachyon Open Source

2

Page 3: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

3

Page 4: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Outline

•  Overview – Motivation – Tachyon Architecture – Using Tachyon

•  Open Source – Status – Production Use Cases

•  Roadmap

4

Page 5: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Outline

•  Overview – Motivation – Tachyon Architecture – Using Tachyon

•  Open Source – Status – Production Use Cases

•  Roadmap

5

Page 6: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Tachyon: Born in UC Berkeley AMPLab

6

Cluster manager Parallel computation framework

Reliable, distributed memory-centric storage system

Page 7: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

7

Why Tachyon?

Page 8: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Memory is Fast

•  RAM throughput increasing exponentially

•  Disk throughput increasing slowly

8 Memory-locality key to interactive response times

Page 9: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Memory is Cheaper

source:  jcmit.com  9

Page 10: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Realized by many…

10

Page 11: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

11

Is the Problem Solved?

Page 12: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

12

Missing a Solution for the Storage Layer

Page 13: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

An Example: -

•  Fast, in-memory data processing framework – Keep one in-memory copy inside JVM – Track lineage of operations used to derive data – Upon failure, use lineage to recompute data

map

filter map

join reduce

Lineage Tracking

13

Page 14: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 1

14

Data Sharing is the bottleneck in analytics pipeline:Slow writes to disk

Spark Job1

Spark mem block manager

block 1

block 3

Spark Job2

Spark mem block manager

block 3

block 1

HDFS / Amazon S3 block 1

block 3

block 2

block 4

storage engine & execution engine same process (slow writes)

Page 15: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 1

15

Spark Job

Spark mem block manager

block 1

block 3

Hadoop MR Job

YARN

HDFS / Amazon S3 block 1

block 3

block 2

block 4

Data Sharing is the bottleneck in analytics pipeline:Slow writes to disk

storage engine & execution engine same process (slow writes)

Page 16: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 2

16

Spark Task

Spark memory block manager

block 1

block 3

HDFS / Amazon S3 block 1

block 3

block 2

block 4

execution engine & storage engine same process

Cache loss when process crashes

Page 17: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 2

17

crash

Spark memory block manager

block 1

block 3

HDFS / Amazon S3 block 1

block 3

block 2

block 4

execution engine & storage engine same process

Cache loss when process crashes

Page 18: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

HDFS / Amazon S3

Issue 2

18

block 1

block 3

block 2

block 4

execution engine & storage engine same process

crash

Cache loss when process crashes

Page 19: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

HDFS / Amazon S3

Issue 3

19

In-memory Data Duplication & Java Garbage Collection

Spark Task1

Spark mem block manager

block 1

block 3

Spark Task2

Spark mem block manager

block 3

block 1

block 1

block 3

block 2

block 4

execution engine & storage engine same process (duplication & GC)

Page 20: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Tachyon

Reliable data sharing at memory-speed within and across

cluster frameworks/jobs

20

Page 21: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Technical Overview

Ideas •  A memory-centric storage architecture •  Push lineage down to storage layer •  Manage tiered storage Facts •  One data copy in memory •  Re-computation for fault-tolerance

21

Page 22: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Eco-System  

22

Page 23: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Tachyon Memory-Centric Architecture

23

Page 24: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Tachyon Memory-Centric Architecture

24

Page 25: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Lineage in Tachyon

25

Page 26: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 1 revisited

26

Memory-speed data sharingamong jobs in different

frameworks execution engine & storage engine same process (fast writes)

Spark Job

Spark mem

Hadoop MR Job

YARN

HDFS / Amazon S3 block 1

block 3

block 2

block 4

HDFS  disk  

block  1  

block  3  

block  2  

block  4  Tachyon"in-memory

block 1

block 3 block 4

Page 27: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

HDFS / Amazon S3 block 1

block 3

block 2

block 4 Tachyon"in-memory

block 1

block 3 block 4

Issue 2 revisited

27

Spark Task

Spark memory block manager

execution engine & storage engine same process

Keep in-memory data safe,even when a job crashes.

Page 28: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 2 revisited

28

HDFS  disk  

block  1  

block  3  

block  2  

block  4  

execution engine & storage engine same process

Tachyon"in-memory

block 1

block 3 block 4

crash

HDFS / Amazon S3 block 1

block 3

block 2

block 4

Keep in-memory data safe,even when a job crashes.

Page 29: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Issue 3 revisited

29

No in-memory data duplication,much less GC

Spark Task

Spark mem

Spark Task

Spark mem

HDFS / Amazon S3 block 1

block 3

block 2

block 4

execution engine & storage engine same process (no duplication & GC)

HDFS  disk  

block  1  

block  3  

block  2  

block  4  Tachyon"in-memory

block 1

block 3 block 4

Page 30: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Comparison with In-Memory HDFS

30

Page 31: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Outline

•  Overview – Motivation – Tachyon Architecture – Using Tachyon

•  Open Source – Status – Production Use Cases

•  Roadmap

31

Page 32: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Open Source Status

•  Started at UC Berkeley AMPLab in Summer 2012

•  Apache License 2.0, Version 0.7.1 (August 2015)

•  Deployed at > 50 companies (July 2014)

•  30+ Companies Contributing

32

Page 33: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Contributors Growth

33

v0.4"Feb ‘14

v0.3"Oct ‘13

v0.2 Apr ‘13

v0.1 Dec ‘12

v0.6"Mar ‘15

v0.5"Jul ‘14

v0.7"Jul ‘15

1 3 15

30

46

70

111

Page 34: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Codebase Growth

34

v0.4"Feb ‘14

v0.3"Oct ‘13

v0.2 Apr ‘13

v0.6"Mar ‘15

v0.5"Jul ‘14

v0.7"Jul ‘15

465commits

696 commits

1080 commits

1610 commits

2884 commits

5021 commits

Page 35: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Thanks to Our Contributors!

35

Page 36: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Reported Tachyon Usage

36

Page 37: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Under Filesystem Choices (Big Data, Cloud, HPC, Enterprise)

37

Page 38: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Use Case: Baidu

•  Framework: SparkSQL •  Under Storage: Baidu’s File System •  Storage Media: MEM + HDD •  100+ nodes deployment •  1PB+ managed space •  30x Performance Improvement

38

Page 39: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Use Case: a SAAS Company

•  Framework: Impala

•  Under Storage: S3

•  Storage Media: MEM + SSD

•  15x Performance Improvement

39

Page 40: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Use Case: an Oil Company

•  Framework: Spark

•  Under Storage: GlusterFS

•  Storage Media: MEM only

•  Analyzing data in traditional storage

40

Page 41: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Use Case: a SAAS Company

•  Framework: Spark

•  Under Storage: S3

•  Storage Media: SSD only

•  Elastic Tachyon deployment

41

Page 42: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Outline

•  Overview – Motivation – Tachyon Architecture – Using Tachyon

•  Open Source – Status – Production Use Cases

•  Roadmap

42

Page 43: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

New Features

•  Lineage in Storage (alpha) •  Tiered Storage (alpha)

43

Page 44: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

New Features

•  Lineage in Storage (alpha) •  Tiered Storage (alpha) •  Data Serving •  Support for New Hardware •  … •  Your New Feature!

44

Page 45: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

45

Tachyon’s Goal?

Page 46: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

Distributed Memory-Centric Storage:Better Assist Other Components

Welcome Collaboration!

46

JIRA New Contributor Tasks

Page 47: A Reliable Memory-Centric Distributed Storage System · PDF fileA Reliable Memory-Centric Distributed Storage System ... – Tachyon Architecture ... Hadoop MR Job YARN HDFS / Amazon

•  Website: http://tachyon-project.org

•  Github: https://github.com/amplab/tachyon

•  Meetup: http://www.meetup.com/Tachyon

•  News Letter Subscription: http://goo.gl/mwB2sX •  Email: [email protected]

47