32
A Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani Bigtable 1

Bigtable

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Bigtable

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

1

Page 2: Bigtable

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

2

Page 3: Bigtable

Bigtable

A distributed storage system ..

• .. for managing structured data.

• Used for demanding workloads, such as:

– Throughput oriented batch processing.

– Serving latency-sensitive data to the client.

• Dynamic control instead of relational model.

• Data locality properties (revisit later briefly).

3

Page 4: Bigtable

Bigtable

Bigtable has achieved several goals

• Wide applicability: used for 60+ Google products, including:

– Google Analytics, Google Code, Google Earth, Google Maps and Gmail.

• Scalability (explain later under evaluation).

• High performance.

• High availability.

4

Page 5: Bigtable

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

5

Page 6: Bigtable

Bigtable

Data model

• Essentially a sparse, distributed, persistent multi-dimensional sorted map.

• The map is indexed by a row key, column key and a timestamp.

• Atomic reads and writes over a single row.

6

Columns

Rows

Page 7: Bigtable

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

7

Page 8: Bigtable

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

8

Enables reasoning about data locality

Page 9: Bigtable

9

Columns

Rows

Page 10: Bigtable

10

Columns

Rows

Anchor is a column family

Page 11: Bigtable

11

Columns

“com.bbc.www”

“anchor:bbcworld.com “anchor:weather

“BBC” “BBC.com”

Tablets

Page 12: Bigtable

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

12

Page 13: Bigtable

Bigtable

Bigtable uses several other technologies

• Google File System to store log and data files.

• SSTable file format to store BigTable data.

• Chubby, a distributed lock service.

For more details on these technologies, refer to section 4 of the paper.

13

Page 14: Bigtable

Bigtable

Implementation

14

Master responsibilities: -Assign tablets to tablet servers -Add/delete tablet servers -Balance tablet server load -GC -Schema changes

MASTER

TABLET SERVERS

INTERNET

CLIENT

Communicate directly to tablet servers

Page 15: Bigtable

Bigtable

How data is stored?

15

A three-level hierarchy, similar to B+ trees.

Page 16: Bigtable

Bigtable

Location hierarchy

16

Chubby file contains location of the root tablet.

Page 17: Bigtable

Bigtable

Location hierarchy

17

Root tablet contains all tablet locations in Metadata

table.

Page 18: Bigtable

Bigtable

Location hierarchy

18

Metadata table stores locations of actual tablets.

Page 19: Bigtable

Bigtable

Location hierarchy

19

Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect.

Page 20: Bigtable

Bigtable

How data is served?

20

Page 21: Bigtable

Bigtable

Tablet serving

21

Persistent

Page 22: Bigtable

Bigtable

Tablet serving

22

Compactions occur regularly, advantages: -Shrinks memory usage. -Reduces amount of data read from log during recovery.

Compactions

Page 23: Bigtable

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

23

Page 24: Bigtable

Bigtable

Benchmarks for perf evaluation

• Scan:

– Scans over values in a row range.

• Random reads from memory.

• Random reads/writes:

– R keys to be read/written spread over N clients.

• Sequential reads/writes:

– 0 to R-1 keys to be read/written spread over N clients.

24

Page 25: Bigtable

Bigtable

Performance evaluation

25

Scan uses single RPC call and shows best performance.

Page 26: Bigtable

Bigtable

Performance evaluation

26

Sequential reads are better than random reads, since

each fetched block is used to serve next requests.

Page 27: Bigtable

Bigtable

Performance evaluation

27

Random read shows the worst performance. Fetching 64KB

every 1000 bytes is expensive.

Page 28: Bigtable

Bigtable

Performance evaluation

28

Not linear, but scales well.

Page 29: Bigtable

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

29

Page 30: Bigtable

Bigtable

Conclusions

• Bigtable: highly scalable and available, without compromising performance.

• Flexibility for Google – designed using their own data model.

• Custom design gives Google the ability to remove or minimize bottlenecks.

• Related work: – Apache Hbase (open source)

– Boxwood (though targeted at a lower/FS level)

30

Page 31: Bigtable

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

31

Page 32: Bigtable

Bigtable

B+ Trees

• A tree with sorted data for:

– Efficient insertion, retrieval and removal of records.

• All records are stored at the leaf level, only keys stored in interior nodes.

32