Bigtable

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

1

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

2

Bigtable

A distributed storage system ..

• .. for managing structured data.

• Used for demanding workloads, such as:

– Throughput oriented batch processing.

– Serving latency-sensitive data to the client.

• Dynamic control instead of relational model.

• Data locality properties (revisit later briefly).

3

Bigtable

Bigtable has achieved several goals

• Wide applicability: used for 60+ Google products, including:

– Google Analytics, Google Code, Google Earth, Google Maps and Gmail.

• Scalability (explain later under evaluation).

• High performance.

• High availability.

4

Bigtable

Outline

• Introduction

• Data model

• Implementation


• Conclusions

5

Bigtable

Data model

• Essentially a sparse, distributed, persistent multi-dimensional sorted map.

• The map is indexed by a row key, column key and a timestamp.

• Atomic reads and writes over a single row.

6

Columns

Rows

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

7

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

8

Enables reasoning about data locality

9

Columns

Rows

10

Columns

Rows

Anchor is a column family

11

Columns

“com.bbc.www”

“anchor:bbcworld.com “anchor:weather

“BBC” “BBC.com”

Tablets

Bigtable

Outline

• Introduction

• Data model

• Implementation


• Conclusions

12

Bigtable

Bigtable uses several other technologies

• Google File System to store log and data files.

• SSTable file format to store BigTable data.

• Chubby, a distributed lock service.

For more details on these technologies, refer to section 4 of the paper.

13

Bigtable

Implementation

14

Master responsibilities: -Assign tablets to tablet servers -Add/delete tablet servers -Balance tablet server load -GC -Schema changes

MASTER

TABLET SERVERS

INTERNET

CLIENT

Communicate directly to tablet servers

Bigtable

How data is stored?

15

A three-level hierarchy, similar to B+ trees.

Bigtable

Location hierarchy

16

Chubby file contains location of the root tablet.

Bigtable

Location hierarchy

17

Root tablet contains all tablet locations in Metadata

table.

Bigtable

Location hierarchy

18

Metadata table stores locations of actual tablets.

Bigtable

Location hierarchy

19

Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect.

Bigtable

How data is served?

20

Bigtable

Tablet serving

21

Persistent

Bigtable

Tablet serving

22

Compactions occur regularly, advantages: -Shrinks memory usage. -Reduces amount of data read from log during recovery.

Compactions

Bigtable

Outline

• Introduction

• Data model

• Implementation


• Conclusions

23

Bigtable

Benchmarks for perf evaluation

• Scan:

– Scans over values in a row range.

• Random reads from memory.

• Random reads/writes:

– R keys to be read/written spread over N clients.

• Sequential reads/writes:

– 0 to R-1 keys to be read/written spread over N clients.

24

Bigtable

Performance evaluation

25

Scan uses single RPC call and shows best performance.

Bigtable


26

Sequential reads are better than random reads, since

each fetched block is used to serve next requests.

Bigtable


27

Random read shows the worst performance. Fetching 64KB

every 1000 bytes is expensive.

Bigtable


28

Not linear, but scales well.

Bigtable

Outline

• Introduction

• Data model

• Implementation


• Conclusions

29

Bigtable

Conclusions

• Bigtable: highly scalable and available, without compromising performance.

• Flexibility for Google – designed using their own data model.

• Custom design gives Google the ability to remove or minimize bottlenecks.

• Related work: – Apache Hbase (open source)

– Boxwood (though targeted at a lower/FS level)

30

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

31

Bigtable

B+ Trees

• A tree with sorted data for:

– Efficient insertion, retrieval and removal of records.

• All records are stored at the leaf level, only keys stored in interior nodes.

32

Technology

Bigtable