Bigtable

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Performance evaluation

• Conclusions

Bigtable

A distributed storage system ..

• .. for managing structured data.

• Used for demanding workloads, such as:

– Throughput oriented batch processing.

– Serving latency-sensitive data to the client.

• Dynamic control instead of relational model.

• Data locality properties (revisit later briefly).

Bigtable

Bigtable has achieved several goals

• Wide applicability: used for 60+ Google products, including:

– Google Analytics, Google Code, Google Earth, Google Maps and Gmail.

• Scalability (explain later under evaluation).

• High performance.

• High availability.

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Conclusions

Bigtable

Data model

• Essentially a sparse, distributed, persistent multi-dimensional sorted map.

• The map is indexed by a row key, column key and a timestamp.

• Atomic reads and writes over a single row.

Columns

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

Bigtable

Row and column range

• Row range dynamically partitioned into tablets.

• Data in lexicographic order.

• Allows data locality.

• Column keys grouped into column families.

• Each family has the same type.

• Allows access control and disk or memory accounting.

Enables reasoning about data locality

Columns

Anchor is a column family

Columns

“com.bbc.www”

“anchor:bbcworld.com “anchor:weather

“BBC” “BBC.com”

Tablets

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Conclusions

Bigtable

Bigtable uses several other technologies

• Google File System to store log and data files.

• SSTable file format to store BigTable data.

• Chubby, a distributed lock service.

For more details on these technologies, refer to section 4 of the paper.

Bigtable

Implementation

Master responsibilities: -Assign tablets to tablet servers -Add/delete tablet servers -Balance tablet server load -GC -Schema changes

MASTER

TABLET SERVERS

INTERNET

CLIENT

Communicate directly to tablet servers

Bigtable

How data is stored?

A three-level hierarchy, similar to B+ trees.

Bigtable

Location hierarchy

Chubby file contains location of the root tablet.

Bigtable

Location hierarchy

Root tablet contains all tablet locations in Metadata

table.

Bigtable

Location hierarchy

Metadata table stores locations of actual tablets.

Bigtable

Location hierarchy

Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect.

Bigtable

How data is served?

Bigtable

Tablet serving

Persistent

Bigtable

Tablet serving

Compactions occur regularly, advantages: -Shrinks memory usage. -Reduces amount of data read from log during recovery.

Compactions

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Conclusions

Bigtable

Benchmarks for perf evaluation

• Scan:

– Scans over values in a row range.

• Random reads from memory.

• Random reads/writes:

– R keys to be read/written spread over N clients.

• Sequential reads/writes:

– 0 to R-1 keys to be read/written spread over N clients.

Bigtable

Performance evaluation

Scan uses single RPC call and shows best performance.

Bigtable

Sequential reads are better than random reads, since

each fetched block is used to serve next requests.

Bigtable

Random read shows the worst performance. Fetching 64KB

every 1000 bytes is expensive.

Bigtable

Not linear, but scales well.

Bigtable

Outline

• Introduction

• Data model

• Implementation

• Conclusions

Bigtable

Conclusions

• Bigtable: highly scalable and available, without compromising performance.

• Flexibility for Google – designed using their own data model.

• Custom design gives Google the ability to remove or minimize bottlenecks.

• Related work: – Apache Hbase (open source)

– Boxwood (though targeted at a lower/FS level)

A Distributed Storage System for Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

Bigtable

B+ Trees

• A tree with sorted data for:

– Efficient insertion, retrieval and removal of records.

• All records are stored at the leaf level, only keys stored in interior nodes.

Bigtable

Technology

HBaseConEast2016: OpenTSDB+BigTable

The Google Bigtable

Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Centralized vs. DistributedDynamo Voldemort TokyoCabinet KAI Cassandra SimpleDB CouchDB Riak BigTable Hypertable HBase MongoDB Terrastore Scalaris BerkeleyDB MemcacheDB Redis ... BigTable

Bigtable Uw Presentation

GOOGLE BIGTABLE - Universitetet i osloData Model A cluster is a set of machines with Bigtable processes Each Bigtable cluster serves a set of tables A table is a sparse, distributed,

Great BigTable and my toys

Bigtable a distributed storage system

Google File System BigTable

Bigtable Raw

Bigtable and Boxwood

BIGTABLE - Massachusetts Institute of Technologydb.lcs.mit.edu/6.830/lectures/bigtable-slides.pdf · BigTable System Architecture Read/write Cluster Scheduling Master handles failover,

Security approaches in BigTable-like storage systems

Building Large-Scale Internet Services - Research at Googleresearch.google.com/people/jeff/SOCC2010-keynote-slides.pdfBigtable tablet server Bigtable tablet server Bigtable tablet

HBase and Bigtable Storage

Google Bigtable Paper Presentation

bigtable - courses.cs.washington.edu

Storage of Structured Data: BigTable and HBaselsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/HBase_2.pdf · Storage of Structured Data: BigTable and HBase HBase/BigTable

Bigtable and Dynamo

BigTable: A System for Distributed Structured Storagepages.cs.wisc.edu/~remzi/Classes/739/Fall2017/Papers/bigtable-slides-05.pdf · Bigtable master Bigtable tablet server Bigtable