Omid: Efficient Transaction Management and Incremental Processing for HBase (Hadoop Summit Europe 2013)

Omid

Efficient Transaction Management and Incremental Processing for HBase

Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission.

Daniel Gómez Ferro 21/03/2013

About me

§ Research engineer at Yahoo!

§ Main Omid developer

§ Apache S4 committer

§ Contributor to: • ZooKeeper • HBase

Omid - Yahoo! Research 2

Outline

§ Motivation

§ Goal: transactions on HBase

§ Architecture

§ Examples

§ Future work

3 Omid - Yahoo! Research

Motivation

§ Traditional DBMS: • SQL • Transactions • Scalable

§ NoSQL stores: • SQL • Transactions • Scalable

§ Some use cases require • Scalability • Consistent updates


Motivation

§ Requested feature • Support for transactions in HBase is a common request

§ Percolator • Google implemented transactions on BigTable

§ Incremental Processing • MapReduce latencies measured in hours • Reduce latency to seconds • Requires transactions


Incremental Processing

State0 State1 Shared State

Offline MapReduce

Online Incremental Processing

Fault tolerance Scalability

Fault tolerance Scalability

Shared state

Using Percolator, Google dramatically reduced crawl-to-index time


Transactions on HBase

Omid

§ ‘Hope’ in Persian

§ Optimistic Concurrency Control system • Lock free • Aborts transactions at commit time

§ Implements Snapshot Isolation

§ Low overhead

§ http://yahoo.github.com/omid Omid - Yahoo! Research 8

Snapshot Isolation

§ Transactions read from their own snapshot

§ Snapshot: •  Immutable •  Identified by creation time • Values committed before creation time

§ Transactions conflict if two conditions apply: A)  Overlap in time B)  Write to the same row


Transaction conflicts

r1 r2 r3 r4 Time

§ Transaction B overlaps with A and C • B ∩ A = ∅ • B ∩ C = { r4 } • B and C conflict

§ Transaction D doesn’t conflict Omid - Yahoo! Research 10

Write set:

r3 r4

r2 r4

r1 r3

Id:

A

B

C

D

Time overlap:

Omid Architecture

Omid Clients Omid

Clients

Architecture

1.  Centralized server 2.  Transactional metadata replicated to clients 3.  Store transaction ids in HBase timestamps

HBase Region Servers

Status Oracle (SO)




(1) (2)

(3)


Omid Clients

SO

Status Oracle

§ Limited memory • Evict old metadata • … maintaining consistency guarantees!

§ Keep LowWatermark • Updated on metadata eviction • Abort transactions older than LowWatermark • Necessary for Garbage Collection


Transactional Metadata

Status Oracle

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4


Metadata Replication

§ Transactional metadata is replicated to clients

§ Status Oracle not in critical path • Minimize overhead

§ Clients guarantee Snapshot Isolation •  Ignore data not in their snapshot • Talk directly to HBase • Conflicts resolved by the Status Oracle

§ Expensive, but scalable up to 1000 clients


Omid Clients Omid

Clients

Architecture recap






Omid Clients

SO

Row LastCommitTS r1 T20 … …

StartTS CommitTS T10 T20 … …

Aborted T2 T18 …

LowWatermark T4

Status Oracle

Architecture + Metadata

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

ST CT 10 20

Abort …

LowW …

Client


Conflict detection (not replicated)

HBase

row TS val r1 10 hi

HBase values tagged with Start TimeStamp

Comparison with Percolator


Omid

Clients HBase

Status Oracle

Percolator

Clients BigTable

TS Server

Simple API

§ TransactionManager: Transaction begin(); void commit(Transaction t) throws RollbackException; void rollback(Transaction t);

§ TTable:

Result get(Transaction t, Get g); void put(Transaction t, Put p); ResultScanner getScanner(Transaction t, Scan s);


Example Transaction

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Status Oracle

R LC r1 20

ST CT 10 20

Abort …

LowW …

HBase

row TS val r1 10 hi

>> |


Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >>

t ST: 25

|


HBase

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >>

t ST: 25

|


HBase

row TS val r1 10 hi

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 20

ST CT 10 20

Abort …

LowW …

>> t = TM.begin() Transaction { ST: 25 } >> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >>

t ST: 25 R: {r1}

|


HBase

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

R LC r1 30

ST CT 25 30

Abort …

LowW 20

>> TTable.get(t, ‘r1’) ‘hi’ >> TTable.put(t, ‘r1’, ‘bye’) >> TM.commit(t) Committed{ CT: 30 } >>

t ST: 25 R: {r1}

|


HBase

row TS val r1 10 hi r1 25 bye

Status Oracle

ST CT 10 20

Abort …

LowW …

Client

Transaction Example

Abort 15

>> t_old Transaction{ ST: 15 } >> TT.put(t_old, ‘r1’, ‘yo’) >> TM.commit(t_old) Aborted >>

t_old ST: 15 R: {r1}


HBase

row TS val r1 15 yo r1 25 bye

R LC r1 30

ST CT 25 30

LowW …

|

Centralized Approach


Omid Performance

§ Centralized server = bottleneck? • Focus on good performance •  In our experiments, it never became the bottleneck

0

5

10

15

20

25

30

1K 10K 100K

Late

ncy

in m

s

Throughput in TPS

2 rows4 rows8 rows

16 rows32 rows

128 rows512 rows


512 rows/t 1K TPS 10 ms

2 rows/t 100K TPS

5 ms

Fault Tolerance

§ Omid uses BookKeeper for fault tolerance • BookKeeper is a distributed Write Ahead Log

§ Recovery: reads log’s tail (bounded time)

0

3000

6000

9000

12000

15000

18000

5900 5950 6000 6050 6100

Throughput in TPS

Time in s

Downtime < 25 s


Example Use Case News Recommendation System

News Recommendation System

§ Model-based recommendations • Similar users belong to a cluster • Each cluster has a trained model

§ New article • Evaluate against clusters • Recommend to users

§ User actions • Train model • Split cluster


Transactions Advantages

§ All operations are consistent • Updates and queries

§ Needed for a recommendation system? • We avoid corner cases •  Implementing existing algorithms is straightforward

§ Problems we solve • Two operations concurrently splitting a cluster • Query while cluster splits • …


Example overview


Index

Articles & User actions Queries

Index Data Store Wor

kers

Updates

Other use cases

§ Update indexes incrementally • Original Percolator use case

§ Generic graph processing

§ Adapt existing transactional solutions to HBase


Future Work

Current Challenges

§ Load isn’t distributed automatically

§ Complex logic requires big transactions • More likely to conflict • More expensive to retry

§ API is low level for data processing


Future work

§ Framework for Incremental Processing

• Simpler API

• Trigger-based, easy to decompose operations

• Auto scalable

§ Improve user friendliness • Debug tools, metrics, etc

§ Hot failover


Questions?

All time contributors Ben Reed

Flavio Junqueira Francisco Pérez-Sorrosal

Ivan Kelly Matthieu Morel

Maysam Yabandeh

Partially supported by CumuloNimbo project (ICT-257993)

funded by the European Commission

Try it! yahoo.github.com/omid

Backup


Commit Algorithm

§ Receive commit request for txn_i

§ Set CommitTimestamp(txn_i) <= new timestamp

§ For each modified row: •  if LastCommit(row) > StartTimestamp(txn_i) => abort

§ For each modified row: • Set LastCommit(row) <= CommitTimestamp(txn_i)


Technology

Omid: Efficient Transaction Management and Incremental Processing for HBase (Hadoop Summit Europe 2013)