62
T RANSACTIONS OVER HB ASE Alex Baranau @abaranau Gary Helmling @gario Continuuity

Transactions Over Apache HBase

Embed Size (px)

Citation preview

Page 1: Transactions Over Apache HBase

TRANSACTIONS OVER HBASEAlex Baranau @abaranau Gary Helmling @gario

Continuuity

Page 2: Transactions Over Apache HBase

WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out

application server for Hadoop

• Fast, easy development, deployment and management of Hadoop and HBase apps

• Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.

���2

Page 3: Transactions Over Apache HBase

AGENDA • Transactions in stream processing: Why? What?

• Implementation: How? • Omid-style transactions explained

• Transaction Manager

• What’s next?

���3

Page 4: Transactions Over Apache HBase

THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase

• Collect, Process, Store, and Query data.

• A Flow is a real-time processor with exactly-once guarantee

• A flow is composed of flowlets, connected via queues

• All processing happens with ACID guarantees in transactions

���4

Page 5: Transactions Over Apache HBase

HBase Table

PROCESSING IN A FLOW

���5

...Queue ...

...

Flowlet

... ...

Page 6: Transactions Over Apache HBase

HBase Table

PROCESSING IN A FLOW

���6

...Queue ...

...

Flowlet

... ...

Page 7: Transactions Over Apache HBase

HBase Table

PROCESSING IN A FLOW

���7

...Queue ...

...

Flowlet

Page 8: Transactions Over Apache HBase

TRANSACTIONS: WHAT?• Atomic - Entire transaction is committed as one

• Consistent - No partial state change due to failure

• Isolated - No dirty reads, transaction is only visible after commit

• Durable - Once committed, data is persisted reliably

���8

Page 9: Transactions Over Apache HBase

WHAT ABOUT HBASE?• Atomic operations on cell value:

checkAndPut, checkAndDelete, increment, append

• Atomic batch of operations on rows within region

���9

• No cross region atomic operations support

• No cross table atomic operations support

• No multi-RPC atomic operations support

Page 10: Transactions Over Apache HBase

IMPLEMENTATION OVERVIEW

���10

Page 11: Transactions Over Apache HBase

OMID-STYLE TRANSACTIONS • Multi-Version Concurrency Control

• Cell version (timestamp) = transaction ID

• All writes in the same transaction use the transaction ID as timestamp

• Reads exclude other, uncommitted transactions (for isolation)

• Optimistic Concurrency Control

• Conflict detection at commit of transaction

• Write Conflict: two overlapping transactions write the same row

• Rollback of one transaction in case of conflict (whichever commits later)

���11

Page 12: Transactions Over Apache HBase

OPTIMISTIC CONCURRENCY CONTROL

• Avoids cost of locking rows and tables

• No deadlocks or lock escalations

• Cost of conflict detection and possible rollback is higher

• Good if conflicts are rare: short transaction, disjoint partitioning of work

���12

Page 13: Transactions Over Apache HBase

ZooKeeper

TRANSACTIONS IN CONTEXT

���13

Tx Manager (standby)

HBase

Master 1

Master 2

RS 1

RS 2 RS 4

RS 3

Client 1

Client 2

Client N

Tx Manager (active)

Page 14: Transactions Over Apache HBase

TRANSACTION LIFE CYCLE

time out

try abort

failed

roll back in HBase

write to

HBasedo work

Client Tx Manager

none

complete Vabortsucceeded

in progress

start txstart

start tx

committry commit check conflicts

RPC API

invalid Xinvalidate

failed

Page 15: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���15

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

Client 2

write = 1002 read = 1001

Page 16: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���16

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

start

write = 1002 read = 1001

Client 2

write = 1002 read = 1001

Page 17: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���17

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

start

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

Page 18: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���18

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1increment

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

Page 19: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���19

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 increment

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

Page 20: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���20

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 start

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

write = 1003 read = 1001

excluded=[1002]

Page 21: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���21

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 start

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

Page 22: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���22

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

increment

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

Page 23: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���23

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

Page 24: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���24

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

write = 1003 read = 1001

excluded=[1002]

Page 25: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���25

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 26: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���26

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

conflict!

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 27: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���27

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 28: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���28

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 29: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���29

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 30: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���30

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[]

Page 31: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���31

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1003

inprogress=[]

Page 32: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���32

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1 start

Client 2

write = 1005 read = 1003

inprogress=[]

write = 1004 read = 1003

Page 33: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���33

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

read

Client 2

write = 1005 read = 1003

inprogress=[]

write = 1004 read = 1003

Page 34: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���34

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

conflict!

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 35: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���35

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 36: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���36

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback failed

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 37: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���37

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

invalidate

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 38: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���38

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

invalidate

write = 1002 read = 1001

Client 2

write = 1004 read = 1003

inprogress=[] invalid=[1002]

Page 39: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���39

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 start

Client 2

write = 1005 read = 1003

inprogress=[] invalid=[1002]

write = 1004 read = 1003

exclude = [1002]

Page 40: Transactions Over Apache HBase

HBase

CLIENT SIDE: TX AWARE

���40

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

read

Client 2

write = 1005 read = 1003

inprogress=[] invalid=[1002]

write = 1004 read = 1003

exclude = [1002]

invisible!

Page 41: Transactions Over Apache HBase

TRANSACTION MANAGER• Create new transactions

• Provides monotonically increasing write pointers

• Maintains all in-progress, committed, and invalid transactions

• Detect conflicts

• Transaction = Write Pointer: Timestamp for HBase writes

Read pointer: Upper bound timestamp for reads

Excludes: List of timestamps to exclude from reads

���41

Page 42: Transactions Over Apache HBase

TRANSACTION MANAGER• Simple & Fast

• All required state is in-memory

• Single point of failure? • Persist all state to a write-ahead log

• Secondary Tx Manager watches for failure of Primary

• Failover can happen quickly

���42

Page 43: Transactions Over Apache HBase

TRANSACTION MANAGER

���43

Tx ManagerCurrent State

in progress

committed

invalid

read point

write point

start()

Page 44: Transactions Over Apache HBase

TRANSACTION MANAGER

���44

Tx ManagerCurrent State

in progress (+)

committed

invalid

read point

write point ++

start()Tx Log

started, <write pt>

HDFS

Page 45: Transactions Over Apache HBase

TRANSACTION MANAGER

���45

Tx ManagerCurrent State

in progress (-)

committed (+)

invalid

read point

write pointcommit()

Tx Logstart, <write pt>

commit, <write pt>

HDFS

Page 46: Transactions Over Apache HBase

TRANSACTION SNAPSHOTS

• Write-ahead log provides persistence • Guarantees point-in-time recovery

• Longer the log grows, longer recovery takes

• Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state

���46

Page 47: Transactions Over Apache HBase

Tx ManagerCurrent State

TRANSACTION SNAPSHOTS

���47

Tx Log A

in progress

committed

invalid

read point

write point

HDFS

Page 48: Transactions Over Apache HBase

Tx ManagerCurrent State

TRANSACTION SNAPSHOTS

���48

Tx Log A

in progress

committed

invalid

read point

write point Tx Log B1

HDFS

Page 49: Transactions Over Apache HBase

TRANSACTION SNAPSHOTS

���49

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B2

HDFS

Page 50: Transactions Over Apache HBase

TRANSACTION SNAPSHOTS

���50

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B

Tx Snapshot

in progress

committed

invalid

read point

write point

3

HDFS

Page 51: Transactions Over Apache HBase

TRANSACTION SNAPSHOTS

���51

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B

Tx Snapshot

in progress

committed

invalid

read point

write point

4

HDFS

Page 52: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP

���52

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback failed

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

Page 53: Transactions Over Apache HBase

TRANSACTION CLEANUP: DATA JANITOR

• RegionObserver coprocessor

• Maintains in-memory snapshot of recent invalid & in-progress sets

• Periodically updates from transaction snapshot in HDFS

• Purges data from invalid transactions and older versions on flush & compaction

���53

Page 54: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���54

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

refresh

Data Janitor (RegionObserver)

MemStore

preFlush()

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Cell TS Valuerow1:col1 1004 12

1003 111002 11

Custom RegionScanner

HFileCell TS Value

Page 55: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���55

Data Janitor (RegionObserver)

HFileCell TS Value

Custom RegionScanner

MemStore

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

Page 56: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���56

Data Janitor (RegionObserver)

HFileCell TS Value

row1:col1 1004 12Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

Page 57: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���57

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 12

Page 58: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���58

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

Page 59: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���59

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

Page 60: Transactions Over Apache HBase

HBase

TRANSACTION CLEANUP: DATA JANITOR

���60

Data Janitor (RegionObserver)

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Custom RegionScanner

preFlush()

MemStore

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

Page 61: Transactions Over Apache HBase

WHAT’S NEXT?• Open Source

• Continue Scaling Tx Manager • Transaction Groups?

• Integration across other transactional stores

���61

Page 62: Transactions Over Apache HBase

QS?Looking for the chance to work with a team that is

defining a new category within Big Data?

!

We are hiring! http://continuuity.com/careers

[email protected]

���62