23
Lock-free transactional support for large-scale storage systems Flavio Junqueira, Benjamin Reed, Maysam Yabandeh Yahoo! Research June 2011

Retso hotdep-2011

Embed Size (px)

DESCRIPTION

Talk at the 7th Workshop on Hot Topics in Systems Dependability (HotDep 2011)

Citation preview

Page 1: Retso hotdep-2011

Lock-free transactional support for large-scale

storage systemsFlavio Junqueira, Benjamin Reed, Maysam Yabandeh

Yahoo! ResearchJune 2011

Page 2: Retso hotdep-2011

June 2011

Big data

• Large data sets

✓ Unstructured, semi-structured data

✓ Critical for business logic

• Examples of such data

✓ Web logs, server logs, social media, etc

2

Page 3: Retso hotdep-2011

June 2011

Big data

3

+43% clicksvs. editor selected

+160% clicksvs. one-size fits all

Eric Baldeschwieler @IBM Big Data, May 2011

Page 4: Retso hotdep-2011

June 2011

Big data: Hadoop

4

Eric Baldeschwieler @IBM Big Data, May 2011

Page 5: Retso hotdep-2011

June 2011

• Database generations in batches

• Online concurrent updates

Background

5

InputDB

Hours of MapReduce

OutputDB

Hours of MapReduce

OutputDB

Input

OutputDB

Input

Input txn

Input txn

Require transactional support

e.g., Hbase, HDFS

Page 6: Retso hotdep-2011

June 2011

Examples

• Mutable tables

• Various indexes: Web, news, shopping, coupons

• User and content models

• Characteristics

✓ Concurrency

✓ Losing updates is undesirable

✓ There are concurrent reads and they must be consistent

6

Page 7: Retso hotdep-2011

June 2011

Semantics

• Read only previously committed values

7

w(x,v)

w(x,v’)

r(x) = v

Time

Txn

Page 8: Retso hotdep-2011

June 2011

Semantics

• No concurrent writes to the same row

8

w(x,v’)

w(x,v)

Time

Txn

At least one must abort

Page 9: Retso hotdep-2011

June 2011

Snapshot Isolation

• Known in the database realm

• Conflicting transactions

✓ Write to the same element (e.g., row)

✓ Time range between start and commit overlap

• Efficient implementation by versioning

9

Page 10: Retso hotdep-2011

June 2011

Locks?

• Previous approaches: Lock data to modify

✓ Convoy effect

✓ Delays of several seconds

✓ Higher overhead on data servers

• Our approach

✓ Lock-free, centralized transaction manager

✓ Single point of failure, potential bottleneck?

10

[Percolator, OSDI’10]

Page 11: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Keeps stateabout committed rows

Page 12: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts①

Keeps stateabout committed rows

Page 13: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts①r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 14: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 15: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③Commit r2

④r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 16: Retso hotdep-2011

June 2011

Transaction Status Oracle

• Single process

✓ Processes client inquiries about transactions

✓ Includes a timestamp oracle

11

TSOTO

ClientDB1 DB2

Ts① Tc(txnw) < Ts(txnr)? ③Commit r2

Cleanup(r2, txnr) ⑤

r(r1)w(r2, v2, Ts(txnr))

v1, Ts(txnw), ⊥ACK

②②

Ts(txnw) < Ts(txnr)

Keeps stateabout committed rows

Page 17: Retso hotdep-2011

June 2011

ReTSO: Design choices

• TSO

✓ Keeps state of modified rows

• In-memory state

✓ Highest commit timestamp of all garbage-collected rows

• Auto-GC Hash map

✓ Lazy garbage-collection

✓ Upon a hit

12

Page 18: Retso hotdep-2011

June 2011

ReTSO: Increasing dependability

• Remote write-ahead log

13

WAL

ReTSOInquiries

Updates

BackupReTSO Warm or cold

e.g., NFS, BookKeeper

[http://zookeeper.apache.org/bookkeeper]

Writes to WALare synchronous but do

not block other txns

Page 19: Retso hotdep-2011

June 2011

Preliminary results

• Coded in Java

✓ Except for hash map (C++ with JNI interface)

• Uses BookKeeper for WAL

• 10 identical servers

✓ 2.13 Dual Core Intel Xeon

✓ 4GB of RAM

✓ 1 Gigabit interfaces

14

Page 20: Retso hotdep-2011

June 2011

Preliminary results

• Average throughput observed

✓ 3 clients, 1,000 concurrent transactions

✓ 81k TPS

• Average latency

✓ 1 client, 1 txn

✓ 0.87 ms (with WAL)

✓ 0.17 ms (without WAL)

15

Page 21: Retso hotdep-2011

June 2011

Preliminary results

• Increasing the load of the system

✓ 1 to 16 clients

✓ Max is 72k TPS

16

0

2

4

6

8

10

12

14

16

18

20000 40000 60000 80000 100000 120000

Late

ncy

in m

s

Throughput in TPS

ReTSOWAL-disabled

Page 22: Retso hotdep-2011

June 2011

What’s baking?

• Integration

✓ HBase

✓ Query engine

• Real workloads

17

Page 23: Retso hotdep-2011

June 2011

Summary

• Transaction management for large-scale data repositories

• Lock-based vs. Lock-free

✓ ReTSO is lock-free and dependable

✓ Reduced load on storage nodes

✓ Low latency despite faults

• Performance sufficient for realistic applications

18