36
Improved Single Global Lock Fallback for Besteffort Hardware Transactional Memory Irina Calciu Tatiana Shpeisman Gilles Pokam Maurice Herlihy TRANSACT 2014

Improved Single Global Lock Fallback for Best-effort Hardware

  • Upload
    buidat

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Improved Single Global Lock Fallback for Best-effort Hardware

Improved  Single  Global  Lock  Fallback  for  Best-­effort  Hardware  

Transactional  Memory

Irina  CalciuTatiana  ShpeismanGilles  PokamMaurice  Herlihy

TRANSACT   2014

Page 2: Improved Single Global Lock Fallback for Best-effort Hardware

Multicore  Performance  Scaling

2

Page 3: Improved Single Global Lock Fallback for Best-effort Hardware

Intel’s  Haswell TSX: RTM  &  HLE

3

Low  overhead  (cache  based)

IBM’s  Blue  Gene/Q  &  System  Z  &  Power  Architecture

Hardware  Transactional  Memory  (HTM)

Page 4: Improved Single Global Lock Fallback for Best-effort Hardware

Haswell RTM

if  (_xbegin()  ==  _XBEGIN_STARTED)

_xend()

Speculate  Execution

Speculate  Execution,  without  any  locks

Read  and  Write  Sets

4

Abort  on  memory  conflict

else

Abort  Handler

Page 5: Improved Single Global Lock Fallback for Best-effort Hardware

Haswell RTM

5

_xbegin()

_xend()

Read  X

Write  Y

Add  to  Read  Set

Add  to  Write  Set

_xbegin()

_xend()

Write  X

Write  YAdd  to  Write  Set

Make  the  change  to  Y  visibleCOMMIT

Add  to  Write  SetABORT

if  (_xbegin()  ==  _XBEGIN_STARTED)

_xend()

Speculate  Execution

Page 6: Improved Single Global Lock Fallback for Best-effort Hardware

Lock Elision

<HLE_Aquire_Prefix>  Lock(L)

<HLE_Release_Prefix>  Release(L)

Atomic  region  executed  as  a  transaction or  mutually  exclusive on  L

Execute  optimistically,  without  any  locks

Track  Read  and  Write  Sets

6

Abort  on  memory  conflict:  rollback  acquire  lock

Page 7: Improved Single Global Lock Fallback for Best-effort Hardware

[Anand Tech]7

Page 8: Improved Single Global Lock Fallback for Best-effort Hardware

Best-­effort

OverflowUnsupported  InstructionsInterrupts

Conflicts

8

Small  &  Medium  Transactions

Haswell RTM

Needs  software  fallback

Page 9: Improved Single Global Lock Fallback for Best-effort Hardware

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

9

Page 10: Improved Single Global Lock Fallback for Best-effort Hardware

Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation

Single Global Lock HyTM (simple and common)

10

EndHW txn

BeginHW txnRead L

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Page 11: Improved Single Global Lock Fallback for Best-effort Hardware

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txnRead L

EndHW txn

(1)

BeginHW txnRead L

EndHW txn

(2)

BeginHW txnRead L

BeginHW txnRead L

EndHW txn

(3) EndHW txn

(4)

XX

X

X

Legend:X = ABORT

Single Global Lock HyTM (simple and common)

Time

11

Page 12: Improved Single Global Lock Fallback for Best-effort Hardware

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Acquire(L)

Release(L)

CRITICAL  SECTION(SW  TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Time

Thread  1 Thread  2

Execution   Time  1 12

Page 13: Improved Single Global Lock Fallback for Best-effort Hardware

Thread  1 Thread  2

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Acquire(L)

Release(L)

CRITICAL  SECTION(SW  TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL  SECTION

Execution   Time  1

Time

Execution   Time  2

13

Page 14: Improved Single Global Lock Fallback for Best-effort Hardware

Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation

Lazy SGL

1414

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Read LEnd

HW txn

BeginHW txn

Page 15: Improved Single Global Lock Fallback for Best-effort Hardware

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend:X = ABORT

COMMIT

COMMIT

Lazy SGL

Time

15

Page 16: Improved Single Global Lock Fallback for Best-effort Hardware

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

16

Page 17: Improved Single Global Lock Fallback for Best-effort Hardware

Transactional Memory Correctness

Transaction 1SW

Transaction 2HW

Time

Order T2 AFTER T1

Order T2 BEFORE T1

COMMIT

COMMIT

17

Page 18: Improved Single Global Lock Fallback for Best-effort Hardware

Thread  1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread  2(HW)

Correct: a Actual: b

Time

Case  1:  HW  begins  SW  begins  HW  ends  SW  ends

X value:

Check Lock

ABORT

18

Page 19: Improved Single Global Lock Fallback for Best-effort Hardware

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread  1(SW)

Thread  2(HW)

Case  2:  SW  beginsHW  beginsHW  endsSW  ends

Correct: a Actual: b

Time

Check Lock

ABORT

X value:

19

Page 20: Improved Single Global Lock Fallback for Best-effort Hardware

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Case  3:  SW  beginsHW  beginsSW  endsHW  ends

Thread  1(SW)

Thread  2(HW)

TimeX value:

Correct: b Actual: b

Check Lock

COMMIT

20

Page 21: Improved Single Global Lock Fallback for Best-effort Hardware

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN

X = b…

TXN_END

Case  4:  HW  beginsSW  beginsSW  endsHW  ends

Thread  1(SW)

Thread  2(HW)

TimeX value:

Correct: b Actual: b

Check Lock

COMMIT21

Page 22: Improved Single Global Lock Fallback for Best-effort Hardware

22

Thread  1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

TXN_BEGIN

Z = 1/(Y-X)

TXN_END

Thread  2(HW)

Z = 1/0 !!!Time

Hardware  Sandboxing

Page 23: Improved Single Global Lock Fallback for Best-effort Hardware

Indirect  Jumps

Thread  1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

_xbegin

if (X == Y) *p = garbagep()

…if (lock) abort_xend

Thread  2(HW)

_xend

Time

23

Page 24: Improved Single Global Lock Fallback for Best-effort Hardware

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

24

Page 25: Improved Single Global Lock Fallback for Best-effort Hardware

0

0.5

1

1.5

2

2.5

3

3.5

1 2 4 8

Spee

dup

Threads

Ssca2  (small  txns)

00.51

1.52

2.53

3.54

1 2 4 8

Spee

dup

Threads

Labyrinth  (large  txns)

25

Intruder  (medium  txns)

0

0.5

1

1.5

2

2.5

3

1 2 4 8

Speedup

Threads

TL2SGLHLEE-­SGLL-­SGL

Better

Page 26: Improved Single Global Lock Fallback for Best-effort Hardware

Improved  Lock  Acquisition  Rate

26

Vacation  Low  (medium  txns)

Kmeans High  (small  txns)

Intruder  (medium  txns)

Labyrinth  (large  txns)

0

5

10

15

20

25

30

1 2 4 8

%  lock  acquisitions

Threads

0

10

20

30

40

50

60

70

1 2 4 8

%  lock  acquisitions

Threads

051015202530354045

1 2 4 8

%  lock  acquisitions

Threads

HLE

E-­SGL

L-­SGL

0

10

20

30

40

50

60

70

80

1 2 4 8

%  lock  acquisitions

Threads

HLE

E-­SGL

L-­SGL

Better

Page 27: Improved Single Global Lock Fallback for Best-effort Hardware

No  single  thread  overhead

27

Slowdown  relative  to  sequential  for  1  thread

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Slowdown

TL2SGLHLEE-­SGLL-­SGL

Page 28: Improved Single Global Lock Fallback for Best-effort Hardware

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

28

Page 29: Improved Single Global Lock Fallback for Best-effort Hardware

Bloom Filters

• Efficient  probabilistic  data  structure  to  compute  fast  set  intersection  

• Can  admit  false  positives

• No  false  negatives

• Used  in  TM  for  Conflict  Detection

29

Page 30: Improved Single Global Lock Fallback for Best-effort Hardware

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend:X = ABORT

COMMIT

COMMIT

Lazy SGL

Time

30

Page 31: Improved Single Global Lock Fallback for Best-effort Hardware

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Check BFEnd

HW txn(1)

BeginHW txn

Check BFEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

Legend:X = ABORT

COMMIT

COMMIT

BF SGL

Time

31

Page 32: Improved Single Global Lock Fallback for Best-effort Hardware

Thread  1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread  2(HW)

Time

Case  1:  HW  begins  SW  begins  HW  ends  SW  ends

X value:a

Correct: a Actual: a

Check BF

If BFs intersect: ABORTElse: COMMIT

32

Page 33: Improved Single Global Lock Fallback for Best-effort Hardware

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread  1(SW)

Thread  2(HW)

Case  2:  SW  beginsHW  beginsHW  endsSW  ends

Correct: a Actual: b

TimeX value:

Check BF

If BFs intersect: ABORTElse: COMMIT 33

Page 34: Improved Single Global Lock Fallback for Best-effort Hardware

Conclusions

• HTMs  are  becoming  more  available

• Best-­effort  – need  software  fallback

• Eager  SGL• simple  and  fast  fallback,  • often  preferred  to  more  efficient  solutions

34

Page 35: Improved Single Global Lock Fallback for Best-effort Hardware

Conclusions

• Lazy  SGL  • as  simple  as  Eager  SGL• more  efficient

• Bloom  Filter  SGL  • more  accurate  conflict  detection• Slower

• Can  be  implemented  directly  in  hardware

35

Page 36: Improved Single Global Lock Fallback for Best-effort Hardware