22
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

TxLinux: Using and Managing Hardware Transactional

Memory in an Operating SystemChristopher J. Rossbach, Owen S. Hofmann, Donald E. Porter,

Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel- Presentation By Sathish P

Page 2: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

What is TxLinux? What are the contributions of this paper? Locks vs Transactions Transaction Memory Hardware Transaction memory MetaTM Output Commit Problem Cooperative Transactional Locking Implementing Cxspinlocks Priority and Policy Inversion Transaction aware scheduling Contention Management Performance Conclusion

Discussion Topics

Page 3: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

It is a variant of Linux. It is the first OS to use Hardware

Transactional Memory (HTM) as a synchronization primitive.

It is the first OS to manage HTM in the OS scheduler.

TXLinux?

Page 4: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

The paper introduces the concept of cooperation between locks and transactions (Cxspinlocks).

The paper introduces integration of HTM with the OS scheduler.

Contributions in this Paper

Page 5: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Lockss p i n _ l o c k (&aryList) ;

o f f s e t = aryList[first];i f (aryList[first]==aryList[last])aryList[last]= 0 ;

s p i n _ u n l o c k (&aryList ) ;i f ( ! ( calculateIfAnyZero()) ) goto f a i l e d ;

s p i n _ l o c k (&aryList) ;l i s t _ a d d _ t a i l (val,&aryList) ;

s p i n _ u n l o c k (&aryList ) ;

Only one thread holds the lock.Other threads spins and waits for lock.Minimizing critical region size is required.Less conncurrency.Works good with high contention and I/O.Works as pessimistic.

Page 6: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Transactionsxbegin ;

o f f s e t = aryList[first];i f (aryList[first]==aryList[last])aryList[last]= 0 ;

xend ;i f ( ! ( calculateIfAnyZero()) ) goto f a i l e d ;

xbegin ;l i s t _ a d d _ t a i l (val,&aryList) ;

xend ;

Many threads run the critical section.Only one wins and others rollback.More concurrency.Critical region size can be large.Doesn’t suit when high contention.Cannot rollback when performing I/O.Works as optimistic.

Page 7: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Concept of applying transactions to memory operations.

Following are the steps◦ Step1: Check if same memory location is part of

another transaction.◦ Step2: If yes abort current transaction.◦ Step3: If no record the current transaction

referenced memory location so that other transaction in step1 can find it.

Transaction Memory

Page 8: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Transactions are implemented with hardware support.

Data is stored in hardware registers and cache such that all actions are performed atomically in hardware and data is written to main memory upon committing the transaction.

If two hardware transactions are accessing the same memory then conflict occurs and hence HTM aborts one transaction.

Hardware Transactional Memory

Page 9: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

MetaTM is a architectural model to run TxLinux.

MetaTM uses eager conflict detection i.e. the first detection of a conflict read/write to the same address will cause transaction to restart, rather than waiting until commit time to detect and handle conflicts.

MetaTM uses the commands:◦ Xbegin ,Xend, Xpush, Xpop

MetaTM

Page 10: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

MetaTM uses xbegin, xend to start and commit a transaction.

MetaTM uses xpush to suspend a transaction, saving its state so it can continue later without restarting. Instructions executed after xpush are independent of suspended transaction. If suspended transaction can have conflicts like working transactions then the suspended transaction restarts when it resumes.

MetaTM uses xpop to resume a xpushed transaction, allowing the suspended transaction to resume.

Commands on MetaTM

Page 11: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Transaction with operations, such as I/O, cannot be rolled back in the event that a transaction causes a conflict.

Transactions perform poor with high contention.

Hence there comes a need for mixing locks and transactions.

The Output Commit Problem

Page 12: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

In order to allow both transactions and locks in the OS, we propose a synchronization API called Cxspinlocks.

Cxspinlocks allow different executions of a single critical section to be synchronized with either locks or transactions.

So concurrency of transactions and safety of locks are added.

They support both transactional and non-transactional code maintaining fairness and high concurrency.

Cooperative transactional locking

Page 13: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Multiple transactional threads can enter a critical region without conflicting on lock variable.

Transactional threads poll the cxspinlock using the Xtest instruction set, which allow the transaction to avoid restarting when the lock is released.

Non-transactional threads acquire cxspinlocks using a hardware instruction xcas (xcas instructions favors transactional threads, mutually exclusive threads, reader etc).

This enables fairness between transactional and non-transactional threads.

Properties of cxspinlocks

Page 14: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Acquired using 2 functions: cx_optimistic and cx_exclusive

cx_optimistic optimistically attempts to protect a critical section using transactions and reverts to using locks with a conflict or I/O.

cx_exclusive are used for sections which always perform I/O.

Acquiring a cxspinlock

Page 15: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

void c x _ o p t imi s t i c ( l o c k ) {

s t a t u s = xbegin ;

/ / Use mutua l e x c l u s i o n i f r e q u i r e d

i f ( s t a t u s == NEED_EXCLUSIVE) {

xend ;

/ / x r e s t a r t f o r c l o s e d n e s t i n g

i f ( g e t t x i d ) x r e s t a r t (NEED_EXCLUSIVE ) ;

e l s e c x _ e x c l u s i v e ( l o c k ) ;

r e t u r n ;

}

/ / Spin wa i t i n g f o r l o c k t o be f r e e (==1)

wh i l e ( x t e s t ( lock , 1)==0) ; / / s p i n

d i s a b l e _ i n t e r r u p t s ( ) ;

}

The status word is checked to determine whether this transaction has restarted and if so, the critical section is entered exclusively, using cx_exclusive.

If mutual exclusion is not entered, then the thread waits for the spinlock to be unlocked, indicating there are zero non-transactional threads in the critical section.

The code that polls the lock uses xtest to avoid adding the lock variable into its read set and hence preventing from restarting.

cx_optimistic

Page 16: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

void c x _ e x c l u s i v e ( l o c k ) {

/ / Only f o r non−t r a n s a c t i o n a l t h r e a d s

i f ( x g e t t x i d ) x r e s t a r t (NEED_EXCLUSIVE ) ;

wh i l e ( 1 ) {

/ / Spin wa i t i n g f o r l o c k t o be f r e e

wh i l e ( l o c k != 1) ; / / s p i n

d i s a b l e _ i n t e r r u p t s ( ) ;

/ / Acqui r e l o c k by s e t t i n g i t t o 0

/ / Co n t e n t i o n manager a r b i t r a t e s l o c k

i f ( xcas ( lock , 1 , 0 ) ) b r e a k ;

e n a b l e _ i n t e r r u p t s ( ) ;

}

}

cx_exclusive uses xgettxid to detect an active transaction. If there is an active transaction, then that transaction is made exclusive.

The code issues xrestart with a status code NEED_EXCLUSIVE indicating that exclusion is required.

If there is no active transaction, the non-transactional thread enters the critical section by locking the cxspinlock using the xcas instruction.

cx_exclusive

Page 17: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Locks can invert OS scheduling priority, resulting in a higher priority thread waiting for a lower-priority thread.

The contention manager of an HTM system provides solution for priority inversion.

Whenever a conflict occurs in transaction, then contention manager solves it by favoring it to thread of higher priority.

Another simple hardware contention management is using timestamp, the oldest transaction wins.

Priority inversion

Page 18: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

OS provide real-time threads to synchronize with non real-time threads. Such synchronization can cause policy inversion where a real-time thread waits for a non-real-time thread.

Policy inversion

Page 19: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

MetaTM implements os_prio to solve priority and policy inversion.

It schedules the transactions with the greatest scheduling value to the OS.

When the scheduling priority value ties then os_prio employs SizeMatters.

If the transaction sizes are equal, then os_prio employs timestamp.

Contention management using OS

Page 20: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

The operating system’s scheduler uses processes transaction state to mitigate the effects of high contention.

MetaTM uses the transaction status word to determine the status of the current transaction.

Using the status information, the scheduler dynamically adjusts priority or de-schedules processes preventing them from repeated restarts.

Transaction-aware scheduling

Page 21: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

Average of 9.5% of all transactional conflicts resolved in favor of thread with lower OS priority when using a simple “SizeMatters” contention management policy.

Using OS priority in contention management entirely eliminates inversions at the cost of 2.5% of performance using the default Linux scheduler and of 1.0% using a modified scheduler.

Contention Management Performance

Page 22: Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett Witchel - Presentation By Sathish P

The cxspinlock primitive is solution to the long-standing problem of I/O in transactions.

The cxspinlock API eases conversion from locking primitives to transactions.

HTM aware scheduling eliminates priority inversion, and provides better management of very high contention.

Conclusion