View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Software Transactions: A Programming-Languages Perspective
Dan GrossmanUniversity of Washington
28 February 2008
28 February 2008 Dan Grossman, Software Transactions 2
Atomic
An easier-to-use and harder-to-implement primitive
lock acquire/release (behave as if)no interleaved computation
void deposit(int x){synchronized(this){ int tmp = balance; tmp += x; balance = tmp;}}
void deposit(int x){atomic { int tmp = balance; tmp += x; balance = tmp; }}
28 February 2008 Dan Grossman, Software Transactions 3
Relation to “The View”
• Focus assumes explicit threads + shared memory– Will be one of a few widespread models
• Seems well suited for “branch-and-bound”– And other less-structured data-access patterns
• A better synchronization primitive is a great thing– Much of today will motivate…– … without overselling
28 February 2008 Dan Grossman, Software Transactions 4
Viewpoints
Software transactions good for:• Software engineering (avoid races & deadlocks)• Performance (optimistic “no conflict” without locks)
Research should be guiding:• New hardware with transactional support• Software support
– Semantic mismatch between language & hardware– Prediction: hardware for the common/simple case– May be fast enough without hardware– Lots of nontransactional hardware exists
28 February 2008 Dan Grossman, Software Transactions 5
PL Perspective
Complementary to lower-level implementation work
Motivation:– What is the essence of the advantage over locks?
Language design: – Rigorous high-level semantics– Interaction with rest of the language
Language implementation: – Interaction with modern compilers – New optimization needs
A key improvement for the shared-memory puzzle piece
28 February 2008 Dan Grossman, Software Transactions 6
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 7
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 8
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
28 February 2008 Dan Grossman, Software Transactions 9
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) {
if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }
28 February 2008 Dan Grossman, Software Transactions 10
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) { synchronized(this) { //race if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }}
28 February 2008 Dan Grossman, Software Transactions 11
Code evolution
void deposit(…) { synchronized(this) { … }}void withdraw(…) { synchronized(this) { … }}int balance(…) { synchronized(this) { … }}
void transfer(Acct from, int amt) { synchronized(this) { synchronized(from) { //deadlock (still) if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }}}
28 February 2008 Dan Grossman, Software Transactions 12
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
28 February 2008 Dan Grossman, Software Transactions 13
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
void transfer(Acct from, int amt) { //race if(from.balance()>=amt && amt < maxXfer) { from.withdraw(amt); this.deposit(amt); } }
28 February 2008 Dan Grossman, Software Transactions 14
Code evolution
void deposit(…) { atomic { … }}void withdraw(…) { atomic { … }}int balance(…) { atomic { … }}
void transfer(Acct from, int amt) { atomic { //correct and parallelism-preserving! if(from.balance()>=amt && amt < maxXfer){ from.withdraw(amt); this.deposit(amt); } }}
28 February 2008 Dan Grossman, Software Transactions 15
But can we generalize
So transactions sure looks appealing…
But what is the essence of the benefit?
Transactional Memory (TM) is to shared-memory concurrency
as Garbage Collection (GC) is to
memory management
28 February 2008 Dan Grossman, Software Transactions 16
GC in 60 seconds
Automate deallocation via reachability approximation
roots heap objects
• Allocate objects in the heap• Deallocate objects to reuse heap space
– If too soon, dangling-pointer dereferences– If too late, poor performance / space exhaustion
28 February 2008 Dan Grossman, Software Transactions 17
GC Bottom-line
Established technology with widely accepted benefits
• Even though it can perform arbitrarily badly in theory
• Even though you can’t always ignore how GC works (at a high-level)
• Even though an active research area after 40 years
Now about that analogy…
28 February 2008 Dan Grossman, Software Transactions 18
The problem, part 1
Why memory management is hard:
Balance correctness (avoid dangling pointers)
And performance (space waste or exhaustion)
Manual approaches require whole-program protocols Example: Manual reference count for each object
• Must avoid garbage cycles
race conditions
concurrent programming
loss of parallelism deadlock
lock
lock acquisition
28 February 2008 Dan Grossman, Software Transactions 19
The problem, part 2
Manual memory-management is non-modular:
• Caller and callee must know what each other access or deallocate to ensure right memory is live
• A small change can require wide-scale code changes – Correctness requires knowing what data
subsequent computation will access
synchronization
locks are heldrelease
concurrent
28 February 2008 Dan Grossman, Software Transactions 20
The solution
Move whole-program protocol to language implementation• One-size-fits-most implemented by experts
– Usually inside the compiler and run-time
• GC system uses subtle invariants, e.g.:– Object header-word bits
– No unknown mature pointers to nursery objectsthread-shared thread-local
TM
28 February 2008 Dan Grossman, Software Transactions 21
So far…
memory management concurrencycorrectness dangling pointers racesperformance space exhaustion deadlockautomation garbage collection transactional memorynew objects nursery data thread-local data
28 February 2008 Dan Grossman, Software Transactions 22
Incomplete solution
GC a bad idea when “reachable” is a bad approximation of “cannot-be-deallocated”
Weak pointers overcome this fundamental limitation– Best used by experts for well-recognized idioms
(e.g., software caches)
In extreme, programmers can encode
manual memory management on top of GC– Destroys most of GC’s advantages
TM memory conflict
run-in-parallelOpen nested txns
unique id generation
locking TM
TM
28 February 2008 Dan Grossman, Software Transactions 23
Circumventing TM
class SpinLock { private boolean b = false; void acquire() { while(true)
atomic { if(b) continue; b = true; return; } } void release() { atomic { b = false; } }}
28 February 2008 Dan Grossman, Software Transactions 24
It really keeps going (see the essay)
memory management concurrencycorrectness dangling pointers racesperformance space exhaustion deadlockautomation garbage collection transactional memorynew objects nursery data thread-local data
key approximation reachability memory conflictsmanual circumvention weak pointers open nestingcomplete avoidance object pooling lock libraryuncontrollable approx. conservative collection false memory conflictseager approach reference-counting update-in-placelazy approach tracing update-on-commitexternal data I/O of pointers I/O in transactionsforward progress real-time obstruction-freestatic optimizations liveness analysis escape analysis
28 February 2008 Dan Grossman, Software Transactions 25
Lesson
Transactional memory is to
shared-memory concurrency
as
garbage collection is to
memory management
Huge but incomplete help for correct, efficient software
Analogy should help guide transactions research
28 February 2008 Dan Grossman, Software Transactions 26
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
[[Katherine Moore]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 27
“Weak” isolation
Widespread misconception:
“Weak” isolation violates the “all-at-once” property only if corresponding lock code has a race
(May still be a bad thing, but smart people disagree.)
atomic { y = 1; x = 3; y = x;}
x = 2;print(y); //1? 2? 666?
initially y==0
28 February 2008 Dan Grossman, Software Transactions 28
It’s worse
Privatization: One of several examples where lock code works and weak-isolation transactions do not
(Example adapted from [Rajwar/Larus] and [Hudson et al])
sync(lk) { r = ptr; ptr = new C();}assert(r.f==r.g);
sync(lk) { ++ptr.f; ++ptr.g;}
initially ptr.f == ptr.gptr
f g
28 February 2008 Dan Grossman, Software Transactions 29
It’s worse
(Almost?) every published weak-isolation system lets the assertion fail!
• Eager-update or lazy-update
atomic { r = ptr; ptr = new C();}assert(r.f==r.g);
atomic { ++ptr.f; ++ptr.g;}
initially ptr.f == ptr.g
ptr
f g
28 February 2008 Dan Grossman, Software Transactions 30
The need for semantics
• Which is wrong: the privatization code or the transactions implementation?
• What other “gotchas” exist? – What language/coding restrictions suffice to avoid
them?
• Can programmers correctly use transactions without understanding their implementation?– What makes an implementation correct?
Only rigorous source-level semantics can answer
28 February 2008 Dan Grossman, Software Transactions 31
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
28 February 2008 Dan Grossman, Software Transactions 32
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
1. “Strong”: If one thread is in a transaction, no other thread may use shared memory or enter a transaction
28 February 2008 Dan Grossman, Software Transactions 33
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
2. “Weak-1-lock”: If one thread is in a transaction, no other thread may enter a transaction
28 February 2008 Dan Grossman, Software Transactions 34
What we did
Formal operational semantics for a collection of similar languages that have different isolation properties
Program state allows at most one live transaction:
a;H;e1|| … ||en a’;H’;e1’|| … ||en’
Multiple languages, including:
3. “Weak-undo”: Like weak, plus a transaction may abort at any point, undoing its changes and restarting
28 February 2008 Dan Grossman, Software Transactions 35
A family
Now we have a family of languages:
“Strong”: … other threads can’t use memory or start transactions
“Weak-1-lock”: … other threads can’t start transactions “Weak-undo”: like weak, plus undo/restart
So we can study how family members differ and conditions under which they are the same
Oh, and we have a kooky, ooky name:
The AtomsFamily
28 February 2008 Dan Grossman, Software Transactions 36
Easy Theorems
Theorem: Every program behavior in strong is possible in weak-1-lock
Theorem: weak-1-lock allows behaviors strong does not
Theorem: Every program behavior in weak-1-lock is possible in weak-undo
Theorem: (slightly more surprising): weak-undo allows behavior weak-1-lock does not
28 February 2008 Dan Grossman, Software Transactions 37
Hard theorems
Consider a (formally defined) type system that ensures any mutable memory is either:– Only accessed in transactions– Only accessed outside transactions
Theorem: If a program type-checks, it has the same possible behaviors under strong and weak-1-lock
Theorem: If a program type-checks, it has the same possible behaviors under weak-1-lock and weak-undo
28 February 2008 Dan Grossman, Software Transactions 38
A few months in 1 picture
strong weak-1-lock weak-undo
strong-undo
28 February 2008 Dan Grossman, Software Transactions 39
Lesson
Weak isolation has surprising behavior;
formal semantics let’s us model the behavior and
prove sufficient conditions for avoiding it
In other words: With a (too) restrictive type system, get semantics of strong and performance of weak
28 February 2008 Dan Grossman, Software Transactions 40
Today, part 1
Language design, semantics:
1. Motivation: Example + the GC analogy [OOPSLA07]
2. Semantics: strong vs. weak isolation [PLDI07]* [POPL08]
3. Interaction w/ other features [ICFP05][SCHEME07][POPL08]
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 41
What if…
Real languages need precise semantics for all feature interactions. For example:
• Native Calls [Ringenburg]• Exceptions [Ringenburg, Kimball]• First-class continuations [Kimball]• Thread-creation [Moore]• Java-style class-loading [Hindman]
• Open: Bad interactions with memory-consistency model
See joint work with Manson and Pugh [MSPC06]
28 February 2008 Dan Grossman, Software Transactions 42
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
[Michael Ringenburg, Aaron Kimball]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 43
Interleaved execution
The “uniprocessor (and then some)” assumption:
Threads communicating via shared memory don't execute in “true parallel”
Important special case:• Uniprocessors still exist• Many language implementations assume it
(e.g., OCaml, Scheme48)• Multicore may assign one core to an application
28 February 2008 Dan Grossman, Software Transactions 44
Implementing atomic
Key pieces:
• Execution of an atomic block logs writes
• If scheduler pre-empts during atomic, rollback the thread
• Duplicate code or bytecode-interpreter dispatch so non-atomic code is not slowed by logging
28 February 2008 Dan Grossman, Software Transactions 45
Logging example
Executing atomic block:• build LIFO log of old values:
y:0 z:? x:0 y:2
Rollback on pre-emption:• Pop log, doing assignments• Set program counter and
stack to beginning of atomic
On exit from atomic: • Drop log
int x=0, y=0;void f() { int z = y+1; x = z;}void g() { y = x+1;}void h() { atomic { y = 2; f(); g(); }}
28 February 2008 Dan Grossman, Software Transactions 46
Logging efficiency
Keep the log small:• Don’t log reads (key uniprocessor advantage)• Need not log memory allocated after atomic entered
– Particularly initialization writes• Need not log an address more than once
– To keep logging fast, switch from array to hashtable after “many” (50) log entries
y:0 z:? x:0 y:2
28 February 2008 Dan Grossman, Software Transactions 47
Evaluation
Strong isolation on uniprocessors at little cost – See papers for “in the noise” performance
• Memory-access overhead
Recall initialization writes need not be logged
• Rare rollback
not in atomic in atomic
read none none
write none log (2 more writes)
28 February 2008 Dan Grossman, Software Transactions 48
Lesson
Implementing transactions in software for a uniprocessor is so efficient it deserves special-casing
Note: Don’t run other multicore services on a uniprocessor either
28 February 2008 Dan Grossman, Software Transactions 49
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
[Steven Balensiefer, Benjamin Hindman]
6. Multithreaded transactions
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 50
Strong performance problem
Recall uniprocessor overhead:
not in atomic in atomic
read none none
write none some
With parallelism:
not in atomic in atomic
read none iff weak some
write none iff weak some
28 February 2008 Dan Grossman, Software Transactions 51
Optimizing away strong’s cost
Thread local
Immutable
Not accessed in transaction
New: static analysis for not-accessed-in-transaction …
28 February 2008 Dan Grossman, Software Transactions 52
Not-accessed-in-transaction
Revisit overhead of not-in-atomic for strong isolation, given information about how data is used in atomic
Yet another client of pointer-analysis
in atomic
no atomic access
no atomic write
atomic write
read none none some some
write none some some some
not in atomic
28 February 2008 Dan Grossman, Software Transactions 53
Analysis details
• Whole-program, context-insensitive, flow-insensitive– Scalable, but needs whole program
• Can be done before method duplication– Keep lazy code generation without losing precision
• Given pointer information, just two more passes
1. How is an “abstract object” accessed transactionally?
2. What “abstract objects” might a non-transactional access use?
28 February 2008 Dan Grossman, Software Transactions 54
Collaborative effort
• UW: static analysis using pointer analysis– Via Paddle/Soot from McGill
• Intel PSL: high-performance STM – Via compiler and run-time
Static analysis annotates bytecodes, so the compiler back-end knows what it can omit
28 February 2008 Dan Grossman, Software Transactions 55
Benchmarks
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1 2 4 8 16
# Threads
Tim
e (
s)
Synch Weak Atom Strong Atom No Opts +JIT Opts +DEA +Static Opts
Tsp
28 February 2008 Dan Grossman, Software Transactions 56
Benchmarks
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 4 8 16
# Threads
Ave
rag
e ti
me
per
10,
000
op
s (s
)
Synch Weak Atom Strong Atom No Opts +JIT Opts +DEA +Static Opts
JBB
28 February 2008 Dan Grossman, Software Transactions 57
Lesson
The cost of strong isolation is in the nontransactional code; compiler optimizations help a lot
28 February 2008 Dan Grossman, Software Transactions 58
Today, part 2
Implementation:
4. On one core [ICFP05] [SCHEME07]
5. Static optimizations for strong isolation [PLDI07]*
6. Multithreaded transactions [Aaron Kimball]
Caveat: ongoing work
* Joint work with Intel PSL
28 February 2008 Dan Grossman, Software Transactions 59
Multithreaded Transactions
• Most implementations (hw or sw) assume code inside a transaction is single-threaded– But isolation and parallelism are orthogonal– And Amdahl’s Law will strike with manycore
• Language design: need nested transactions
• Currently modifying Microsoft’s Bartok STM– Key: correct logging without sacrificing parallelism
• Work perhaps ahead of the technology curve– like concurrent garbage collection
28 February 2008 Dan Grossman, Software Transactions 60
Credit
Semantics: Katherine MooreUniprocessor: Michael Ringenburg, Aaron KimballOptimizations: Steven Balensiefer, Ben HindmanImplementing multithreaded transactions: Aaron Kimball
Memory-model issues: Jeremy Manson, Bill PughHigh-performance strong STM: Tatiana Shpeisman,
Vijay Menon, Ali-Reza Adl-Tabatabai, Richard Hudson, Bratin Saha
wasp.cs.washington.edu
28 February 2008 Dan Grossman, Software Transactions 61
Please read
High-Level Small-Step Operational Semantics for Transactions (POPL08) Katherine F. Moore and Dan Grossman
The Transactional Memory / Garbage Collection Analogy (OOPSLA07) Dan Grossman
Software Transactions Meet First-Class Continuations (SCHEME07) Aaron Kimball and Dan Grossman
Enforcing Isolation and Ordering in STM (PLDI07) Tatiana Shpeisman, Vijay Menon, Ali-Reza Adl-Tabatabai, Steve Balensiefer, Dan Grossman, Richard Hudson, Katherine F. Moore, and Bratin Saha Atomicity via Source-to-Source Translation (MSPC06) Benjamin Hindman and Dan GrossmanWhat Do High-Level Memory Models Mean for Transactions?
(MSPC06) Dan Grossman, Jeremy Manson, and William PughAtomCaml: First-Class Atomicity via Rollback (ICFP05) Michael F. Ringenburg and Dan Grossman
28 February 2008 Dan Grossman, Software Transactions 62
Lessons
1. Transactions: the garbage collection of shared memory
2. Semantics lets us prove sufficient conditions for avoiding weak-isolation anomalies
3. Must define interaction with features like exceptions
4. Uniprocessor implementations are worth special-casing
5. Compiler optimizations help remove the overhead in nontransactional code resulting from strong isolation
6. Amdahl’s Law suggests multithreaded transactions, which we believe we can make scalable