View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Two Ways of Speeding Up Transactional Memory Algorithms
Vincent Gramoli Joint work with
Pascal Felber, Rachid Guerraoui, Derin Harmanci
Roadmap1. Motivations
2. Transactional Memory
3. Problems of Efficiency
4. Input Acceptance
5. Elastic Transactions
6. Conclusion
Single CPU Limitations
• Transistor size still decreases [Moore’s law]
• Induced overheating disturbs computation
• Clock speed no longer doubles since 2004
[“The free lunch is over” by Herb Sutter]
Manufactured MulticoresIn
tel C
OO anno
unce
s
Multi
core
revo
lution
AMD an
noun
ces t
he
2-co
re O
pter
on
AMD an
noun
ces t
he
4-co
re O
pter
on
Inte
l ano
unce
s 4-co
re
Xeon
5000
serie
s
Inte
l ann
ounc
es
8-co
re N
ahele
m EX
SUN N
iagar
a 2 w
/ 8
core
s & 64
HW
thre
ads
Inte
l ann
ounc
es 6-
core
Xeon
7000
serie
s
SUN an
noun
ces t
he
8-co
re N
iagar
a
Concurrent Programming
• Difficult task:– Using locks, how to avoid deadlock?
Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);}
Concurrent Programming
• Difficult task:– Using locks, how to avoid deadlock?
Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);}
– Using lock-free (LF) primitives, how can composition preserve atomicity?LF-move(x,y) ≠ LF-delete(x) + LF-insert(y)
Concurrent Programming
• Difficult task:– Using locks, how to avoid deadlock?
Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);}
– Using lock-free (LF) primitives, how can composition preserve atomicity?LF-move(x,y) ≠ LF-delete(x) + LF-insert(y)
• Dedicated to expert programmers:– Database programmers– Scientific computing programmers– What about other programmers?
Concurrent Programming
• Difficult task:– Using locks, how to avoid deadlock?
Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);}
– Using lock-free (LF) primitives, how can composition preserve atomicity?LF-move(x,y) ≠ LF-delete(x) + LF-insert(y)
• Dedicated to expert programmers:– Database programmers– Scientific computing programmers– What about other programmers?
• Democratizing multicores requires new programming abstractions
Roadmap1. Motivations
2. Transactional Memory
3. Problems of Efficiency
4. Input Acceptance
5. Elastic Transactions
6. Conclusion
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
BEGIN_TX R(act) W(act,v)END_TX
Assume we want to read (R) and write (W) a shared bank account ‘act’ atomically.We simply have to label the region of the sequential code using transaction delimiters BEGIN_TX and END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
TMTM
after this point, operations will be handled by the TMBEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
TMTM
read through the TM?BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
read through the TM?
TMTM
Sounds good, I keep track of
your read
Sounds good, I keep track of
your read
BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
you can return v1
TMTM
BEGIN_TX R(act) W(act,v)END_TX
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
BEGIN_TX R(act) W(act,v’)END_TX
Transactional Memory
TMTM
write through the TM?
BEGIN_TX R(act) W(act,v)END_TX
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
BEGIN_TX R(act) W(act,v’)END_TX
Transactional Memory
TMTM
Sounds good, I keep track of
your write
Sounds good, I keep track of
your write
write through the TM?
BEGIN_TX R(act) W(act,v)END_TX
BEGIN_TX R(act) W(act,v’)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
write has been scheduled
TMTM
BEGIN_TX R(act) W(act,v)END_TX
BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
write through the TM?
TMTM
BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
write through the TM?
TMTM
No way, there is a risk of
safety violation
No way, there is a risk of
safety violation
BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
abort, roll-back, and restart the whole transaction later on
TMTM
No way, there is a risk of
safety violation
No way, there is a risk of
safety violation
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
after this point, all operations become unprotected again
BEGIN_TX R(act) W(act,v)END_TX
Transactional Memory
An abstraction: a black box that encapsulates all synchronizations– all read/write accesses to shared data are protected transparently
– atomicity is preserved under transaction composition
move(acc1, acc2, amt) { BEGIN_TX delete(act1, amt) insert(acc2, amt) END_TX}
delete(acc, amt) { BEGIN_TX v = R(act) W(act,v-amt) END_TX}
insert(acc, amt) { BEGIN_TX v = R(act) W(act,v+amt) END_TX}
+ =
Roadmap1. Motivations
2. Transactional Memory
3. Problems of Efficiency
4. Input Acceptance
5. Elastic Transactions
6. Conclusion
1st Problem: Wasted Effort Problem
Transactions waste efforts while aborting and rolling-back
BEGIN_TX W(x) END_TX
BEGIN_TX R(x)
END_TX
(1)
(2)
(3)
(4)
Although transactions can commit safely one is aborted by common STMs:
TL2, WSTM, DSTM, TinySTM
Some aborts are unnecessary
2nd Problem: Lack of ConcurrencyTransactions ensure stronger guarantees than necessaryExample: sorted linked list implementation of integer set
zzyy tt
insert(x)/search(z)
xx
hh
BEGIN_TX R(h) R(y) R(z)END_TX
BEGIN_TX … W(h)END_TX
search(z) insert(x)
2nd Problem: Lack of ConcurrencyTransactions ensure stronger guarantees than necessaryExample: sorted linked list implementation of integer set
Both transactions could commit w/o violating linked list linearizability, but transactional models consider read/write atomicity.
zzyy tt
insert(x)/search(z)
xx
hh
BEGIN_TX R(h) R(y) R(z)END_TX
BEGIN_TX … W(h)END_TX
search(z) insert(x)
Roadmap
1. Motivations
2. Transactional Memory
3. Problems of Efficiency
4. Input Acceptance
5. Elastic Transactions
6. Conclusion
A Metric for Input Acceptance• TM efficiency depends on
– Execution speed– Number of successful (committed) transactions
A Metric for Input Acceptance• TM efficiency depends on
– Execution speed– Number of successful (committed) transactions
TMTM
A Metric for Input Acceptance• TM efficiency depends on
– Execution speed– Number of successful (committed) transactions
• The Input acceptance is the ability for a TM to commit transactions
• The commit-abort ratio is “σ”: # committed tx / # complete tx
TMTM
How do STMs perform w.r.t. this metric?
• Ideal goal: no abort (σ = 1)
• A TM accepts an input if σ = 1
• What is accepted by the existing STMs?
Identifying TM designs
Designs Meaning TM examples
VWVR Visible writeVisible read
SXM
VWIR Visible writeInvisible read
DSTM, TinySTM
IWIR Invisible writeInvisible read
WSTM, TL2
CTR Commit-time relaxation
TSTM
RTR Real-time relaxation
SSTM
Formalizing Workload as an InputEvents (i.e., an alphabet):si: start event of transaction iwx
i: write request of transaction i on location xrx
i: read request of transaction i on location xπ(x)
i: any event of transaction i (on location x)ci: commit request of transaction i
An input pattern is a totally ordered set of events (i.e., a word)An input class is a set of input patterns (i.e., a language):
| represents the choice (e.g., “a | b” means “a” or “b”)* represents the Kleene closure (e.g., “a*” means “ε|a|aa|…”)¬ represents the complement (e.g., “¬a” means “any event but a”)
Input Acceptance Upper-bound of VWIR
Theorem. There is no VWIR design that accepts the following input class:
C2 = π (r∗ xi ¬ci w∗ x
j ¬ci c∗ j | wxj ¬cj r∗ x
i) π . ∗
Input Acceptance Upper-bound of VWIR
Theorem. There is no VWIR design that accepts the following input class:
C2 = π (r∗ xi ¬ci w∗ x
j ¬ci c∗ j | wxj ¬cj r∗ x
i) π . ∗
BEGIN_TX W(x)
END_TX
BEGIN_TX R(x)END_TX
Going furtherOther classes:
C 1 = π (π∗ xi ¬ci w∗ x
j | wxj ¬cj π∗ x
i) π∗C 3 = π (r∗ x
i ¬ci w∗ xj | wx
j ¬cj r∗ xi ) ¬ci c∗ j π ∗
C 4 = (¬wx) r∗ xi ¬ci w∗ x
j ¬ci c∗ j ¬ci s∗ k ¬(ci |ck|rxk) w∗ y
k ¬(ci |ck | rx
k ) c∗ k ¬ci r∗ yi π ∗
Other impossibility results:Theorem 1. VWVR design does not accept input class C1.Theorem 3. IWIR design does not accept input class C3.Theorem 4. CTR design does not accept input class C4.
Input Acceptance Classification
VWVR(e.g. SXM)
~C1
Input Acceptance Classification
VWIR(e.g., DSTM, TinySTM)
VWVR(e.g. SXM)~C2
~C1
Input Acceptance Classification
IWIR (e.g., WSTMTL2)
VWVR(e.g. SXM)
~C3~C2
~C1
VWIR(e.g., DSTM, TinySTM)
Input Acceptance Classification
CTR(e.g., TSTM)
IWIR (e.g., WSTMTL2)
VWVR(e.g. SXM)
~C4~C3
~C2~C1
VWIR(e.g., DSTM, TinySTM)
Input Acceptance Classification
RTR(e.g., SSTM)
CTR(e.g., TSTM)
IWIR (e.g., WSTMTL2)
VWVR(e.g. SXM)
~C5~C4
~C3~C2
~C1
Serializable STM needs to track all conflicts
VWIR(e.g., DSTM, TinySTM)
C5 = Ø
Experimental Validation: Scalability
20% Update operations: 10% linked-list insert, 10% linked-list delete80% Other operations: linked-list containsDual quad-core Intel Xeon
Roadmap
1. Motivations
2. Transactional Memory
3. Problem
4. Input Acceptance
5. Elastic Transactions
6. Conclusion
Software Transactional Memories
• TinySTM, LSA-STM, SSTM, SwissTM: efficient?
zzyy tt
insert(x)/search(z)
xx
hh
Software Transactional Memories
• TinySTM, LSA-STM, SSTM, SwissTM: efficient?
zzyy tt
insert(x)/search(z)
xx
hh
BEGIN_TX R(h) R(y) R(z)END_TX
BEGIN_TX … W(h)END_TX
search(z) insert(x)
Software Transactional Memories
• TinySTM, LSA-STM, SSTM, SwissTM: efficient?
Both transactions cannot commit, because read/write atomicity is violated even though linked list linearizability is guaranteed.
zzyy tt
insert(x)/search(z)
xx
hh
BEGIN_TX R(h) R(y) R(z)END_TX
BEGIN_TX … W(h)END_TX
search(z) insert(x)
Elastic Transactional Memory (ε-STM)
• Elastic transactions: weaker than normal ones
The goal is to cut transactions into sub-parts
zzyy tt
insert(x)/search(z)
xx
hh
BEGIN_TX … W(h)END_TX
search(z) insert(x)
BEGIN_TX R(h) R(y) R(z)END_TX
Elastic Transactional Memory (ε-STM)
• Elastic transactions: weaker than normal ones
It is cut in 2 parts w/ resp. ops π(x,*) and π(y,*) if:- there are no two writes on x and y between. - all writes are in the same part;- the first op of any part is a read;
BEGIN_TX R(h) R(y) R(z)END_TX
BEGIN_TX … W(h)END_TX
search(z) insert(x)
BEGIN_EL_TX R(h) R(y) R(z)END_TX
BEGIN_EL_TX … W(h)END_TX
search(z) insert(x)
Cut
Elastic Transactional Memory (ε-STM)
• Elastic transactions: weaker than normal ones
The key idea is that when reading element e:• the predecessor has not changed since it has been read• or e has not changed since the predecessor has been read.
This ensures that the parsing is always consistent although atomicity is relaxed.
zzyy tt
insert(x)/search(z)
xx
hh
Elastic Transactional Memory (ε-STM)
• Elastic transactions: – Weaker than normal ones (cannot implement sum)– Compatible with normal ones (retain simplicity)
Elastic Transactional Memory (ε-STM)
• Elastic transactions: – Weaker than normal ones (cannot implement sum)– Compatible with normal ones (retain simplicity)
• Apply to various search structures:– Red-black tree, skip list, hash table…
Elastic Transactional Memory (ε-STM)
• Elastic transactions: – Weaker than normal ones (cannot implement sum)– Compatible with normal ones (retain simplicity)
• Apply to various search structures:– Red-black tree, skip list, hash table…
• Could be applied to counter increment transactions as well …and others?
μBenchmarks (5% insert, 5% delete, 90% search)
(HT w/ 256 buckets)
μBenchmarks (Cont’d.)(10% move, 10% sum, 80% search)
(5% insert, 5% delete, 90% search)
Conclusion
• Transactional Memory is promisingly simple• But its efficiency can be improved:
– By increasing Input Acceptance;– By weakening Transactional Model.
Conclusion
• Transactional Memory is promisingly simple• But its efficiency can be improved:
– By increasing Input Acceptance;– By weakening Transactional Model.
• Input Acceptance: – Maximal input acceptance is not practical – The best tradeoff (input acceptance vs. Practicality) is an open
question.
Conclusion
• Transactional Memory is promisingly simple• But its efficiency can be improved:
– By increasing Input Acceptance;– By weakening Transactional Model.
• Input Acceptance: – Maximal input acceptance is not practical – The best tradeoff (input acceptance vs. Practicality) is an open
question.
• Elastic transactions:– Allow more concurrency that locking techniques.– We should characterizes all their applications.
Related Work• Permissiveness [Guerraoui et al. DISC 2008]:
– Indicates the variety of output/history– Does not depend on the input
Related Work• Permissiveness [Guerraoui et al. DISC 2008]:
– Indicates the variety of output/history– Does not depend on the input
• Open Nesting [E. Moss, WMPI 2006]: – Each sub-transaction commits independently from its parent
transaction(s)– Complex roll-back mechanism [Ni et al. PPoPP’07]
Related Work• Permissiveness [Guerraoui et al. DISC 2008]:
– Indicates the variety of output/history– Does not depend on the input
• Open Nesting [E. Moss, WMPI 2006]: – Each sub-transaction commits independently from its parent
transaction(s)– Complex roll-back mechanism [Ni et al. PPoPP’07]
• Early Release [Herlihy et al. PODC 2003]:– Some reads may be forgotten (removed from r-set)– Programmer has to decide which/when objects can be released cannot
be automatic [Harris et al. TRANSACT 2007]
Related Work• Permissiveness [Guerraoui et al. DISC 2008]:
– Indicates the variety of output/history– Does not depend on the input
• Open Nesting [E. Moss, WMPI 2006]: – Each sub-transaction commits independently from its parent
transaction(s)– Complex roll-back mechanism [Ni et al. PPoPP’07]
• Early Release [Herlihy et al. PODC 2003]:– Some reads may be forgotten (removed from r-set)– Programmer has to decide which/when objects can be released cannot
be automatic [Harris et al. TRANSACT 2007]
• Transactional Boosting [Herlihy et al, PPoPP 2007]:– Transforms linearizable objects into transactional objects– Requires to define commutative and inverted operations
Thank you
• On the Input Acceptance of Transactional Memory,
Parallel Processing Letters, dec. 2009
• Elastic TransactionsEPFL Technical Report - LPD-REPORT-2009-002