Upload
others
View
25
Download
0
Embed Size (px)
DistributedSystemsDay13:Distributed Transaction
“ToBeorNottoBe… Distributed ..Transactions”
Summary
• BackgroundonTransactions• ACIDSemantics
• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit
• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues
• ConsistencyModels• Serializability VersusLinearizability
https://cloud.google.com/spanner/docs/transactions
https://blog.couchbase.com/optimistic-or-pessimistic-locking-which-one-should-you-pick/
AllFacebookData
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
Partitiondataintoshards,mapsshardstoserverwithconsistenthashing
k1 v1
k2 v2
k3 v3
k4 v4
k5 v5
k0 v0
Hash table
k4 v4
k5 v5
k4 v4
k5 v5
k4 v4
k5 v5
Maintainm
ultiplecopiesForfaulttoleranceandtoreducelatency
ClientssendrequestsToallreplicas
• Replication• Lazy,Passive,Active• Consistencyforasingleshard
• DistributedTransaction• Consistent/Atomicchangetodatain
multipleshards• Multipleshardsè Canusetraditional
replicationtechniques FE
FE
k2 v2
k3 v3
k2 v2
k3 v3
k2 v2
k3 v3
WhatisTransaction?
• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout
Read(R)Update(R,$50)Read(T)Update(T,$150)
InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
Rodrigoisinshard1 Theoisinshard2
WhatisTransaction?
• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout
Read(R)Update(R,$50)Read(T)Update(T,$150)
InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.Ideal:eitherthewhole4operationshappenornonehappenWorstcase:onlyasubsetoccur
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
Rodrigoisinshard1 Theoisinshard2
TransactionBackground
• ACIDSemantics• Atomicity• Consistency• Isolation• Durability
• Transactionsareeasyfortraditionaldatabases• Traditional databases areonasingle serverà failure is``all-or-nothing’’
• Allcomponents ofthetransactionsfail• Distributed transactionsà differentcomponents canfail
• NeedtoprovidesTransactionsemanticswhenonlyasubset ofthecomponents fail
Allornothingsemantics:alloperationssucceedsorfails.
Intermediatestatesarenotexposedtotheoutsideworld(nopartialwritesareexposed)
Resultsofa‘committed’transactionspersistsafterthetransaction(andthroughfailures)
Transitionsfromoneconsistentstatetoanotherconsistentstate
DistributedTransactionsSemantics
• TransactionManager• Serverinchargeoforchestratingthetransaction
• Stepsfortransaction• Client initiates atransaction
• TMgivesclientaTransactionID(TID)• Client submits operationstoTM
• TMrelaysoperationstoreplicas• Client commitstransaction
• TMperformstwophasecommit
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
FE
TransactionManager
DistributedTransactionsSemantics
• TransactionManager• Serverinchargeoforchestratingthetransaction
• Stepsfortransaction• Client initiates atransaction
• TMgivesclientaTransactionID(TID)• Client submits operationstoTM
• TMrelaysoperationstoreplicas• Client commitstransaction
• TMperformstwophasecommit
Client-SideCode:tid =openTransaction();RVal =a.get(tid,Rodrigo);a.update(tid,Rodrigo,RVal - 50);Tval =b.get(tid,Theo);b.update(tid,Theo,Tval+50);
closeTransaction(tid)orabortTransaction(tid)
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
FE
TransactionManager
TransactionManager• TMhandsoutTIDs• TMmanagesandrelays
operationstoReplicaLeaders• TMkeepstrackofallReplicas
involvedinthetransactions
ReplicaManager• Prepareoperations• Storeoperationslocallyinlog• Butdonotcommitoperations
TwoPhaseCommit
TwoPhaseCommit
• ProvidesAtomicityandConsistency• NOTIsolationandDurability
• Assumptions:eachservermaintainsatransactionlog• Transactionlogisstoredinpersistentmemory
• Iffailureà itemsinTransactionLogsurvives
• Terminologychanges:• Coordinator<--- ThetransactionManager• Participantß Leaderofareplica
NodeA
NodeB
NodeC
Shard1
Shard1
Shard1
NodeD
NodeE
NodeF
Shard2
Shard2
Shard2
FE
TransactionManagerCoordinator
Participant Participant
TwoPhaseCommit
• Phase1:• Coordinatorsends requestforvotes• Participants vote
• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus
• AttheendofPhase2:eitherallparticipantscommitorallabort
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
CoordinatorStateDiagramforTwo-PhaseCommit
Abort Commit
Wait
Init
allcommit/commitanyabort/abort
appcommit/votereq
Coordinator
Coordinator pariticpantspariticpantspariticpantspariticpants
Response
Ready?
Vote:CommitorAbort
ParticipantStateDiagramforTwo-PhaseCommit
Abort Commit
Uncertain
Init
votereq/commitvotereq/abort
abort/ack commit/ack
Participant
Coordinator pariticpantspariticpantspariticpantspariticpants
Response
Makeachange
CommitorAbort
StateDiagramsforTwoPhaseCommit
Abort Commit
Wait
Init
Abort Commit
Uncertain
Init
allcommit/commitanyabort/abort
appcommit/votereq votereq/commitvotereq/abort
abort/ack commit/ack
Coordinator Participant
TwoPhaseCommit
• Phase1:• Coordinatorsends requestforvotes• Participants vote
• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus
• AttheendofPhase2:eitherallparticipantscommitorallabort
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
CoordinatorCoordinatorfails,participantswillbeuncertain(waiting)
Participantfails,coordinatorwillbewaiting
CrashPoints
Abort Commit
Wait
Init
Abort Commit
Uncertain
Init
allcommit/commitanyabort/abort
appcommit/votereq votereq/commitvotereq/abort
abort/ack commit/ack
Coordinator Participant
CrashPoints
Abort Commit
Wait
Init
Abort Commit
Uncertain
Init
allcommit/commitanyabort/abort
appcommit/votereq votereq/commitvotereq/abort
abort/ack commit/ack
Coordinator Participant
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
• DetectfailureusingTimeouts• IsthismodelSynchronousorAsynchronous?
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà Transactionterminates
• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting
ParticipantRecoveryfromCoordinatorFailure
Abort Commit
Uncertain
Init
votereq/commitvotereq/abort
abort/ack commit/ack
Participant
• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”
• Itdetectsfailureofcoordinator– Usingtimeout
• ParticipantinUncertain state– can’tassumeeitheroutcome
- BADthingshappenifparticipantmakeswrongassumptions
– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome
ParticipantRecoveryfromCoordinatorFailure
Abort Commit
Uncertain
Init
votereq/commitvotereq/abort
abort/ack commit/ack
Participant
• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”
• Itdetectsfailureofcoordinator– Usingtimeout
• ParticipantinUncertain state– can’tassumeeitheroutcome
- BADthingshappenifparticipantmakeswrongassumptions
– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome
CoordinatorRecoveryfromParticipantFailure
• Coordinatorinwaitstate– waitingforparticipanttosay“commit”or“abort”
• Itdetectsfailureofparticipant– Usingtimeout
• Coordinatorassumestheysaid‘no’– Takesnoresponseasanabort– Aborttransaction!!
• IfparticipantFails,Coordinatorcanmakeprogress
Abort Commit
Wait
Init
allcommit/commitanyabort/abort
appcommit/votereq
Coordinator
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà transactionterminates
• ParticipantdetectsCoordinatorfailure• Theparticipant mustwait forcoordinator• The transaction isstalled!
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting
TwoPhaseCommitandCAPTheorem!
• Duringapartition• Does2PCpickAvailabilityorConsistency?
ReadytoCommit?
Coordinator
Participant
Participant
NetworkPartition
CAPTheorem
• C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance
• Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”
TwoPhaseCommitandCAPTheorem!
• Duringapartition• Does2PCpickAvailabilityorConsistency?
ReadytoCommit?
Coordinator
Participant
Participant
NetworkPartition
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT
• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting
CoordinatorassumesABORTTransactionends
ParticipantABORTEventuallytransactionends
ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT
TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?
• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls
• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT
• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
CountVotes!!!!!!
Coordinator
Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting
CoordinatorassumesABORTTransactionends
ParticipantABORTEventuallytransactionends
ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT
TwoPhaseCommit:AddingIsolationwithLocks
https://github.com/facebook/rocksdb/wiki/Transactions
https://apacheignite.readme.io/docs/concurrency-modes-and-isolation-levels
PessimisticVersusOptimisticLocking
• Trade-off:concurrencyversusisolationPessimistic:• Getalllocksbeforetransaction• Releasealllocksaftertransaction
• Releaselocksaftercommit/abort• Preventsanyoneelsefromusingthe
dataduringtransaction• Lockspreventread/writeofdata• Locksstopothertransactions
Optimistic• Nolocks• Getacopyofdatabeforetransaction• Aftertransactionchecktomakesuredata
hasnotchanged• IfthedatachangedthenABORT!!!!• Datachangesmeanssomeoneelse
changedthedata
PessimisticVersusOptimisticLocking
• Trade-off:concurrencyversusisolation
Pessimistic
HighlevelofconcurrencyHighthroughput:especiallyifallreadsManyTransactionswillabortifmanywrites
Optimistic
LowlevelofconcurrencyLowperformanceSequentialorderingoftransactions
TwoPhaseCommit:PracticalIssues
PracticalPerformanceIssueswith2PC
• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication
• 3(N)messages• Duringthese roundsresourcesarefrozen
ReadytoCommit?
CoordinatorParticipant
Participant
Abort/Commit?
CoordinatorParticipant
Participant
Vote[Yes/No]
CoordinatorParticipant
Participant
PracticalPerformanceIssueswith2PC
• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress
PracticalPerformanceIssueswith2PC
• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication
• 3(N)messages• Duringthese roundsresourcesarefrozen
• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress
https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
DistributedTransactionNoLongerConsideredDead!
https://thenewstack.io/microsoft-orleans-brings-distributed-transactions-to-cloud/
2019:MS’sOrleansDist.TransactiontotheCloud!!!
2007:AvoidDistributedtransactions
https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
2012:Google’sSpannerDist.Transaction!!!
DistributedTransactionsandACID
HowdoyougetACIDinDistributedTransactions?• TwoPhaseCommità A+C
• 2PC:atomicandconsistent changefromonestatetoanotherstate
• TwoPhaseCommit+Locksà A+C+I• Locksprovideisolation bypreventingconcurrenttransactionfromaccessingdata
• TwoPhaseCommit+Locks+Logsà A+C+I+D• Logsensurethetransactions persists
ConsistencyModelsRevisited
ConsistencySpectrum
StrictSerializability
Linearizable
Sequential
Causal+
Eventual
WEAKCONSISTENCY
STRONGCONSISTENCY
SLOWERBUTEASYTOPROGRAM
FASTBUTHARDERTOPROGRAM
StrictSerializability
• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata
• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput
• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)
StrictSerializability
• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata
• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput
• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)
Summary
• BackgroundonTransactions• ACIDSemantics
• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit
• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues
• ConsistencyModels• Serializability VersusLinearizability