Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1...

Preview:

Citation preview

DistributedSystemsDay13:Distributed Transaction

“ToBeorNottoBe… Distributed ..Transactions”

Summary

• BackgroundonTransactions• ACIDSemantics

• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

• ConsistencyModels• Serializability VersusLinearizability

https://cloud.google.com/spanner/docs/transactions

https://blog.couchbase.com/optimistic-or-pessimistic-locking-which-one-should-you-pick/

AllFacebookData

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Partitiondataintoshards,mapsshardstoserverwithconsistenthashing

k1 v1

k2 v2

k3 v3

k4 v4

k5 v5

k0 v0

Hash table

k4 v4

k5 v5

k4 v4

k5 v5

k4 v4

k5 v5

Maintainm

ultiplecopiesForfaulttoleranceandtoreducelatency

ClientssendrequestsToallreplicas

• Replication• Lazy,Passive,Active• Consistencyforasingleshard

• DistributedTransaction• Consistent/Atomicchangetodatain

multipleshards• Multipleshardsè Canusetraditional

replicationtechniques FE

FE

k2 v2

k3 v3

k2 v2

k3 v3

k2 v2

k3 v3

WhatisTransaction?

• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

Read(R)Update(R,$50)Read(T)Update(T,$150)

InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Rodrigoisinshard1 Theoisinshard2

WhatisTransaction?

• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

Read(R)Update(R,$50)Read(T)Update(T,$150)

InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.Ideal:eitherthewhole4operationshappenornonehappenWorstcase:onlyasubsetoccur

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Rodrigoisinshard1 Theoisinshard2

TransactionBackground

• ACIDSemantics• Atomicity• Consistency• Isolation• Durability

• Transactionsareeasyfortraditionaldatabases• Traditional databases areonasingle serverà failure is``all-or-nothing’’

• Allcomponents ofthetransactionsfail• Distributed transactionsà differentcomponents canfail

• NeedtoprovidesTransactionsemanticswhenonlyasubset ofthecomponents fail

Allornothingsemantics:alloperationssucceedsorfails.

Intermediatestatesarenotexposedtotheoutsideworld(nopartialwritesareexposed)

Resultsofa‘committed’transactionspersistsafterthetransaction(andthroughfailures)

Transitionsfromoneconsistentstatetoanotherconsistentstate

DistributedTransactionsSemantics

• TransactionManager• Serverinchargeoforchestratingthetransaction

• Stepsfortransaction• Client initiates atransaction

• TMgivesclientaTransactionID(TID)• Client submits operationstoTM

• TMrelaysoperationstoreplicas• Client commitstransaction

• TMperformstwophasecommit

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManager

DistributedTransactionsSemantics

• TransactionManager• Serverinchargeoforchestratingthetransaction

• Stepsfortransaction• Client initiates atransaction

• TMgivesclientaTransactionID(TID)• Client submits operationstoTM

• TMrelaysoperationstoreplicas• Client commitstransaction

• TMperformstwophasecommit

Client-SideCode:tid =openTransaction();RVal =a.get(tid,Rodrigo);a.update(tid,Rodrigo,RVal - 50);Tval =b.get(tid,Theo);b.update(tid,Theo,Tval+50);

closeTransaction(tid)orabortTransaction(tid)

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManager

TransactionManager• TMhandsoutTIDs• TMmanagesandrelays

operationstoReplicaLeaders• TMkeepstrackofallReplicas

involvedinthetransactions

ReplicaManager• Prepareoperations• Storeoperationslocallyinlog• Butdonotcommitoperations

TwoPhaseCommit

TwoPhaseCommit

• ProvidesAtomicityandConsistency• NOTIsolationandDurability

• Assumptions:eachservermaintainsatransactionlog• Transactionlogisstoredinpersistentmemory

• Iffailureà itemsinTransactionLogsurvives

• Terminologychanges:• Coordinator<--- ThetransactionManager• Participantß Leaderofareplica

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManagerCoordinator

Participant Participant

TwoPhaseCommit

• Phase1:• Coordinatorsends requestforvotes• Participants vote

• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

• AttheendofPhase2:eitherallparticipantscommitorallabort

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

CoordinatorStateDiagramforTwo-PhaseCommit

Abort Commit

Wait

Init

allcommit/commitanyabort/abort

appcommit/votereq

Coordinator

Coordinator pariticpantspariticpantspariticpantspariticpants

Response

Ready?

Vote:CommitorAbort

ParticipantStateDiagramforTwo-PhaseCommit

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

Coordinator pariticpantspariticpantspariticpantspariticpants

Response

Makeachange

CommitorAbort

StateDiagramsforTwoPhaseCommit

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

TwoPhaseCommit

• Phase1:• Coordinatorsends requestforvotes• Participants vote

• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

• AttheendofPhase2:eitherallparticipantscommitorallabort

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

CoordinatorCoordinatorfails,participantswillbeuncertain(waiting)

Participantfails,coordinatorwillbewaiting

CrashPoints

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

CrashPoints

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• IsthismodelSynchronousorAsynchronous?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà Transactionterminates

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

ParticipantRecoveryfromCoordinatorFailure

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

• Itdetectsfailureofcoordinator– Usingtimeout

• ParticipantinUncertain state– can’tassumeeitheroutcome

- BADthingshappenifparticipantmakeswrongassumptions

– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

ParticipantRecoveryfromCoordinatorFailure

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

• Itdetectsfailureofcoordinator– Usingtimeout

• ParticipantinUncertain state– can’tassumeeitheroutcome

- BADthingshappenifparticipantmakeswrongassumptions

– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

CoordinatorRecoveryfromParticipantFailure

• Coordinatorinwaitstate– waitingforparticipanttosay“commit”or“abort”

• Itdetectsfailureofparticipant– Usingtimeout

• Coordinatorassumestheysaid‘no’– Takesnoresponseasanabort– Aborttransaction!!

• IfparticipantFails,Coordinatorcanmakeprogress

Abort Commit

Wait

Init

allcommit/commitanyabort/abort

appcommit/votereq

Coordinator

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà transactionterminates

• ParticipantdetectsCoordinatorfailure• Theparticipant mustwait forcoordinator• The transaction isstalled!

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

TwoPhaseCommitandCAPTheorem!

• Duringapartition• Does2PCpickAvailabilityorConsistency?

ReadytoCommit?

Coordinator

Participant

Participant

NetworkPartition

CAPTheorem

• C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance

• Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”

TwoPhaseCommitandCAPTheorem!

• Duringapartition• Does2PCpickAvailabilityorConsistency?

ReadytoCommit?

Coordinator

Participant

Participant

NetworkPartition

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

CoordinatorassumesABORTTransactionends

ParticipantABORTEventuallytransactionends

ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

CoordinatorassumesABORTTransactionends

ParticipantABORTEventuallytransactionends

ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

TwoPhaseCommit:AddingIsolationwithLocks

https://github.com/facebook/rocksdb/wiki/Transactions

https://apacheignite.readme.io/docs/concurrency-modes-and-isolation-levels

PessimisticVersusOptimisticLocking

• Trade-off:concurrencyversusisolationPessimistic:• Getalllocksbeforetransaction• Releasealllocksaftertransaction

• Releaselocksaftercommit/abort• Preventsanyoneelsefromusingthe

dataduringtransaction• Lockspreventread/writeofdata• Locksstopothertransactions

Optimistic• Nolocks• Getacopyofdatabeforetransaction• Aftertransactionchecktomakesuredata

hasnotchanged• IfthedatachangedthenABORT!!!!• Datachangesmeanssomeoneelse

changedthedata

PessimisticVersusOptimisticLocking

• Trade-off:concurrencyversusisolation

Pessimistic

HighlevelofconcurrencyHighthroughput:especiallyifallreadsManyTransactionswillabortifmanywrites

Optimistic

LowlevelofconcurrencyLowperformanceSequentialorderingoftransactions

TwoPhaseCommit:PracticalIssues

PracticalPerformanceIssueswith2PC

• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

• 3(N)messages• Duringthese roundsresourcesarefrozen

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

PracticalPerformanceIssueswith2PC

• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

PracticalPerformanceIssueswith2PC

• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

• 3(N)messages• Duringthese roundsresourcesarefrozen

• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

DistributedTransactionNoLongerConsideredDead!

https://thenewstack.io/microsoft-orleans-brings-distributed-transactions-to-cloud/

2019:MS’sOrleansDist.TransactiontotheCloud!!!

2007:AvoidDistributedtransactions

https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

2012:Google’sSpannerDist.Transaction!!!

DistributedTransactionsandACID

HowdoyougetACIDinDistributedTransactions?• TwoPhaseCommità A+C

• 2PC:atomicandconsistent changefromonestatetoanotherstate

• TwoPhaseCommit+Locksà A+C+I• Locksprovideisolation bypreventingconcurrenttransactionfromaccessingdata

• TwoPhaseCommit+Locks+Logsà A+C+I+D• Logsensurethetransactions persists

ConsistencyModelsRevisited

ConsistencySpectrum

StrictSerializability

Linearizable

Sequential

Causal+

Eventual

WEAKCONSISTENCY

STRONGCONSISTENCY

SLOWERBUTEASYTOPROGRAM

FASTBUTHARDERTOPROGRAM

StrictSerializability

• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

StrictSerializability

• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

Summary

• BackgroundonTransactions• ACIDSemantics

• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

• ConsistencyModels• Serializability VersusLinearizability

Recommended