Click here to load reader

Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

  • View
    14

  • Download
    0

Embed Size (px)

Text of Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1...

  • DistributedSystemsDay13:Distributed Transaction

    “ToBeorNottoBe… Distributed ..Transactions”

  • Summary

    • BackgroundonTransactions• ACIDSemantics

    • DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

    • AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

    • ConsistencyModels• Serializability VersusLinearizability

  • https://cloud.google.com/spanner/docs/transactions

  • https://blog.couchbase.com/optimistic-or-pessimistic-locking-which-one-should-you-pick/

  • AllFacebookData

    NodeA

    NodeB

    NodeC

    Shard1

    Shard1

    Shard1

    NodeD

    NodeE

    NodeF

    Shard2

    Shard2

    Shard2

    Partitiondataintoshards,mapsshardstoserverwithconsistenthashing

    k1 v1k2 v2k3 v3k4 v4k5 v5

    k0 v0

    Hash table

    k4 v4k5 v5

    k4 v4k5 v5

    k4 v4k5 v5

    Maintainm

    ultiplecopiesForfaulttoleranceandtoreducelatency

    ClientssendrequestsToallreplicas

    • Replication• Lazy,Passive,Active• Consistencyforasingleshard

    • DistributedTransaction• Consistent/Atomicchangetodatain

    multipleshards• Multipleshardsè Canusetraditional

    replicationtechniques FE

    FE

    k2 v2k3 v3

    k2 v2k3 v3

    k2 v2k3 v3

  • WhatisTransaction?

    • Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

    Read(R)Update(R,$50)Read(T)Update(T,$150)

    InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.

    NodeA

    NodeB

    NodeC

    Shard1

    Shard1

    Shard1

    NodeD

    NodeE

    NodeF

    Shard2

    Shard2

    Shard2

    Rodrigoisinshard1 Theoisinshard2

  • WhatisTransaction?

    • Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

    Read(R)Update(R,$50)Read(T)Update(T,$150)

    InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.Ideal:eitherthewhole4operationshappenornonehappenWorstcase:onlyasubsetoccur

    NodeA

    NodeB

    NodeC

    Shard1

    Shard1

    Shard1

    NodeD

    NodeE

    NodeF

    Shard2

    Shard2

    Shard2

    Rodrigoisinshard1 Theoisinshard2

  • TransactionBackground

    • ACIDSemantics• Atomicity• Consistency• Isolation• Durability

    • Transactionsareeasyfortraditionaldatabases• Traditional databases areonasingle serverà failure is``all-or-nothing’’

    • Allcomponents ofthetransactionsfail• Distributed transactionsà differentcomponents canfail

    • NeedtoprovidesTransactionsemanticswhenonlyasubset ofthecomponents fail

    Allornothingsemantics:alloperationssucceedsorfails.

    Intermediatestatesarenotexposedtotheoutsideworld(nopartialwritesareexposed)

    Resultsofa‘committed’transactionspersistsafterthetransaction(andthroughfailures)

    Transitionsfromoneconsistentstatetoanotherconsistentstate

  • DistributedTransactionsSemantics

    • TransactionManager• Serverinchargeoforchestratingthetransaction

    • Stepsfortransaction• Client initiates atransaction

    • TMgivesclientaTransactionID(TID)• Client submits operationstoTM

    • TMrelaysoperationstoreplicas• Client commitstransaction

    • TMperformstwophasecommit

    NodeA

    NodeB

    NodeC

    Shard1

    Shard1

    Shard1

    NodeD

    NodeE

    NodeF

    Shard2

    Shard2

    Shard2

    FE

    TransactionManager

  • DistributedTransactionsSemantics

    • TransactionManager• Serverinchargeoforchestratingthetransaction

    • Stepsfortransaction• Client initiates atransaction

    • TMgivesclientaTransactionID(TID)• Client submits operationstoTM

    • TMrelaysoperationstoreplicas• Client commitstransaction

    • TMperformstwophasecommit

    Client-SideCode:tid =openTransaction();RVal =a.get(tid,Rodrigo);a.update(tid,Rodrigo,RVal - 50);Tval =b.get(tid,Theo);b.update(tid,Theo,Tval+50);

    closeTransaction(tid)orabortTransaction(tid)

    NodeA

    NodeB

    NodeC

    Shard1

    Shard1

    Shard1

    NodeD

    NodeE

    NodeF

    Shard2

    Shard2

    Shard2

    FE

    TransactionManager

    TransactionManager• TMhandsoutTIDs• TMmanagesandrelays

    operationstoReplicaLeaders• TMkeepstrackofallReplicas

    involvedinthetransactions

    ReplicaManager• Prepareoperations• Storeoperationslocallyinlog• Butdonotcommitoperations

  • TwoPhaseCommit

  • TwoPhaseCommit

    • ProvidesAtomicityandConsistency• NOTIsolationandDurability

    • Assumptions:eachservermaintainsatransactionlog• Transactionlogisstoredinpersistentmemory

    • Iffailureà itemsinTransactionLogsurvives

    • Terminologychanges:• Coordinator

  • TwoPhaseCommit

    • Phase1:• Coordinatorsends requestforvotes• Participants vote

    • Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

    • AttheendofPhase2:eitherallparticipantscommitorallabort

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

  • CoordinatorStateDiagramforTwo-PhaseCommit

    Abort Commit

    Wait

    Init

    allcommit/commitanyabort/abort

    appcommit/votereq

    Coordinator

    Coordinator pariticpantspariticpantspariticpantspariticpants

    Response

    Ready?

    Vote:CommitorAbort

  • ParticipantStateDiagramforTwo-PhaseCommit

    Abort Commit

    Uncertain

    Init

    votereq/commitvotereq/abort

    abort/ack commit/ack

    Participant

    Coordinator pariticpantspariticpantspariticpantspariticpants

    Response

    Makeachange

    CommitorAbort

  • StateDiagramsforTwoPhaseCommit

    Abort Commit

    Wait

    Init

    Abort Commit

    Uncertain

    Init

    allcommit/commitanyabort/abort

    appcommit/votereq votereq/commitvotereq/abort

    abort/ack commit/ack

    Coordinator Participant

  • TwoPhaseCommit

    • Phase1:• Coordinatorsends requestforvotes• Participants vote

    • Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

    • AttheendofPhase2:eitherallparticipantscommitorallabort

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    CoordinatorCoordinatorfails,participantswillbeuncertain(waiting)

    Participantfails,coordinatorwillbewaiting

  • CrashPoints

    Abort Commit

    Wait

    Init

    Abort Commit

    Uncertain

    Init

    allcommit/commitanyabort/abort

    appcommit/votereq votereq/commitvotereq/abort

    abort/ack commit/ack

    Coordinator Participant

  • CrashPoints

    Abort Commit

    Wait

    Init

    Abort Commit

    Uncertain

    Init

    allcommit/commitanyabort/abort

    appcommit/votereq votereq/commitvotereq/abort

    abort/ack commit/ack

    Coordinator Participant

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    • DetectfailureusingTimeouts• IsthismodelSynchronousorAsynchronous?

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

    Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    • DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà Transactionterminates

    • ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

    Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

  • ParticipantRecoveryfromCoordinatorFailure

    Abort Commit

    Uncertain

    Init

    votereq/commitvotereq/abort

    abort/ack commit/ack

    Participant

    • ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

    • Itdetectsfailureofcoordinator– Usingtimeout

    • ParticipantinUncertain state– can’tassumeeitheroutcome

    - BADthingshappenifparticipantmakeswrongassumptions

    – waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

  • ParticipantRecoveryfromCoordinatorFailure

    Abort Commit

    Uncertain

    Init

    votereq/commitvotereq/abort

    abort/ack commit/ack

    Participant

    • ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

    • Itdetectsfailureofcoordinator– Usingtimeout

    • ParticipantinUncertain state– can’tassumeeitheroutcome

    - BADthingshappenifparticipantmakeswrongassumptions

    – waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

  • CoordinatorRecoveryfromParticipantFailure

    • Coordinatorinwaitstate– waitingforparticipanttosay“commit”or“abort”

    • Itdetectsfailureofparticipant– Usingtimeout

    • Coordinatorassumestheysaid‘no’– Takesnoresponseasanabort– Aborttransaction!!

    • IfparticipantFails,Coordinatorcanmakeprogress

    Abort Commit

    Wait

    Init

    allcommit/commitanyabort/abort

    appcommit/votereq

    Coordinator

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    • DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà transactionterminates

    • ParticipantdetectsCoordinatorfailure• Theparticipant mustwait forcoordinator• The transaction isstalled!

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

    Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

  • TwoPhaseCommitandCAPTheorem!

    • Duringapartition• Does2PCpickAvailabilityorConsistency?

    ReadytoCommit?

    Coordinator

    Participant

    Participant

    NetworkPartition

  • CAPTheorem

    • C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance

    • Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”

  • TwoPhaseCommitandCAPTheorem!

    • Duringapartition• Does2PCpickAvailabilityorConsistency?

    ReadytoCommit?

    Coordinator

    Participant

    Participant

    NetworkPartition

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    • DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

    • ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

    Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

    CoordinatorassumesABORTTransactionends

    ParticipantABORTEventuallytransactionends

    ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

  • TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

    • 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

    • DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

    • ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

    CountVotes!!!!!!

    Coordinator

    Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

    CoordinatorassumesABORTTransactionends

    ParticipantABORTEventuallytransactionends

    ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

  • TwoPhaseCommit:AddingIsolationwithLocks

  • https://github.com/facebook/rocksdb/wiki/Transactions

  • https://apacheignite.readme.io/docs/concurrency-modes-and-isolation-levels

  • PessimisticVersusOptimisticLocking

    • Trade-off:concurrencyversusisolationPessimistic:• Getalllocksbeforetransaction• Releasealllocksaftertransaction

    • Releaselocksaftercommit/abort• Preventsanyoneelsefromusingthe

    dataduringtransaction• Lockspreventread/writeofdata• Locksstopothertransactions

    Optimistic• Nolocks• Getacopyofdatabeforetransaction• Aftertransactionchecktomakesuredata

    hasnotchanged• IfthedatachangedthenABORT!!!!• Datachangesmeanssomeoneelse

    changedthedata

  • PessimisticVersusOptimisticLocking

    • Trade-off:concurrencyversusisolation

    Pessimistic

    HighlevelofconcurrencyHighthroughput:especiallyifallreadsManyTransactionswillabortifmanywrites

    Optimistic

    LowlevelofconcurrencyLowperformanceSequentialorderingoftransactions

  • TwoPhaseCommit:PracticalIssues

  • PracticalPerformanceIssueswith2PC

    • Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

    • 3(N)messages• Duringthese roundsresourcesarefrozen

    ReadytoCommit?

    CoordinatorParticipant

    Participant

    Abort/Commit?

    CoordinatorParticipant

    Participant

    Vote[Yes/No]

    CoordinatorParticipant

    Participant

  • PracticalPerformanceIssueswith2PC

    • Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

  • PracticalPerformanceIssueswith2PC

    • Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

    • 3(N)messages• Duringthese roundsresourcesarefrozen

    • Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

    https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

  • DistributedTransactionNoLongerConsideredDead!

    https://thenewstack.io/microsoft-orleans-brings-distributed-transactions-to-cloud/

    2019:MS’sOrleansDist.TransactiontotheCloud!!!

    2007:AvoidDistributedtransactions

    https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

    2012:Google’sSpannerDist.Transaction!!!

  • DistributedTransactionsandACID

  • HowdoyougetACIDinDistributedTransactions?• TwoPhaseCommità A+C

    • 2PC:atomicandconsistent changefromonestatetoanotherstate

    • TwoPhaseCommit+Locksà A+C+I• Locksprovideisolation bypreventingconcurrenttransactionfromaccessingdata

    • TwoPhaseCommit+Locks+Logsà A+C+I+D• Logsensurethetransactions persists

  • ConsistencyModelsRevisited

  • ConsistencySpectrum

    StrictSerializability

    Linearizable

    Sequential

    Causal+

    Eventual

    WEAKCONSISTENCY

    STRONGCONSISTENCY

    SLOWERBUTEASYTOPROGRAM

    FASTBUTHARDERTOPROGRAM

  • StrictSerializability

    • Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

    • Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

    • StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

  • StrictSerializability

    • Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

    • Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

    • StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

  • Summary

    • BackgroundonTransactions• ACIDSemantics

    • DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

    • AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

    • ConsistencyModels• Serializability VersusLinearizability