Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ReversiblePhaseTransi/onsinaStructuredOverlayNetwork
withChurnRumaR.Paul1,2,PeterVanRoy1,andVladimirVlassov2
May18,2016
1UniversitécatholiquedeLouvain,Belgium{ruma.paul,peter.vanroy}@uclouvain.be
2KTHRoyalIns/tuteofTechnology,Sweden
{rrpaul,vladv}@kth.se
Interna/onalConferenceonNetworkedSystems,Netys2016,Marrakech,Morocco
Introduc/on• Applica/onsareexposedtoincreasinglystressfulenvironments
– Outofdatacenters,togeoreplica/onandedgecompu/ng– Nodeandcommunica/onfailuresareincreasingasnodesincreaseinnumber
• Wewouldlikeapplica/onstosurvivesuchstressfulenvironmentsandtohavepredictablebehavior– WeintroducetheconceptofReversibilitytodefinewhatsurvivalmeansin
arbitrarilystressfulenvironments– WeintroducetheconceptofPhasetoallowapplica/onstoobservethe
hos/lityoftheirenvironmentandbehaveaccordingly
• Weevaluatetheseconceptsonalargerealis/csystem– Astructuredoverlaynetworkwithsimulatedenvironmentandhighchurn– Weinves/gatehowtomakeitReversibleandhowtobuildapplica/onsontop
NETYS2016 2
Reversibility
3NETYS2016
WhyweneedReversibility• Supposeadistributedsystemrunningonnnodesprovidinga
specificsetofservices• From/mettot+T,thesystemexperiencesexternalstress
– Ex.knodescrashandjnodesjointhesystem– Ex.asystempar//onduetoaconnec/vityproblemoftheunderlying
physicalnetwork
• Canweensurethatthesystemwilleventuallyregainitsfullfunc/onalitya]er/met+T?
• Doesthesystemhaveawell-definedbehaviorduringtheinterval[t,t+T]?
NETYS2016 4
Reversibility(informal)
• WithReversibilitywecangiveaffirma:veanswerstobothques:ons!
• Informally,Reversibilitymeansthatthesystem'sfunc/onalitydependsonlyonthecurrentstressexperiencedbythesystemandnotonthehistoryofthestress
NETYS2016 5
Reversibility(formal)• Givenafunc/onS(t)thatreturnsthesystemstressinsome
arbitrarybutwell-definedunits– Ex.S(t)canexplainhowthesystemispar//onedasafunc/onof/me,
orgivechurnasfunc/onof/me
• AsystemisReversibleifthereexistsafunc/onFop(id,S(t))ofnodeiden/fieridandstressS(t)suchthatthesetofsystemopera/onsavailableatnodeidisFop(id,S(t))– Anopera/onisavailableforagivenstressiftheopera/onwilleventually
succeed(itwillfailonlyafinitenumberof/mesiftriedrepeatedlyandthensucceed)
– NotethatwhenS(t)=0thesystemprovidesfullfunc/onality
NETYS2016 6
Comparisonwithrelatedconcepts• ReversibilityversusFaultTolerance
– Afault-tolerantsystemisresilientforagivenfaultmodel,butitsbehavioroutsidethatmodelisundefined
– Reversibilityisastrongerpropertybecauseitguaranteesthatthesystemwillrecoverfunc/onalityifthestressisremoved
• ReversibilityversusSelfStabiliza/on– Aself-stabilizingsystemsurvivesanytemporaryperturba/onofitsinternalstate;itreturnstoavalidstatewhentherearenoperturba/ons
– Reversibilityismoreusefulinprac/ce:itgivesinforma/onaboutfunc/onalityevenduringnonzerostress
NETYS2016 7
Evalua/on
Evalua/on
• Inves/gateReversibilityinthecontextofarealis/csystem– Representa/vesystem:astructuredoverlaynetwork
• SimulatedenvironmentrunningonMozart-Oz2.0plamorm– SimulatedmessagedelaysfollowInternetdistribu/on– Networksizeof1024peers
• Experiments– Firststory:achievingReversibilityduringhighchurn– Secondstory:deducingsystemfunc/onalitybyobservingstructure– Thirdstory:designingReversibleapplica/ons
9NETYS2016
StructuredOverlayNetwork(SON)
• P2PSystems:Dualclient/serverroleofeachnodeofthesystem.
• Duetolocalcoopera/onofpeersan
overallnetworkrou/ngviewemerges,knownasanoverlaynetwork,ontopoftheunderlaynetwork.
• StructuredOverlayNetwork:Astructureisinducedthroughthepointersmaintainedbyeachpeerofthesystem. OverlayNetwork:AP2PSystemwithnodes
a,f,i,pandxformstheoverlaynetworkontopoftheunderlaynetwork
NETYS2016 10
Beernet• Beernet3isarepresenta/veexampleofthedesignclassasperthereference
architectureproposedbyAbereret.al.• WhyBeernet?
– SimilartoChord,butwithcorrectlock-freejoinopera/on.– Join/leaveinChordrequirescoordina/onofthreepeersthatisnotguaranteeddueto
non-transi/veconnec/vityonInternet.• Non-Transi/veConnec/vity:AcantalktoBandBcantalktoC≠AcantalktoC.
– Beernetdoesnotassumetransi/veconnec/vity.MoreresilientonInternet.Threestepjoin/leaveopera/on,eachsteprequirescoordina/onamongonlytwopeers(guaranteedwithapoint-to-pointcommunica/on).• Consequence:NaturalBranchingstructure.Astablecoreringandtransientbranches.
Branchesonarelaxedring.Peerspandsconsideruassuccessor,butuonlyconsiderssaspredecessor.Peerqhasnotestablishedaconnec/onwithitspredecessorpyet.
3B.Mejías,“Beernet:Arelaxedapproachtothedesignofscalablesystemswithself-managingbehaviourandtransac/onalrobuststorage,”Ph.D.disserta/on,UCL,Belgium,2010.11NETYS2016
MaintenanceStrategies• AMaintenanceStrategymaintainscorrectstructureofaSON
– Weinves/gatetheMaintenanceStrategiesneededforReversibility
• Severalstrategiesareproposedintheliterature:– Correc/on-on-Change/Use(usedbyDKS,Beernet);– PeriodicStabiliza/on(usedbyChord);– Gossip-basedstrategies,e.g.,T-MAN(buildingoverlaytopology).
• Thesestrategiesarecomplementary– Correc/on-on-changeismuchmoreefficientthangossip,whereas
gossipismuchmoreresilient
12NETYS2016
MaintenanceStrategies(cont..)
13
MaintenanceStrategy Local/Global
Reac6ve/Proac6ve
Fast/Slow
Safety BandwidthConsump6on
Correc/on-on-Change(forself-healing)andCorrec/on-on-Use(providesself-op/miza/onandself-configura/on).
Local Reac/ve Fast Yes Small
PeriodicStabiliza/on:correc/onusingperiodicprobing.
Local Proac/ve Slow Lookupinconsistenciesanduncorrectedfalsesuspicionscanbeintroduced
High
OverlayMergerwithPassiveList:TriggerMergerusingfalselysuspectednodes2
Global Reac/ve Adaptable Yes Adaptable
Gossip-basedMaintenance,e.g.,OverlayMerger2withKnowledgeBase:Proac/veapproachtotriggermergerusingthegatheredknowledgeateachnode.
Global Proac/ve Adaptable Yes Adaptable
Efficiency
Resiliency
.2T.M.Shafaat,“Par//ontoleranceanddataconsistencyinstructuredoverlaynetworks,”Ph.D.disserta/on,KTH,Sweden,2013.
NETYS2016
• Wecoveracompletespaceofpossiblemaintenancestrategies:
• FirstStory:“CanthesystembemadereversibleagainstchurnusingtheMaintenanceStrategies?”– Weshowexperimentallytheneedofbothefficientandresilientmaintenance
• SecondStory:“Canwededucethesystem’sfunc/onalitybyexaminingitsstructureathighchurn?YES!Phaseconcept.”– Insightonhowtoobserveglobalstructure;– Insightonhowphaseofeachnodeisrelatedtofunc/onalityofthesystem;– Experimentaldemonstra/onthatreversiblephasetransi/onshappenina
reversiblesystemasthestressvaries
• ThirdStory:“Canwehelpapplica/onstobereversibleandpredictable”?YES!ExposePhaseofeachnodethroughanAPI.”– Introduc/onofPhaseAPI;– Insightonhowtheapplica/oncanusephaseconcepttomanageitsbehavior
StoriesandTheirContribu/ons
NETYS2016 14
FirstStoryChurn&Reversibility
15NETYS2016
AretheMaintenanceStrategiesReversible?(1)
Correc/on-on-*
Churn:%ofnodeturnoverpersecond.Metric:%ofnodesoncoreringasafunc/onof/me
Correc/on-on-*isinsufficienttoachieveReversibilityduetolackofliveness!!
16
0
20
40
60
80
100
20 40 60 80 100 120 140
Per
cen
tag
e o
f N
od
es o
n C
ore
Rin
g
Time (in sec)
For Churn = 10%For Churn = 50%
For Churn = 100%
NETYS2016
ToachieveReversibility,thepercentageofnodesonthecoreringshouldeventuallyapproach100%
AretheMaintenanceStrategiesReversible?(2)
Correc/on-on-*andPeriodicStabiliza/on
Correc/on-on-*andPeriodicStabiliza/onandMergerwith
passivelist
S6llnotReversible.Why?17
0
20
40
60
80
100
20 40 60 80 100 120 140
Perc
enta
ge o
f N
odes
on C
ore
Rin
g
Time (in sec)
For Churn = 10%For Churn = 50%
For Churn = 100%
0
20
40
60
80
100
20 40 60 80 100 120 140
Perc
enta
ge o
f N
odes
on C
ore
Rin
g
Time (in sec)
For Churn = 10%For Churn = 50%
For Churn = 100%
NETYS2016
• Highchurnmakesoverlayunstable,whichdoesnotallownewpeerstocompleteajoin– Thechurnrapidlyinvalidatesthejoinreferenceofthenewpeer
• Inordertomaketheseisolatedpeerspartofoverlay,weneedtore-triggerjoinbyprovidinganewvalidjoinreference.– KnowledgeBaseisrequiredtogetknowledgeaboutanalivepeerofoverlay
• Proac6vetriggeringofmergerusingKnowledgeBasetoavoidpar//onofthesystema]erisolatednodescompletetheirjoinprocedures.
Correc/on-on-*,PeriodicStabiliza/on,MergerwithKnowledgeBase.
18
AretheMaintenanceStrategiesReversible?(3)
APerfectRingwith100%nodes!!
0
20
40
60
80
100
20 40 60 80 100 120 140
Perc
enta
ge o
f N
odes
on
Cor
e R
ing
Time (in sec)
For Churn = 10%For Churn = 50%
For Churn = 100%
NETYS2016
SummaryofFirstStory
• RepeatedjoinusingKnowledgeBaseisrequiredtoachieveReversibilityagainstextremelyhighChurn.
• Proac/vemergerusingKnowledgeBaseisrequiredtoavoid
par//oningofthesystem.
19NETYS2016
SecondStoryPhaseandPhaseTransi/ons
20NETYS2016
Phase,PhaseTransi/on&Cri/calPoint
• System=Anaggregateen/tycomposedofalargenumberofinterac/ngparts– EachpartisanodeoftheSON
• APhaseisasubsetofasystemforwhichthequalita/veproper/es(e.g.,func/onalguarantees)areessen/allythesame– Differentpartscanbeindifferentphases,dependingonthelocal
environmentobservedbythepart
• Whyisthisinteres/ng?– Systemfunc/onalitydependsonthesequalita/veproper/es
• Usephaseforobservingsystemfunc/onality,butitshouldworkwithoutextracomputa/onandevenwhencommunica/onisbroken
– Usefultoapplica/onsrunningontopofSONinstressfulenvironments
21NETYS2016
Phase,PhaseTransi/on&Cri/calPoint(Cont..)
• APhaseTransi:onoccurswhenasignificantfrac/onofasystem’sparts
changesphase
– Thiscanhappenifthelocalenvironmentchangesatmanyparts• ACri:calPointoccurswhenmorethanonephaseexistssimultaneouslyin
significantfrac/onsofasystem
• ReversibilityandPhase:– Stressisaglobalcondi/onthatcannotbeeasilymeasuredbyindividualnodes– PhasePiateachnodeiisawell-definedlocalproperty– Phaseconfigura/onofsystem,Pc=(P1,P2,P3,...,Pn).– Thesetofavailableopera/onsofthesystem,namelyFdet(id,Pc(t)).– Importantproperty:Fdet(id,Pc(t))approximatesFop(id,S(t))
22NETYS2016
Canweobservetheglobalstructure?YES!Phaseconcept!!
• IncaseofBeernet,wecaniden/fyaqualita/vepropertydependingonneighborbehavior
• Phasesofanodeareanalogoustosolid,liquidandgaseousphasesinphysicalsystem(e.g.,water)– Solid:neighborsdonotchange(corering).– Liquid:neighborschanging(branches).– Gaseous:noneighbors(isolatednodes).
• Threeliquidsub-phasesintermsofavailablefunc/onali/esandprobabilityoffacinganimmediatephasetransi/on.– liquid-1:ifpeerisonabranchwithdepth<=2andholdsastablefingertable;– liquid-2:ifpeerisonabranchwithdepth>2,butnottailofabranch.The
fingertableholds>50%validfingers;– liquid-3:ifpeerisonabranchwithdepth>2,anditistailofabranch.Most
fingersareinvalidorcrashed.
23NETYS2016
Underincreasingchurnduring5minutes A]erwithdrawingchurn
PhaseTransi/onsinSON:red,greenandblueareascorrespondto%ofnodesonring(solid),branches(liquid)andisola/on(gaseous)respec/vely.
24
0
20
40
60
80
100
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
Time (in sec)
0
20
40
60
80
100
320
340
360
380
400
420
Time (in sec)
NETYS2016
Increasingchurnwith/meupto100%,thendecreasingchurnwith/me:
25
WhatarePhaseTransi/onsgoodfor?ü Giveusefulinforma/ontotheapplica/on.ü Canbeusedforefficientself-management.
0
20
40
60
80
100
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
Time (in sec)
NETYS2016
SummaryofSecondStory• ThePhaseofeachnodehasadirectcorrela/onwiththeoverall
func/onali/es(e.g.,rou/ng,availabilityofkeys,transac/ons)ofthesystem.– Thecurrentphaseandphasetransi/onateachnodecanbedeterminedwith
highconfidence,withoutanyglobalsynchroniza:on.
• ReversiblePhaseTransi/onsinthesystemwithvaryingstresscanbeobservedasaby-productofmakingthesystemReversible.– Thesystem“boils”tothegaseousstate(becomesdisconnected)whenchurn
increasesand“condenses”fromgaseousbacktosolidphaseaschurnintensitygoesdown.
– Canprovideusefulinforma/ontotheapplica/onlayerusingAPIs.
– Canbeusedforefficientself-managementofthesystem.
26NETYS2016
ThirdStoryPhaseAPIandApplica/ons
27NETYS2016
PhaseAPI
• AnAPIexistsoneachnodetoexposeitsphasetotheapplica/onlayer
• Pushandpullmethodstocommunicatethecurrentphaseofanode– getPhase(?Pcur)BindsPcurtothecurrentphaseofthepeer.– setPhaseNo:fy(f)Setsauser-definedfunc/on,f(?Pnew)tobeexecutedwhen
thephasechanges.Pnewisboundtothenextphaseofthepeerandfisexecuted.Execu/onsoffareserializedinthesamethreadoverastreamofsuccessivephases.
28NETYS2016
Phase-AwareApplica/ons• Predictablebehaviorfortheusers:anindicatorthatchangescolorto
indicatethecurrentphaseoftheunderlyingnode.– Allowuserstoworkproduc/velyofflineandpreventanypoten/aldata-loss.
• Reversibilityfortheapplica/on:
– Canincreasereplica/onfactorofcri/caldata,basedonphaseofunderlyingnode;
– Canimprovethroughput,byadap/ngphilosophyofexponen/alback-offasTCPconges/onalgorithm.
– Canmanageitsbehaviorforconges/on-avoidance,thushelpsystemtorecoverquickly.
• EmpiricalDemonstra/onofPhase-AwareApplica/ondesign(futurework)
29NETYS2016
ConclusionandFutureWork
30NETYS2016
Conclusion
• Inordertodesignprovablycorrectdecentralizednetworkedsystems,itisrequiredtoensuretheirreversibilityagainststressfulenvironments.– Buildsystemsthatarebothpredictable(hence,usefulinprac/ce)
andreversible(hence,theysurvive)
• WedefinetheconceptofReversibilitytomakeprecisewhatsurvivalmeansinstressfulenvironments
• WedefinetheconceptofPhasetoallowapplica/onstoobservetheirstressfulenvironmentandactaccordingly
31NETYS2016
SummaryofOurStories• FirstStory:RepeatedjoinandmergerusingKnowledgeBase
isrequiredtoachieveReversibilityagainstextremelyhighChurn
• SecondStory:WeobservePhasesandPhaseTransi/onsinthesystemasaby-productofmakingthesystemReversible(giveusefulinforma/ontoapplica/onsusingAPIs)
• ThirdStory:WeintroduceaPhaseAPItogiveusefulinforma/ontoapplica/onsanduseitforphase-awareapplica/ondesign:predictablebehaviorandreversibilityintheapplica/on-levelseman/cs.
32NETYS2016
FutureWork
• Con/nuingtheworkdirectly:– DeepentheanalogybetweenphaseinSONsandinphysicalsystems;– Designapplica/onsthattakeadvantageofthePhaseAPItosurvivein
extremelystressfulenvironments;– Gainmoreinsightsaboutthemaintenancestrategies.
• Othertopics:– Inves/gateotherapplica/onarchitectures;– Inves/gateotherstressesandstressinterac/ons;– Movetorealenvironment,notsimulated.
33NETYS2016
ReversiblePhaseTransi/onsinaStructuredOverlayNetworkwithChurn
RumaR.Paul1,2,PeterVanRoy1,andVladimirVlassov2
1UniversitécatholiquedeLouvain,Belgium{ruma.paul,peter.vanroy}@uclouvain.be2KTHRoyalIns/tuteofTechnology,Sweden{rrpaul,vladv}@kth.se
ThankYou!!
NETYS2016 34