14
Page 1 Co pyri ght 2 007 Koren & Krishna, Mor gan- Ka ufman P art. 9 .1 FAULT TOLERANT SYSTEMS ht t p: / / www. ecs. umass. edu/ ece / koren / FaultToleran t Syst ems Part 9 - Dat a Replicat ion Chapt er 3 – I nf ormation Redundan cy Co pyri ght 2 007 Koren & Krishna, Mor gan - Ka ufman Part.9 .2 Data Replication f or Fault T olerance I den tical copies of dat a held a t multiple nodes in a distributed syst em I mpr oved perf orman ce and f ault - t olerance Data replicates must be kept co nsis t ent despit e f ailures in t he system Example - f ive copies in f ive nodes: I f A is di sconne ct ed an d a write upda t es t he co py in A - th e rest no lon ge r consist ent wit h A A ny read of t heir data w ill r esult in stale data How many copies should we read ( write)?

Part9 Ch3 Data Replic

Embed Size (px)

Citation preview

Page 1: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 1/14

Page 1

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .1

FAULT TOLERANT SYSTEMS

ht t p: / / www. ecs. umass. edu/ ece/ kor en/ Fault Toler ant Syst ems

Par t 9 - Dat a Replicat ion

Chapt er 3 – I nf or mat ion Redundancy

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .2

Dat a Replicat ion f or Fault Tolerance♦I dent ical copies of dat a held at mult iple nodes in a

dist r ibut ed syst em

♦I mproved perf ormance and f ault - t olerance

♦Dat a replicat es must be kept consist ent despit ef ailur es in t he syst em

♦Example - f ive copies in f ive nodes:

♦I f A is disconnect ed and a write

updat es t he copy in A - t he r estno longer consist ent wit h A

♦Any read of t heir dat a will r esultin st ale dat a

♦How many copies should we read (write)?

Page 2: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 2/14

Page 2

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .3

Simple Vot ing Scheme - Non Hierar chical

♦Assign v vot es t o copy i of t he dat a

♦S - set of all nodes wit h copies of t he dat a

♦V - sum of all vot es -

♦r , w - var iables such t hat r + w > V ; w > V/ 2

♦V(X) - t ot al number of vot es assigned t o copies inset X -

♦St r ategy ensur ing t hat all reads use t he lat est dat a

♦To complet e a read - r ead nodes of a set R ⊆ Ssuch t hat V(R) ≥ r

♦To complet e a write - wr it e on ever y node of a setW ⊆ S such t hat V(W) ≥ w

i

∑∈

=S i

iV V 

∑∈

= X i

iV  X V  )(

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .4

Pr ocedur e J ust if icat ion♦A set R such t hat V(R) ≥ r is called a read

quorum

♦A set W such t hat V(W) ≥ w is called awr it e quorum

♦For any set s R and W such t hat V(R) ≥ r and

V(W) ≥ w R ∩ W ≠ ∅ (since r + w > v)

♦Any read oper at ion is guarant eed t o r eadt he value of at least one copy which hasbeen updat ed by t he lat est write

♦Fur t her mor e - f or any t wo set ssuch t hat

21, W W 

wW V W V  ≥)(),( 21

φ≠I 21 W W 

Page 3: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 3/14

Page 4: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 4/14

Page 4

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .7

Per f or mance vs. Availabilit y

♦Select ed values of r and w af f ect t he syst emper f or mance and availabil it y

♦I f t her e ar e many mor e reads t han wr it es - wechoose a low r t o speed up t he read oper at ions

♦r =1 requires w=5 - wr it e can not be done if evenone node is disconnect ed

♦Select ing r =2 allows w=4 and write can st ill bedone if f our out of t he f ive nodes ar e connect ed

♦Tr ade- of f bet ween perf or mance and availabilit y

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .8

Reliabilit y and Availabilit yMarkov Models

♦Assumpt ions:∗ Failures occur at each node

accor ding t o a Poisson processwith rate λ (links do not f ail)

∗ When a node f ails, it is repairedand up- t o- dat e data is loaded

∗ Repair t ime is an exponent ially dist r ibut ed random var iable

wit h mean 1/ µ♦Example: (r,w) = (3,3) - bot h read and writeoperat ions can t ake place if at least three of t hef ive nodes are up

♦To comput e r eliabilit y and availabilit y, we useMar kov Chain models

Page 5: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 5/14

Page 5

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 .9

Reliabilit y and Long- Ter mAvailabilit y f or (r , w)=(3, 3)

♦Markov chain f orr eliabilit y:

♦St at e - number ofnodes down;F – t he f ai lur e st at e

♦Reliabilit y at t ime t

♦Markov chain f oravailabilit y:

♦St at e - number of

nodes down♦Long- Term Availabilit y =P0+P1+P2

♦Complet e analysis inExercises

)(1)( t Pt  R F −=

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 10

Vot e Assignment f or Maximizing Availabili t y

♦I n general, nodes have dif f erent r eliabilit ies andavailabilit ies

♦Links can f ail as well

♦Point Availabilit y - t he probabi li t y t hat at t ime t t hesyst em is up - r ead and wr it e quor ums exist

♦Pr oblem: Assigning vot es t o nodes t o maximize pointavailabilit y

♦Opt imal assignment dif f icult - heur ist ics necessar y♦Notat ions: For some f ixed point in t ime t

(t is omit t ed f or simplicit y)

∗Point Availabili t y of node i - an(i)

∗Point Availabilit y of link  j - al(j )

∗L(i) - set of links incident on node i

Page 6: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 6/14

Page 6

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 11

Heur ist ics f or Vot e Assignment

♦Heur ist ic 1: set

♦(r ounded t o t he near est int eger)

♦I f sum of vot es is even, give an ext r a vot e t o oneof t he nodes wit h t he maximum number of vot es

♦Heur ist ic 2:

♦k(i, j) - node connect ed t o node i by link  j ;set

♦(r ounded t o t he near est int eger)

♦I f sum is even - an ext r a vot e is added as above

∑∈

=

)(

)()()(

i L j

 jalianiV 

)),(()()()()(

 jik an jalianiV i L j

⋅+= ∑∈

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 12

Heur ist ic 1 - Example

♦Node A is unr eliable compared t ot he r est - get s no vot es

♦Sum of vot es is 3 - quor ums must sat isf yr + w > 3 ; w > 3/ 2 ⇒ w = 2 or 3

♦I f w=2 - r=2 is smallest read quorum♦Possible read (or write) quorums - BC, CD, BD

♦I f w=3 - r=1 is smallest read quorum

♦Possible read quorums - B, C, D

♦One write quorum: BCD

Page 7: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 7/14

Page 7

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 13

Heur ist ic 2 - Example

♦Sum of vot es even - B get s an ext r a vot e

♦Final vot e assignment - v(A)=1, v(B)=3, v(C)=2, v(D)=1

♦Sum of vot es is 7 - read and write quorums mustsat isf y:

♦r + w > 7

♦w > 7/ 2

♦w = 4 or 5 or 6 or 7

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 14

Quor ums f or Heur ist ic 2♦Possible quorums f or r+w=8

♦Ever y (r,w) pair has an availabilit y associat ed wit h it- t he pr obabilit y t hat at least one read and one writequor um exist despit e node and/ or link f ailures

♦(r,w)=(4,4) - ident ical l ist s of read and write quorums

♦Ot her (r,w) - l ist s are di f f erent

v(A)=1, v(B)=3, v(C)=2, v(D)=1

Page 8: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 8/14

Page 8

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 15

Calculat ing Availabilit y f or (r , w)=(4, 4)

♦Availabilit y A is t he pr obabilit y t hatat least one of t he quor ums

AB, BC, BD, ACD can be used♦Denot e event s: E1, E2, E3, E4 -

AB,BC, BD,ACD are up, r espect ively

♦Event s are not mut ually exclusive

♦A=P(E1∪E2∪E3∪E4)

♦Some calculat ions:P(E1)=P(AB is up)=0. 7x0. 7x0. 8=0. 392

P(E2 ∩E3)=P(BC and BD ar e up) =0. 8x0. 9x0. 9x0 . 2x0. 7=0. 091♦Exercise: Complet e t he calculat ion of t he availabil it y A

f or (r,w)=(4,4)

∑ ∩∩∩−∩∩∑ +∩=<<<

−∑k  jii ji

i E  E  E  E Pk 

 E  j

 E i

 E P j

 E i

 E P E P )4321

()()(( )

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 16

Dif f er ent Met hod f or Availabilit yCalculat ion

♦Syst em has 8 modules (4 nodes and 4 links) - eachcan be up or down

♦Total of 2 =256 mut ually exclusive st at es

♦Pr obabilit y of each st at e is a pr oduct of 8 t erms,eit her an(i) or 1- an(i) or al(i) or 1- al(i)

♦Met hodical (but long) way of comput ing availabili t y -list all st at es and add up t he pr obabilit ies of t hosewhere a quorum exist s

♦For any ot her value of (r,w) - read and wr i t equor ums ar e dif f erent

♦Availabilit y  – sum of probabili t ies of st at es in whichbot h r ead and wr it e quor ums exist

8

Page 9: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 9/14

Page 9

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 17

Dynamic Vot e Assignment

♦I f r epair is not f ast enough - syst em can degr ade♦I f syst em degr ades enough - no connect ed cluster

wit h a maj or it y of t ot al vot es exist s

♦Solut ion - adj ust able quor ums inst ead of st at ic ones

♦Assumpt ion - each node has exact ly one vot e

♦For each dat a, ver sion numbers ar e maintained -incr ement ed wit h every update

♦This can only be execut ed if a wr it e quorum can begat her ed

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 18

Dynamic Vot e Assignment - Not at ions

♦ - version number of dat a at node i

♦ - updat e sit es car dinalit y at node i - numberof nodes which part icipat ed in t he - t h updat e oft his dat a

♦When syst em start s operat ion, is init ialized t ot he t ot al number of nodes in t he syst em

♦ - set of nodes wit h which node i cancommunicat e

♦M - maximum version number in

♦I - part ial set of wit h nodes whose versionnumber is M

♦N - maximum updat e sit es car dinalit y ( ) ofnodes in I

iVN 

iSC 

iSC 

iVN 

iS 

iS 

iS 

iS 

Page 10: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 10/14

Page 10

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 19

Dynamic Vot e Assignment Algor it hm

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 20

Dynamic Vot e Assignment - Example

♦Seven nodes - same dat a - state at t ime t

♦Failure disconnect s int o t wo - {A,B,C,D} and {E, F,G}

♦E r eceives an updat e r equest

∗ - E must f ind more t han 7/ 2 nodes (including it self )

∗ can f ind only 3

∗ updat e request is reject ed

♦A r eceives an updat e r equest

∗ can be accept ed

∗ A,B,C,D are updat ed

♦New st at e

0

7= E 

SC 

Page 11: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 11/14

Page 11

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 21

Example - Cont .

♦Anot her f ailur e - component s become{A,B,C}, {D}, {E,F,G}

♦An updat e request ar r ives at C

∗wr it e quorum at C is 3

∗updat e successf ul

♦New st at e

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 22

Vot ing - Hier ar chical Organizat ion

♦I f V is large, r+w is lar ge - dat a operat ions t ake along t ime

♦Possible solut ion - hier ar chical vot ing scheme

♦Const r uct an m- level t ree

♦All t he nodes holding copies of t he dat a ar e leavesat level m- 1

♦Add vir t ual nodes at t he higher levels up t o t her oot at level 0 - added nodes are vir t ual gr oupingsof t he real nodes

♦Each node at level i will have exact ly childr en1+i

 L

Page 12: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 12/14

Page 12

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 23

Example - a Tr ee f or Hier ar chicalQuor um Generat ion

33 21 === L Lm

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 24

Quor um Generat ion Algor it hm

♦Assign one vot e t o each node in t he t r ee

♦Set Read quorum and wr it e quor um sizes at level i,and so t hat :

♦Following algor it hm is used r ecur sively :

♦Read- mark t he r oot at level 0

♦At level 1 - r ead- mar k nodes

♦Pr oceeding f rom level i t o level i+1- r ead- mar k children of each of t he nodesr ead- marked at level i

♦You cannot r ead- mark a node which does not haveat least non- f ault y childr en

♦Pr oceed unt il i = m- 1

♦The r ead- mar ked leaves f or m a read quor um

♦For ming a wr it e- quorum is similar

2 / ; iiiii Lw Lwr  >>+ir 

iw

1+

i

1+ir 

1r 

Page 13: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 13/14

Page 13

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 25

♦ f or i=1,2,

♦St ar t ing at t he root - read- mark t wo of i t schildren - say X and Y

♦Read- mark t wo children f or X and Y - say A, B -f or X, and D, E f or Y

♦Read quorum is t he set of r ead- marked leaves -A, B, D, E

Algor it hm - Example

2=iw 21=+−= iii w Lr 

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 26

Example - cont .

♦Suppose D is f ault y - cannot be par t of t heread quorum

♦We have t o pick anot her child of Y - say F -t o be in t he r ead quorum

♦I f t wo of Y' s children ar e f ault y - we cannotr ead- mar k Y - we have t o backt r ack and t r yr ead- marking Z inst ead

♦Exercise: List read quorums generat ed by

2,2,3,1 2211 ==== wr wr 

Page 14: Part9 Ch3 Data Replic

7/31/2019 Part9 Ch3 Data Replic

http://slidepdf.com/reader/full/part9-ch3-data-replic 14/14

P 14

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 27

Hierar chical vs. Non- Hier ar chicalAppr oach

♦Read quorum consist s of j ust 4 copies

♦Similar ly, we can have a wr it e quorum with 4 copies

♦For t he non- hier archical approach wit h one vot e pernode, r + w > 9 ; w > 9/ 2

♦w is at least 5, compared t o 4 in t he t r ee approach

♦To pr ove t hat t he hierarchical appr oach wor ks, weshow t hat every possible read quorum has t o int ersectevery possible wr it e quorum in at least one node

Copyri ght 2007 Koren & Krishna, Morgan- KaufmanPart. 9 . 28

Pr imar y Backup Approach

♦A node is designated as t he pr imar y - allaccesses ar e t hr ough t hat node

♦Ot her nodes are designat ed as backups

♦Under normal oper at ion - all wr it es t o t hepr imary are also copied t o t he f unct ional

backups♦When t he pr imar y f ails - one of t he backup

nodes is chosen t o t ake it s place