Consistency without consensusLinearizable Resilient Data Types (LRDT)
Kaushik RajanSagar Chordia Kapil Vaswani
Ganesan RamalingamSriram Rajamani
Consistency & consensus
Add(The Hobbit)
Add(Kindle)
GetCart()
Processes agree on ordering of operations
GetCart()
No deterministic algorithm in the presence
of failures [FLP]
Commuting updates• What if all update operations commute?– Ordering of updates doesn’t matter!– Eventual consistency reduces to eventual message delivery– Single round trip latency
• What if we desire linearizability?– Updates don’t commute with arbitrary reads – Reads must be consistently ordered with updates– Semantics of queries like the current top(k) elements well
understood
Commuting updates
Add(The Hobbit)
Add(Kindle)
GetCart()
GetCart()
{}
{The Hobbit, Kindle}
Reads must observe comparable sets of operations
Linearizable resilient data types
Possible ImpossibleDon’t know
SS’
op1
op2op1
op2
P1 : commutes(s,op1,op2)
op1
op2
S
S1
S2
op1
P2 : nullify(s,op1,op2)
op2
S
S1
S2
op2
op1
Examples• Read write register :
every pair of writes nullify• Read write memory :
writes to the same location nullify, writes to different locations commute
Examples• Set : add, remove and read the whole set– Add(u), Remove(v) commute– Add(u), Remove(u) nullify – Add(*), Add(*) commute– Remove(*) Remove(*) commute
• Counter : IncrBy(x), DecrBy(x), SetTo(v), Read()– SetTo(v) nullifies all other operations– Other pairs of updates commute
• Other examples Heaps, union-find, atomic snapshot objects…
Lattice agreement• Consistency reduces to lattice agreement– Weaker problem than consensus– Solvable in an asynchronous distributed system
• Assumptions– t < n/2 failures– Eventual message delivery
Lattice agreement• processes, each process starts with a value belonging
to a join semi lattice• Each non-faulty process outputs a value– (Validity) Each process’ output is a join of one or more input
values including its own– (Consistency) Any two output values are comparable– (Liveness) Every correct process eventually outputs a value
Lattice agreement
{}
{𝑎} {𝑏} {𝑐 }
{𝑎 ,𝑏} {𝑏 ,𝑐 } {𝑎 ,𝑐 }
{𝑎 ,𝑏 ,𝑐 }
𝑝1 𝑝2
𝑝3𝑝2
𝑝3𝑝2
𝑝1
a = Add(The Hobbit)b = Add(Kindle)c = Add(Lumia)
Send to all acceptors
All Acks
?
Output
𝑣 𝑖←⋁ ∀ 𝑁𝑎𝑐𝑘 (𝑎 𝑗 )𝑎 𝑗
wait for majority of acceptors to respond
On receiving
𝑎𝑖≤𝑣 𝑗
S S
Y
N
Y N
PROPOSERS ACCEPTORSInitially
𝑎𝑖=𝑎𝑖∨𝑣 𝑗 𝑎𝑖=𝑎𝑖∨𝑣 𝑗
Safety and liveness• Safety always guaranteed• Lattice agreement is t-resilient – Liveness guaranteed if quorum of processes are non-faulty
and communication is reliable– Processes output value in at-most n round trips, where n is
the number of processes
Generalized lattice agreement• Generalization of lattice agreement – Processes receive sequence of values– Values belong to an infinite lattice
• Processes output a sequence of values– (Validity) Every output value is a join of some received values – (Consistency) Any two output values are comparable (i.e.
output values form a chain)– (Liveness) Every value received by a correct process is
eventually included in an output value
GLA algorithm• Liveness (t-resilient)– Every received value is eventually included in some output in
n round trips– Adaptive, complexity depends on contention
• Fast path – Received values output in one round trip
• Reconfigurable – Replicas can be added/removed dynamically
From GLA to linearizability• Update commands form power set lattice• Updates return once majority of processes have learnt a
command set that includes the update command• Read performed by (ABD style algorithm)
1. reading the learnt command set from a quorum of processes2. Writing back the largest among these to a quorum3. Constructing state corresponding to the largest command set
by exploiting commutativity and nullification
• Multi-master replication– Does not require a single primary/leader
Impossibility
• Consensus reductionConsensus(b)
Si S0
if(b) then op1 else op2s = read()if(s = S1,S12) return
trueelse return false
Pair of idempotent update operations that neither commute nor nullify at some state s0
S0
S1
S1
2
S2
S2
1
op2
op1
op1
op2
Si
Op*
op2
op1
Implications for designing ADTs
Most commands commute
Implications for designing ADTs
neither commute nor nullify at
;
The Gap : Open problems Doubly saturating counter
0 1Incr()
Decr()
2Incr()
Decr()
nIncr()
Decr()Decr()
Incr()
Incr() and Decr() commute at 1 … n-1Incr() and Dect() nullify at 0 and n
Don’t know if this is possible or impossible
Summary
graph, RW mem… queues, sequences
Possible Impossible??Saturating
counter