Accountable systems or
how to catch a liar?
Jinyang Li(with slides from authors of SUNDR and PeerReview)
What we have learnt so far
ā¢ Use BFT replication to handle (a few) malicious servers
ā¢ Strengths: ā Transparent (service goes on as if no faults)ā Minimal client modifications
ā¢ Weaknesses:ā Breaks down completely if >1/3 servers are bad
Hold (bad) servers accountableā¢ What is possible when >1/3 servers fail?
ā E.g. 1 out of 1 servers fail?ā¢ Let us consider a bad file server
ā Can it delete a clientās data?ā Can it leak the content of a clientās data?ā Can it show client A garbage and claim itās from B?ā Can it show client A old data of client B?ā Can it show A (op1, op2) and B (op2, op1)?
Hold (bad) servers accountableā¢ SUNDRās observations:
ā Cannot prevent a bad server from misbehaving at least once
ā Trustworthy clients can detect inconsistencies due to server misbehaviors!
ā¢ Useful?
QuickTimeā¢ and a decompressor
are needed to see this picture.
Fool me once, shame on you; fool me twice, shame on ā¦
Case study I: SUNDR
ā¢ Whatās SUNDR?ā A (single server) network file systemā Handle potential Byzantine server behaviorā Can run on an un-trusted server
ā¢ Useful properties:ā Tamper-evident
ā¢ Unauthorized operations will be immediately detectedā Can detect past misbehavior
ā¢ If server drops operations, can be caught eventually
Ideal File system semantics
ā¢ Represent FS calls as fetch/modify operationsā Fetch: client downloads new dataā Modify: client makes new change visible to others
ā¢ Ideal (sequential) consistency: ā A fetch reflects the sequence of modifications that
happen before itā Impossible when the server is malicious
ā¢ A is only aware of Bās latest modification via the serverā¢ Goal:
ā get as close to ideal consistency as possible
Strawman File System
AModify f2 sig3
AModify f1 sig1
BFetch f4 sig2
AModify f2 sig3
BFetch f2 sig4
AModify f1 sig1
BFetch f4 sig2
AModify f1 sig1
BFetch f4 sig2
AModify f1 sig1
BFetch f4 sig2
sig1
sig2
AModify f2 sig3
sig3
File server
A: echo āA was hereā >> /share/aaa
B: cat /share/aaa
userA
userB
Log specifies the total order
AModify f1 sig1
BFetch f4 sig2
AModify f2 sig3
BFetch f2 sig4
AModify f1 sig1
BFetch f4 sig2
AModify f2 sig3 The total order:
LogA ā¤ LogB iff LogA is prefix of LogB
Aās latest log:
Bās latest log:
LogA
LogB
Detecting attacks by the server
AModify f2 sig3
AModify f1 sig1
BFetch f4 sig2
BFetch f2 sig3
A: echo āA was hereā >> /share/aaa AModify f1 sig1
BFetch f4 sig2
AModify f1 sig1
BFetch f4 sig2
AModify f1 sig1
BFetch f4 sig2
AModify f2 sig3
A
BB: cat /share/aaa(stale result!)
File server
AModify f1 sig1
BFetch f4 sig2
BFetch f2 sig3b
AModify f1 sig1
BFetch f4 sig2
AModify f2 sig3a
Aās log and Bās log canno longer be ordered:LogA ā¤ LogB, LogB ā¤ LogA
Aās latest log:
Bās latest log:
Detecting attacks by the server
LogA
LogB
sig1sig2
sig3a
What Strawman has achieved
High overhead, no concurrency Tamper-evident
ā¢ A bad server canāt make up ops users didnāt do
Achieves fork consistencyā¢ A bad server can conceal usersā ops from each
other, but such misbehavior could be detected later
Fork Consistency: A tale of two worlds
File Server
Aās view Bās view
ā¦
ā¦
Fork consistency is useful
ā¢ Best possible alternative to sequential consistencyā If server lies only once, it has to lie forever
ā¢ Enable detections of misbehavior:ā users periodically gossip to check violationsā or deploy a trusted online ātimestampā box
SUNDRās tricks to make strawman practical
1. Store FS as a hash treeā No need to reconstruct FS image from log
2. Use version vector to totally order opsā No need to fetch entire log to check for misbehavior
Trick #1: Hash tree
h0
h1
h3
h2
h4
h6
h5
h7
h9
h8
h10 h11 h12
D0
D1 D2 D3
D4 D5 D6 D7 D8 D9 D10 D11 D12
Key property:h0 verifies the entire tree of data
Trick #2: version vector
ā¢ Each client keeps track of its version #ā¢ Server stores the (freshest) version vector
ā Version vector orders all operationsā¢ E.g. (0, 1, 2) ā¤ (1, 1, 2)
ā¢ A client remembers the latest version vector given to it by the serverā If new vector does not succeed old one, detect
order violation!ā¢ E.g. (0,1,2) ā¤ (0,2,1)
SUNDR architecture
ā¢ block server stores blocks retrievable by content-hashā¢ consistency server orders all events
Untrusted Network
consistency server
block server
SUNDR server-side
SUNDRclient
SUNDRclient
userA
userB
SUNDR data structures: hash tree
ā¢ Each file is writable by one user or groupā¢ Each user/group manages its own pool of inodes
ā A pool of inodes is represented as a hash tree
Hash files
ā¢ Blocks are stored and indexed by hash on the block server
data1
Metadata
H(data1)
H(data2)
H(iblk1)
data2
data3
data4H(data3)
H(data4)
iblk1
20-byte File Handle
i-node
Hash a pool of i-nodesā¢ Hash all files writable by each user/group
ā¢ From this digest, client can retrieve and verify any block of any file
2 20-byte
3 20-byte
4 20-byte
i-table i-node 2
i-node 3
i-node 4
20-byte digest
i-num
SUNDR FS 2 20-byte3 20-byte4 20-byte
digest
2 20-byte3 20-byte4 20-byte
digest
Superuser:
UserA:
SUNDR State
How to fetch ā/share/aaaā?/:Dir entry: (share, Superuser, 3)
Lookup ā/ā
/share:Dir entry: (aaa, UserA, 4)
Lookup ā/shareā
Fetch ā/share/aaaā
digestUserB: ā¦
234
digestGroupG:
SUNDR data structures:version vector
ā¢ Server orders usersā fetch/modify opsā¢ Client signs version vector along with digestā¢ Version vectors will expose ordering failures
Version structure
ā¢ Each user has its own version structure (VST)ā¢ Consistency server keeps latest VSTs of all usersā¢ Clients fetch all other usersā VSTs from server before
each operation and cache themā¢ We order VSTA ā¤ VSTB iff all the version numbers in VSTA
are less than or equal in VSTB
VSTA
Signature A
ADigest A
A - 1B - 1G - 1
VSTBā¤ Signature B
BDigest B
A - 1B - 2G - 2
Update VST: An exampleConsistency Server
B
AA-0
B-0
A: echo āA was hereā >> /share/aaa
B: cat /share/aaa
DigA
A
A-1
B-1DigA
A
A-1
B-1DigA
A
A-1
B-1DigA
A
A-0
B-1DigB
B
A-1
B-2DigB
B
A-1
B-2DigB
B
A-0
B-1DigB
B
VSTA ā¤ VSTB
Detect attacks
Consistency Server
BA: echo āA was hereā >> /share/aaa
B: cat /share/aaa (stale!)
A
A-1
B-1DigA
A
A-0
B-0DigA
A
A-0
B-1DigA
B
A-1
B-1DigA
A A-0
B-1DigB
B
A-0
B-2DigB
B
A-0
B-2DigB
B
Aās latest VST and Bās can no longer be ordered:VSTA ā¤ VSTB, VSTB ā¤ VSTA
ā¤A-0
B-0DigA
A
Support concurrent operationsā¢ Clients may issue operations concurrently
ā While first client is still working on his operation, how does second client know what vector to sign?
ā¢ Idea: If operations donāt conflict, include first userās forthcoming version number in VST
ā¢ Solution: Pre-declare operations in signed updatesā Server returns latest VSTs and all pending updates,
thereby ordering them before current operationā User computes new VST including pending updatesā User signs and commits new VST
Concurrent update of VSTs
Consistency ServerAB
update: B-2
update: A-1
A: echo āA was hereā >>/share/aaa
B: cat /share/bbb
A-0
B-0DigA
A
A-1
A-0
B-1DigB
B
A-0
B-1DigB
B
A-1 B-2A-0
B-0DigA
A
VSTA ā¤ VSTB
A-1
B-1DigA
A
A-1 A-1
B-1DigA
A
A-1A-1
B-2DigB
B
A-1B-2A-1
B-2DigB
B
A-1B-2
SUNDR is practical
0
2
4
6
8
10
12
Create (1K) Read (1K) Unlink
NFSv2 NFSv3 SUNDR SUNDR/NVRAM
Seco
nds
Case study II: PeerReview [SOSP07]
Motivations for PeerReview
ā¢ Large distributed systems consist of many nodes
ā¢ Some nodes become Byzantineā Software compromiseā Malicious/careless administrator
ā¢ Goal: detect past misbehavior of nodesā Apply to more general apps than FS
Challenges of general fault detection
ā¢ How to detect faults?ā¢ How to convince others that a node is (not)
faulty?
Overall approachā¢ Fault := Node deviates from expected behaviorā¢ Obtain a signature for every action from each nodeā¢ If a node misbehaves, the signature works a proof-of-misbehavior against the
node
Can we detect all faults?
ā¢ Noā e.g. Faults affecting a node's
internal state
ā¢ Detect observable faultsā E.g. bad nodes send a message that correct nodes would not send
A
X
C
100101011000101101011100100100
0
Can we always get a proof?
ā¢ No ā A said it sent X B said A didnāt C: did A send X?
ā¢ Generate verifiable evidence:ā a proof of misbehavior (A send wrong X)ā a challenge (C asks A to send X again)
ā¢ Nodes not answering challenges are suspects
A
X
B
C
?
I sent X!
I neverreceived
X!?!
ā¢ Treat each node as a deterministic state machineā¢ Nodes sign every output messageā¢ A witness checks that another node outputs correct
messagesā using a reference implementation and signed inputs
PeerReview Overview
M
PeerReview architectureā¢ All nodes keep a log of
their inputs & outputsā Including all messages
ā¢ Each node has a set of witnesses, who audit its log periodically
ā¢ If the witnesses detect misbehavior, theyā generate evidenceā make the evidence available
to other nodes
ā¢ Other nodes check evi-dence, report fault
A's log
B's log
A
BM
CD
E
A's witnesses
M
PeerReview detects tampering
A B
Message Hash chain
Send(X)
Recv(Y)
Send(Z)
Recv(M)
H0
H1
H2
H3
H4
B's log
ACK ā¢ What if a node modifies its log entries?
ā¢ Log entries form a hash chainā¢ Signed hash is included with
every message Node commits to having received all prior messages
Hash(log)
Hash(log)
PeerReview detects inconsistencyā¢ What if a node
ā keeps multiple logs?ā forks its log?
ā¢ Witness checks if signed hashes form a single chain
H3'
Read X
H4'
Not found
Read Z
OK
Create X
H0
H1
H2
H3
H4
OK
"View #1""View #2"
Module B
PeerReview detects faultsā¢ Witness audits a node:
ā Replay inputs on a reference implementation of the state machine
ā Check outputs against the log
Module AModule B
=?
LogNetwork
Input
Output
State machine
if ā
Module A
PeerReviewās guaranteesā¢ Faults will be detected (eventually)
ā If a node if faulty: ā¢ Its witness has a proof of misbehaviorā¢ Its witness generates a challenge that it cannot answer
ā If a node is correctā¢ No proof of misbehavorā¢ It can answer any challenge
PeerReview applicationsā¢ App #1: NFS server
ā¢ Tampering with filesā¢ Lost updates
ā¢ App #2: Overlay multicastā¢ Freeloadingā¢ Tampering with content
ā¢ App #3: P2P emailā¢ Denial of serviceā¢ Dropping emails
ā¢ Metadata corruptionā¢ Incorrect access control
PeerReviewās performance penalty
ā¢ Cost increases w/ # of witnesses per node W
Baseline 1 2 3 4 5
100
80
60
40
20
0Avg traffic (Kbps/node)
Number of witnesses
Baseline traffic
What have we learnt?ā¢ Put constraints on what faulty servers can do
ā Clients sign data, bad SUNDR server cannot fake data
ā Clients sign version vector, bad server cannot hide past inconsistency
ā¢ Fault detectionā Need proof of misbehavior (by signing actions)ā Use challenges to distinguish slow nodes apart
from bad ones