Upload
winifred-green
View
218
Download
2
Embed Size (px)
Citation preview
George SaadUniversity of New Mexico
Department of Computer Science
Selfishness and Malice in Distributed Systems
Selfishness and Malice• Selfishness and malice have negative influence on the
performance of distributed systems.• Selfishness of players in a game can reduce social welfare.• Malicious nodes can seriously disrupt the network.
• In this dissertation, we provide algorithms to address these issues.
Selfishness and Malice• Selfishness (El-Farol game): we characterize
BCE for game of +ve/-ve network effects. “The Power of Mediation in an Extended El Farol Game”, SAGT’13
2013
2013 2014
“Self-Healing Communication”, SSS’13 “Self-Healing Computation”, SSS’14
• Malice: we develop algorithms to recover networks from Byzantine faults.
Part I : Selfishness
El-Farol Game
• A set of n selfish players• Actions:• go to the bar• stay home
• The cost function:• cost to stay = 1,• cost to go: f(x)
Objective: find an equilibrium which minimizes Social Cost, where
Our El Farol Extension
We extend the cost function:• The cost to stay can be any constant t > 0,• The cost to go, f(x):
Positive and Negative Network Effects
“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”
“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”
[Easley and Kleinberg, 2010]
Solution ConceptsHow to minimize
my own cost unilaterally?• Nash Equilibrium • Unfortunately, NE has high social cost.
• Correlated Equilibrium (CE)• Mediator implements CE.
Mediator• A trusted coordinator that • gives recommendations to the players, • implements a correlated equilibrium.• Note that all players have free will.
• A mediator is optimal when it implements the best correlated equilibrium.
Let mediator have a probability distribution on k ≥ 1 strategy profiles.• The players know probability distribution and strategy profiles.• Mediator selects secretly one strategy profile according to the
probability distribution. • Mediator advises each player privately and separately.• No player has incentive to deviate unilaterally from the advice.
How to design such a mediator?
( [s11,…,s1n], p1 )
( [s21,…,s2n] , p2 )
( [sk1,…,skn] , pk )
( x1 , p1 )
( x2 , p2 )
( xk , pk )
( x1 , p1 )
( x2 , p2 )
( xk , pk )
( x1 , p1 )
( x2 , p2 )
( xk , pk )
( x1 , p1 )
( x2 , p2 )
( xk , pk )
( x1 , p1 )
( x2 , p2 )
( xk , pk )
Example for (c, s1, s2)-El Farol Game
• For a (2, 4, 4)-El Farol game:• Best Nash Equilibrium:
• ¼-fraction of players go.• Social cost = n.
• An optimal mediator:• Strategy profile 1: (x1 = 0, p1 = 1/3)
• Strategy profile 2: (x2 = ½, p2 = 2/3)• Expected social cost = ⅔ n.
• The optimal social cost (no selfishness)• ½-fraction of players go.• Social cost = ½ n.
How efficient is our mediator?
Our Contributions• Game of positive and negative network effects, we characterize: • Optimal Social Cost,• Best Nash Equilibrium (BNE), and• Best Correlated Equilibrium (BCE).
• Efficiency of optimal mediator for this game• When BCE = BNE?• MV and EV can be unbounded!
Optimal Social Cost
We characterize x* as a function of parameters of our game.
Best Nash Equilibrium
Optimal Mediator
- - p is a function of c, s1 and s2.- p can be 0 or 1 for some values of c, s1 and s2.
When is BCE = BNE?
If c ≤ 1, then all players would rather stay, if f(1) ≥ 1; all players would rather go, if f(1) < 1.
If c > 1 and λ(c, s1, s2) ≥ 1, then all players would rather go, where:
When BCE = BNE?
BCE is advantageous over BNE when c > 1 and λ < 1.
Can MV be unbounded?c s1 s2 c/s1 1
Can EV be unbounded?c s1 s2 c/s1 1
Related Work• Linear Congestion Games [CK’05]:• 1.577 ≤ EV ≤ 1.6 and MV ≤ 1.015.
• Ranking Games [BFHS’07]:• EV = n-1 and MV = n-1 for n>3.
• Virus Inoculation Game [DMNS’09]:• EV = and MV = .
Conclusion
• We extended the El-Farol game to have both positive and negative network effects.
• For this extension, we have characterized:• the optimal social cost, • the BNE, and• the BCE.
• We characterized the MV and the EV for this game.• We show when BCE = BNE.• We show that MV and EV can be unbounded in this game.
Open Problems
• Multi-Site El-Farol Game (> 2 actions): • The bar has k > 2 sites.• Each player chooses which site to go to.• How many strategy profiles required for BCE?
• If f(x) is polynomial in x, with degree > 1, then• what is the characterization of BCE? • Is # strategy profiles related to degree of
f(x)?
Self-Healing Communication Self-Healing Computation
Part II : Malice
Malice• We consider the presence of an adversary.
• Adversary takes over a subset of nodes to cause faults.
• Byzantine Faults vs Fail-Stop Faults
• Fault Tolerance:
• Replication
• Self-healing (automatic recovery)
Fault Tolerance• Non-self-healing algorithms for Byzantine model: [NW’03,
HK’04, FSY’05, AS’06, AJR’06, AS’07, JY’08, GKKY’10, GKKY’13].
• Self-healing algorithms for fail-stop model: [BSAS’06, ST’06, HRST’08, HST’09, PT’11, ST’11].
• Self-healing Algorithms for Byzantine faults?
• We develop self-healing algorithms to recover from Byzantine faults.
How to recover from Byzantine faults?
Self-Healing CommunicationMessage is sent through a path of nodes.
Self-Healing ComputationComputation is performed through circuits.
Our Model• A network of n nodes• Static and Computationally Bounded Adversary• Adversary controls up to ¼ of the nodes.• Partially Synchronous Communication: Upper bound of time
steps between sending and receiving messages.• Rushing Adversary: Waiting until receiving all messages from
good nodes before responding.• After bad nodes selected, Quorum Graph is built up [KLST’10]• Any quorum is a set of θ(log n) nodes; and • Each node is in θ(log n) quorums.• At most ¼ of nodes in any quorum are bad.
KLST’10 : Valerie King, Steve Lonargan, Jared Saia and Amitabh Trehan, “Load balanced Scalable Byzantine Agreement through Quorum Building, with Full Information”, ICDCN 2010.
Naïve Communication (no self-healing)
• All-to-all communication between quorums• Message cost O(l log2 n), and latency O(l)• However, we can do better by self-healing.
Our Contribution• We developed a self-healing algorithm that detects message
corruptions and marks bad nodes.
• Each bad node causes O((log∗ n)2) corruptions, in expectation.“Fool me once, shame on you. Fool me ω((log* n)2) times,
shame on me.”
Iterated Logarithme.g. log*
1010 = 5
Naïve Communication Our Algorithm
Message cost O(l log2 n ) O(l + log n)Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))
Our Algorithm (SEND)
SEND-PATH
SEND
CHECK
CHECK1 CHECK2
HEAL
HEAL is triggered O(t) times before all bad nodes are marked.
CHECK1• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).• Detects corruptions with const prob. for l = O(log2 n).
• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).
CHECK2• SEND triggers CHECK2 with probability 1/(log ∗ n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗ n)2).
• SEND triggers CHECK2 with probability 1/(log ∗n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗n)2).
CHECK2 Analysis• Deception Interval : a substring of bad nodes, where a
corruption occurs.• Key Points of Detecting Corruptions:• Deception interval shrinks logarithmically with prob. ≥ ½.• O(log* n) rounds to shrink deception interval to size zero.
CHECK2 Analysis• Deception Interval shrinks logarithmically from round to round:
HEAL
• Inspects each node participated what it received and sent
• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other
• Each pair of nodes in conflict has at least one bad node
?HEAL
• If ½ nodes in any quorum are marked, they are set unmarked.
• HEAL is triggered O(t) times before all bad nodes are marked.
• We show that using a potential function argument.
• f(b,g) is monotonically increasing,• Δf(b,g) is at least some +ve constant.• When f(b,g) = t, we are done.
Empirical Results• Our simulation runs:• over butterfly networks of quorums,• for different network sizes, up to
n=30k, and • for different fractions of bad nodes.
• Simulation terminates after all bad nodes are marked.
• The results are taken over 3000 experiments.
# messages is improved by a factor of 60 for CHECK1
39,100
649
Empirical Results# Messages reduces by a factor of 60 (n~30k)
39,100
1,177
# messages is improved by a factor of 33 for CHECK2
Empirical ResultsLatency increases by 1½ times (n~30k)
Latency increases by 1½ times for CHECK1
39,100
649
Latency increases by 2 times for CHECK2
18
13
25
13
Empirical ResultsCorruption Probability 0
39,100
649
18
13
25
13
CHECK1 CHECK2
Empirical Results# Messages reduces by O(log2 n) times
Empirical ResultsLatency increases by (1) timesθ
How to recover from Byzantine faults?
Self-Healing CommunicationMessage is sent through a path of nodes.
Self-Healing ComputationComputation is performed through circuits.
Quorum Graph• Quorum Graph has:• n input quorums; • m quorum gates; and• one output quorum
• No self-healing• All nodes in each quorum (gate) perform the same computation• Results are sent between quorums via all-to-all communication• Expensive resource cost
Naïve Computation
Our Contribution
Naïve Computation Our Algorithm
Message cost O( (n+m) log2 n ) O(m + nlog n)
Computation cost O( (n+m) log2 n ) O(m + nlog n)
Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))
We develop a self-healing algorithm for computation networks
Our Algorithm (COMPUTE)
COMPUTE
CHECK
EVALUATE
RECOVER
CHECK Algorithm• CHECK has O(log* n) rounds• In each round, nodes are selected uniformly at random, and same
computation is performed
Round 1
Round 2
CHECK Algorithm• Adversary corrupts computation in a Deception Subgraph.
• Key points of corruption detection:• We prove that deception subgraph shrinks logarithmically in each
round with constant probability.• Once deception subgraph shrinks to size zero, corruption is
detected.
Shrinks Logarithmically
Round 1
Round 2
Shrinks Logarithmically
Round 2
Round 3
Shrinks Logarithmically
Round 3
Round 4
RECOVER
• Inspects each node participated what it received and sent
• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other
• Each pair of nodes in conflict has at least one bad node
?RECOVER
• If ½ nodes in any quorum are marked, they are set unmarked.
• HEAL is triggered O(t) times before all bad nodes are marked.
• We show that using a potential function argument.
• f(b,g) is monotonically increasing, and• when it reaches t, we are done.
Empirical Results• Our simulation runs:• over perfect binary trees of quorums,• for different network sizes, up to 8k, and • for different fractions of bad nodes.
• Simulation terminates after all leaders are good.
• The results are taken over 3000 experiments.
Empirical Results# Messages reduces by factor of 65 (n~8k)
Reduced by afactor of 651.01M
66M
Empirical ResultsLatency increases by 1.75 times (n~8k)
Increases 1.75 times
63 time steps
36
Empirical ResultsCorruption Probability 0
Empirical Results# Messages reduces by O(log2n) times!
Empirical ResultsLatency increases by (1) timesθ
Conclusion
• We developed self-healing algorithms to recover networks from Byzantine faults.
• Message cost is reduced polylogarithmically in n, compared to non-self-healing algorithms.
• Experiments show that message cost reduced by • Up to a factor of 60 for communication networks• Up to a factor of 65 for computation networks
• For t < n/4, the expected total number of corruptions is O(t(log∗ n)2)
Open Problems• Can we limit the number of corruptions to O(t)?• How to self-heal networks with churn? adaptive adversary?• How to self-healing asynchronous networks?• We trigger CHECK and select the nodes in a centralized
manner. How we make CHECK decentralized?• We propose a decentralized CHECK for future work.• We implement a simulation that suggests interesting results.
Thanks! Any Questions?