64
George Saad University of New Mexico Department of Computer Science Selfishness and Malice in Distributed Systems

George Saad University of New Mexico Department of Computer Science

Embed Size (px)

Citation preview

Page 1: George Saad University of New Mexico Department of Computer Science

George SaadUniversity of New Mexico

Department of Computer Science

Selfishness and Malice in Distributed Systems

Page 2: George Saad University of New Mexico Department of Computer Science

Selfishness and Malice• Selfishness and malice have negative influence on the

performance of distributed systems.• Selfishness of players in a game can reduce social welfare.• Malicious nodes can seriously disrupt the network.

• In this dissertation, we provide algorithms to address these issues.

Page 3: George Saad University of New Mexico Department of Computer Science

Selfishness and Malice• Selfishness (El-Farol game): we characterize

BCE for game of +ve/-ve network effects. “The Power of Mediation in an Extended El Farol Game”, SAGT’13

2013

2013 2014

“Self-Healing Communication”, SSS’13 “Self-Healing Computation”, SSS’14

• Malice: we develop algorithms to recover networks from Byzantine faults.

Page 4: George Saad University of New Mexico Department of Computer Science

Part I : Selfishness

Page 5: George Saad University of New Mexico Department of Computer Science

El-Farol Game

• A set of n selfish players• Actions:• go to the bar• stay home

• The cost function:• cost to stay = 1,• cost to go: f(x)

Objective: find an equilibrium which minimizes Social Cost, where

Page 6: George Saad University of New Mexico Department of Computer Science

Our El Farol Extension

We extend the cost function:• The cost to stay can be any constant t > 0,• The cost to go, f(x):

Page 7: George Saad University of New Mexico Department of Computer Science

Positive and Negative Network Effects

“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”

“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”

[Easley and Kleinberg, 2010]

Page 8: George Saad University of New Mexico Department of Computer Science

Solution ConceptsHow to minimize

my own cost unilaterally?• Nash Equilibrium • Unfortunately, NE has high social cost.

• Correlated Equilibrium (CE)• Mediator implements CE.

Page 9: George Saad University of New Mexico Department of Computer Science

Mediator• A trusted coordinator that • gives recommendations to the players, • implements a correlated equilibrium.• Note that all players have free will.

• A mediator is optimal when it implements the best correlated equilibrium.

Page 10: George Saad University of New Mexico Department of Computer Science

Let mediator have a probability distribution on k ≥ 1 strategy profiles.• The players know probability distribution and strategy profiles.• Mediator selects secretly one strategy profile according to the

probability distribution. • Mediator advises each player privately and separately.• No player has incentive to deviate unilaterally from the advice.

How to design such a mediator?

( [s11,…,s1n], p1 )

( [s21,…,s2n] , p2 )

( [sk1,…,skn] , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

Page 11: George Saad University of New Mexico Department of Computer Science

Example for (c, s1, s2)-El Farol Game

• For a (2, 4, 4)-El Farol game:• Best Nash Equilibrium:

• ¼-fraction of players go.• Social cost = n.

• An optimal mediator:• Strategy profile 1: (x1 = 0, p1 = 1/3)

• Strategy profile 2: (x2 = ½, p2 = 2/3)• Expected social cost = ⅔ n.

• The optimal social cost (no selfishness)• ½-fraction of players go.• Social cost = ½ n.

Page 12: George Saad University of New Mexico Department of Computer Science

How efficient is our mediator?

Page 13: George Saad University of New Mexico Department of Computer Science

Our Contributions• Game of positive and negative network effects, we characterize: • Optimal Social Cost,• Best Nash Equilibrium (BNE), and• Best Correlated Equilibrium (BCE).

• Efficiency of optimal mediator for this game• When BCE = BNE?• MV and EV can be unbounded!

Page 14: George Saad University of New Mexico Department of Computer Science

Optimal Social Cost

We characterize x* as a function of parameters of our game.

Page 15: George Saad University of New Mexico Department of Computer Science

Best Nash Equilibrium

Page 16: George Saad University of New Mexico Department of Computer Science

Optimal Mediator

Page 17: George Saad University of New Mexico Department of Computer Science

- - p is a function of c, s1 and s2.- p can be 0 or 1 for some values of c, s1 and s2.

When is BCE = BNE?

Page 18: George Saad University of New Mexico Department of Computer Science

If c ≤ 1, then all players would rather stay, if f(1) ≥ 1; all players would rather go, if f(1) < 1.

If c > 1 and λ(c, s1, s2) ≥ 1, then all players would rather go, where:

When BCE = BNE?

BCE is advantageous over BNE when c > 1 and λ < 1.

Page 19: George Saad University of New Mexico Department of Computer Science

Can MV be unbounded?c s1 s2 c/s1 1

Page 20: George Saad University of New Mexico Department of Computer Science

Can EV be unbounded?c s1 s2 c/s1 1

Page 21: George Saad University of New Mexico Department of Computer Science

Related Work• Linear Congestion Games [CK’05]:• 1.577 ≤ EV ≤ 1.6 and MV ≤ 1.015.

• Ranking Games [BFHS’07]:• EV = n-1 and MV = n-1 for n>3.

• Virus Inoculation Game [DMNS’09]:• EV = and MV = .

Page 22: George Saad University of New Mexico Department of Computer Science

Conclusion

• We extended the El-Farol game to have both positive and negative network effects.

• For this extension, we have characterized:• the optimal social cost, • the BNE, and• the BCE.

• We characterized the MV and the EV for this game.• We show when BCE = BNE.• We show that MV and EV can be unbounded in this game.

Page 23: George Saad University of New Mexico Department of Computer Science

Open Problems

• Multi-Site El-Farol Game (> 2 actions): • The bar has k > 2 sites.• Each player chooses which site to go to.• How many strategy profiles required for BCE?

• If f(x) is polynomial in x, with degree > 1, then• what is the characterization of BCE? • Is # strategy profiles related to degree of

f(x)?

Page 24: George Saad University of New Mexico Department of Computer Science

Self-Healing Communication Self-Healing Computation

Part II : Malice

Page 25: George Saad University of New Mexico Department of Computer Science

Malice• We consider the presence of an adversary.

• Adversary takes over a subset of nodes to cause faults.

• Byzantine Faults vs Fail-Stop Faults

• Fault Tolerance:

• Replication

• Self-healing (automatic recovery)

Page 26: George Saad University of New Mexico Department of Computer Science

Fault Tolerance• Non-self-healing algorithms for Byzantine model: [NW’03,

HK’04, FSY’05, AS’06, AJR’06, AS’07, JY’08, GKKY’10, GKKY’13].

• Self-healing algorithms for fail-stop model: [BSAS’06, ST’06, HRST’08, HST’09, PT’11, ST’11].

• Self-healing Algorithms for Byzantine faults?

• We develop self-healing algorithms to recover from Byzantine faults.

Page 27: George Saad University of New Mexico Department of Computer Science

How to recover from Byzantine faults?

Self-Healing CommunicationMessage is sent through a path of nodes.

Self-Healing ComputationComputation is performed through circuits.

Page 28: George Saad University of New Mexico Department of Computer Science

Our Model• A network of n nodes• Static and Computationally Bounded Adversary• Adversary controls up to ¼ of the nodes.• Partially Synchronous Communication: Upper bound of time

steps between sending and receiving messages.• Rushing Adversary: Waiting until receiving all messages from

good nodes before responding.• After bad nodes selected, Quorum Graph is built up [KLST’10]• Any quorum is a set of θ(log n) nodes; and • Each node is in θ(log n) quorums.• At most ¼ of nodes in any quorum are bad.

KLST’10 : Valerie King, Steve Lonargan, Jared Saia and Amitabh Trehan, “Load balanced Scalable Byzantine Agreement through Quorum Building, with Full Information”, ICDCN 2010.

Page 29: George Saad University of New Mexico Department of Computer Science

Naïve Communication (no self-healing)

• All-to-all communication between quorums• Message cost O(l log2 n), and latency O(l)• However, we can do better by self-healing.

Page 30: George Saad University of New Mexico Department of Computer Science

Our Contribution• We developed a self-healing algorithm that detects message

corruptions and marks bad nodes.

• Each bad node causes O((log∗ n)2) corruptions, in expectation.“Fool me once, shame on you. Fool me ω((log* n)2) times,

shame on me.”

Iterated Logarithme.g. log*

1010 = 5

Naïve Communication Our Algorithm

Message cost O(l log2 n ) O(l + log n)Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))

Page 31: George Saad University of New Mexico Department of Computer Science

Our Algorithm (SEND)

SEND-PATH

SEND

CHECK

CHECK1 CHECK2

HEAL

HEAL is triggered O(t) times before all bad nodes are marked.

Page 32: George Saad University of New Mexico Department of Computer Science

CHECK1• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).• Detects corruptions with const prob. for l = O(log2 n).

• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).

Page 33: George Saad University of New Mexico Department of Computer Science

CHECK2• SEND triggers CHECK2 with probability 1/(log ∗ n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗ n)2).

• SEND triggers CHECK2 with probability 1/(log ∗n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗n)2).

Page 34: George Saad University of New Mexico Department of Computer Science

CHECK2 Analysis• Deception Interval : a substring of bad nodes, where a

corruption occurs.• Key Points of Detecting Corruptions:• Deception interval shrinks logarithmically with prob. ≥ ½.• O(log* n) rounds to shrink deception interval to size zero.

Page 35: George Saad University of New Mexico Department of Computer Science

CHECK2 Analysis• Deception Interval shrinks logarithmically from round to round:

Page 36: George Saad University of New Mexico Department of Computer Science

HEAL

• Inspects each node participated what it received and sent

• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other

• Each pair of nodes in conflict has at least one bad node

Page 37: George Saad University of New Mexico Department of Computer Science

?HEAL

• If ½ nodes in any quorum are marked, they are set unmarked.

• HEAL is triggered O(t) times before all bad nodes are marked.

• We show that using a potential function argument.

• f(b,g) is monotonically increasing,• Δf(b,g) is at least some +ve constant.• When f(b,g) = t, we are done.

Page 38: George Saad University of New Mexico Department of Computer Science

Empirical Results• Our simulation runs:• over butterfly networks of quorums,• for different network sizes, up to

n=30k, and • for different fractions of bad nodes.

• Simulation terminates after all bad nodes are marked.

• The results are taken over 3000 experiments.

Page 39: George Saad University of New Mexico Department of Computer Science

# messages is improved by a factor of 60 for CHECK1

39,100

649

Empirical Results# Messages reduces by a factor of 60 (n~30k)

39,100

1,177

# messages is improved by a factor of 33 for CHECK2

Page 40: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by 1½ times (n~30k)

Latency increases by 1½ times for CHECK1

39,100

649

Latency increases by 2 times for CHECK2

18

13

25

13

Page 41: George Saad University of New Mexico Department of Computer Science

Empirical ResultsCorruption Probability 0

39,100

649

18

13

25

13

CHECK1 CHECK2

Page 42: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by O(log2 n) times

Page 43: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by (1) timesθ

Page 44: George Saad University of New Mexico Department of Computer Science

How to recover from Byzantine faults?

Self-Healing CommunicationMessage is sent through a path of nodes.

Self-Healing ComputationComputation is performed through circuits.

Page 45: George Saad University of New Mexico Department of Computer Science

Quorum Graph• Quorum Graph has:• n input quorums; • m quorum gates; and• one output quorum

Page 46: George Saad University of New Mexico Department of Computer Science

• No self-healing• All nodes in each quorum (gate) perform the same computation• Results are sent between quorums via all-to-all communication• Expensive resource cost

Naïve Computation

Page 47: George Saad University of New Mexico Department of Computer Science

Our Contribution

Naïve Computation Our Algorithm

Message cost O( (n+m) log2 n ) O(m + nlog n)

Computation cost O( (n+m) log2 n ) O(m + nlog n)

Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))

We develop a self-healing algorithm for computation networks

Page 48: George Saad University of New Mexico Department of Computer Science

Our Algorithm (COMPUTE)

COMPUTE

CHECK

EVALUATE

RECOVER

Page 49: George Saad University of New Mexico Department of Computer Science

CHECK Algorithm• CHECK has O(log* n) rounds• In each round, nodes are selected uniformly at random, and same

computation is performed

Round 1

Round 2

Page 50: George Saad University of New Mexico Department of Computer Science

CHECK Algorithm• Adversary corrupts computation in a Deception Subgraph.

• Key points of corruption detection:• We prove that deception subgraph shrinks logarithmically in each

round with constant probability.• Once deception subgraph shrinks to size zero, corruption is

detected.

Page 51: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 1

Round 2

Page 52: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 2

Round 3

Page 53: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 3

Round 4

Page 54: George Saad University of New Mexico Department of Computer Science

RECOVER

• Inspects each node participated what it received and sent

• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other

• Each pair of nodes in conflict has at least one bad node

Page 55: George Saad University of New Mexico Department of Computer Science

?RECOVER

• If ½ nodes in any quorum are marked, they are set unmarked.

• HEAL is triggered O(t) times before all bad nodes are marked.

• We show that using a potential function argument.

• f(b,g) is monotonically increasing, and• when it reaches t, we are done.

Page 56: George Saad University of New Mexico Department of Computer Science

Empirical Results• Our simulation runs:• over perfect binary trees of quorums,• for different network sizes, up to 8k, and • for different fractions of bad nodes.

• Simulation terminates after all leaders are good.

• The results are taken over 3000 experiments.

Page 57: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by factor of 65 (n~8k)

Reduced by afactor of 651.01M

66M

Page 58: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by 1.75 times (n~8k)

Increases 1.75 times

63 time steps

36

Page 59: George Saad University of New Mexico Department of Computer Science

Empirical ResultsCorruption Probability 0

Page 60: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by O(log2n) times!

Page 61: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by (1) timesθ

Page 62: George Saad University of New Mexico Department of Computer Science

Conclusion

• We developed self-healing algorithms to recover networks from Byzantine faults.

• Message cost is reduced polylogarithmically in n, compared to non-self-healing algorithms.

• Experiments show that message cost reduced by • Up to a factor of 60 for communication networks• Up to a factor of 65 for computation networks

• For t < n/4, the expected total number of corruptions is O(t(log∗ n)2)

Page 63: George Saad University of New Mexico Department of Computer Science

Open Problems• Can we limit the number of corruptions to O(t)?• How to self-heal networks with churn? adaptive adversary?• How to self-healing asynchronous networks?• We trigger CHECK and select the nodes in a centralized

manner. How we make CHECK decentralized?• We propose a decentralized CHECK for future work.• We implement a simulation that suggests interesting results.

Page 64: George Saad University of New Mexico Department of Computer Science

Thanks! Any Questions?