40
CS43 4/534: Topics in Network Systems High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store Yang (Richard) Yang Computer Science Department Yale University 208A Watson Email: [email protected] http://zoo.cs.yale.edu/classes/cs434/ Acknowledgements: Paxos slides are based on RAFT usanility slides linked on the Schedule page.

CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

CS434/534: Topics in Network Systems

High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store

Yang (Richard) YangComputer Science Department

Yale University208A Watson

Email: [email protected]

http://zoo.cs.yale.edu/classes/cs434/

Acknowledgements: Paxos slides are based on RAFT usanility slides linked on the Schedule page.

Page 2: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

2

Outline

q Admin and recapq Controller software framework (network OS)

supporting programmable networkso architectureo data model and operations: OpenDaylight as an exampleo distributed data store

• overview• basic Paxos• multi-paxos

Page 3: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Proactive HL Programmingq Objective: Proactive, fully generate multi-

table datapath pipeline

q Basic ideas: o A function f consists of a sequence of

instructions I1, I2, …, In, and one can • (conceptually) consider each instruction in the dataflow

as a table• identify instructions with compact representation

(symbolic analysis and direct state plug in)• use state propagation to compute compact tables

Page 4: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

4

L1, EMPTY

L2, dstSw=sw1, dstMac=11

L2, dstSw=sw2, dstMac=22

L2, dstSw=sw2, dstMac=33

L2, dstSw=swu, dstMac=*

L4, dstSw=sw1, dstCond=S

L4, dstSw=sw2, dstCond=C

Pri p.dstMac Action2 11 dstSw=sw1

2 22 dstSw=sw2

2 33 dstSw=sw2

1 * dstSw=swu

Pri p.dstMac Action2 11 dstCond=S

2 22 dstCond=C

2 33 dstCond=C

1 * dstCond=UK

L4, dstSw=swu, dstCond=UK

L∞, aRoutes(sw1, S)

L∞, aRoutes(sw2, C)

L∞,aRoutes(swu, UK)

Map<MAC, Switch> hostTable; // {11->sw1, 22->sw2, 33->sw2, others swu} Map<MAC, Cond> condTable; // {11->S, 22->C, 33->C, others UK}

0. Route onPacketIn(Packet p) {L1. Switch dstSw = hostTable.get( p.dstMac() );L2. Cond dstCond = condTable.get( p.dstMac() );L3. Route aRoutes = AllPairCondRoutes();L4. return aRoutes.get(dstSw, dstCond);

Page 5: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

g1

!g1

srcMac

dstPort

dstMac

g1 = dstPort < 1025

dstSw = SW_FW dstCond = V

g2=openPorts.contain(dstPort)

egress=DropdstSw =

hostTable(dstMac)dstCond =

condTable(dstMac)

g1 g2 g2

phi(dstSw) phi(dstCond)

!g2

srcSw = hostTable(srcMac)

egress=f(dstCond, dstSw, srcSw)

Pipeline design for constraints and optimization

Page 6: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: HL NOS Architecture

NetworkView

NEDatapath

Service/Policy

NEDatapath

logically centralized data store

Program

6

Key component - data store:• Data model• Data access model• Data store availability

Key goal: provide applications w/ high-level views and make the views highly available (e.g., 99.99%), scalable.

Page 7: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: OpenDaylight Software Architecture

7

Controller

Model-Driven SAL (MD-SAL)

Protocol Plugin RESTCONF

NETCONF SERVER

Network Devices Applications

App/Service Plugin

App/Service Plugin

......

Protocol Plugin

Config Subsystem

Messaging Data Store

Remote Controller Instance

Remote Controller Instance

Network DevicesNetwork Devices

ApplicationsApplications

Page 8: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Data Model - Yang Data Tree

https://wiki.opendaylight.org/view/OpenDaylight_Controller:Architectural_Framework

8

Page 9: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

ReadWriteTransaction transaction = dataBroker.newReadWriteTransaction();Optional<Node> nodeOptional;nodeOptional = transaction.read(

LogicalDataStore.OPERATIONAL,n1InstanceIdentifier);

transaction.put(LogicalDataStore.CONFIG,n2InstanceIdentifier,topologyNodeBuilder.build());

transaction.delete(LogicalDataStore.CONFIG,n3InstanceIdentifier);

CheckedFuture future;future = transaction.submit();

n1

/operational /config

network-topo

BGPv4 overlay1

nodesnodes

Datastore

n3n1

transaction

n2 n3

Recap: Data Operation Model - Transactions

- Operations included in transactions

9

Page 10: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Multi-Server NOS

10

Servers

Clients

request/response

Discussion: why multiple servers?

Page 11: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Replicated operations log => replicated state machineo All servers execute same commands in same order

q Consensus module ensures proper log replicationq System makes progress as long as any majority of servers are up

Recap: A Common Multi-Server (Data Store) Arch: Replicated Log

add jmp mov shlLog

ConsensusModule

StateMachine

add jmp mov shlLog

ConsensusModule

StateMachine

add jmp mov shlLog

ConsensusModule

StateMachine

Servers

Clientsshl

11

Page 12: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

12

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

Page 13: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Two general approaches to achieving replicated log (consensus in general):q Symmetric, leader-less:

o All servers have equal roleso Clients can contact any servero We will see Basic Paxos as an example

q Asymmetric, leader-based:o At any given time, one server is in charge, others accept

its decisionso Clients communicate with the leadero We will see Raft, which uses a leadero OpenDaylight is based on Raft

Roadmap Today

13

Page 14: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

14

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• basic Paxos

Page 15: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Leslie Lamport, 1989q Nearly synonymous with consensus

Warning

“The dirty little secret of the NSDI community is that at most five people really, truly understand every part of Paxos ;-).”—NSDI reviewer

“There are significant gaps between the description of the Paxos algorithm and the needs of a real-worldsystem...the final system will be based on an unproven protocol.”—Chubby authors

15

Page 16: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Basic Paxos (“single decree”):o One or more servers propose values (what is a

value in our context?)o System must agree on a single value (what is

chosen in our context?)• Only one value is ever chosen

q Multi-Paxos:o Combine several instances of Basic Paxos to

agree on a series of values forming the log

The Paxos Approach

16

Page 17: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Safety:o Only a single value may be choseno A server never learns that a value has been chosen

unless it really has been

q Liveness (as long as majority of servers up and communicating with reasonable timeliness):o Some proposed value is eventually choseno If a value is chosen, servers eventually learn about it

Requirements for Basic Paxos

17

Page 18: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Proposers:o Handle client requestso Active: put forth particular values to be chosen

q Acceptors:o Passive: respond to messages from proposerso Responses represent votes that form consensuso Store chosen value, state of the decision

processFor this class:

o Each Paxos server contains both components

Paxos Components

18

Page 19: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

19

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• Paxos

– basic paxos» structure» initial design options

Page 20: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q A single acceptor chooses value

q Problem: o What if acceptor crashes

after choosing?q Solution:

o Quorum: multiple acceptors (3, 5, ...)

o As long as majority of acceptors available, system can make progress

Design I: A Single AcceptorProposers

Acceptor

add jmp shl sub

jmp

20

Page 21: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Design decision: which value to accept?o Design: Acceptor accepts only first value it receives

q Problem: o Split votes: if simultaneous proposals, no value might

have majority

=> An Acceptor must be able to sometimes “change mind” to accept multiple (different) values

Design II

time

s1s2s3s4s5

accept?(red)

accept?(blue)

accept?(green)

accepted(red)

accepted(blue)

accepted(green)

accepted(red)

accepted(blue)

21

Page 22: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Design decision: which value to accept?o Design: Acceptor accepts every value it receives

q Problem: o Conflicting choices

Solution: If a value has been chosen => future proposals propose/choose that same value

Design III

time

s1s2s3s4s5

accept?(red)

accept?(blue)

accepted(red)

accepted(red)

accepted(blue)

accepted(red)

accepted(blue)

accepted(blue)

Red Chosen

Blue Chosen

22

Page 23: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

24

Outlineq Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• Paxos

– basic paxos» structure» initial design options» design details

Page 24: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Two-phase approach:q Phase 1: broadcast Prepare RPCs

o Find out about any chosen valueso Block older proposals that have not yet

completedq Phase 2: broadcast Accept RPCs

o Ask acceptors to accept a specific value

Basic Paxos Protocol Structure

25

Page 25: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Each proposal has a unique numbero Higher numbers take priority over lower numberso It must be possible for a proposer to choose a new proposal

number higher than anything it has seen/used beforeq One simple approach:

o Each server stores maxRound: the largest Round Number it has seen so far

o To generate a new proposal number:• Increment maxRound• Concatenate with Server Id

o Proposers must persist maxRound on disk: must not reuse proposal numbers after crash/restart

Proposal Numbers

Server IdRound Number

Proposal Number

26

Page 26: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Basic Paxos ProtocolAcceptors

3) Respond to Prepare(n):o If n > minProposal then minProposal = no Return(acceptedProposal, acceptedValue)

6) Respond to Accept(n, value):o If n ≥ minProposal then

acceptedProposal = minProposal = nacceptedValue = value

o Return(minProposal)

Acceptors must record minProposal, acceptedProposal, and acceptedValue on stable storage (disk)

Proposers1) Choose new proposal number n2) Broadcast Prepare(n) to all

servers

4) When responses received from majority:

o If any acceptedValues returned, replace value with acceptedValuefor highest acceptedProposal

5) Broadcast Accept(n, value) to all servers

6) When responses received from majority:

o Any rejections (result > n)? goto (1)o Otherwise, value is chosen

27

Page 27: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Basic setting: single proposero s1 proposes X

Basic Paxos Protocol: Exercise

time

s1s2s3s4s5

X

Page 28: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Q: Who knows that a value has been chosen?

q Q: If other servers want to know the chosen value, what should they do?

Question

Page 29: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Setting: a new proposal arrives after a value is already chosen

Basic Paxos Protocol: Exercise

time

s1

s2

s3

s4

s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 X

A 4.5 X

A 4.5 X

“Prepare proposal 3.1 (from s1)”

X

Y

“Accept proposal 4.5with value X (from s5)”

values

Page 30: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Previous value not chosen, but new proposer sees it:o New proposer will use existing valueo Both proposers can succeed

Basic Paxos Protocl: Race Condition I

time

s1s2s3s4s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 X

A 4.5 X

A 4.5 X

X

Y

values

Page 31: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Previous value not chosen, new proposer doesn’t see it:o New proposer chooses its own valueo Older proposal blocked

Basic Paxos Protocl: Race Condition II

time

s1s2s3s4s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 Y

A 4.5 Y

A 4.5 Y

X

Y

values

Page 32: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Summary of Cases

q Three possibilities when later proposal prepares:1. Previous value already chosen:

• New proposer will find it and use it2. Previous value not chosen, but new proposer sees it:

• New proposer will use existing value• Both proposers can succeed

3. Previous value not chosen, new proposer doesn’t see it:• New proposer chooses its own value• Older proposal blocked

33

Page 33: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Competing proposers can livelock:

q Potential solutions: o randomized delay before restarting, give other

proposers a chance to finish choosingo use leader election instead

Liveness

time

s1s2s3s4s5

A 3.1 XP 3.1

P 3.5

A 3.5 Y

P 3.1

P 3.1

P 3.5

P 3.5

A 3.1 X

A 3.1 X

P 4.1

P 4.1

P 4.1

A 3.5 Y

A 3.5 Y

P 5.5

P 5.5

P 5.5 A 4.1 X

A 4.1 X

A 4.1 X

Page 34: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

35

Outline

q Admin and recapq Controller framework supporting programmable

networkso architectureo data model and operations: OpenDaylight as an exampleo distributed data store

• overview• Paxos

– basic Paxos– multi-Paxos

Page 35: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Separate instance of Basic Paxos for each entry in the log:o Add index argument to Prepare and Accept

(selects entry in log)

Multi-Paxos

add jmp mov shlLog

ConsensusModule

StateMachine Server

Client

shl

OtherServers

1. Client sends command to server

2. Server uses Paxos to choose command as value for a log entry

3. Server waits for previous log entries to be applied, then applies new command to state machine

4. Server returns result from state machine to client

Page 36: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Which log entry to use for a given client request?

q Performance optimizations:q Ensuring full replicationq Client protocolq Configuration changes

Note: Multi-Paxos not specified precisely in literature

Multi-Paxos Issues

Page 37: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Selecting Log Entries

38

q When request arrives from client:o Find first log entry not known to be choseno Run Basic Paxos to propose client’s command for this indexo Prepare returns acceptedValue?

• Yes: finish choosing acceptedValue, start again• No: choose client’s command

cmpmov add

cmp

ret

1 2 3 4 5 6 7

s1

submov add rets2

cmpmov add rets3

cmpmov add

shl

ret

1 2 3 4 5 6 7

s1

submov add rets2

cmpmov add rets3

cmp

sub jmp

jmp

jmp Known Chosen

Logs Before Logs After

Page 38: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Selecting Log Entries, cont’d

39

q Servers can handle multiple client requests concurrently:o Select different log entries for each

q Must apply commands to state machine in log order

Page 39: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q What are inefficiencies of the concurrent, leaderless multi-paxos:o With multiple concurrent proposers, conflicts and

restarts are likely (higher load → more conflicts)o 2 rounds of RPCs for each value chosen (Prepare, Accept)

Leaderless Multi-Paxos Discussion

Page 40: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Prepare for Next Class

q Stare at Paxos and play with examples to better understand it

q Read Raftq Start to think about projects

54