31
Consensus as a Network Service Huynh Tu Dang, Pietro Bressana, Han Wang, Ki Suh Lee, Hakim Weatherspoon, Marco Canini, Fernando Pedone, and Robert Soulé Università della Svizzera italiana (USI), Cornell University, and KAUST 1

Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Consensus as aNetwork Service

Huynh Tu Dang, Pietro Bressana, Han Wang, Ki Suh Lee, Hakim Weatherspoon, Marco Canini, Fernando Pedone, and Robert Soulé Università della Svizzera italiana (USI),Cornell University, and KAUST

1

Page 2: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Consensus is a Fundamental Problem

Many distributed problems can be reduced to consensus

E.g., Atomic broadcast, atomic commit

Consensus protocols are the foundation for fault-tolerant systems

E.g., OpenReplica, Ceph, Chubby

Any improvement in performance would have HUGE impact

2

Page 3: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Key Idea: Move Consensus Into Network Hardware

This work focuses on Paxos

One of the most widely used consensus protocol

Has been proved to be correct

Enabling technology trends:

Hardware is becoming more flexible: e.g. PISA, FlexPipe, NFP-6xxx

Hardware is becoming more programmable: e.g., POF, PX, and P4

3

Page 4: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Outline of This Talk

Introduction

Consensus Background

Design, Implementation & Evaluation

Conclusions

4

Page 5: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Paxos Roles and Communication

5

Proposers propose values

A distinct proposer assumes the role of Coordinator

Acceptors accept a proposal, promise not to accept any other proposals

Learners require a quorum of messages from Acceptors, “deliver” a value

Coordinator

Proposer

Acceptor 1

Acceptor 2

Acceptor 3

. .Learners

(up to n). .

proposalPhase2A

Phase2B

Page 6: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Design

6

Page 7: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Design Goals 1: Be a Drop-In Replacement

István et al. [NSDI ’16] implement ZAB in FPGAs, but require that the application written in the Hardware Description Language

High-level languages make hardware development easier

Implementing LevelDB in P4 might still be tricky….

7

Page 8: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Standard Paxos API

8

1 void submit(struct paxos_ctx * ctx,2 char * value,3 int size);4

5 void (*deliver)(struct paxos_ctx* ctx,6 int instance,7 char * value,8 int size);9

10 void recover(struct paxos_ctx * ctx,11 int instance,12 char * value,13 int size);

Figure 4: CAANS application level API.

paxos ctx struct. When a learner learns a value, itcalls the application-specific deliver function. Thedeliver function returns a buffer containing the learnedvalue, the size of the buffer, and the instance number forthe learned value.

The recover function is used by the application todiscover a previously agreed upon value for a particularinstance of consensus. The recover function results in thesame sequence of Paxos messages as the submit func-tion. The difference in the API, though, is that the ap-plication must pass the consensus instance number as aparameter, as well as an application-specific no-op value.The resulting deliver callback will either return the ac-cepted value, or the no-op value if no value had been pre-viously accepted for the particular instance number.Hardware/Software divide. An important question foroffering consensus as a network service is: exactly whatlogic should be implemented in network hardware, andwhat logic should be implemented in software?

In the CAANS architecture, network hardware executesthe logic of coordinators and acceptors. This choice al-lows CAANS to address the bottlenecks identified in Sec-tion 2. Moreover, since the proposer and learner code areimplemented in software, the design facilitates the simpleapplication-level interface described above. The logic ofeach of the roles is neatly encapsulated by communicationboundaries.

Figure 3 illustrates the CAANS architecture for aswitch-based deployment. In the figure, switch hardwareis shaded grey, and commodity servers are colored white.Note that a backup coordinator can execute on either asecond switch, or a commodity server, as we’ll discussbelow. We should also point out that CAANS could be de-ployed on other devices, such as the programmable NICsthat we use in the evaluation.Paxos header. Network hardware is optimized to processpacket headers. Since CAANS targets network hardware,it is a natural choice to map Paxos messages into a Paxos-protocol header. The Paxos header follows the transportprotocol header (e.g., UDP), allowing CAANS messages

1 struct paxos_t {2 uint8_t msgtype;3 uint8_t inst[INST_SIZE];4 uint8_t rnd;5 uint8_t vrnd;6 uint8_t swid[8]7 uint8_t value[VALUE_SIZE];8 };

Figure 5: Paxos packet header.

to co-exist with standard network hardware.In a traditional Paxos implementation, each participant

receives messages of a particular type (e.g., Phase 1A,2A), executes some processing logic, and then synthesizesa new message that it sends to the next participant in theprotocol.

However, network hardware, in general, cannot craftnew messages; they can only modify fields in the headerof the packet that they are currently processing. There-fore, a network-based Paxos needs to map participantlogic into forwarding and header rewriting decisions (e.g.,the message from proposer to coordinator is transformedinto a message from coordinator to each acceptor byrewriting certain fields). Because the message size can-not be changed at the switch, each packet must containthe union of all fields in all Paxos messages, which fortu-nately are still a small set.

Figure 5 shows the CAANS packet header for Paxosmessages, written as a C struct. To keep the header small,the semantics of some of the fields change depending onwhich participant sends the message. The fields are as fol-lows: (i) msgtype distinguishes the various Paxos mes-sages (e.g., phase 1A, 2A); (ii) inst is the consensusinstance number; (iii) rnd is either the round numbercomputed by the proposer or the round number for whichthe acceptor has cast a vote; vrnd is the round numberin which an acceptor has cast a vote; (iv) swid identi-fies the sender of the message; and (v) value containsthe request from the proposer or the value for which anacceptor has cast a vote.

A CAANS proposer differs from a standard Paxos pro-poser because before forwarding messages to the coor-dinator, it must first encapsulate the message in a Paxosheader. Through standard sockets, the Paxos header isthen encapsulated inside a UDP datagram and we rely onthe UDP checksum to ensure data integrity.Memory limitations. CAANS aims to support practi-cal systems that use Paxos as a building block to achievefault tolerance. A prominent example of these are servicesthat rely on a replicated log to persistently record the se-quence of all consensus values. The Paxos algorithm doesnot specify how to handle the ever-growing, replicatedlog that is stored at acceptors. On any system, this cancause problems, as the log would require unbounded disk

5

Send a value

Deliver a value

Discover prior value

Page 9: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Design Goals 2: Alleviate Bottlenecks

9

0%

25%

50%

75%

100%

ProposerCoordinator

AcceptorLearner

CPU

util

izat

ion

● ●

●● ●

25%

50%

75%

100%

4 8 12 16 20Number of Learners

CPU

Util

izat

ion

● CoordinatorProposerAcceptorLearner

Coordinator and acceptors are to blame!

Page 10: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Hardware/Software

10

Proposer Proposer

Learner

Coordinator

AcceptorAcceptorAcceptor

Coordinator Backup

Learner

Challenge: map Paxos logic into stateful forwarding decisions

Facilitate software

API

Alleviatebottlenecks

Page 11: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

NetPaxos: Header Definition & Parser

header_type paxos_t { fields { msgtype : 16; inst : 32; rnd : 16; vrnd : 16; acptid : 16; paxosval : 256; } }

parser parse_ethernet { extract(ethernet); return parse_ipv4; } parser parse_ipv4 { extract(ipv4); return parse_udp; } parser parse_udp { extract(udp); return select(udp.dstPort) { PAXOS_PROTOCOL: parse_paxos; default: ingress; } } parser parse_paxos { extract(paxos); return ingress; }

11

Page 12: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Acceptor Control Flow

isPaxos?

round_tbl

acceptor_tbl

Packet’srnd>=acceptor’srnd?

forward_tbl

12

Drop

load acceptor’s rnd stored in registers

Ingress

Update: registers’ states ‘msgtype’ ‘acptid’ UDP dst port

isIPv4?Drop

Egress

Page 13: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

round_tbl

forward_tbl

13

Drop

Ingress

Drop

Egress

control ingress { if (valid(ipv4)) {

apply(forward_tbl); }

if (valid(paxos)) { apply(round_tbl);

if(paxos.rnd >= current.rnd){ apply(acceptor_tbl); } } }

acceptor_tbl

Page 14: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

round_tbl

forward_tbl

14

Drop

Ingress

Drop

Egress

acceptor_tbl

Acceptor Control Flow

Page 15: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

round_tbl// uint16_t rounds_regs[64000]; register rounds_reg { width : 16; instance_count : 64000; }

action read_round() { // uint16_t current.round = rounds_reg[paxos.inst] register_read(current.round, rounds_reg, paxos.inst); }

table round_tbl { actions { read_round; } size : 1; }

15

Page 16: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

round_tbl

forward_tbl

16

Drop

Ingress

Drop

Egress

Acceptor Control Flow

acceptor_tbl

Page 17: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

acceptor_tblaction handle_2a(learner_port) { // rounds_reg[paxos.inst] = paxos.rnd register_write(rounds_reg, paxos.inst, paxos.rnd); // vrounds_reg[paxos.inst] = paxos.rnd register_write(vrounds_reg, paxos.inst, paxos.rnd); // values_reg[paxos.inst] = paxos.rnd register_write(values_reg, paxos.inst, paxos.paxosval);

register_read(paxos.acptid, acceptor_id, 0); modify_field(paxos.msgtype, PAXOS_2B); modify_field(udp.dstPort, learner_port); }

table acceptor_tbl { reads { paxos.msgtype : exact }; actions { handle_1a; handle_2a }; }

17

Page 18: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

ImplementationSource code

Proposer and learner written in C

Coordinator and acceptor written in P4

4 Compilers

P4C

P4FPGA

Xilinx SDNet

Netronome SDK

18

4 Hardware target platforms

NetFPGA SUME (4x10G)

Netronome Agilio-CX (1x40G)

Alpha Data ADM-PCIE-KU3 (2x40G)

Xilinx VCU109 (4x100G)

2 Software target platforms

Bmv2

DPDK ( work in progress )

Page 19: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

P4 Compilers

19

Compiler Target Remark

P4C SoftwareSwitch Supports most of the P4 constructs

P4@ELTE DPDK Does not support register operations. Limits field length to 32 bits

P4FPGA FPGAs Must write modules for unsupported P4 constructs

XilinxSDNet FPGAs Does not support register operations.

Requires a wrapper for the packet stream Netronome

SDK Netronome ISAs Works only with Netronome devices. Custom actions can be written in Micro-C

BarefootCapilano

Barefoot Tofino Tbps switch

Page 20: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Evaluation

20

Page 21: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Experiment: What is the Absolute Performance?

Run Coordinator / Acceptor in isolation

Testbed:

NetFPGA SUME board in a SuperMicro Server

A Packet generator for offering load

21

Page 22: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Absolute Performance

22

Late

ncy

(us)

0

0.2

0.4

0.6

0.8

Forwarding Coordinator Acceptor

Measured on NetFPGA SUME using P4FPGA

Throughput is over 9 million consensus messages / second (close to line rate)

Little overhead latencycompared to simply forwarding packets

Page 23: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Experiment: What is the End-to-End Performance?

Comparing NetPaxos to a software-based Paxos (Libpaxos)

Testbed:

4 NetFPGA SUME boards in SuperMicro Servers

An OpenFlow-enable 10 Gbps switch (Pica8 P-3922 switch)

23

Page 24: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

End-to-End Performance

24

0

1,000

2,000

3,000

50,000 100,000Throughput (Msgs / S)

Late

ncy

(µs)

CAANSLibpaxos

2.24x throughput improvement over software implementation

75% reduction in latency

Similar results when replicating LevelDB as application

Page 25: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Next Steps

We make consensus great again!

The ball is now in the application developer’s court

Suggests direction for future work

25

0%

25%

50%

75%

100%

Proposer Learner

CPU

util

izat

ion

Page 26: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Lessons Learned

26

Page 27: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Outlook

The performance of consensus protocols has a dramatic impact on the performance of data center applications

Moving consensus logic into network hardware results in significant performance improvements

27

“a HUGE wave of consensus messages is approaching”

Page 28: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

28

http://www.inf.usi.ch/faculty/soule/netpaxos.html

Page 29: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

29

Questions & Answers

Page 30: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

Performance After Failure

30

0

50,000

100,000

150,000

2 4 6 8 10Time (Second)

Thro

ughp

ut (M

sgs /

S)

0

50,000

100,000

150,000

2 4 6 8 10Time (Second)

Thro

ughp

ut (M

sgs /

S)

Coordinator failurewith software backup Acceptor failure

Page 31: Consensus as a Network Service - USI Informatics · offering consensus as a network service is: exactly what logic should be implemented in network hardware, and what logic should

End-to-End Experiment NetPaxos Setup

31

RunPaxosprotocol

Programmable device

ApplicationClients

ApplicationServers

ApplicationClients

ApplicationServers