Building Compilers for Reconfigurable Switches• We will program them using languages like P4 •...

Preview:

Citation preview

BuildingCompilersforReconfigurableSwitches

LavanyaJose,LisaYan,NickMcKeown,andGeorgeVarghese

1ResearchfundedbyAT&T,Intel,OpenNetworkingResearchCenter.

Inthenext20minutes

•  Fixed-funcPonswitchchipswillbereplacedbyreconfigurableswitchchips

•  WewillprogramthemusinglanguageslikeP4•  WeneedacompilertocompileP4programstoreconfigurableswitchchips.

2

Fixed-FuncPonSwitchChips

QueuesL2

StageIPv4StagePa

rser IPv6

StageACLStage

L3

L2

Packet

Packet

3

ControlFlowGraph

QueuesL2

StageIPv4StagePa

rser IPv6

StageACLStageL2

Table

IPv4Table

IPv6

Table

ACLTable

L2v4

v6ACL

ControlFlowGraph

SwitchPipeline

FixedAcPo

n

FixedAcPo

n

AcPon

FixedAcPo

n

4

Fixed-FuncPonSwitchChipsAreLimited

1.  Can’taddnewforwardingfuncPonality2.  Can’taddnewmonitoringfuncPonality

5

Fixed-FuncPonSwitchChips

QueuesL2

StageIPv4StagePa

rser IPv6

StageACLStageL2

Table

IPv4Table

IPv6

Table

ACLTable

FixedAcPo

n

FixedAcPo

n

AcPo

n

FixedAcPo

n

L2v4

v6ACL

ControlFlowGraph

SwitchPipeline

MyEncapMyEncap

6

7

Fixed-FuncPonSwitchChipsAreLimited

1.  Can’taddnewforwardingfuncPonality2.  Can’taddnewmonitoringfuncPonality3.  Can’tmoveresourcesbetweenfuncPons

QueuesL2Stage

IPv4Stage

Parser IPv6

Stage

ACLStageFi

xedAcPo

n

FixedAcPo

n

AcPo

n

L2Table

FixedAcPo

n

IPv4Table

IPv6

Table

ACLTable

ControlFlowGraph

SwitchPipeline

ReconfigurableSwitchChips

Queues

Parser

FixedAcPo

n

FixedAcPo

n

FixedAcPo

n

L2Table

FixedAcPo

n

IPv4Table

IPv6Table

ACLTable

MatchTable

MatchTable

MatchTable

MatchTable

L2v4

v6ACL

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

8

MatchTable

AcPo

nMacro

MappingControlFlowtoReconfigurableChip.

Queues

Parser

MatchTable

MatchTable

MatchTable

L2Table

IPv4Table

IPv6Table

ACLTable

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

L2v4

v6ACL

ControlFlowGraph

SwitchPipeline

L2

v6ACL

v4

L2AcPon

Macro

v4AcPon

Macro

v6AcPon

ACLAcPo

nMacro

9

ControlFlowGraph

SwitchPipeline

ReconfigurableSwitchChips

Queues

Parser

L2Table

IPv4Table

ACLTable

IPv6

MyEncap

L2v4

v6ACL

MyEncapL2AcPon

Macro

v4AcPon

Macro

ACLAcPo

nMacro

AcPo

n

MyEncap

AcPo

n

IPv4

AcPo

n

IPv4

AcPo

n

10

IPv6

IPv4

AcPo

n

Parser

L2Table

IPv4Table

IPv6Table

ACLTable

MatchMemory

AcPonALU

ProtocolIndependentSwitch

L2AcPon

Macro

v4AcPon

Macro

v6AcPon

Macro

ACLAcPo

nMacro

11

Parser

L2Table

IPv4Table

IPv6

ACL

Table

IPv4Table

IPv4

Table

L2AcPon

Macro

v4AcPon

Macro

AcPo

n

AcPo

n

AcPo

n

12

Match+AcPonProcessor:pipelinedandin-parallel

13

Reconfigurability:thenormin5years

•  Reconfigurabilityaddsmostlytologic.•  LogicisgeangrelaPvelysmaller.•  Thecostofreconfigurabilityisgoingdown.•  Fixedswitchchipareatoday:–  I/O(40%),Memory(40%),– Wires,Logic

SwitchI/O(30%)

Memory(30%)

Wires(20%)

Logic(20%)

SwitchI/O

Memory

Wires Logic

14

FixedFunc-onBroadcomTomahawk:3.2TbpsReconfigurableCaviumXpliant:3.2Tbps

15

Reconfigurablechipsareinevitable.

16

ConfiguringSwitchChipsP4code

Compiler

CompilerTarget

Queues

Parser

FixedAcPo

n

FixedAcPo

n

FixedAcPo

n

L2Table

FixedAcPo

n

IPv4Table

IPv6Table

ACLTable

MatchTable

MatchTable

MatchTable

MatchTable

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

17

Queues

Parser

FixedAcPo

n

FixedAcPo

n

FixedAcPo

n

L2Table

FixedAcPo

n

IPv4Table

IPv6Table

ACLTable

MatchTable

MatchTable

MatchTable

MatchTable

P4(hgp://p4.org/)

parser parse_ethernet {! extract(ethernet);!select(latest.etherType) {! 0x800 : parse_ipv4;! 0x86DD : parse_ipv6;! }!}!

�table ipv4_lpm {! reads {! ipv4.dstAddr : ! lpm;! }! actions {! set_next_hop;! drop; ! }!}!

control ingress !{! apply(l2_table);! if (valid(ipv4)) {! apply(ipv4_table);! }! if (valid(ipv6)) {! apply(ipv6_table);! }! apply (acl);!}!

L2 v4v6

ACL

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

18

(ANCS’13)Parser

MatchAcPonTables

ControlFlowGraph

Whatdoesreconfigurabilitybuyus?

19

•  Useresourcesefficiently– MulPpletablesperstage– BigtableinmulPplestages

•  UsefewerstagesL2

IPv4

IPv6ACL

BenefitsofReconfigurability

20

NaïveMapping:ControlFlowGraphParser

MatchTable

MatchTable

MatchTable

MatchTable

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

AcPo

nMacro

L2v4

v6ACL

ControlFlow

SwitchPipeline

Queues

L2Table

IPv4Table

IPv6Table

ACL

Table

L2

v6ACL

v4

AcPo

n

v4AcPon

Macro

v6AcPon

Macro

AcPo

n

21

ControlFlowGraph

L2

TableDependencyGraph(TDG)

v4

v6ACL

L2

v4

v6

ACL

TableDependencyGraph

22

SwitchPipeline

EfficientMapping:TDG

Queues

Parser L2Table

IPv4Table

IPv6Table

TableDependencyGraphControlFlowGraph

L2v4

v6ACLL2

v4

v6

ACLAcPo

n

v4AcPon

Macro

v6AcPon

Macro

23AC

LTable

AcPo

n

L2

ControlFlowGraph

SwitchPipeline

Resourceconstraints

v4

v6ACL

Queues

Parser

L2Table

IPv6

IPv4

L3

L2

v6

v4

L2AcPon

Macro

v4AcPon

Macro

v6AcPon

Macro

AcPo

n

ACLTable

24

25

Headerwidths

AcPonALUinputMemoryType

Tableparallelism

Moreresourceconstraints

AcPonMemory

MapmatchacPontablesinaTDGtoaswitchpipelinewhilerespecPngdependencyand

resourceconstraints.

TheCompilerProblem

26

Step1:P4Program

Step2:ControlFlowGraph

L2v4

v6ACL

27

Step3:TableDependencyGraph

L2 v4

v6ACL

Step4:TableConfiguraPon

Isthatit?

28

TwoSwitchesWeStudied1 2 3 4 32…

14

3 5

2

RMT32Stages

(SIGCOMM2013)

FlexPipe5Stages

(IntelFM6000)

29

AddiPonalswitchfeatures

L2

v4

v6

ACL

L2

L2

v4

v6

TableshapinginRMT TablesharinginFlexPipe

30

31

MapmatchacPontablesinaTDGtoaswitchpipelinewhilerespecPngdependencyand

resourceconstraints.

TheCompilerProblem

Tableshaping Tablesharing

Headerwidths AcPonALUinput

MemoryType

TableparallelismAcPonMemory

Firstapproach:Greedy

•  PrioriPzeoneconstraint•  Sorttables•  MaptablesoneataPme

Queues

Parser

32

12 3

3 Sortby#dependencies

Firstapproach:Greedy

Queues

Parser

1

•  PrioriPzeoneconstraint•  Sorttables•  MaptablesoneataPme

2 3 4

11 Sortbymatchwidth

33

ToomanyconstraintsforGreedy

•  AnygreedymustsorttablesbasedonametricthatisafixedfuncPonofconstraints.

•  Asthenumberofconstraintsgetslarger,it’sharderforafixedfuncPontorepresenttheinterplaybetweenallconstraints.

•  Canwedobegerthangreedy?

34

Secondapproach:IntegerLinearProgramming(ILP)

FindanopPmalmapping.

Pros:•  Takesinallconstraints•  DifferentobjecPves•  Solversexist(CPLEX)

35

Cons:•  Blackboxsolver•  Encodingisanart•  Slow

ILPSetup

min#stagessubjectto:

dependencyconstraints

36

tablesizesassignedmemoriesassigned

tablesizesspecifiedmemoriesinphysicalstage

ExperimentSetup

•  4datacenterusecasesfromIntel,Barefoot

•  Differintables,tablesizes,anddependencies

37

ExampleUseCase

ATypicalTDG

38

IPv6-Mcast

EG-ACL1

EG-Phy-Meta

IG-Agg-Int

IG-Dmac

IPv4-Mcast

IPv4-Nexthop

IPv6-Nexthop

IG-Props

IG-Router-Mac

Ipv4-Ecmp

IG-Smac

Ipv4-Ucast-LPM

Ipv4-Ucast-Host

Ipv6-Ucast-Host

Ipv6-Ucast-LPM

Ipv6-Ecmp

IG_ACL2

IG_Bcast_Storm

Ipv4_Urpf

Ipv6_Urpf

IG_ACL1

EG_Props

IG_Phy_Meta

ConfiguraPonforRMT

Metrics:GreedyvsILP

1.  Abilitytofitprograminchip

2.  OpPmality

3.  RunPme

39

Setup:GreedyvsILP

1.  Abilitytofit:FlexPipe– Variantsofusecasesin5-stagepipeline.

2.  OpPmality:RMT– Minimumstage,pipelinelatency,power

3.  RunPme:bothswitches

40

Results:GreedyvsILP

1.  CanGreedyfitmyprogram?–  Yes,ifresourcesaplenty(RMT,32stages)– No,ifresourcesconstrained(FlexPipe,5stages),

Can’tfit25%ofprograms.2.  HowclosetoopPmalisGreedy?–  30%morePmeforpackettogetthroughRMTpipeline.

3.  Hmm..lookslikeIneedILP.Howslowisit?–  100xslowerthanGreedy–  Reasonableifprogramsdon’tchangeoven.

41

IfwehavePme,weshouldrunILP.

42

UseILPtosuggestbestGreedyforprogramtype.

43

CriPcalconstraints•  DependencycriPcal:16à13stages•  AddiPonalresourceconstraintslessimportantCriPcalresources•  TCAMmemoriescriPcal:16à14stages– ResultsforoneofourdatacenterL2/L3usecases

Conclusion•  Challenge:Parallelismandconstraintsinreconfigurablechipsmakescompilingdifficult.

•  TDG:highlightsparallelisminprogram.•  ILP:begerifenoughPme,fiangiscriPcal,orobjecPvesarecomplicated.

•  BestGreedy:ILPcanchoosevianoPonofcri1calconstraintsandcri1calresources.

44

Thankyou!

45ResearchfundedbyAT&T,Intel,OpenNetworkingResearchCenter.

ILPRunPme

•  Numberofconstraints?Notobvious.E.g.,RMT– Min.stage:fewsecs.– Min.power:fewsecs.– Min.pipelinelatency10xslower

•  Numberofvariables?Howfine-grainedistheresourceassignment?E.g.,FlexPipe– OnematchentryataPme:manydays..–  100-500matchentriesataPme:<1hr

Recommended