Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
VPPHostStackTransportandSessionLayers
FlorinCoras,DaveBarach,KeithBurns,DaveWallace
EFFICIENCY
PERFORMANCE
SOFTWARE DEFINED NETWORKING
CLOUD NETWORK SERVICES
LINUX FOUNDATION
VPP - AUniversalTerabitNetworkPlatformForNativeCloudNetworkServices
Superior Performance
Most Efficient on the Planet
Flexible and Extensible
Open Source
Cloud Native
Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !
Motivation:Containernetworking
FD.ioMini-SummitatKubeCon2017
FIFO
TCP
IP(routing)
device
send()
FIFO
TCP
IP(routing)
device
recv()
kernel
glibc
PID1234 PID4321
Motivation:Containernetworking
FIFO
PID1234
TCP
IP(routing)
device
send()
FIFO
PID4321
TCP
IP(routing)
device
recv()
FIFO
device
FIFO
device
VPP
af_packet
etc etc etcACL,SR,VXLAN,LISP
IP4/6MPLS
Ethernet
dpdk
dpdk
device
af_packet
FD.ioMini-SummitatKubeCon2017
Whynotthis?
PID1234 PID4321
recv()
FIFOFIFO
TCP
IP
DPDK
send()
Session
FD.ioMini-SummitatKubeCon2017
VPP
VPPHostStack
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
shmsegmentrx tx
VPPHostStack:SessionLayer
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Maintainsperappstateandconveysto/fromsessionevents
§ Allocatesandmanagessessions/segments/fifos§ Isolatesnetworkresourcesvianamespacing§ Sessionlookuptables(5-tuple)andlocal/global
sessionruletables(filters)§ Supportforpluggabletransportprotocols§ Binary/nativeCAPIforexternal/builtin
applications
shmsegmentrx tx
VPPHostStack:SVMFIFOs
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Allocatedwithinsharedmemorysegments§ Fixedpositionandsize§ Lockfreeenqueue/dequeue butatomicsize
increment§ Optiontodequeue/peekdata§ Supportforout-of-orderdataenqueues
shmsegmentrx tx
VPPHostStack:TCP
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
shmsegmentrx tx
§ Clean-slateimplementation§ “Complete”statemachineimplementation§ Connectionmanagementandflowcontrol
(windowmanagement)§ Timersandretransmission,fastretransmit,SACK§ NewReno congestioncontrol,SACKbasedfast
recovery§ Checksumoffloading§ LinuxcompatibilitytestedwithIWLTCPprotocol
tester
VPPHostStack:Comms Library(VCL)
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
IP,DPDK
VPP
§ Comms library(VCL)appscanlinkagainst§ LD_PRELOADlibraryforlegacyapps§ epoll
shmsegmentrx tx
ApplicationAttachment
FD.ioMini-SummitatKubeCon2017
Session
App
TCP
IP,DPDK
VPP
attachbind(server)connect(client)
BinaryAPI
shmsegment
SessionEstablishment
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI BinaryAPI
attachbind
listen
SessionEstablishment
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
attachconnect
open
BinaryAPI
attachbind
listen
SessionEstablishment
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
handshake
BinaryAPI
SessionEstablishment
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
handshake
BinaryAPI
newclientconnectsucceeded
SessionEstablishment
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
BinaryAPI
connectreply
BinaryAPI
acceptnotifyshm
segmentshm
segmentrx tx rx tx
DataTransfer
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
read
copytobuffer copytofifo
rx tx rx tx
write
CongestioncontrolReliabletransport
BinaryAPI
tx writeevt
BinaryAPI
rx writeevt
DataTransfer
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Session
Server
TCP
IP,DPDK
VPP
read
copytobuffer copytofifo
rx tx rx tx
write
CongestioncontrolReliabletransport
BinaryAPI
tx writeevt
BinaryAPI
rx writeevt
NotyetpartofCSITbutsomeroughnumbersonaE2690:~200kCPSand~12Gbps/core!
RedirectedConnections(Cut-through)
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Server
bindBinaryAPI
RedirectedConnections(Cut-through)
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Server
redirectBinaryAPI
connect
RedirectedConnections(Cut-through)
FD.ioMini-SummitatKubeCon2017
Session
Client
TCP
IP,DPDK
VPP
Server
redirectBinaryAPI
connect
Throughputismemorybandwidthconstrained:~120Gbps!
Multi-threading
FD.ioMini-SummitatKubeCon2017
Session
App1
BinaryAPI
Session
DPDK
rx tx rx tx
TCP
IP
TCP
IP
Core0 Core1
§ Connections/sessions’pinned’toathread
§ Per-threaddatastructures/state
Features:Namespaces
FD.ioMini-SummitatKubeCon2017
Session
App
BinaryAPI
TCP
VPP
Session
TCP
Session
TCP
IP IP IP
ns1 ns2 ns3
fib1 fib2
Requestaccesstovpp ns+secret
Namespacesareconfiguredindependentlyandassociateapplicationstonetworklayerresourceslikeinterfacesandfibtables
Features:SessionTables
FD.ioMini-SummitatKubeCon2017
NSLocalSessionTable
BinaryAPI
TCP
NSLocalSessionTable
TCP
ns1 ns2
fib1
GlobalSessionTable
App1
Requestaccesstoglobaland/orlocalscope
Features:SessionTables
FD.ioMini-SummitatKubeCon2017
NSLocalSessionTable
BinaryAPI
TCP
NSLocalSessionTable
TCP
ns1 ns2
fib1
GlobalSessionTable
§ Bothtablehave“rulestable”thatcanbeusedforfiltering
§ Localtablesarenamespacespecificandcanbeusedforegressfiltering
§ Globaltablesarefibtablespecificandcanbeusedforingressfiltering
App1
Ongoingwork
• Overallintegrationwithk8s• Istio/Envoy
• TCP• Rxpolicer/tx pacer• TSO• Newcongestioncontrolalgorithms• PMTUdiscovery• Optimization/hardening/testing
• VCL/LD_PRELOAD• Iperf,nginx,wget,curl
FD.ioMini-SummitatKubeCon2017
• GettheCode,BuildtheCode,RuntheCode• Sessionlayer:src/vnet/session• TCP:src/vnet/tcp• SVM:src/svm• VCL:src/vcl
• Read/WatchtheTutorials
• Read/WatchVPPTutorials• JointheMailingLists
FD.ioMini-SummitatKubeCon2017
Nextsteps– Getinvolved