Upload
kernel-tlv
View
359
Download
9
Embed Size (px)
Citation preview
©2015 Check Point Software Technologies Ltd. 1©2015 Check Point Software Technologies Ltd.
Overview
Kirill Tsym,Next Generation Enforcement team
FD.IO VECTOR PACKET PROCESSING
©2015 Check Point Software Technologies Ltd. 2
CHECK POINT SOFTWARE TECHNOLOGIESThe largest pure-play security vendor in the worldProtecting more than 100,000 companies with millions of users worldwide$1.63B annual revenues in 2015
Over 4,300 employees
Partners in over 95 countries
©2015 Check Point Software Technologies Ltd. 3
Lecture agenda
Linux networking stack vs user space networking initiatives– Why User Space networking? Why so many projects around it?
Introduction to FD.io and VPP– Architecture, Vectors, Graph, etc.
VPP Data path – Typical graphs– Example of supported topologies
VPP Threads and scheduling
Single and Multicore support
Supported topologies
©2015 Check Point Software Technologies Ltd.
LINUX KERNEL STACK
01
©2015 Check Point Software Technologies Ltd. 5
Applications
Linux kernel data path
User SpaceKernel Space
NIC1 NIC2
TCP/IP StackForwarding
To Application
HW
Rx Tx
Design goals or why stack is in the kernel?– Linux is designed as an Internet Host (
RFC1122) or an “End-System” OS– Need to service multiple applications– Separate user applications from
sensitive kernel code– Make application as simple as possible– Receive direct access to HW drivers
Cost– Not optimized for Forwarding – Every change requires new kernel
version– Code is too generic– Networking stack today is a huge part
of the kernel
Pass-through
Application Path
ApplicationsApplication
Reference: Kernel Data Path
L1
L2
L3
L4
L7
Drivers
Sockets L5
©2015 Check Point Software Technologies Ltd. 6
Linux stack whole picture
Reference: Network_data_flow_through_kernel
©2015 Check Point Software Technologies Ltd. 7
Linux stack packet processing Packets are processed in Kernel one by one
– A lot of code involved in each packet processing– Processing path is monolithic, it’s impossible to change it or load new
stack modules – Impossible to achieve Instruction Cache optimization in this model– There are technics to hijack kernel routines or defines hooks, but no
simple and standard way to replace tcp_input() for example
skb processing is not cache optimized– sk_buff struct includes too much information– It could be ideal to load all needed sk_buff ‘s to cache before processing– But skb doesn’t fit to cache line nor placed in chain– As result there is no Data Cache optimization and usually a lot of cache
misses
Every change requires new kernel version– Upstream a new protocol takes very long time– Standardization goes much faster than implementation
©2015 Check Point Software Technologies Ltd.
USER SPACE NETWORKINGPROJECTS
01
©2015 Check Point Software Technologies Ltd. 9
Applicationnetmap API
Netmap
User SpaceKernel Space
NIC
HW
Linux Networking Stack
netmap rings
NICrings
Pros– BSD, Linux and Windows
ports– Good scalability– Data path is detached from
host stack– Widely adopted
Cons– No networking stack– Routing done in host stack
which slows down initial processing
Performance
Packet forwarding Mpps
Freebsd bridging 0.690
Netmap + libpcap 7.5
Netmap 14.88 Reference: netmap - the fast packet I/O framework
©2015 Check Point Software Technologies Ltd. 10
DPDK / Forwarding engine
DPDK
User Space
Kernel Space
NIC1
Linux Networking Stack
Slow Path
Fast Path
4
HW
Kernel Networking Interface
3 5
8
NIC2
Pros– Kernel independent– All packet processing done
in user space– DPDK Fast Path is cache
and minimum instructions optimized
Cons– No networking stack– No routing stack– Need to send packets to
Kernel for routing decisions– Doesn’t perform well on
scaling tests– No external API– No integration with
management– Out of tree drivers
Fast Path
Slow Path
RoutingDecision
Drivers
71
2 6
©2015 Check Point Software Technologies Ltd. 11
OpenFastPath BSD Networking Stack on top of DPDK and ODP
OpenDataPlane (ODP) is a cross-platform data plane SoC networking open source API
Supported by Nokia, ARM, Cavium and ENEA
Includes optimized IP, UDP and TCP stacks
Routes and MACs are in sync with Linux through Netlink
©2015 Check Point Software Technologies Ltd. 12
Other projects OpenSwitch
N OS with Main component: DPDK based Open vSwitch N Various management and CLI daemonsN Routing decision made by Linux Kernel (Ouch!)N REST APIN Good for inter-VM communications
OpenOnloadN A user-level network stack from SloarflareN Depends on Solarflare NICs (Ouch!)
• IO Visor N XDP or eXpress Data Path N Not a user space networking!N Tries to bring performance in to
existing kernel with BPFN No need for 3rd party codeN Allows option of busy polling N No need to allocate large pagesN No need for dedicated CPUs
©2015 Check Point Software Technologies Ltd.
FD.IO01
©2015 Check Point Software Technologies Ltd. 14
FD.io Project overview• FD.io is Linux Foundation project
N A collection of several projects based on Data Plane Development Kit (DPDK) N Distributed under Apache licenseN A key project the Vector Packet Processing (VPP) is donated by Cisco N Proprietary version of VPP is running in Cisco CRS1 routerN There is no tool chain, OS, etc in Open sourced VPP versionN VPP is about 300K lines of codeN Major contributor: Cisco Chief Technology and Architecture office team
• Three Main componentsN Management AgentN Packet ProcessingN IO
• VPP RoadmapN First release 16 of June includes14MPPS single core L3 performanceN 16.09 release includes integration with containers and orchestrationN 17.01 release will include dpdk-16.11, dpdk CryptoDev, enhanced NAT, etc.
©2015 Check Point Software Technologies Ltd. 15
VPP ideas• CPU cycles budget
N 14 Mpps on 3.5 Ghz CPU = 250 cycles per packet budgetN Memory access 67ns and it’s the cost of fetching one cache line (64
bytes) OR 134 CPU cycles
• SolutionN Perform all the processing with minimum of codeN Process more than one packet at a timeN Grab all available packets from Rx ring on every cycleN Perform each atomic task in a dedicated Node
• VPP Optimization TechniquesN Branch Prediction hintsN Use of vector instructions SSE, AVXN Prefetching – do not pre-fetch to much to left the cache warmN Speculations – around the packet destination instead of a full lookupN Dual Loops
Cache miss is unacceptable
©2015 Check Point Software Technologies Ltd. 16
VPP architecture
NIC1 NIC2
User Space
Kernel Space
DPDK
VPP IP Stack
PluginsPluginVPP Plugins
VPP Pros
– Kernel independent– All packet processing done in user space– DPDK based (or netmap, virtio, host,
etc.)– Includes full scale L2/L3 Networking
stack– Routing decision made by VPP– Also includes bridge implementation– Good plugins framework– Integrated with external management:
Honeycomb
Cons– Young project
– First stable release ~06/16– Many open areas
– Open Stack integration / Neutron– Lack of Transport Layer integration– Control Plane API & Stack
But what about L4/L7?– TLDK Project
HW
Fast Path
VPP I/O Tasks I/O Polling logic + L2
L3 tasks
User Defined tasks
©2015 Check Point Software Technologies Ltd. 17
Performance
N VPP data plane throughput not impacted by large IPv4 FIB sizeN OVSDPDK data plane throughput heavily impacted by IPv4 FIB sizeN VPP and OVSDPDK tested on Haswell x86 platform with E5-2698v3 2x16C 2.3GHz (Ubuntu 14.04 trusty)
fd.io Foundation
Reference: FD.io intro
©2015 Check Point Software Technologies Ltd. 18
TLDK
VPP TLDK Application layer (project)
NIC1
User Space
Kernel Space
HWFast Path
Purpose build TLDK
Application
SocketApplication
BSD Socket Layer
LD_P
RE
LOA
D
Socket Layer
Native Linux
Application
DPDK
NIC2
VPP
TLDK Application Layer– Using TLDK Library to process
TCP and UDP packets
Purpose Built Application– Using TLDK API Directly (VPP node)– Provides highest performance
BSD Socket Layer– A standard BSD socket layer for
applications using sockets by design– Lower performance, but good
compatibility
LD_PRELOAD Socket Layer– Used to allow a ‘native binary Linux’
application to be ported in to the system
– Allows for existing application to work without any change
©2015 Check Point Software Technologies Ltd. 19
VPP Nodes and Graph
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Processing is divided per Node
Node works on Vector of Packets
Nodes are connected to graph
Graph could be changed dynamically
vector of packets
©2015 Check Point Software Technologies Ltd.
DATA PATH
©2015 Check Point Software Technologies Ltd. 21
• Full zero copy• Data always resides in
Huge Pages memory• Vector is passed from
graph node to node during processing
ethernet-input
Data path - ping
dpdk-input
ipv4-input ipv4-local ipv4-icmp-input
ipv4-icmp-echo-
request
ipv4-rewrite-
local
Gigabit Ethernet-
Output
Gigabit Ethernet-
Txt
DPDK
Core 0
vector of packet pointers
HugePagesMemory
packets data
Packets placed to Huge Pages
by NIC
VPP Vector created during input device work
Node
©2015 Check Point Software Technologies Ltd. 22
ethernet-input
Vector processing – split example
input-device
ipv4-input
Gigabit Ethernet-
Output
Gigabit Ethernet-
Txt
input vector
ipv6-input
output vector A
output vector B
Transmit queue:
packets are reordered
Next node is called twice by threads
scheduler
DPDK
©2015 Check Point Software Technologies Ltd. 23
ethernet-input
Vector processing – cloning example
dpdk-input
ipv4-input
Gigabit Ethernet-
Output
Gigabit Ethernet-
Txt
input vector
Transmit queue
ipv4-frag output vector * 2 packets
input vector
Max vector size is 256If output vector is full
Then two vectors will be created
DPDK
©2015 Check Point Software Technologies Ltd. 24
Rx features example : IPsec flow
dpdk-input
ipsec-if-output
Gigabit Ethernet-
Output
Gigabit Ethernet-
Txt
DPDK
ethernet-input
ipv4-input esp-encrypt
ipv4-rewrite-
local
esp-decrypt
ipsec-if-input
ipv4-local
ipsec-if node been dynamically registered to receive
IPsec traffic using Rx Features during interface UP
Done through rewrite adjutancy
©2015 Check Point Software Technologies Ltd.
THREADS AND SCHEDULING
©2015 Check Point Software Technologies Ltd. 26
Threads scheduling
[Restricted] ONLY for designated groups and individuals
One VPP scheduling cycle
PRE-INPUT
Purpose: Linux input and system control
Example:unix_epoll_input
dhcp-clientmanagement
stack interface
INPUT
Purpose:Packets input
Example:dpdk_io_input
dpdk_inputtuntap_rx
INTERRUPTS
Purpose:Run Suspended
processes
Example:expired timers
PENDINGNODES
DISPATCH
Purpose:Processing all vectors that
needs additional processing after
changes
Example:Worker thread
main
INTERNALNODES
DISPATCH
Purpose:Processing all
pending vectors on VPP graph
Example:Worker thread
main
Main work: L2/L3 stack processing and Tx
©2015 Check Point Software Technologies Ltd. 27
Threads zoom-in
[Restricted] ONLY for designated groups and individuals
vpp# show runTime 9.5, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00 vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0 Name State Calls Vectors Suspends Clocks Vectors/Call admin-up-down-process event wait 0 0 1 6.52e3 0.00api-rx-from-ring active 0 0 6 1.04e5 0.00cdp-process any wait 0 0 1 1.10e5 0.00cnat-db-scanner any wait 0 0 1 5.34e3 0.00dhcp-client-process any wait 0 0 1 6.58e3 0.00dpdk-process any wait 0 0 3 2.73e6 0.00flow-report-process any wait 0 0 1 6.19e3 0.00gmon-process time wait 0 0 2 5.36e8 0.00ip6-icmp-neighbor-discovery-ev any wait 0 0 10 1.81e4 0.00startup-config-process done 1 0 1 2.64e5 0.00unix-cli-stdin event wait 0 0 1 3.05e9 0.00unix-epoll-input polling 24811921 0 0 9.48e2 0.00vhost-user-process any wait 0 0 1 3.24e4 0.00vpe-link-state-process event wait 0 0 1 7.10e3 0.00vpe-oam-process any wait 0 0 5 1.37e4 0.00vpe-route-resolver-process any wait 0 0 1 9.52e3 0.00vpp# exit# ps -elf | grep vpp4 R root 20566 1 92 80 0 - 535432 - 16:10 ? 00:00:27 vpp -c /etc/vpp/startup.conf0 S root 20582 1960 0 80 0 - 4293 pipe_w 16:10 pts/34 00:00:00 grep --color=auto vpp#
[Restricted] ONLY for designated groups and individuals ©2015 Check Point Software Technologies Ltd.
SINGLE AND MULTCORE MODES
©2015 Check Point Software Technologies Ltd. 29
Core 0 Core 1 Core 2
Rx Tx Rx Tx
VPP Threading modes
[Restricted] ONLY for designated groups and individuals
• Single-threadedN Both control and forwarding engine run on single thread
• Multi-thread with workers onlyN Control running on Main thread (API, CLI)N Forwarding performed by one or more worker threads
• Multi-thread with IO and WorkersN Control on main thread (API,CLI)N IO thread handling input and dispatching to worker threadsN Worker threads doing actual work including interface TXN RSS is in use
• Multi-thread with Main and IO
on a single threadN Workers separated by core
- Control - IO - Worker
Core 0 Core 1 Core 2
Rx Tx Tx
Core 0
Rx Tx
Core 0 Core 1 Core 2
Rx Tx
Core 3
Rx
…..
[Restricted] ONLY for designated groups and individuals ©2015 Check Point Software Technologies Ltd.
SUPPORTED TOPOLOGIES
©2015 Check Point Software Technologies Ltd. 31
Router and Switch for namespaces
Reference
©2015 Check Point Software Technologies Ltd.
QUESTIONS?
©2015 Check Point Software Technologies Ltd. 33
VPP Capabilities• Why VPP?
N Linux Kernel is good, but going too slow because of backward compatibilityN Standardization today moving faster than implementationsN Main reason for VPP speed – optimal usage of ICACHE
N Do not trash the cache with packet per packet processing like in the standard IP stack
N Separation of Data Plane and Control Plane. VPP is pure Data Plane
• Main ideasN Separation of Data Plane and Control PlaneN API generation. Available binding for Java, C and PythonN OpenStack integration
N Neutron ML2 driverN OPENFV / ODL-GBP / ODL-SFC (Service chaining like firewalls, NAT, QoS)
• ContainersN Could be in the host connecting between containersN Could be VPP inside of containers and talking between them
©2015 Check Point Software Technologies Ltd. 34
Connection between various layers
dpdk-input
plugin
ethernet-input
ip-input
udp-localip4_register_protocol() UDP
ethernet_register_input_type() IPv4
vnet_hw_interface_rx_redirect_to_node()
Defined in plugin code
Next node is hardcoded in dpdk-input/handoff-dispatch
Callback
Data
©2015 Check Point Software Technologies Ltd. 35
Output attachment point
ipv4-input ipv4-lookup
VPP Adjacency: mechanism to add and rewrite next node dynamically after routing lookup. Available nodes:- miss- drop- punt- local- rewrite- classify- map- map_t- sixrd- hop_by_hop
*Possible place for POSTROUTING HOOK
ipv4-rewrite-transit
VPP Rx features: mechanism to add and rewrite next node dynamically after ipv4-input. Available nodes:- input acl *Prerouting- source check rx- source check any- ipsec- vpath- lookup
*Currently impossible to do it from plugins
L3 Nodes Various L4 Nodes Various Post Routing Nodes