Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Tackling Parallelization Challenges of Kernel Network Stack for Container Overlay Networks
Jiaxin Lei*, Kun Suo+, Hui Lu*, Jia Rao+
* SUNY at Binghamton+ University of Texas at Arlington
Containers Are Widely Adopted by Industry
•OS level virtualization
•Lightweight
•Higher consolidation density
•Lower operational cost
Containers Are Widely Adopted by Industry
•OS level virtualization
•Lightweight
•Higher consolidation density
•Lower operational cost
Containers Are Widely Adopted by Industry
•OS level virtualization
•Lightweight
•Higher consolidation density
•Lower operational cost
Overlay Networks Are the Technique For Containers Connectivity•Typical overlay network solutions:
Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Overlay Networks Are the Technique For Containers Connectivity
Inner IP Header
Inner Ethernet Header Payload
Original L2 Frame
•Typical overlay network solutions: Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Overlay Networks Are the Technique For Containers Connectivity
Inner IP Header
Inner Ethernet Header Payload
Original L2 Frame
VxLAN Header
Outer UDP Header
Outer IP Header
Outer Ethernet Header FCS
VxLAN Encapsulated Packet
•Typical overlay network solutions: Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Overlay Networks Are the Technique For Containers Connectivity
Inner IP Header
Inner Ethernet Header Payload
Original L2 Frame
VxLAN Header
Outer UDP Header
Outer IP Header
Outer Ethernet Header FCS
VxLAN Encapsulated Packet
VxLAN Flags Reserved VNI Reserved
VxLAN Network Identifier
•Typical overlay network solutions: Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Overlay Networks Are the Technique For Containers Connectivity
Inner IP Header
Inner Ethernet Header Payload
Original L2 Frame
VxLAN Header
Outer UDP Header
Outer IP Header
Outer Ethernet Header FCS
VxLAN Encapsulated Packet
VxLAN Flags Reserved VNI Reserved
VxLAN Network Identifier
•Typical overlay network solutions: Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Overlay Networks Are the Technique For Containers Connectivity
Inner IP Header
Inner Ethernet Header Payload
Original L2 Frame
VxLAN Header
Outer UDP Header
Outer IP Header
Outer Ethernet Header FCS
VxLAN Encapsulated Packet
VxLAN Flags Reserved VNI Reserved
VxLAN Network Identifier
•Typical overlay network solutions: Docker Overlay, Flannel, Calico, Weave
•They are generally built upon the tunneling approach like using VxLAN protocol.
Network Packet Processing Path•Prolonged network packet processing path
•Additional virtual devices overhead
Network Packet Processing Path
Receiving Side
•Prolonged network packet processing path
•Additional virtual devices overhead
Native
Network Packet Processing Path
NIC
Receiving Side Kernel Space
Container Applications
•Prolonged network packet processing path
•Additional virtual devices overhead
Native
User Space
Network Packet Processing Path
NIC
Receiving Side
Container Applications
IRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
NIC
Receiving Side
Network Stack
Container Applications
IRQ SoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
NIC
Receiving Side
Network Stack
Container Applications
IRQ SoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
NIC
Receiving Side
Network Stack
Container Applications
IRQ SoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
NIC
Receiving Side
Network Stack
Container Applications
IRQ SoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
NIC
Receiving Side
Network Stack
Container Applications
IRQ SoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Native
Network Packet Processing Path
Container Applications
NIC
Receiving Side
•Prolonged network packet processing path
•Additional virtual devices overhead
Overlay
Kernel Space User Space
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ SoftIRQ
Network Stack
•Prolonged network packet processing path
•Additional virtual devices overhead
Overlay
Kernel Space User Space
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth
Packet decapsulation
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Packet decapsulation
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Packet decapsulation
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Packet decapsulation
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Network Packet Processing Path
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Packet decapsulation
•Prolonged network packet processing path
•Additional virtual devices overhead
Kernel Space User Space
Overlay
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Existing Optimizations for Packet Processing
Packet decapsulation
Kernel Space User Space
Container Applications
NIC
Receiving SideIRQ
VxLANSoftIRQ
Network StackSoftIRQ
vBridge Veth Network StackSoftIRQ
Existing Optimizations for Packet Processing
Packet decapsulation
IRQ coalescing GRO
Kernel Space User Space
Container Applications
NIC
Receiving Side User Space
IRQVxLAN
SoftIRQNetwork Stack
SoftIRQ
vBridge Veth Network StackSoftIRQ
Existing Optimizations for Packet Processing
Packet decapsulation
IRQ coalescing GRO RPSMulti-queue
Kernel Space
Experimental SettingsHardware Software
CPU Memory NIC Throughput CPU Usage2.2 GHz 64GB 40Gbps iPerf3 mpstat10 cores Multi-queue
Experimental SettingsHardware Software
CPU Memory NIC Throughput CPU Usage2.2 GHz 64GB 40Gbps iPerf3 mpstat10 cores Multi-queue
S1 - Native S2 - Linux Overlay S3 - Docker Overlay
Experimental Settings
Host1 Host2
eth0
iPerf3 iPerf3
eth0
Hardware SoftwareCPU Memory NIC Throughput CPU Usage
2.2 GHz 64GB 40Gbps iPerf3 mpstat10 cores Multi-queue
S1 - Native S2 - Linux Overlay S3 - Docker Overlay
Experimental Settings
Host1 Host2 Host1 Host2
eth0
VxLANiPerf3 iPerf3
iPerf3 iPerf3
eth0 eth0 eth0
VxLAN
Hardware SoftwareCPU Memory NIC Throughput CPU Usage
2.2 GHz 64GB 40Gbps iPerf3 mpstat10 cores Multi-queue
S1 - Native S2 - Linux Overlay S3 - Docker Overlay
Experimental Settings
S3 - Docker Overlay
Host1 Host2 Host1 Host2 Host1 Host2
eth0
VxLAN
vBridge
Veth0
iPerf3
iPerf3 Container
iPerf3 Container
iPerf3
iPerf3 iPerf3
eth0 eth0 eth0 eth0 eth0
VxLAN VxLAN VxLAN
vBridge
Veth0
S2 - Linux Overlay
Hardware SoftwareCPU Memory NIC Throughput CPU Usage
2.2 GHz 64GB 40Gbps iPerf3 mpstat10 cores Multi-queue
S1 - Native
Single Flow Performance
TCP and UDP Throughputs under Three Different Cases
Thro
ughp
ut (G
b/s)
0
5
10
15
20
25
S1 S2 S3
6.46.5
23 TCP
• TCP Throughput of Docker Overlay case drops 72% compared with native case.
Single Flow Performance
TCP and UDP Throughputs under Three Different Cases
Thro
ughp
ut (G
b/s)
0
5
10
15
20
25
S1 S2 S3
6.46.5
23 TCP
• TCP Throughput of Docker Overlay case drops 72% compared with native case.
• UDP Throughput drops 58%.
Single Flow Performance
TCP and UDP Throughputs under Three Different Cases
Thro
ughp
ut (G
b/s)
0
5
10
15
20
25
S1 S2 S3
3.94.7
9.36.46.5
23 TCPUDP
Single Flow Performance
CPU Usage under Three Different Cases for TCP
Tota
l CPU
Usa
ge
0%
2%
4%
6%
8%
10%
S1 S2 S3
User
System
SoftIRQ
* 5% indicates one cpu core is fully saturated.
Single Flow Performance
• Packet processing overhead fully saturates one cpu core in two overlay cases.
CPU Usage under Three Different Cases for TCP
Tota
l CPU
Usa
ge
0%
2%
4%
6%
8%
10%
S1 S2 S3
User
System
SoftIRQ
~5%
* 5% indicates one cpu core is fully saturated.
Single Flow Performance
CPU Usage under Three Different Cases for TCP
Tota
l CPU
Usa
ge
0%
2%
4%
6%
8%
10%
S1 S2 S3
User
System
SoftIRQ
~5% • Packet processing overhead fully saturates one cpu core in two overlay cases.
• Current solutions can’t scale single flow performance.
* 5% indicates one cpu core is fully saturated.
Multiple Flows Performance
0
10
20
30
40
1 2 3 4 5 6 ... 10 20 40 80
Native
Pair Number of Iperf Connection
Thro
ughp
ut (G
b/s)
Multiple Flows Performance
0
10
20
30
40
1 2 3 4 5 6 ... 10 20 40 80
Native
Pair Number of Iperf Connection
Thro
ughp
ut (G
b/s)
• Native case quickly reaches ~37 Gbps under TCP with only 2 pairs.
~37Gbps
Multiple Flows Performance
0
10
20
30
40
1 2 3 4 5 6 ... 10 20 40 80
NativeLinux OverlayDocker Overlay
Pair Number of Iperf Connection
Thro
ughp
ut (G
b/s)
• Native case quickly reaches ~37 Gbps under TCP with only 2 pairs.
• In two overlay cases, TCP throughput grows slowly.
Multiple Flows Performance
0
10
20
30
40
1 2 3 4 5 6 ... 10 20 40 80
NativeLinux OverlayDocker Overlay
Pair Number of Iperf Connection
Thro
ughp
ut (G
b/s)
• Native case quickly reaches ~37 Gbps under TCP with only 2 pairs.
• In two overlay cases, TCP throughput grows slowly.
Multiple Flows Performance
• Under the same throughput (e.g., 40 Gbps), overlay networks consume much more CPU resources (e.g., around 2.5 times) than the native case.
0%10%20%30%40%50%60%70%
1 2 3 4 5 6 ... 10 20 40 80 1 2 3 4 5 6 ... 10 20 40 80 1 2 3 4 5 6 ... 10 20 40 80
UserSystemSoftIRQ
Native Linux Overlay Docker Overlay Pair Number of Iperf Connection
Tota
l CPU
Usa
ge~2.5 times
Multiple Flows Performance
• Bad scalability is largely due to the inefficient interplay of many tasks.
0%10%20%30%40%50%60%70%
1 2 3 4 5 6 ... 10 20 40 80 1 2 3 4 5 6 ... 10 20 40 80 1 2 3 4 5 6 ... 10 20 40 80
UserSystemSoftIRQ
Native Linux Overlay Docker Overlay Pair Number of Iperf Connection
Tota
l CPU
Usa
ge~2.5 times
Small Packet Performance
0
60,000
120,000
180,000
240,000
64B128B
256B512B
1KB2KB
4KB8KB
NativeLinux OverlayDocker Overlay
Pack
et N
umbe
r /s
Packet Size of Iperf Connection
Small Packet Performance
0
60,000
120,000
180,000
240,000
64B128B
256B512B
1KB2KB
4KB8KB
NativeLinux OverlayDocker Overlay
Pack
et N
umbe
r /s
Packet Size of Iperf Connection
• Docker overlay achieves as low as 50% packet processing rate of that in the native case.
Interrupt Number with Varying Packet Sizes
0
10,000
20,000
30,000
40,000
64B128B256B512B1KB2KB4KB8KB
64B128B256B512B1KB2KB4KB8KB
NativeLinux OverlayDocker Overlay
IRQ SoftIRQ Packet Size of Iperf Connection
IRQ
/s (S
oftIR
Q/s
)
IRQ SoftIRQ Packet Size of Iperf Connection
IRQ
/s (S
oftIR
Q/s
)
0
100,000
200,000
300,000
400,000
64B128B256B512B1KB2KB4KB8KB
64B128B256B512B1KB2KB4KB8KB
NativeLinux OverlayDocker Overlay
Interrupt number for TCP Interrupt number for UDP
• IRQ number increases dramatically in the Docker overlay UDP case — 10x of that in the TCP case.
Interrupt Number with Varying Packet Sizes
0
10,000
20,000
30,000
40,000
64B128B256B512B1KB2KB4KB8KB
64B128B256B512B1KB2KB4KB8KB
NativeLinux OverlayDocker Overlay
IRQ SoftIRQ Packet Size of Iperf Connection
IRQ
/s (S
oftIR
Q/s
)
IRQ SoftIRQ Packet Size of Iperf Connection
IRQ
/s (S
oftIR
Q/s
)
0
100,000
200,000
300,000
400,000
64B128B256B512B1KB2KB4KB8KB
64B128B256B512B1KB2KB4KB8KB
NativeLinux OverlayDocker Overlay
Interrupt number for TCP Interrupt number for UDP
• IRQ number increases dramatically in the Docker overlay UDP case — 10x of that in the TCP case.
• 3x softIRQ numbers are observed in Docker Overlay case compared with the IRQ numbers.
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks.
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks. • Bottlenecks become more severe for small packets.
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks. • Bottlenecks become more severe for small packets.
Thinking about future works:
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks. • Bottlenecks become more severe for small packets.
Thinking about future works:• Is it feasible to provide packet-level parallelization for a single network
flow?
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks. • Bottlenecks become more severe for small packets.
Thinking about future works:• Is it feasible to provide packet-level parallelization for a single network
flow?
• How can the kernel perform a better isolation among multiple flows especially for efficiently utilizing shared hardware resources?
Insights and Conclusions• Kernel does not provide per-packet level parallelization. • Kernel does not efficiently handle various packet processing tasks. • Bottlenecks become more severe for small packets.
Thinking about future works:• Is it feasible to provide packet-level parallelization for a single network
flow?
• How can the kernel perform a better isolation among multiple flows especially for efficiently utilizing shared hardware resources?
• Can the packets be further coalesced with optimized network path for reduced interrupts and context switches?