19
Network Virtualization Case Study: NVP

Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Network Virtualization Case Study: NVP

Page 2: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Case Study: NVP

This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems

Design and Implementation (NSDI ’14).

ISBN 978-1-931971-09-6

Open access to the Proceedings of the 11th USENIX Symposium on

Networked Systems Design and Implementation (NSDI ’14)

is sponsored by USENIX

Network Virtualization in Multi-tenant DatacentersTeemu Koponen, Keith Amidon, Peter Balland, Martín Casado, Anupam Chanda, Bryan

Fulton, Igor Ganichev, Jesse Gross, Natasha Gude, Paul Ingram, Ethan Jackson, Andrew Lambeth, Romain Lenglet, Shih-Hao Li, Amar Padmanabhan, Justin Pettit, Ben Pfaff,

and Rajiv Ramanathan, VMware; Scott Shenker, International Computer Science Institute and the University of California, Berkeley; Alan Shieh, Jeremy Stribling, Pankaj Thakkar, Dan

Wendlandt, Alexander Yip, and Ronghua Zhang, VMware

https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/koponen

Page 3: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

NVP Approach to Virtualization

L3 L3

L2

L2

L2

L2

L2

L2

L2

L2

1. Service: Arbitrary network topology

Page 4: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

NVP Approach to Virtualization

L3 L3

L2

L2

L2

L2

L2

L2

L2

L2

1. Service: Arbitrary network topology

Page 5: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

NVP Approach to Virtualization

L2

L2

Service: Arbitrary network topology

Physical Network:Any standard layer 3 network

L3 L3

L2

L2

L2

L2

L2

L2

Network Hypervisor

Page 6: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

L3L2 L2

[Figure: Koponen et al.]

Page 7: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

L3L2 L2

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

[Figure: Koponen et al.]

Page 8: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 9: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 10: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 11: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 12: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 13: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 14: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 15: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Page 16: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Virtual network service

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

LogicalPhysical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Packetabstraction

Control abstraction(sequence of OpenFlow flow tables)

Page 17: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Physical L3 Networkserver

server

server

serverserver

L3L2 L2

Network HypervisorControllers

Tenant control abstraction

tunnel

(GRE,

VXLAN)

OpenvSwitch

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

OpenvSwitch

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

OpenvSwitch

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation 207

ACL L2 ACL

Map

VM

Map

ACL L2 L3 ACL

Map

ACLACL L2

Map Tunnel

Logical

Physical

Logical Datapath 1 Logical Datapath 2 Logical Datapath 3

Figure 4: Processing steps of a packet traversing through two logical switches interconnected by a logical router (in the middle). Physical flowsprepare for the logical traversal by loading metadata registers: first, the tunnel header or source VM identity is mapped to the first logical datapath.After each logical datapath, the logical forwarding decision is mapped to the next logical hop. The last logical decision is mapped to tunnel headers.

datapaths. In this case, the OVS flow table would holdflow entries for all interconnected logical datapaths andthe packet would traverse each logical datapath by thesame principles as it traverses the pipeline of a singlelogical datapath: instead of encapsulating the packet andsending it over a tunnel, the final action of a logicalpipeline submits the packet to the first table of thenext logical datapath. Figure 4 depicts how a packetoriginating at a source VM first traverses through alogical switch (with ACLs) to a logical router beforebeing forwarded by a logical switch attached to thedestination VM (on the other side of the tunnel). Thisis a simplified example: we omit the steps required forfailover, multicast/broadcast, ARP, and QoS, for instance.

As an optimization, we constrain the logical topologysuch that logical L2 destinations can only be present atits edge.6 This restriction means that the OVS flow tableof a sending hypervisor needs only to have flows forlogical datapaths to which its local VMs are attached aswell as those of the L3 routers of the logical topology;the receiving hypervisor is determined by the logical IPdestination address, leaving the last logical L2 hop to beexecuted at the receiving hypervisor. Thus, in Figure 4, ifthe sending hypervisor does not host any VMs attached tothe third logical datapath, then the third logical datapathruns at the receiving hypervisor and there is a tunnelbetween the second and third logical datapaths instead.

3.2 Forwarding PerformanceOVS, as a virtual switch, must classify each incomingpacket against its entire flow table in software. However,flow entries written by NVP can contain wildcards for anyirrelevant parts of a packet header. Traditional physicalswitches generally classify packets against wildcard flowsusing TCAMs, which are not available on the standardx86 hardware where OVS runs, and so OVS must use adifferent technique to classify packets quickly.7

To achieve efficient flow lookups on x86, OVS exploitstraffic locality: the fact that all of the packets belongingto a single flow of traffic (e.g., one of a VM’s TCPconnections) will traverse exactly the same set of flowentries. OVS consists of a kernel module and a userspaceprogram; the kernel module sends the first packet of each6We have found little value in supporting logical routersinterconnected through logical switches without tenant VMs.7There is much previous work on the problem of packetclassification without TCAMs. See for instance [15, 37].

new flow into userspace, where it is matched against thefull flow table, including wildcards, as many times as thelogical datapath traversal requires. Then, the userspaceprogram installs exact-match flows into a flow table inthe kernel, which contain a match for every part of theflow (L2-L4 headers). Future packets in this same flowcan then be matched entirely by the kernel. Existing workconsiders flow caching in more detail [5, 22].

While exact-match kernel flows alleviate the challengesof flow classification on x86, NVP’s encapsulation of alltraffic can introduce significant overhead. This overheaddoes not tend to be due to tunnel header insertion, but tothe operating system’s inability to enable standard NIChardware offloading mechanisms for encapsulated traffic.

There are two standard offload mechanisms relevant tothis discussion. TCP Segmentation Offload (TSO) allowsthe operating system to send TCP packets larger thanthe physical MTU to a NIC, which then splits them intoMSS-sized packets and computes the TCP checksums foreach packet on behalf of the OS. Large Receive Offload(LRO) does the opposite and collects multiple incomingpackets into a single large TCP packet and, after verifyingthe checksum, hands it to the OS. The combinationof these mechanisms provides a significant reductionin CPU usage for high-volume TCP transfers. Similarmechanisms exist for UDP traffic; the generalization ofTSO is called Generic Segmentation Offload (GSO).

Current Ethernet NICs do not support offloading in thepresence of any IP encapsulation in the packet. That is,even if a VM’s operating system would have enabled TSO(or GSO) and handed over a large frame to the virtual NIC,the virtual switch of the underlying hypervisor would haveto break up the packets into standard MTU-sized packetsand compute their checksums before encapsulating themand passing them to the NIC; today’s NICs are simply notcapable of seeing into the encapsulated packet.

To overcome this limitation and re-enable hardwareoffloading for encapsulated traffic with existing NICs,NVP uses an encapsulation method called STT [8].8 STTplaces a standard, but fake, TCP header after the physicalIP header. After this, there is the actual encapsulationheader including contextual information that specifies,among other things, the logical destination of the packet.The actual logical packet (starting with its Ethernetheader) follows. As a NIC processes an STT packet,8NVP also supports other tunnel types, such as GRE [9] andVXLAN [26] for reasons discussed shortly.

5

Tenant VMTenant VM

Tenant VM

Page 18: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Challenge: Performance

Large amount of state to compute

• Full virtual network state at every host with a tenant VM!• O(n2) tunnels for tenant with n VMs• Solution 1: Automated incremental state computation with nlog declarative language

• Solution 2: Logical controller computes single set of universal flows for a tenant, translated more locally by “physical controllers”

Page 19: Network Virtualization Case Study: NVP...Case Study: NVP This paper is included in the Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI

Challenge: Performance

Pipeline processing in virtual switch can be slow

• Solution: Send first packet of a flow through the full pipeline; thereafter, put an exact-match packet entry in the kernel

Tunneling interferes with TCP Segmentation Offload (TSO)

• NIC can’t see TCP outer header• Solution: STT tunnels adds “fake” outer TCP header