21
Linux Networking: The RISE of the congestion window, the FALL of the routing cache, and the LOCALITY of packets. David S. Miller Death of the Routing Cache Rise of the TCP Congestion Window Locality of Packets The End Linux Networking: The RISE of the congestion window, the FALL of the routing cache, and the LOCALITY of packets. David S. Miller Red Hat Inc. IBM Watson Research Center, 2010

Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

Linux Networking: The RISE of the congestionwindow, the FALL of the routing cache, and the

LOCALITY of packets.

David S. Miller

Red Hat Inc.

IBM Watson Research Center, 2010

Page 2: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

ROUTING CACHE: WHAT IS IT?

Hash table based cache of routing lookups.Keyed on many attributes

src and dest addressTOSdevice indexetc.

Assumes real route lookup is (relatively) slowReal route lookup is layered (f.e. policy routing)

Page 3: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

ROUTING CACHE: PROBLEMS

Routing table to routing cache is one to manyEntries are created in response to packetsPrime target or focus for DOS attacksMitigation strategies:

Secure hash keysGarbage collection

GC is very non-deterministic and hard to tuneRouting table changes require careful cache flushing

Page 4: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

ROUTING CACHE: WHAT BACKS IT?

Original algorithm, array of hash tablesOne hash table per prefix length (0 –> 32)Not the most optimal, but routing cache makes this OKRelatively simple

New algorithm, LC-trieMulti-dimensional trieClose to what’s known to be optimalComplicatedPerformance tied to trie balancing heuristics

Page 5: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

ROUTING CACHE: BARRIERS TO REMOVAL

Mainly performance, cache handled DOS attacks betterNo longer true after Eric Dumazet’s workHandling of metrics

Move to existing inetpeer cacheIssues of metric granularity

Storing of route lookup “result”IPSEC and route stackingAnd again, performance...

Page 6: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

ROUTING CACHE: SIDE TOPICS

What does BSD do?Uses a patricia tree.Clones are created for specific routes.

What does our IPV6 stack do?See BSD above.But with support for source address keying.Thus two tiered tree layout.

Page 7: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: HOW TO KILL THE INTERNET

CWND == congestion windowIronically by keeping things as they are now.Initial CWND has stayed constant for more than adecade.Meanwhile net capacity has increased dramatically.Current situation is a bit of a joke.

Page 8: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: SHORT TUTORIAL

Connections start with an initial CWNDIncreased until loss is detectedCWND is reduced at loss eventsProcess repeatsCritical aspect: aggressive probing of network capacity

Page 9: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: THE BIG MYTH

That we actually have an initial CWNDActually there is no real limitApplications can have as large of one as they wantOpenning up several connections at onceN connections == “initial CWND X N”

Page 10: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: GOOGLE’S PROPOSAL

“draft-hkchu-tcpm-initcwnd-00“Increase initial CWND to 10 packetsMost web objects do not fit into existing initial CWNDWith 10 packets, most will fitWorks well with technologies such as SPDY

Page 11: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: KNEE JERK REACTIONS

This increase will cause congestion collapseFALSE: Congestion avoidance still at workTCP will still back off in the event of loss

It will hurt clients with smaller pipesFALSE: Smaller pipe end hosts get better performanceThe key is ACK clocking and how fast recovery works3+ ACKs are necessary to trigger fast recoveryWith old initial CWND that never happened at start

Page 12: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

TCP CWND: ANYWAYS...

Linux will adopt larger initial CWND real soonNothing IETF can do about it (sorry Chicken Little thesky is not falling)You heard it here first

Page 13: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: SYSTEM HIERARCHY

Welcome to the NUMA worldMemory “distance” matters more than everNo longer a quaint optimization for “huge” serversNUMA is pervasive even on desktopsHeck, even laptops...

Page 14: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: MULTIQUEUE NETWORKING

Old systems, single RX queue, single TX queueLimited by event signaling in old PCIWelcome PCI-E and MSI-X interruptsNetworking cards beging to have multi-Q functionalityNow it’s pervasive

Page 15: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: LINUX SUPPORT FOR HARDWARE

MULTI-Q

Stephen Hemminger’s NAPI split-up workPull NAPI state out of struct netdevOne NAPI instance per HW interrupt source

Making TX path multi-Q capablePull queue flow control state ouf of struct netdevDealing with qdiscs.... ugh...Only simplest qdiscs are fully multi-QComplex qdiscs force synchronization at the qdiscIn the future token based qdiscs (SFQ, etc.) can bemulti-Q tooHierarchical qdiscs fundamentally cannot (HFSC, HTB,etc.)Create new multi-Q qdisc for high level flowmanagement

Page 16: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: SOFTWARE MULTI-Q

Use software facilities to implement multi-QCPU cross calls and packet processing job placementWhy even bother?

Lots of non-multi-Q capable hardware out thereHardware multi-Q is stateless (as it should be)Software schemes provide more flexibilityPosibility to optimize for application locality

Initially I was against.Happily, Tom Herbert was able to convince me.

Page 17: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: STAGE ONE: RPS

Receive Packet Steering by Tom HerbertStateless flow seperationPerfectly mimicks hardware multi-Q on RXEach hardware RX queue has a configurable cpumaskPackets received on RX queue hash to CPU in thatmask

Page 18: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: STAGE TWO: RFS

Receive Flow Steering, again from Tom HerbertHash table of flow to CPU mappingsDynamically updatedKernel spies on application I/O callsCPU of I/O call becomes flow CPU mappingTable is sized and enabled via sysctlIssue: out-of-order packet delivery avoidance

Page 19: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: STAGE THREE: XPS

Tom now gives us Transmit Packet SteeringTransmit side localityMaps cpus to transmit queues, reverse of RPSData structure localityLikelyhood packet free happens near sending threadEric Dumazet’s Transmit Completion Steering patch

Page 20: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

LOCALITY: FUTURES

Hardware assist for RPS/RFS (Ben Hutchings)Test patches exist for SFC chipsMakes use of on-chip flow table facilities

Having lots and lots of hardware queuesNegative matching for things like GROSteering “queues” themselves instead of flows

Better defaults (all SW stuff off by defualt at themoment)

Page 21: Linux Networking: The RISE of the congestion window, the FALL …vger.kernel.org/~davem/davem_ibm2010.pdf · 2010-12-06 · Linux Networking: The RISE of the congestion window, the

LinuxNetworking:The RISE of

thecongestionwindow, theFALL of the

routingcache, and

theLOCALITY of

packets.

DavidS. Miller

Death of theRoutingCache

Rise of theTCPCongestionWindow

Locality ofPackets

The End

THE END

Thanks to:Erich Nahum and IBM Watson Research CenterOren LaadanStephen HemmingerEric Dumazet (AKA: The Networking Ninja)Ben HutchingsTom Herbert and GoogleLinus Torvalds