Linx88 IPv6 Neighbor Discovery Russell Heilling


Citation preview

IPv6 Neighbor Discovery

An IXP Perspective

Russell HeillingSenior Network


We all understand ARP, right?

• Messages carried directly on EthernetEtherType 0x806

• Device sends broadcast requestWho has x.x.x.x?

• Receivers check target against local addresses

• If it matches they send a unicast reply

• Result is cached

All nodes on the network need to process all ARP Requests.High levels of ARP and you are going to have a bad day.

• Defined in

• Messages are carried within ICMPv6

• Includes:• Router and prefix discovery• Address resolution and neighbor unreachability detection• Redirect function

• Address resolution is most relevant from IXP perspective

IPv6 Neighbor Discovery

Router and prefix discovery

• The main point on RD: “Don’t do it on the exchange”

• We have seen an increase in the number of members sending RAs

• Please check your config and make sure you have it disabled

• We are improving our instrumentation and will be getting more proactive

• This is an MoU violation, and will result in a chase

• Analogous to ARP query message

“I know your IP, what’s your MAC?”

• ICMPv6 Type 135, Code 0.

• Can be sent unicast to refresh neighbor cache

• Can be multicast to discover uncached neighbors

• Uses last 24-bits of target address to construct multicast destinationTarget: 2001:7f8:4::1553:2Destination: ff02::1:ff53:2Group MAC: 33:33:ff:53:00:02

• RFC recommends no more than 1 solicitation per second per target

• Unicast solicitation used to refresh stale entry before removing

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Target Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options ... +-+-+-+-+-+-+-+-+-+-+-+-

Neighbor Solicitation

Neighbor Advertisement

• Analogous to ARP reply message

• ICMPv6 Type 136, Code 0.

• R, S & O flags to indicate advertisement typeR & O flags outside scope here

• Can be sent unsolicited [S=0] (like gratuitous ARP)In which case uses all nodes multicast address

• IP source can be any address on same interface as target

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|S|O| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | | + Target Address + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options ... +-+-+-+-+-+-+-+-+-+-+-+-

Broadcast, Unknown, Multicast

Unknown unicast

• VPLS is just a virtual switch – still needs to learn MAC addresses

• Ports going down immediately flush database entries causing short bursts of flooding while MAC is relearnt

• Unidirectional flows can result in longer term flooding if the destination ages out of the database

• Stale routes can direct traffic to unknown macs leading to extended flooding

• ARP can flush fdb entries on XOS (bug)

• We are investigating ways to better mitigate.

So why use multicast if it goes everywhere?

• A well designed NIC will filter in hardware

• ARP queries go to a single (broadcast) destination and will always need to be punted up the stack

• Neighbor solicitations are distributed over a large number of multicast groups. Most of them can be filtered out in hardware

More on NIC Filtering

• Ideally a NIC would have enough filter space for all subscribed groups

• Reality is that space is limited

• Different cards take different approaches

• Fallback to promiscuous mode• Promiscuous for all multicast• Hash the group address, accept any groups that hash to same value

• Caveat emptor. Know your hardware limits.

[linx-ops] LINX London Juniper LAN weirdness

• Nov 19th 2014 22:28 – Massive increase in non-unicast traffic

• Investigation shows member with fibre issue

• 2x10GE LAG, one link bouncing• Member router not happy, sending

massive numbers of neighbor solicitations

• Maxed out at around 3kp/s• Caused instability for a number of

other members

[linx-ops] LINX London Juniper LAN weirdness

• “IXPWatch” is good at spotting this for ARP

• Turns out not so good for IPv6 NS

• IPv6 NS stats were added to report easily

• Detection and alerting still has room for improvement

A note on addressing on LINX peering LANs

• LINX recommended IPv6 Address:2001:7f8:4:{LAN}::{ASN}:1/64

• LAN administered by LINX

• ASN converted to hex, not BCD

• Examples:

LINX (5459) on Juniper LAN2001:7f8:4::1553:1

LINX (8714) on IXCardiff2001:7f8:4:4::220a:1

So how does that work with Neighbor Solicitations?

• LINX recommended IPv6 Address2001:7f8:4:{LAN}::{ASN}:1/64

• Solicited nodes multicast address33:33:ff:{A}:00:01

• A is the low order octet of the ASN

• 5th byte is almost always zero

• 550+ unique member ASNs share 229 last octets

• Most group addresses match at least 2 members

• Some as high as 7

• Still much better than ARP

How busy is IPv6?

Hmmm. Wrong scale.

How busy is IPv6?

• Around 0.7% of traffic on Juniper LAN

• Follows very similar diurnal pattern to IPv4

• Not just BGP and monitoring – real traffic

How does ARP vs NS look?


There are more neighbor solicitations than ARP requests on the Juniper LAN

How do the distributions compare?

• Median interval between repeated ARP requests is 8s

• Median for NS is only 4s

• ARP intervals more distributed

• NS has strong peaks at 1s, 3-5s

• Smaller peak at approx 60s

ND may attempt to be more efficient than ARP, but it sure seems chatty

• Repeat offenders? Maybe…Top 5% of senders account for 34% of requests*

• Down neighbors?strong peak at 1s suggests retriesabout 80% of destinations down

• I think we have a winner…

* Based on analysis of peak hour flooded traffic

What is causing the difference?

Could we / Should we do something?

• Obvious reaction might be to suggest higher RETRANS_TIMER value

• Before jumping to that conclusion we should ask “Does it matter that there is more ND than ARP?”

• NS Addressing makes it easier for nodes to cope• Extending timer also makes unreachability detection slower
