adventures in bug hunting

  • View
    15.186

  • Download
    0

Embed Size (px)

Text of adventures in bug hunting

adventures in bug hunting@joedamato http://timetobleed.com

whoami

http://timetobleed.com

@joedamato

http://boundary.com (make use of it)

rst, a confession.

debugging > programming

before we get this horror show rolling kernels, drivers, glibc, and everything elsechanges.

code snips will differ from what you arerunning on your machines. time.

some things are simplied in the interest of

bprobe boundary IPFIX ow meter collects ow data by snifng packets with libpcap also collects low level NIC data from the driver

packets tx/rx bytes tx/rx ethernet collisions ethernet errors

ethernet bonding (aka teaming) combine a group of physical NICs (eth0, eth1, ...)into a single virtual device (bond0, bond1, ...).

different modes active-passive round robin link aggregation

ethernet bonding (aka teaming)

how does bonding work (on linux) ?

at a high level... the bonding driver creates a virtual device when a packet is sent, bonding driver gures when a packet comes in, the NICs pass the

out which physical NIC to transmit the packet on. incoming packet up for the higher layers of the network stack to gure out.

bprobe and bonding bprobe discovers bonded networkinterfaces.

uses libpcap to monitor the underlyingphysical NICs instead of bond devices.

detecting link failures, etc

everything was looking good until....

Bug was led... Debian Lenny, 64bit. Bonded ethernet interfaces. No incoming packets are showing up.

Step 0

Take a step back. Breathe. Do not break the computer.

Step 1 Examine our assumptions: The packets are making it to the kernel. The packets are being handed up from thekernel to libpcap.

libpcap doesnt lose any packets beforebprobe examines them.

bprobe has some weird bug in it.

packets are making it to the kernel

?

watch -n 1 'cat /proc/net/dev'

packets are making it to the kernel

packets are being handed up from the kernel

?

Peel some layers away bprobe is really libpcap + packet analysis +output.

if this is a bug in the kernel or libpcap thenother programs that use libpcap (like tcpdump) will also fail the same way.

so, do they?

tcpdump bonded ethernet interfaces (on linux) are virtual devices created by combining other devices. for example:

bond0

eth0 eth2 eth4 ...

First, sniff bond0...% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1 ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is not cool.

incoming packets appear to be missing when snifng the physical device.

(only on debian lenny)

outgoing packets show up regardless.

tcpdump mailing list

only way to gure out where they are getting lost is to follow them through the kernel.

Step 2

Lets start digging.

Steps 3-5 Dig until you see something you haventseen before.

Read all of the code and understand it. Go to step 2.

how are packets received? packets come in from the wire. a couple different ways for the kernel toknow about new packets.

lets just look at the simple case. an interrupt is raised when a packet arrives. both paths hand data up to the higherlayers in similar ways.

e1000

e1000

netif_rx queues packets up. another thread pulls packets off and processes them.

OK, but how does pcap nd out about these packets?

a more fundamental question...how does pcap actually work?

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

bprobe/tcpdump/etc(userland)

call pcap_open_live or pcap_create/pcap_activateto initialize libpcap.

call pcap_next_ex to get packets from libpcap. examine the packets and do stuff.

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

libpcap (userland) creates a socket of type PF_PACKET two ways to get get packets from the kernel: one by one (slow) via shared memory (fast) libpcap tries to use the fast method if it fails, it falls back to slow.

libpcap creating PF_PACKET socket

new fast way is being setup.

the new way of pulling packets out.

the old way is getting setup when the new way failed to initialize.

pull packets out from the kernel the old way.

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

PF_PACKET (kernel) libpcap creates the PF_PACKET socket the PF_PACKET code in the kernel(eventually) executes.

this code does some initialization andinserts a protocol hook...

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

network device agnostic layer

pulls packets off the backlog queue. calls netif_receive_skb() has some logic to determine who the realsender is when bonding is enabled. hooks.

passes the packet through the protocol

(run through all protocol blocks handing the packet over)

we now know the path packets take so they can be examined by pcap apps.

bprobe/tcpdump/etc(in userland) (in userland)

libpcap

packet protocol family(in the kernel)(in the kernel)

network device agnostic layer

back to the bug so, the bug was that packeting snifng what do we now know about ourenvironment? physical NICs on bonded hosts was not revealing incoming packets.

what would be the best place to look totrack down this bug?

we know

assume the following setupbond0

eth0 eth1 eth2

packet came in on eth0 thus: skb->dev = eth0 skb->dev->master = bond0

we know

before

after skb->dev = bond0 code returns eth0 as orig_dev

skb->dev = eth0

we know

LOOK

we know

Did you see it?

Bug We overwrite the packets device with the bond device. The protocol hook check, checks to see if the hook is for the device on the packet. It isnt we are snifng eth0 skb->dev was overwritten to bond0. Thats why if you sniff bond0 you see packets but if you sniff eth0 you see nothing.

packets are being handed up from the kernel

YYYYYyyyyYYyYYyyYYy YYYYYYYYYYYYYYYyY eeeEEeeEEeEEEeEEEEee eeeEEeEEEeeEEeEeEEaA AaaaAaaAaAAaAaAAaa AaAAAAAaaaAAa!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

EASY FIX

YYYYYyyyyYYyYYyyYYy YYYYYYYYYYYYYYYyY eeeEEeeEEeEEEeEEEEee eeeEEeEEEeeEEeEeEEaA AaaaAaaAaAAaAaAAaa AaAAAAAaaaAAa!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

reboot and try the new kernel...

First, sniff bond0...% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1 12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 64 12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64 ^C 2 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1 ^C 0 packets captured 2 packets received by filter 0 packets dropped by kernel

Everything is not cool.

NO

!"

NEIN!

tcpdump/bprobe/other pcap apps STILL FAIL.

???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ??????????????????????????????????????