adventures in bug hunting

Preview:

Citation preview

adventures in

bug hunting@joedamatohttp://timetobleed.com

whoami

http://timetobleed.com

@joedamato

http://boundary.com(make use of it)

first, a confession.

debugging > programming

before we get this horror show rolling

• kernels, drivers, glibc, and everything else changes.

• code snips will differ from what you are running on your machines.

• some things are simplified in the interest of time.

bprobe

• boundary IPFIX flow meter

• collects flow data by sniffing packets with libpcap

• also collects low level NIC data from the driver

• packets tx/rx

• bytes tx/rx

• ethernet collisions

• ethernet errors

ethernet bonding (aka teaming)

• combine a group of physical NICs (eth0, eth1, ...) into a single virtual device (bond0, bond1, ...).

• different modes

• active-passive

• round robin

• link aggregation

ethernet bonding (aka teaming)

how does bonding work (on linux) ?

• at a high level...

• the bonding driver creates a “virtual device”

• when a packet is sent, bonding driver figures out which physical NIC to transmit the packet on.

• when a packet comes in, the NICs pass the incoming packet up for the higher layers of the network stack to figure out.

bprobe and bonding

• bprobe discovers bonded network interfaces.

• uses libpcap to monitor the underlying physical NICs instead of bond devices.

• detecting link failures, etc

everything was looking good until....

Bug was filed...

• Debian Lenny, 64bit.

• Bonded ethernet interfaces.

•No incoming packets are showing up.

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Step 1• Examine our assumptions:

• The packets are making it to the kernel.

• The packets are being handed up from the kernel to libpcap.

• libpcap doesn’t lose any packets before bprobe examines them.

• bprobe has some weird bug in it.

packets are making it to the kernel

?

watch -n 1 'cat /proc/net/dev'

packets are making it to the kernel

packets are being handed up from the kernel

?

Peel some layers away• bprobe is really libpcap + packet analysis +

output.

• if this is a bug in the kernel or libpcap then other programs that use libpcap (like tcpdump) will also fail the same way.

• so, do they?

tcpdump• bonded ethernet interfaces (on linux) are virtual

devices created by combining other devices.

• for example:

• bond0

• eth0

• eth2

• eth4

• ...

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1^C0 packets captured2 packets received by filter0 packets dropped by kernel

Everything is not cool.

incoming packets appear to be missing when sniffing

the physical device.

(only on debian lenny)

outgoing packets show up regardless.

tcpdump mailing list

only way to figure out where they are getting lost is to follow them

through the kernel.

Step 2

Let’s start digging.

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

how are packets received?

• packets come in from the wire.

• a couple different ways for the kernel to “know” about new packets.

• let’s just look at the simple case.

• an interrupt is raised when a packet arrives.

• both paths hand data up to the higher layers in similar ways.

e1000

e1000

netif_rx

• queues packets up.

• another thread pulls packets off and processes them.

OK, but how does pcap find out about these packets?

a more fundamental question...

how does pcap actually work?

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

bprobe/tcpdump/etc (userland)

• call pcap_open_live or pcap_create/pcap_activate to initialize libpcap.

• call pcap_next_ex to get packets from libpcap.

• examine the packets and do stuff.

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

libpcap (userland)

• creates a socket of type PF_PACKET

• two ways to get get packets from the kernel:

• one by one (slow)

• via shared memory (fast)

• libpcap tries to use the fast method

• if it fails, it falls back to slow.

libpcap creating PF_PACKET socket

new “fast” way is being setup.

the new way of pulling packets out.

the old way is getting setup when the new way failed to initialize.

pull packets out from the kernel the old way.

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

PF_PACKET (kernel)

• libpcap creates the PF_PACKET socket

• the PF_PACKET code in the kernel (eventually) executes.

• this code does some initialization and inserts a protocol hook...

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

network device agnostic layer

• pulls packets off the backlog queue.

• calls netif_receive_skb()

• has some logic to determine who the real sender is when bonding is enabled.

• passes the packet through the protocol hooks.

(run through all protocol blocks handing the packet over)

we now know the path packets take so they can be examined by pcap apps.

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

back to the bug

• so, the bug was that packeting sniffing physical NICs on bonded hosts was not revealing incoming packets.

• what do we now know about our environment?

• what would be the best place to look to track down this bug?

we know

assume the following setup

• bond0

• eth0

• eth1

• eth2

• packet came in on eth0

• thus:

• skb->dev = eth0

• skb->dev->master = bond0

we know

before

• skb->dev = eth0

after

• skb->dev = bond0

• code returns eth0 as orig_dev

we know

LOOK

we know

Did you see it?

Bug• We overwrite the packet’s device with the bond

device.

• The protocol hook check, checks to see if the hook is for the device on the packet.

• It isn’t

• we are sniffing eth0

• skb->dev was overwritten to bond0.

• That’s why if you sniff “bond0” you see packets but if you sniff “eth0” you see nothing.

packets are being handed up from the kernel

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

EASY FIX

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

reboot and try the new kernel...

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Everything is cool.

Now eth0 (the active NIC in bond0)

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1^C0 packets captured2 packets received by filter0 packets dropped by kernel

Everything is not cool.

NO

!"

нет

NEIN!

tcpdump/bprobe/other pcap apps STILL FAIL.

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

In real life I spent the next 4 days looking over the same kernel code,

hundreds of times.

Every single day from the moment I woke up (9am) until I searched all day until I collapsed

with exhaustion (3am).

I got so wound up in trying to get my fix working, I lost

track of the process.

It was a miserable 4 days.

Until I realized...

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Step 1

• Examine our assumptions:

• The kernel code is still broken.

• The incoming packets are being queued up for libpcap to pull out of PF_PACKET properly.

• There probably isn’t bug in bProbe and tcpdump.

Step 2

Let’s start digging.

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

verify my assumption

modify libpcap to verify that the kernel really is still broken

i used apt-get source to retrieve the official source for

debian lenny’s libpcap and I found something

surprising.

old way of doing pcap• debian lenny’s kernel supports the new way

of getting packets out of the kernel via mmap.

• but, debian lenny’s libpcap is not new enough and therefore uses the old way to examine packets.

• this also means that unless i statically link the libpcap version i want, my app will just perform worse on lenny.

reading a packet the old way

that if statement fails.

• we are sniffing packets on a physical device

• BUT in the kernel we are changing the device a packet comes in on to the bond device (remember in netif_receive_skb?)

that if statement fails.

• the index of the bond device is different from the index of the physical device we are sniffing

• so this if statement evaluates to TRUE

• libpcap returns without processing the packet.

why?

this code exists to prevent a race condition when sniffing packets the old way in some kernels.

solution• boot into our fixed debian lenny kernel.

• download a version of libpcap that is newer and supports the mmap method for packet sniffing.

• new method doesn’t have this race condition and has better performance.

• link bprobe/tcpdump/other pcap apps against it.

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Next, sniff eth0...

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

summarize

• kernel bug when overwriting the device the packet arrived on.

• fixed this bug, but bprobe/tcpdump still failed.

• libpcap bug when pulling packets out the kernel the old way

• can avoid this bug and get better performance with a newer libpcap

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Step 1-5

• Examine your assumptions.

• Start digging.

• Keep going until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Спасибо!

Happy debugging!

questions?twitter: @joedamatoblog: http://timetobleed.com

if there is extra time...

an warmup bug

cool operating system.

no, not really.

but, people use it.

ipfix_reader

•a test program

•links against yajl because it generates JSON output

•works on ubuntu, but not on centos5

TOO EASY, JOE.

but, wait.

here’s another program that links fine to a lib in /usr/local/lib

ON THE SAME SYSTEM.

W A T

• We have 2 programs:

• Both link against libraries in /usr/local/lib/

• Only one works.

• The broken program’s library is in /usr/local/lib/

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Step 1

• Examine our assumptions:

• The programs and libraries are both 64bit.

• /usr/local/lib/ is in the library search path

both programs and their libraries are 64bit.

?

program 1: ipfix_reader

program 2: bprobe

both programs and their libraries are 64bit.

/usr/local/lib/ is in the library search path

?

Let’s check...

ldconfig -p

/usr/local/lib/ is in the library search path

So...

ipfix_reader doesn’t work because /usr/local/lib is not in the search path.

but...

how can bprobe be working fine?

Strange

• This is confusing.

• bprobe should fail.

• But, the shared libraries a particular binary dynamically links to at runtime are built into the binary itself.

• So....

Step 2

Let’s start digging.

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Let’s take a look withreadelf

(let’s resize it)

rpath

ah ha!

• bprobe works and can link because the binary is storing the library path inside of itself.

• but, now there are 2 more questions:

• how did the rpath tag get there?

• why doesn’t ipfix_reader have one?

how did the rpath tag get there?

why doesn’t ipfix_reader have rpath?

almost forgot...

an warmup bug feature

Recommended