192
adventures in bug hunting @joedamato http://timetobleed.com

adventures in bug hunting

  • Upload
    ice799

  • View
    15.193

  • Download
    0

Embed Size (px)

Citation preview

Page 1: adventures in bug hunting

adventures in

bug hunting@joedamatohttp://timetobleed.com

Page 2: adventures in bug hunting

whoami

Page 3: adventures in bug hunting
Page 4: adventures in bug hunting

http://timetobleed.com

Page 5: adventures in bug hunting

@joedamato

Page 6: adventures in bug hunting

http://boundary.com(make use of it)

Page 7: adventures in bug hunting

first, a confession.

Page 8: adventures in bug hunting

debugging > programming

Page 9: adventures in bug hunting
Page 10: adventures in bug hunting
Page 11: adventures in bug hunting
Page 12: adventures in bug hunting
Page 13: adventures in bug hunting

before we get this horror show rolling

• kernels, drivers, glibc, and everything else changes.

• code snips will differ from what you are running on your machines.

• some things are simplified in the interest of time.

Page 14: adventures in bug hunting

bprobe

• boundary IPFIX flow meter

• collects flow data by sniffing packets with libpcap

• also collects low level NIC data from the driver

• packets tx/rx

• bytes tx/rx

• ethernet collisions

• ethernet errors

Page 15: adventures in bug hunting

ethernet bonding (aka teaming)

• combine a group of physical NICs (eth0, eth1, ...) into a single virtual device (bond0, bond1, ...).

• different modes

• active-passive

• round robin

• link aggregation

Page 16: adventures in bug hunting

ethernet bonding (aka teaming)

Page 17: adventures in bug hunting

how does bonding work (on linux) ?

• at a high level...

• the bonding driver creates a “virtual device”

• when a packet is sent, bonding driver figures out which physical NIC to transmit the packet on.

• when a packet comes in, the NICs pass the incoming packet up for the higher layers of the network stack to figure out.

Page 18: adventures in bug hunting

bprobe and bonding

• bprobe discovers bonded network interfaces.

• uses libpcap to monitor the underlying physical NICs instead of bond devices.

• detecting link failures, etc

Page 19: adventures in bug hunting

everything was looking good until....

Page 20: adventures in bug hunting
Page 21: adventures in bug hunting
Page 22: adventures in bug hunting

Bug was filed...

• Debian Lenny, 64bit.

• Bonded ethernet interfaces.

•No incoming packets are showing up.

Page 23: adventures in bug hunting

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Page 24: adventures in bug hunting

Step 1• Examine our assumptions:

• The packets are making it to the kernel.

• The packets are being handed up from the kernel to libpcap.

• libpcap doesn’t lose any packets before bprobe examines them.

• bprobe has some weird bug in it.

Page 25: adventures in bug hunting

packets are making it to the kernel

?

Page 26: adventures in bug hunting
Page 27: adventures in bug hunting

watch -n 1 'cat /proc/net/dev'

Page 28: adventures in bug hunting
Page 29: adventures in bug hunting

packets are making it to the kernel

Page 30: adventures in bug hunting

packets are being handed up from the kernel

?

Page 31: adventures in bug hunting
Page 32: adventures in bug hunting

Peel some layers away• bprobe is really libpcap + packet analysis +

output.

• if this is a bug in the kernel or libpcap then other programs that use libpcap (like tcpdump) will also fail the same way.

• so, do they?

Page 33: adventures in bug hunting
Page 34: adventures in bug hunting

tcpdump• bonded ethernet interfaces (on linux) are virtual

devices created by combining other devices.

• for example:

• bond0

• eth0

• eth2

• eth4

• ...

Page 35: adventures in bug hunting

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Page 36: adventures in bug hunting

Everything is cool.

Page 37: adventures in bug hunting

Now eth0 (the active NIC in bond0)

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1^C0 packets captured2 packets received by filter0 packets dropped by kernel

Page 38: adventures in bug hunting

Everything is not cool.

Page 39: adventures in bug hunting
Page 40: adventures in bug hunting

incoming packets appear to be missing when sniffing

the physical device.

Page 41: adventures in bug hunting

(only on debian lenny)

Page 42: adventures in bug hunting

outgoing packets show up regardless.

Page 43: adventures in bug hunting
Page 44: adventures in bug hunting
Page 45: adventures in bug hunting

tcpdump mailing list

Page 46: adventures in bug hunting
Page 47: adventures in bug hunting
Page 48: adventures in bug hunting
Page 49: adventures in bug hunting

only way to figure out where they are getting lost is to follow them

through the kernel.

Page 50: adventures in bug hunting

Step 2

Let’s start digging.

Page 51: adventures in bug hunting

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Page 52: adventures in bug hunting

how are packets received?

• packets come in from the wire.

• a couple different ways for the kernel to “know” about new packets.

• let’s just look at the simple case.

• an interrupt is raised when a packet arrives.

• both paths hand data up to the higher layers in similar ways.

Page 53: adventures in bug hunting

e1000

Page 54: adventures in bug hunting

e1000

Page 55: adventures in bug hunting

netif_rx

• queues packets up.

• another thread pulls packets off and processes them.

Page 56: adventures in bug hunting
Page 57: adventures in bug hunting

OK, but how does pcap find out about these packets?

Page 58: adventures in bug hunting
Page 59: adventures in bug hunting
Page 60: adventures in bug hunting

a more fundamental question...

how does pcap actually work?

Page 61: adventures in bug hunting
Page 62: adventures in bug hunting
Page 63: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 64: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 65: adventures in bug hunting

bprobe/tcpdump/etc (userland)

• call pcap_open_live or pcap_create/pcap_activate to initialize libpcap.

• call pcap_next_ex to get packets from libpcap.

• examine the packets and do stuff.

Page 66: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 67: adventures in bug hunting

libpcap (userland)

• creates a socket of type PF_PACKET

• two ways to get get packets from the kernel:

• one by one (slow)

• via shared memory (fast)

• libpcap tries to use the fast method

• if it fails, it falls back to slow.

Page 68: adventures in bug hunting

libpcap creating PF_PACKET socket

Page 69: adventures in bug hunting

new “fast” way is being setup.

Page 70: adventures in bug hunting

the new way of pulling packets out.

Page 71: adventures in bug hunting

the old way is getting setup when the new way failed to initialize.

Page 72: adventures in bug hunting

pull packets out from the kernel the old way.

Page 73: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 74: adventures in bug hunting

PF_PACKET (kernel)

• libpcap creates the PF_PACKET socket

• the PF_PACKET code in the kernel (eventually) executes.

• this code does some initialization and inserts a protocol hook...

Page 75: adventures in bug hunting
Page 76: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 77: adventures in bug hunting

network device agnostic layer

• pulls packets off the backlog queue.

• calls netif_receive_skb()

• has some logic to determine who the real sender is when bonding is enabled.

• passes the packet through the protocol hooks.

Page 78: adventures in bug hunting
Page 79: adventures in bug hunting
Page 80: adventures in bug hunting
Page 81: adventures in bug hunting

(run through all protocol blocks handing the packet over)

Page 82: adventures in bug hunting
Page 83: adventures in bug hunting

we now know the path packets take so they can be examined by pcap apps.

Page 84: adventures in bug hunting

packet protocol family(in the kernel)

libpcap(in userland)

bprobe/tcpdump/etc(in userland)

network device agnostic layer(in the kernel)

Page 85: adventures in bug hunting
Page 86: adventures in bug hunting
Page 87: adventures in bug hunting

back to the bug

• so, the bug was that packeting sniffing physical NICs on bonded hosts was not revealing incoming packets.

• what do we now know about our environment?

• what would be the best place to look to track down this bug?

Page 88: adventures in bug hunting

we know

Page 89: adventures in bug hunting

assume the following setup

• bond0

• eth0

• eth1

• eth2

• packet came in on eth0

• thus:

• skb->dev = eth0

• skb->dev->master = bond0

Page 90: adventures in bug hunting

we know

Page 91: adventures in bug hunting

before

• skb->dev = eth0

after

• skb->dev = bond0

• code returns eth0 as orig_dev

Page 92: adventures in bug hunting

we know

Page 93: adventures in bug hunting

LOOK

Page 94: adventures in bug hunting

we know

Page 95: adventures in bug hunting

Did you see it?

Page 96: adventures in bug hunting
Page 97: adventures in bug hunting

Bug• We overwrite the packet’s device with the bond

device.

• The protocol hook check, checks to see if the hook is for the device on the packet.

• It isn’t

• we are sniffing eth0

• skb->dev was overwritten to bond0.

• That’s why if you sniff “bond0” you see packets but if you sniff “eth0” you see nothing.

Page 98: adventures in bug hunting

packets are being handed up from the kernel

Page 99: adventures in bug hunting

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 100: adventures in bug hunting
Page 101: adventures in bug hunting

EASY FIX

Page 102: adventures in bug hunting

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 103: adventures in bug hunting
Page 104: adventures in bug hunting

reboot and try the new kernel...

Page 105: adventures in bug hunting

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Page 106: adventures in bug hunting

Everything is cool.

Page 107: adventures in bug hunting

Now eth0 (the active NIC in bond0)

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1^C0 packets captured2 packets received by filter0 packets dropped by kernel

Page 108: adventures in bug hunting

Everything is not cool.

Page 109: adventures in bug hunting

NO

Page 110: adventures in bug hunting

!"

Page 111: adventures in bug hunting

нет

Page 112: adventures in bug hunting

NEIN!

Page 113: adventures in bug hunting

tcpdump/bprobe/other pcap apps STILL FAIL.

Page 114: adventures in bug hunting

???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Page 115: adventures in bug hunting
Page 116: adventures in bug hunting

In real life I spent the next 4 days looking over the same kernel code,

hundreds of times.

Page 117: adventures in bug hunting

Every single day from the moment I woke up (9am) until I searched all day until I collapsed

with exhaustion (3am).

Page 118: adventures in bug hunting

I got so wound up in trying to get my fix working, I lost

track of the process.

Page 119: adventures in bug hunting

It was a miserable 4 days.

Page 120: adventures in bug hunting

Until I realized...

Page 121: adventures in bug hunting

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Page 122: adventures in bug hunting

Step 1

• Examine our assumptions:

• The kernel code is still broken.

• The incoming packets are being queued up for libpcap to pull out of PF_PACKET properly.

• There probably isn’t bug in bProbe and tcpdump.

Page 123: adventures in bug hunting

Step 2

Let’s start digging.

Page 124: adventures in bug hunting

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Page 125: adventures in bug hunting

verify my assumption

modify libpcap to verify that the kernel really is still broken

Page 126: adventures in bug hunting

i used apt-get source to retrieve the official source for

debian lenny’s libpcap and I found something

surprising.

Page 127: adventures in bug hunting

old way of doing pcap• debian lenny’s kernel supports the new way

of getting packets out of the kernel via mmap.

• but, debian lenny’s libpcap is not new enough and therefore uses the old way to examine packets.

• this also means that unless i statically link the libpcap version i want, my app will just perform worse on lenny.

Page 128: adventures in bug hunting
Page 129: adventures in bug hunting

reading a packet the old way

Page 130: adventures in bug hunting

that if statement fails.

• we are sniffing packets on a physical device

• BUT in the kernel we are changing the device a packet comes in on to the bond device (remember in netif_receive_skb?)

Page 131: adventures in bug hunting
Page 132: adventures in bug hunting

that if statement fails.

• the index of the bond device is different from the index of the physical device we are sniffing

• so this if statement evaluates to TRUE

• libpcap returns without processing the packet.

Page 133: adventures in bug hunting

why?

this code exists to prevent a race condition when sniffing packets the old way in some kernels.

Page 134: adventures in bug hunting

solution• boot into our fixed debian lenny kernel.

• download a version of libpcap that is newer and supports the mmap method for packet sniffing.

• new method doesn’t have this race condition and has better performance.

• link bprobe/tcpdump/other pcap apps against it.

Page 135: adventures in bug hunting

First, sniff bond0...

% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Page 136: adventures in bug hunting

Next, sniff eth0...

% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 112:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 54, length 6412:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP echo request, id 62831, seq 55, length 64^C2 packets captured2 packets received by filter0 packets dropped by kernel

Page 137: adventures in bug hunting

YYYYYyyyyYYyYYyyYYyYYYYYYYYYYYYYYYyYeeeEEeeEEeEEEeEEEEeeeeeEEeEEEeeEEeEeEEaAAaaaAaaAaAAaAaAAaaAaAAAAAaaaAAa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 138: adventures in bug hunting
Page 139: adventures in bug hunting
Page 140: adventures in bug hunting

summarize

• kernel bug when overwriting the device the packet arrived on.

• fixed this bug, but bprobe/tcpdump still failed.

• libpcap bug when pulling packets out the kernel the old way

• can avoid this bug and get better performance with a newer libpcap

Page 141: adventures in bug hunting

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Page 142: adventures in bug hunting

Step 1-5

• Examine your assumptions.

• Start digging.

• Keep going until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Page 143: adventures in bug hunting

Спасибо!

Happy debugging!

Page 144: adventures in bug hunting

questions?twitter: @joedamatoblog: http://timetobleed.com

Page 145: adventures in bug hunting

if there is extra time...

Page 146: adventures in bug hunting

an warmup bug

Page 147: adventures in bug hunting
Page 148: adventures in bug hunting

cool operating system.

Page 149: adventures in bug hunting

no, not really.

Page 150: adventures in bug hunting

but, people use it.

Page 151: adventures in bug hunting

ipfix_reader

•a test program

•links against yajl because it generates JSON output

•works on ubuntu, but not on centos5

Page 152: adventures in bug hunting
Page 153: adventures in bug hunting
Page 154: adventures in bug hunting

TOO EASY, JOE.

Page 155: adventures in bug hunting
Page 156: adventures in bug hunting

but, wait.

Page 157: adventures in bug hunting

here’s another program that links fine to a lib in /usr/local/lib

ON THE SAME SYSTEM.

Page 158: adventures in bug hunting
Page 159: adventures in bug hunting

W A T

• We have 2 programs:

• Both link against libraries in /usr/local/lib/

• Only one works.

• The broken program’s library is in /usr/local/lib/

Page 160: adventures in bug hunting
Page 161: adventures in bug hunting

Step 0

•Take a step back.

•Breathe.

•Do not break the computer.

Page 162: adventures in bug hunting

Step 1

• Examine our assumptions:

• The programs and libraries are both 64bit.

• /usr/local/lib/ is in the library search path

Page 163: adventures in bug hunting

both programs and their libraries are 64bit.

?

Page 164: adventures in bug hunting

program 1: ipfix_reader

Page 165: adventures in bug hunting

program 2: bprobe

Page 166: adventures in bug hunting

both programs and their libraries are 64bit.

Page 167: adventures in bug hunting

/usr/local/lib/ is in the library search path

?

Page 168: adventures in bug hunting

Let’s check...

ldconfig -p

Page 169: adventures in bug hunting
Page 170: adventures in bug hunting

/usr/local/lib/ is in the library search path

Page 171: adventures in bug hunting

So...

ipfix_reader doesn’t work because /usr/local/lib is not in the search path.

Page 172: adventures in bug hunting

but...

how can bprobe be working fine?

Page 173: adventures in bug hunting
Page 174: adventures in bug hunting

Strange

• This is confusing.

• bprobe should fail.

• But, the shared libraries a particular binary dynamically links to at runtime are built into the binary itself.

• So....

Page 175: adventures in bug hunting

Step 2

Let’s start digging.

Page 176: adventures in bug hunting

Steps 3-5

• Dig until you see something you haven’t seen before.

• Read all of the code and understand it.

• Go to step 2.

Page 177: adventures in bug hunting
Page 178: adventures in bug hunting

Let’s take a look withreadelf

Page 179: adventures in bug hunting
Page 180: adventures in bug hunting

(let’s resize it)

Page 181: adventures in bug hunting
Page 182: adventures in bug hunting

rpath

Page 183: adventures in bug hunting

ah ha!

• bprobe works and can link because the binary is storing the library path inside of itself.

• but, now there are 2 more questions:

• how did the rpath tag get there?

• why doesn’t ipfix_reader have one?

Page 184: adventures in bug hunting

how did the rpath tag get there?

Page 185: adventures in bug hunting
Page 186: adventures in bug hunting

why doesn’t ipfix_reader have rpath?

Page 187: adventures in bug hunting
Page 188: adventures in bug hunting
Page 189: adventures in bug hunting

almost forgot...

Page 190: adventures in bug hunting

an warmup bug feature

Page 191: adventures in bug hunting
Page 192: adventures in bug hunting