64
Donny Nadolny, PagerDuty #Devoxx #distsys Debugging Distributed Systems Donny Nadolny PagerDuty

Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Embed Size (px)

Citation preview

Page 1: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Debugging Distributed SystemsDonny Nadolny

PagerDuty

Page 2: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Page 3: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

What is ZooKeeper

• Distributed system for building distributed systems

• Small in-memory filesystem

Page 4: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper API

• create directory

• create file (ZooKeeper term: “node”)

• atomically update a file

• watch a file for changes

• create “ephemeral” file (goes away when client does)

• create sequential file (concurrent attempts to create are ordered)

Page 5: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper at PagerDuty

• Distributed locking

• Consistent, highly available

Page 6: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Current Talk: Debugging Distributed SystemsFor Cassandra Consistency Issues, See:

Page 7: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper at PagerDuty

• Distributed locking

• Consistent, highly available

Page 8: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper at PagerDuty

• Distributed locking

• Consistent, highly available

ZK 3

ZK 1 ZK 2

DC-A

DC-C

DC-B

24 ms

24 ms 3 ms

… over a WAN

Page 9: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper Overview

Page 10: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

The Failure

• Network trouble, one follower falls behind

• ZooKeeper gets stuck - leader still up

1

2

DB

Siz

e

Page 11: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

The Failure

• Network trouble, one follower falls behind

• ZooKeeper gets stuck - leader still up

2

DB

Siz

e

1

2

1.51

Page 12: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Recovery

• Restart all nodes

• Restart leader

2

DB

Siz

e

1

2

1.51

3 3

Page 13: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

First Hint

• Leader logs: “Toobusytosnap,skipping”

Page 14: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Fault Injection

• Disk slow? let’s test:•sshfsdonny@some_server:/home/donny/mnt

• Similar failure profile

Page 15: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Fault Injection

• Disk slow? let’s test:•sshfsdonny@some_server:/home/donny/mnt

• Similar failure profile

• Re-examine disk latency… nope, was a red herring

Page 16: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Health Checks

• First warning: application monitoring

• High-level application checks are good because they catch many problems, but don’t tell you the cause

• Monitoring ZooKeeper: used ruok

Page 17: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Deep Health Checks

• Added deep health check:

• write to one ZooKeeper key

• read from ZooKeeper key

Page 18: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

"LearnerHandler-/123.45.67.89:45874"prio=10tid=0x00000000024bb800nid=0x3d0drunnable[0x00007fe6c3193000]java.lang.Thread.State:RUNNABLEatjava.net.SocketOutputStream.socketWrite0(NativeMethod)atjava.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)…atorg.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:118)…atorg.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)atorg.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115)-locked<0x00000000d4cd9e28>(aorg.apache.zookeeper.server.DataNode)atorg.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)…atorg.apache.zookeeper.server.ZKDatabase.serializeSnapshot(ZKDatabase.java:467)atorg.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:493)

The Stack Trace

1

2

3

Page 19: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

Page 20: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

🔒🔒

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

🔒🔓🔓

Page 21: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

🔒

Page 22: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

🔒

🔒

Page 23: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

🔒

🔒 🔒

Page 24: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

void serializeNode(OutputArchive output, String path) { DataNode node = getNode(path); String[] children = {}; synchronized (node) { output.writeString(path, "path"); output.writeRecord(node, "node"); children = node.getChildren(); } for (String child : children) { serializeNode(output, path + "/" + child); }}

Write Snapshot Code (simplified)

Blocking network write

Page 25: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ZooKeeper Heartbeat

• Why didn’t a follower take over?

• restart all nodes - cluster recovers

• restart leader - cluster recovers

• ZK heartbeat: message from leader to follower

• follower gets heartbeat, everything is fine

• follower doesn’t get heartbeat: start an election

Page 26: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

🔒

🔒 🔒

Page 27: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Threads (Leader)

Request processors

Learner handler (one per follower)

Client requests

Quorum Peer

Followers

❤ ❤ ❤

🔒

🔒 🔒

Page 28: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

TCP

Page 29: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Follower LeaderESTABLISHED ESTABLISHED

Packet 1

ACK

… SYN, SYN-ACK, ACK …

TCP Data Transmission

Page 30: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Follower LeaderESTABLISHED ESTABLISHED

Packet 1

TCP Data Transmission

Page 31: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

TCP Data Transmission

Follower Leader

Page 32: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

Packet 1 ~200ms

TCP Data Transmission

Follower Leader

Page 33: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

Packet 1 ~200ms

~400msPacket 1

TCP Data Transmission

Follower Leader

Page 34: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

Packet 1 ~200ms

~400msPacket 1

~800msPacket 1

TCP Data Transmission

Follower Leader

Page 35: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

Packet 1 ~200ms

~400msPacket 1

~800ms

~

120sec

Packet 1

Packet 1 120sec

CLOSED

15 retries…

TCP Data Transmission

Follower Leader

Page 36: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

TCP Retransmission (Linux Defaults)

• Retransmission timeout (RTO) is based on latency

• TCP_RTO_MIN = 200 ms

• TCP_RTO_MAX = 2 minutes

• /proc/sys/net/ipv4/tcp_retries2 = 15 retries

• 0.2 + 0.2 + 0.4 + 0.8 + … + 120 = 924.8 seconds (15.5 mins)

Page 37: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHEDPacket 1

Packet 1~200ms

Packet 1 ~200ms

~400msPacket 1

~800ms

~

120sec

Packet 1

Packet 1 120sec

CLOSED

15.5 mins (or more)

TCP Data Transmission

Follower Leader

Page 38: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Timeline1. Network trouble begins - packet loss / latency2. Follower falls behind, restarts, requests snapshot3. Leader begins to send snapshot4. Snapshot transfer stalls5. Follower ZooKeeper restarts, attempts to close connection 6. Network heals 7. … Leader still stuck

Page 39: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Timeline1. Network trouble begins - packet loss / latency2. Follower falls behind, restarts, requests snapshot3. Leader begins to send snapshot4. Snapshot transfer stalls5. Follower ZooKeeper restarts, attempts to close connection6. Network heals7. … Leader still stuck

Page 40: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHED

FIN/ACK

FIN

ACK

LAST_ACK

CLOSED

TIME_WAIT

CLOSED

60 seconds

FIN_WAIT1

TCP Close Connection

Follower Leader

Page 41: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHED

CLOSED~1m40s

FIN_WAIT1 FINFINFIN

FIN

FIN

8 retries ~

TCP Close Connection

Follower Leader

Page 42: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHED

CLOSED~1m40s

FIN_WAIT1 FIN Packet 1

CLOSED~15.5 mins

TCP Close Connection

Follower Leader

Page 43: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHED

CLOSED~1m40s

FIN_WAIT1 FIN Packet 1

CLOSEDRST

TCP Close Connection

Follower Leader

Page 44: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

06:51:47iptables:WARN:IN=eth0OUT=MAC=00:0d:12:34:56:78:12:34:56:78:12:34:56:78SRC=<leader_ip>DST=<follower_ip>LEN=54TOS=0x00PREC=0x00TTL=44ID=36370DFPROTO=TCPSPT=3888DPT=36416WINDOW=227RES=0x00ACKPSHURGP=0

syslog - Dropped Packets on Follower

Page 45: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

ESTABLISHED ESTABLISHED

CLOSED~1m40s

FIN_WAIT1 FIN Packet 1

TCP Close Connection

Blocked by iptablesX

Follower Leader

XX

Page 46: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

iptablesiptables-AINPUT-mstate--stateESTABLISHED,RELATED-jACCEPT

iptables-AINPUT-ptcp--dport80-jACCEPT

... more rules to accept connections …

iptables-AINPUT-jDROP

Page 47: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

iptablesiptables-AINPUT-mstate--stateESTABLISHED,RELATED-jACCEPT

iptables-AINPUT-ptcp--dport80-jACCEPT

... more rules to accept connections …

iptables-AINPUT-jDROP

But: iptables connections != netstat connections

Page 48: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

conntrack Timeouts

• From linux/net/netfilter/nf_conntrack_proto_tcp.c:

• [TCP_CONNTRACK_LAST_ACK] = 30 SECS

Page 49: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Follower Leader

CLOSED

~51.2s

FIN_WAIT1 FINFINFIN

FIN

FIN~25.6s

kernel TCPconntrackLAST_ACK

30s

30s

30s

30s

CLOSED

~12.8s

30s

~81.2s~102.4s

TCP Close Connection

Page 50: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

The Full Story

• Packet loss

• Follower falls behind, requests snapshot

• (Packet loss continues) follower closes connection

• Follower conntrack forgets connection

• Leader now stuck for ~15 mins, even if network heals

Page 51: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

(Alternative: kill the follower)

Reproducing (1/3) - Setup

• Follower falls behind:tcqdiscadddeveth0rootnetemdelay500ms100msloss35%

• Wait for a few minutes

Page 52: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Reproducing (2/3) - Request Snapshot

• Remove latency / packet loss:tcqdiscdeldeveth0rootnetem

• Restrict bandwidth:tcqdiscadddeveth0handle1:roothtbdefault11tcclassadddeveth0parent1:classid1:1htbrate100kbpstcclassadddeveth0parent1:1classid1:11htbrate100kbps

• Restart follower ZooKeeper process

Page 53: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Reproducing (3/3) - Close Connection

• Block traffic to leader:iptables-AOUTPUT-ptcp-d<leaderip>-jDROP

• Remove bandwidth restriction:tcqdiscdeldeveth0root

• Kill follower ZooKeeper process, kernel tries to close connection

• Monitor conntrack status, wait for entry to disappear, ~80 seconds:conntrack-L|grep<leaderip>

• Allow traffic to leader:iptables-DOUTPUT-ptcp-d<leaderip>-jDROP

Page 54: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

IPsec

Page 55: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Follower Leader

ESP (UDP)

ESP (UDP)IPsec

TCP dataIPsec TCP data

IPsec

Page 56: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

IPsec Phase 1

IPsec Phase 2

TCP data

IPsec - Establish Connection

Follower Leader

Page 57: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

TCP data

IPsec - Dropped Packets

TCP data

IPsec Phase 1

IPsec Phase 2

Follower Leader

Page 58: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

IPsec Heartbeat

IPsec - Heartbeat

TCP data

TCP data

IPsec Phase 1

IPsec Phase 2

Follower Leader

Page 59: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Lessons

Page 60: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Lesson 1

• Don’t lock and block

• TCP can block for a really long time

• Interfaces / abstract methods make analysis harder

Page 61: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Lesson 2

• Automate debug info collection (stack trace, heap dump, transaction logs, etc)

Page 62: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Lesson 3

• Application/dependency checks should be deep health checks!

• Leader/follower heartbeats should be deep health checks!

Page 63: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

Questions?Link: “Network issues can cause cluster to hang due to near-deadlock”https://issues.apache.org/jira/browse/ZOOKEEPER-2201

Page 64: Debugging Distributed Systems - Devoxx Belgium 2016 [Extended]

Donny Nadolny, PagerDuty#Devoxx #distsys

“Mess With The Network” Cheat Sheet#addlatencytcqdiscadddeveth0rootnetemdelay500ms100msloss25%

#removelatencytcqdiscdeldeveth0rootnetem

#restrictbandwidthtcqdiscadddeveth0handle1:roothtbdefault11tcclassadddeveth0parent1:classid1:1htbrate100kbpstcclassadddeveth0parent1:1classid1:11htbrate100kbps

#removebandwidthrestrictiontcqdiscdeldeveth0root#tip:whendoinglatency/loss/bandwidthrestriction:#run"sleep60&&<tcdeletecommand>&disown"incaseyoulosesshaccess

#capturepackets,thenopenlocallyinwiresharktcpdump-n"srchost123.45.67.89ordsthost123.45.67.89"-ieth0-s65535-w/tmp/packet.dump

iptables-AOUTPUT-ptcp--dport4444-jDROP#blocktrafficiptables-DOUTPUT-ptcp--dport4444-jDROP#allowtraffic#canuseINPUT/OUTPUTchainforincoming/outgoingtraffic#otheroptions:--dport<destport>,--sport<srcport>,-s<sourceip>,-d<destip>

#configuredatabase/applicationlocaldatadirectorytobe/mnt,[email protected]:/tmp/data/mnt#alternative:nbd(networkblockdevice)

netstat-peanut#networkconnections,regularkernelviewconntrack-L#networkconnections,iptablesview