Dumb Ways To Die: How Not To Write TCP-based Network Applications

Preview:

DESCRIPTION

Thanks to http://www.youtube.com/watch?v=IJNR2EpS0jw :-)

Citation preview

1/x

Artyom Gavrichenkov

HOW[NOT]TO Write TCP-based Network Applications

2

Based on a True Story

• NOT AN AD!• Qrator: distributed network

● Custom TCP/IP at the bottom● Custom management protocol at the top● Interacting with plenty of Web servers and Web browsers

on a daily basis● 2 years of continuous debug^W Product ImprovementTM

Issue #1

• Message delivery is unreliable in TCP.

Issue #1

• Message delivery is unreliable in TCP: there's no estimation on when (and if) the message will arrive at all

• Timeouts!

• Limit all resources, including time

• No action is itself an action

Timeouts

• Between recvfrom()

• Between requests

• Request timeout

• Lifetime of a session

• Lifetime of %OBJECTNAME%

• Long polling may be a bad idea

Ex. 1

• Slowloris (Apache): DoS● (not distributed, just denial of service)

• Slow HTTP POST● Apache, IIS, Lighttpd: DoS● Nginx: DDoS with a botnet

Ex. 2

12 rpm AJAX page update● Backup script switched the server off

Content-Length

– Limit resources for all actions

– Custom protocol should define limits on the input length

errno(3)

– The connection may be closed for no good reason

– Check errno after recvfrom(), sendto(), etc.● ENOMEM● ECONNRESET● EANYTHING

Ex. 3

● Internet Explorer: ECONNRESET means successful connection termination

– Download status is being ignored

– Content-Length is being ignored

Memory limits

– Resource limits:● Maximum

– ENOMEM● Minimum

– idle wait → ECONNRESET

Ex. 4

– DNS TTL● Too big: days of downtime (continuous)● Too small: days of downtime (total)

Latency

– 3-Way Handshake takes time– Do implement persistent connections!

● Do it from the very beginning

They haven't listened to me!

● TCP

– T/TCP● HTTP/1.0

– HTTP/1.1

Optimization

– Measure!– Profile!– Emulate packet loss!

Optimization– Text-based protocols are convenient to debug

● And you will debug– Maybe even in production

– Making use of binary protocols is often a premature optimization

● BSON, Google Protocol Buffers

Optimization

● TCP socket options:

– TCP_NODELAY: disables Nagle's algorithm● Speedup with small portions of data

– TCP_CORK (Linux): multiple portions of data in a single TCP segment

– "socket corking"

Optimization

● TCP stack options:

– Linux: /proc/sys/net/**● net.ipv4.tcp_fin_timeout● net.ipv4.tcp_{,r,w}mem● net.core.{r,w}mem_max

– Windows: HKLM\System\CurrentControlSet\Services\Tcpip\Parameters

IPv6

● Accidental IPv6 deployment

21

• SO_REUSEADDR• sendfile(2)• select(2)/poll(2)/epoll(7)• {n,h}to{n,h}{s,l}()• int64_t vs long

This is it!Artyom Gavrichenkov <ximaera@highloadlab.com>