22
TIME-WAIT Hack For High Performance Ephemeral Connection in Linux TCP Stack E A Faisal [email protected]

TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

Embed Size (px)

DESCRIPTION

Slides used for presentation during MOSC Q4 Meetup 2015

Citation preview

Page 1: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TIME-WAIT HackFor High Performance Ephemeral Connection in

Linux TCP Stack

E A [email protected]

Page 2: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

$ whoami

Engku Ahmad Faisal

⇛ github.com/efaisal⇛ twitter.com/efaisal⇛ facebook.com/eafaisal⇛ plus.google.com/u/0/+EAFaisal

Linux user since 1996/1997

Attempted to contribute to open source projects:few accepted, most rejected ;-P

Page 3: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

$ whoami

Worked with Nexo Prima Sdn Bhd

● Open Source Cloud Infrastructure○ Virtualisation: oVirt/OpenStack○ Storage: Gluster/Ceph

● High Availability & Scalability Infrastructure○ Linux-based solutions

● System Performance Tuning & Profiling○ Focusing on web-based application on Linux platform

Page 4: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP STATE MACHINE

Page 5: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: ACTIVE CLOSE

3-way handshakeESTABLISHED

CLOSED

CLOSING

TIME_WAIT

FIN_WAIT_1

FIN_WAIT_2

Active C

lose

2MSL Timeout

close()/fin

ack/-

fin/ack

fin+ack/ackack/-

fin/ack

Page 6: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: ACTIVE CLOSE

● By the initiator of close()● TIME-WAIT & 2MSL are there for good reasons:

○ due to nature of Internet - packet lost, re-transmission, arrives late○ to ensure the other end properly closed

● RFC 793 states 2MSL should be 4 minutes● 2MSL:

○ MS Windows - 4 minutes○ Linux - 1 minute (hard coded)

TIME-WAIT is good for TCP communication over the Internet

Page 7: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: PASSIVE CLOSE

3-way handshakeESTABLISHED

CLOSED

LAST_ACK

CLOSE_WAIT Passive C

losefin/ack

close()/fin

ack/-

Page 8: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

TCP :: PASSIVE CLOSE

● By the receiver of close()● CLOSE-WAIT

○ waits up to 60 seconds in Linux○ configurable via tcp_fin_timeout

● WARNING!Some resources on the Web wrongly informed their readers to tweak tcp_fin_timeout to tune TIME-WAIT

Page 9: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

WEB APPLICATION OF TODAY

Page 10: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SIMPLIFIED WEB APP STACK

Client

Load Balancer

Web App

Database MQCacheRESTAPI

Page 11: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

WEB APP STACK

● Supporting services for Web App layer typically use TCP as transport protocol● Web App layer is both:

○ TCP server listening to connection from the client○ TCP client connecting to various supporting services

● Consider a LAMP stack + memcached server○ Each HTTP request, creates/opens a TCP connection to the memcached○ At the end of the request, the connection is closed○ OMG! Ephemeral connection!

○ If we have more supporting services (MQ, REST API, etc), there might be more open/close

operations for each request○ HTTP is considered ephemeral by “nature”

Page 12: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

IMPACT AND PROBLEMS

Page 13: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

BUSY SERVER WITH EPHEMERAL CONNECTIONS

● Busy server, e.g. 1,000 HTTP requests/second● Web App layer also open TCP connection to backend services at that rate or

more● In 1 minute, we’re going to have thousands lingering TCP TIME_WAIT● You can check using netstat or ss command

$ ss -nt state time-wait$ netstat -tn | grep TIME_WAIT

Page 14: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

PROBLEMS: CONNECTION TABLE SLOT

Connection in TIME-WAIT state hold a local port for 1 minute

Local port range is finite - 16-bit integer

In many distro, default to around 30,000

Can be changed: net.ipv4.ip_local_port_range

If local port range is exhausted, any connect() results in EADDRNOTAVAIL

Page 15: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

PROBLEMS: ADDITIONAL MEMORY & CPU USAGE

● Memory Usage to Hold Socket Structure○ Though not really significant but annoying enough

● Additional CPU Usage○ Searching for free port uses CPU○ Wasting CPU cycle to iteratively purge tons TIME_WAIT connections

Page 16: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

EXISTING & POTENTIAL SOLUTIONS

Page 17: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SOLUTION 1: tcp_tw_reuse

From Linux doc:“Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts.”

Commonly recommended to be enabled$ echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Dependent on another kernel param to be enabled: net.ipv4.tcp_timestamps

Does it really work?

Page 18: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

SOLUTION 2: TIME-WAIT NEGOTIATION

Proposed by Theodore Faber, Joe Touch & Wei Yue from University of Southern California in 1999

No code available, claimed have experimental code written for SunOS 4.1.3

Involves modifying TCP by adding a new TCP option called TW-Negotiate, negotiated during the three-way handshake

Not a viable solution, simply a theoretical one

Page 19: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

INTRODUCING LINUXTCPTW

Page 20: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

LINUXTCPTW

Implementation of an old idea

● Once discussed in kernel core dev mailinglist to make TIME-WAIT tunable● Rejected by kernel core dev - TIME-WAIT is there for good reasons● Easily abused to make TCP non-compliant to standard● Open source project to create patch set to the kernel for configurable TIME-

WAIT● Introduce a new kernel param - tcp_timewait_len● A new entry in proc fs - /proc/sys/net/ipv4/tcp_timewait_len● Able to use sysctl for configuration - net.ipv4.tcp_timewait_len

Page 21: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

THE PROJECT

Project lives at https://github.com/efaisal/linuxtcptw/

Binary release available for CentOS 6 and 7 at https://github.com/efaisal/linuxtcptw/releases

Unfortunately not battle tested in production environment yet - any volunteer?

Currently working on Ubuntu 14.04 LTS kernel

Page 22: TIME-WAIT Hack for High Performance Ephemeral Connection in Linux TCP Stack

THANK YOU