36
TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

Embed Size (px)

Citation preview

TCP loss sensitivity analysisADAM KRAJEWSKI, IT-CS-CE

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Original problem IT-DB was backuping some data to Wigner CC. High transfer rate was required in order to avoid service degradation.

But the transfer rate was:◦ Initial: 1G◦ After some tweaking: 4G

The problem lies within TCP internals!

Meyrin Wigner

4G out of... 10G

2 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP introduction

TCP (Transmission Control Protocol) is a protocol providing reliable, connection-oriented transport over network.

It operates at Transport Layer (4th) of the OSI reference model.

Defined in RFC 793 from 1981. [10 Mb/s at that time! ]

Its reliability means that it guarantees no data will be lost during communication.

It is connection-oriented because it creates and maintains a logical connection between sender and receiver.

Application

Presentation

Session

Transport

Network

Data Link

Physical

3 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP operation basics: connection init

Connection initiation: 3-way handshake

Sequence counters setup:◦ Client sends random SEQ number (ISN – Initial

Sequence Number)◦ Server confirms it (ACK=SEQ+1) and sends its

own SEQ number◦ Client confirms and syncs to server’s SEQ

number

SYN, SEQ=1

ACK=8

SYN, ACK=2, SEQ=7

ServerClient

4 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP is reliable

TCP uses SEQ and ACK numbers to monitor which segments have been successfully transmitted.

Lost packets are easily detected and retransmitted. This provides reliability.

...but do we really need to wait for every ACK?

SEQ=1, Len=1000

ACK=1001

ServerClient

SEQ=1001, Len=1000

SEQ=1001, Len=1000

Timeout

5 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP window (1)Let us assume that we have a client and a server connected with a 1G Ethernet link with 1 ms delay...

RTT = 2ms, B = 1 Gbps

CLIENT SERVERDATA

ACK

2ms

If we send packets of 1460 bytes each and wait for acknowledgment, then the effective transfer rate is 1460B/2ms = 5.84 Mbps. So we end up using 0.5% of available bandwidth.

6 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP window (2)In order to use the full bandwidth we need to send at least

unacknowledged bytes. This way we make sure we will not just sleep and wait for ACKs to come, but keep sending traffic in the meantime.

The factor is referred to as Bandwidth-Delay product.

CLIENT SERVER1

ACK-1

2ms

2

ACK-2

3

ACK-3

4

ACK-4

5

ACK-5

7 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP window (3) Client sends a certain amount of segments one after another. As soon as first ACK is received it assumes transmission is fine and continues to send segments.

Basically, at each time t0 there are W unacknowledged bytes on the wire.

W is referred to as TCP Window Size

W != const

SEQ=1, Len=1460

ACK=1461

ServerClient

SEQ=1461, Len=1460

SEQ=2921, Len=1460SEQ=4381, Len=1460

SEQ=5841, Len=1460

SEQ=8761, Len=14601461 2921 4381 5841 7301 8761 SEQ1

W ACK=2921

8 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Window growth

t

Wdef

W

W1

0.5 W1

DROP

Then, it grows by 1 every RTT:wnd ← wnd + 1

This is an AIMD algorithm (Additive Increase Multiplicative Decrease).

But if a packet loss is detected, the window is cut in half:wnd ← wnd/2

It starts with a default valuewnd ← Wdef

α

α

9 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Congestion control Network congestion appears when there is more traffic that the switching devices can handle.

TCP prevents congestion! ◦ It starts with a small window thus low

transfer rate◦ Then the window grows linearly thus

increasing transfer rate◦ TCP assumes that congestion is happening

when there are packet losses. So in case of a packet loss it cuts the window size in half thus reducing transfer rate.

t [s]

B0

B [Gbps]

Bc

0.5 Bc

α

α

10 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Long Fat Network problem

Long Fat Networks (LFNs) ◦ high bandwidth-delay product ()◦ Example: CERN link to Wigner.

To get full bandwith on 10G link with 25ms RTT we need a window:

If drop occurs at peak throughput...

11 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP algorithms (1)The default TCP algorithm (reno) behaves poorly.

Other congestion control algorithms have been developed such as:

◦ BIC

◦ CUBIC

◦ Scalable

◦ Compund TCP

Each of them behaves differently and we want to see how they perform in our case...

12 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Testbed Two servers connected back-to-back.

How the congestion control algorithm performance changes with packet loss and delay ?

Tested with iperf3.

Packet loss and delay emulated using NetEm in Linux:◦ tc qdisc add dev eth2 root netem loss 0.1 delay 12.5ms

13 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

0 ms delay, default settings

0.0001 0.001 0.01 0.1 1 100

1

2

3

4

5

6

7

8

9

10

cubic

reno

bic

scalable

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

14 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

0.0001 0.001 0.01 0.1 1 100

0.2

0.4

0.6

0.8

1

1.2

cubic

reno

bic

scalable

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

25ms delay, default settings

PROBLEM

15 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Growing the window (1) We need...

Default settings:◦ net.core.rmem_max = 124K◦ net.core.wmem_max = 124K◦ net.ipv4.tcp_rmem = 4K 87K 4M◦ net.ipv4.tcp_wmem = 4K 16K 4M◦ net.core.netdev_max_backlog = 1K

WARNING: The real OS numbers are in pure bytes count so less readable.

Window can’t grow bigger than maximum TCP send and receive buffers!

16 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Growing the window (2)We tune TCP settings on both server and client.

Tuned settings:◦ net.core.rmem_max = 67M◦ net.core.wmem_max = 67M◦ net.ipv4.tcp_rmem = 4K 87K 67M◦ net.ipv4.tcp_wmem = 4K 65K 67M◦ net.core.netdev_max_backlog = 30K

WARNING: TCP buffer size doesn’t translate directly to window size because TCP uses some portion of its buffers to allocate operational data structures. So the effective window size will be smaller than maximum buffer size set.

Helpful link: https://fasterdata.es.net/host-tuning/linux/

Now it’s fine

17 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

25 ms delay, tuned settings

0.0001 0.001 0.01 0.1 1 100

1

2

3

4

5

6

7

8

9

10

cubic

reno

scalable

bic

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

18 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

0.0001 0.001 0.01 0.1 1 100

1

2

3

4

5

6

7

8

9

10

cubic

scalable

reno

bic

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

0 ms delay, tuned settings

DEFAULT CUBIC

19 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Bandwith fairness Goal: Each TCP flow should get a fair share of available bandwidth

10G 10G

5G 5G

B

t

10G

5G

20 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Bandwith competition How do congestion control algorithms compete?

Tests were done for:◦ Reno, BIC, CUBIC, Scalable◦ 0ms and 25ms delay

Two cases:◦ 2 flows starting at the same time◦ 1 flow delayed by 30s

10G 10G

10Gcongestion

21 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

cubic vs cubic + offset

23 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

cubic vs bic

0.0001 0.001 0.01 0.1 1 100123456789

10

Congestion control0ms

cubic

bic

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

24 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

cubic vs scalable

0.0001 0.001 0.01 0.1 1 100

2

4

6

8

10

12

Congestion control0ms

cubic

scalable

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

25 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

TCP (reno) friendliness

26 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

cubic vs reno

0.0001 0.001 0.01 0.1 1 100

1

2

3

4

5

6

7

8

9

10

Congestion control25ms delay

cubic

reno

Packet loss rate [%]

Ba

nd

wid

th [G

bp

s]

27 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Why default cubic? (1) ◦ BIC was the default Linux TCP congestion algorithm until kernel version

2.6.19

◦ It was changed to CUBIC in kernel 2.6.19 with commit message:◦ ”Change default congestion control used from BIC to the newer CUBIC which it the successor to

BIC but has better properties over long delay links.”

◦ Why?

28 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Why default cubic? (2)

29 / 37

◦ It is more friendly to other congestion algorithms◦ to Reno...◦ to CTCP (Windows default)...

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Why default cubic? (3) It has better RTT fairness properties

30 / 37

Wigner

Meyrin

0 ms RTT25 ms RTT

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Why default cubic? (4)

31 / 37

0 ms RTT0 ms RTT

25 ms RTT25 ms RTT

RTT fairness test:

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

The real problem

100G

100G

ToR

ToR

LAG

LAG

Packet losses on:• Links (dirty fibers, ...)• NICs (hardware limitations, bugs, ...)

Difficult to troubleshoot!

Lots of networks paths!

Wigner

Meyrin

32 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Conclusions To achieve high transfer rate to Wigner:

33 / 37

User part Networking team part

Collaboration is essential !

• Tune TCP settings!• Follow https://fasterdata.es.net/host-tuning/• Make sure you use at least CUBIC for Linux

and CTCP for Windows

• Avoid BIC:• Better performance on lossy links, but• unfair to other congestion algorithms• unfair to itself (RTT fairness)

• Submit a ticket if you observe low network performance for your data transfer

• Minimizing packet loss throughout the network

• Fiber cleaning

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Setting TCP algorithms (1)

OS-wide setting (sysctl ...):◦ net.ipv4.tcp_congestion_control = reno

◦ net.ipv4.tcp_available_congestion_control = cubic reno

Local setting:◦ setsockopt(), TCP_CONGESTION optname

◦ net.ipv4.tcp_allowed_congestion_control = cubic reno

bic

Add new: # modprobe tcp_bic

34 / 37

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

Setting TCP algorithms (2)

35 / 37

OS-wide settings:

C:\> netsh interface tcp show global

◦ Add-On Congestion Control Provider: ctcp

Enabled by default in Windows Server 2008 and newer.

Needs to be enabled manually on client systems:

◦ Windows 7: C:\>netsh interface tcp set global congestionprovider=ctcp

◦ Windows 8/8.1 [Powershell]: Set-NetTCPSetting –CongestionProvider ctcp

ADAM KRAJEWSKI - TCP CONGESTION CONTROL

References [1] http://indico.cern.ch/event/36803/material/slides/0.pdf

[2] https://www.cnaf.infn.it/~ferrari/papers/myslides/2006-04-03-hepix-ferrari.pdf

36 / 37

TCP loss sensitivity analysisADAM KRAJEWSKI, IT-CS-CE