24
TCP – transmission control protocol Suguru Yamaguchi 2014 Information Network 1 Functions that transport layer provides ! Model: inter-process communication Identification of process Communication pair of processes ! Interfaces for upper layers Connection oriented (virtual circuit) Connectionless (datagram) ! Contention and coordination of network resources Flow control, maximizing peer benefit. Congestion control, maximizing network welfare. 2014 Information Network 1

TCP – transmission control protocol - NAIST · PDF fileTCP – transmission control protocol Suguru Yamaguchi 2014 Information Network 1 Functions that transport layer provides !

  • Upload
    vanminh

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

TCP – transmission control protocol�

Suguru Yamaguchi

��2014 Information Network 1 �

Functions that transport layer provides�! Model: inter-process communication

–  Identification of process –  Communication pair of processes

! Interfaces for upper layers –  Connection oriented (virtual circuit) –  Connectionless (datagram)

! Contention and coordination of network resources –  Flow control, maximizing peer benefit. –  Congestion control, maximizing network welfare.�

��2014 Information Network 1 �

Transport protocols in Internet protocol suites�! TCP

–  Connection-oriented –  Almost all applications are using. –  Powerful functions

! UDP –  Connectionless –  Simple, less overhead. IP + process identification

! others –  Many implementations and standards,…

•  SCTP, RTP, DCCP, …..�

��2014 Information Network 1 �

Process and connection�! Identification of “process”

–  (IP, port)

! Identification of TCP connection –  (source IP, source port, destination IP, destination port)�

80�3175�

25�1040�

163.221.52.100� 203.178.136.36�

(163.221.52.100, 1040)� (203.178.136.36, 25)�

connection�

connection�

��2014 Information Network 1 �

port�! Port is defined for each transport protocol, separately.

–  TCP/25 is NOT equal to UDP/25 –  The number has meaning.

•  IANA manages the numbers. •  Well-known port: 1 – 1023

–  www (world wide web) = 80 –  smtp (simple mail transfer protocol) = 25

•  Registered port: 1024 – 49151 –  Registration to IANA

•  Private port: 49152 – 65535 –  http://www.iana.org/assignments/port-numbers �

��2014 Information Network 1 �

TCP service model (1)�! Connection-oriented ! Byte-stream service

–  No explicit boundary among messages –  Message structure defined by applications

! Full duplex –  Independent stream for sending and receiving

! Reliable –  Managing message order, duplications, discarding, and bit

errors.

TCP being viewed as byte-stream service

OLLEH� OLLEH�OK�OK�

��2014 Information Network 1 �

Reliable steam, how?�! ACK: acknowledgement

–  Active acknowledgement –  Duplicate ACK = notification of “packet drop”

! Timeout and retransmission –  In the case the sender does not receive ACK from its

receiver, TIMEOUT! –  Suppose the message transmission did incomplete with

some errors, sender does retransmission again for its receiver.

•  Exponential back-off

�2014 Information Network 1 �

ACK

Packet in transit

Sent but unacknowledged Sent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Insti

10

16

�2014 Information Network 1 �

Piggybacking: speed up for ACK�

Packet in transit

Sent but unacknowledged Sent and acknowledged

User data arrives

Sender

Nara Institute of Science and Technology

Nara Insti

Graduate School of Information Science

Graduate S

User data arrives

*** Not accurate �

Receiver

Sender Receiver

Sent but unacknowledged Sent and acknowledged 2014 Information Network 1 � ��

Duplicate ACK

Outstanding packets

Sent but unacknowledged Sent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Institute o

Packet loss

10

16

16

���2014 Information Network 1 �

TCP header�

IP Header

TCP Header TCP data

TCP segment

16bit source port 16bit destination port

32bit sequence number

32bit acknowledgment number

4bit hlen 16bit window size

16bit TCP checksum 16bit urgent pointer

(options)

(TCP data)

reserved flags

20 octets

���2014 Information Network 1 �

Nagle algorithm�! Q. header (20bytes+20bytes) is too large for 1byte

data. How can we deal with this?

! Nagle algorithm –  Only one unacknowledged small segment in the connection

•  If the sending segment is smaller than its receiver buffer, wait until it exceeds, or wait predefined time for transmission

•  Small RTT - small waiting time •  Large RTT – fill up the buffer for good throughput

2014 Information Network 1 � ���

TCP service model (2)�! Buffered transfer

–  Write messages as long as you want –  No explicit synchronization needed in application layer –  OS manages status of processes.

! Virtual circuit –  Connection setup & release –  Detecting disconnection in communication

2014 Information Network 1 � ���

Buffered transfer�

2014 Information Network 1 � ���

OS kernel�

process�

Send buffer�

Recv buffer�

write()� read()�

OS kernel�

process�

Send buffer�

Recv buffer�

write()� read()�

TCP connection�

TCP header again�

Sender port #�

Sequence #�

Receiver port #�

ACK #�

Window size�

Checksum� Pointer to OOB�

TCP option�

Hdr len�rsv�

20 octets�

FIN

SY

N

RS

T P

SH

A

CK

U

RG

flags�

2014 Information Network 1 � ���

TCP connection setup - 3-way Handshake

Client (active open)�

Server (passive open)�

LISTEN SYN_SENT

SYN_RECEIVED

ESTABLISHED

SYN J

SYN K, ACK J+1

ACK K+1

ESTABLISHED

2014 Information Network 1 � ���

TCP connection release�

��

close FIN

Ack of FIN

ACK of FIN

FIN close

2014 Information Network 1 �

TCP connection reset�! RST

–  Abortive release –  Nonexistent port�

2014 Information Network 1 � ��

Options�! TCP options in 3-way handshake

–  Negotiation on options in 3way handshake –  MSS option – Maximum segment Size negotiation –  Window scale option

•  For huge message buffer, larger than 64k, with bit shift •  High speed networks

–  Timestamp option •  More accurate RTT measurement •  With MSS option

–  Many options available

2014 Information Network 1 � ���

TCP state transition�CLOSED

LISTEN

SYN_RCVD

FIN_WAIT_1

FIN_WAIT_2 TIME_WAIT

CLOSING

SYN_SENT

CLOSE_WAIT

LAST_ACK

ESTABLISHED

Active open�

Data transmission�

Simul.close�

2MSL �������

Passive open�

start�

Active close�

Passive close�

recv: ACK send: <nodata>

appl: CLOSE Or timeout�

recv: SYN send: SYN, ACK Simul. open�

recv: FIN send: ACK

recv: FIN send: ACK

recv: FIN send: <nodata>

recv: FIN send: ACK

recv: ACK send: <nodata>

recv: CLOSE send: FIN

recv: CLOSE send: FIN

appl: passive opem send: <nodata>

Server �Client �

2014 Information Network 1 � ���

Summary�! Functions in transport layer (L4)

! Internet transport protocol ! TCP service model ! High performance: ACK, piggybacking, Nagle

algorithm ! Connection management �

2014 Information Network 1 � ���

Tcpdump – 3way handshake�# tcpdump tcp and host iplab.naist.jp 15:26:50.965563 IP rm.naist.jp.64868 > iplab.naist.jp.http: S 2196338486:2196338486(0) win

32120 <mss 1460,nop,wscale 0,nop,nop,timestamp 234659186 0,sackOK,eol> 15:26:51.013517 IP iplab.naist.jp.http > rm.naist.jp.64868: S 2951392133:2951392133(0) ack

2196338487 win 57344 <mss 1414,nop,wscale 0,nop,nop,timestamp 10980172 234659186>

15:26:51.013634 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 1 win 32246 <nop,nop,timestamp 234659187 10980172>

2014 Information Network 1 � ���

Time src.port > dst.port flag [ from:to(nbytes) | ack # ] win # opt�

� 32bit sequence number & acknowledgement number � flags�

Tcpdump – connection release�15:26:51.149121 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 5857 win 30554

<nop,nop,timestamp 234659188 10980187> 15:27:06.103280 IP iplab.naist.jp.http > rm.naist.jp.64868: F 5857:5857(0) ack 430 win 58296

<nop,nop,timestamp 10981679 234659188> 15:27:06.103372 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 5858 win 32246

<nop,nop,timestamp 234659337 10981679> 15:27:10.938811 IP rm.naist.jp.64868 > iplab.naist.jp.http: F 430:430(0) ack 5858 win 32246

<nop,nop,timestamp 234659385 10981679> 15:27:10.961089 IP iplab.naist.jp.http > rm.naist.jp.64868: . ack 431 win 58296

<nop,nop,timestamp 10982169 234659385> �

2014 Information Network 1 � ���

Play with tcpdump�! Tcpdump – microscope of TCP communication

–  RST use –  Packet transmission order –  TCP option

•  MSS options •  Window scale options

2014 Information Network 1 � ���

TCP – flow control & congestion control�

Suguru Yamaguchi�

���2014 information Network 1�

Contention and coordination of resources �! Flow control

–  Negotiation of processing performance –  Recovery from message disorders –  Recovery from message duplication, discard and bit errors –  Maximizing performance of data transmission

! Congestion control –  Sharing network bandwidth among connections, suppressing

network congestions. –  Fair sharing –  Maximizing network welfare

2014 information Network 1� ���

Flow control�! Stop-and-wait ! Go Back N ! Selective repeat

! Many schemes –  ARQ (Adaptive Repeat reQuest)�

2014 information Network 1� ��

2014 information Network 1 28

Stop-and-wait ARQ

t1 t2 t3

t4

t5 t1 Sender

Receiver

t1: transmission delay t2: frame transmission time t3: frame processing time t4: ACK transmission time t5: ACK processing time�

2014 information Network 1 29

Go-back-N ARQ

1 6 5 3 4 5 4 3 2

1 6 5 3 4 5 4 2

Timeout on frame3�!!

TCP flow control �! End to End

–  No global coordination –  Working with available bandwidth estimation at individual

hosts –  No interference with intermediate routers –  Implicit signaling through packet drops

! Scalable –  Working at each end host

•  Autonomous → less state management → Scalable�

2014 information Network 1� ���

End to end control in TCP�

2014 information Network 1� ���

Possible packet drop in Intermediate routers (both data and ACK)

Data flow�

ACK flow�

-  timer & duplicate ACK -  delayed ACKing -  window size notification -  buffering for reordering packets�

-  timer & retransmission -  packet interval handlings -  on-the-fly packet control -  buffering for retransmission�

Many contributions for TCP�! Very simple algorithm

–  Macroscopic self-stabilization

! No assumption with Greedy nodes –  No global control system –  No greedy node for eating bandwidth as much as possible –  Reject the idea of intermediate policing system

! For many data-links –  General purpose –  Modest performance on almost all data-links

! Long term tuning for last 20 years�

2014 information Network 1� ���

TCP flow control�! Bandwidth usage coordination

–  Sliding window

! Sequence number based control –  Window size

! Packet gap control –  ACK clocking

! others –  Error detection - TCP checksum –  Discard detection - duplicate ACK, timeout

2014 information Network 1� ���

Window size�

Sliding window

Packets in transit (on-the-fly packets, outstanding packets)

Sent but unacknowledged Sent and acknowledged

User data arrives

Sender

Receiver

Nara Institute of Science and Technology

Nara Insti

10

16

���2014 information Network 1�

Sequence number

Advertisement window size from receivers�

! Flow control of classic TCP

! rwnd: advertisement window –  Notification from receiver, on maximum receivable packet

size

–  Coordination with sender’s sliding window size

–  Too sensitive on bottleneck link

2014 information Network 1� ���

ACK clocking�

2014 information Network 1� ���

Data flow�

ACK flow�

Packets in the Bottle neck, with packet gap T �

•  Transmission with receiving speed of ACK (bottle neck speed) • self clocking in its balancing stituation�

data�

ACK gen.�

(gap �)�

TCP congestion control – TCP tahoe�! Fair-share model: End to end ! Increase/decrease of Window size

–  Additive increase –  Multiplicative decrease

•  For Self-stabilization (Jain, et.al)

! Strategy on changing of Window size\ ! Detect congestion through packet drops�

2014 information Network 1� ��

More control parameters – TCP tahoe�! Parameters in sender

–  Cwnd •  congestion window •  Init 1

–  Ssthresh •  slow start threshold, •  Init large

–  Tcprecvthresh •  dup ACK number for fast recovery, •  Init 3 for many implementations

2014 information Network 1� ��

Increasing Window size�! Increase congestion window (cwnd) exponentially, by

slow start threshold (ssthresh)

! Overview of algorithm –  On receiving an ACK:

If (cwnd < ssthresh) { /* slow start */ send 2 packets on every ACK; /* exponentially growth*/ cwnd += 1;

} else { /* congestion avoidance */ send cwnd+1 packets on every ACK, cwnd += 1 / cwnd; /* liner behavior */

}�2014 information Network 1� ���

Increasing Window size�! Slow start

–  Exponential increase

! Congestion avoidance –  Additive increase –  Liner growth

2014 information Network 1� ���

slow start�

congestion avoidance�

T�

# of packets�

Reducing Window size (idea)�! In the case the transmission exceeds maximum

throughput… –  Packet drop may occur, because buffer overrun.

! In the case of packet drop, … –  Return Duplicate ACK

•  Congested, but not serious (because ACK was traveled) •  Maybe OK for retransmission

–  Timeout!! •  Retransmission Time Out (RTO) •  ACK cannot travel back, so serious heavy congestion…. •  It’s better to wait some..

2014 information Network 1� ���

Reducing Window size�(overview of algorithm) ! On detecting packet drop:

If (dup ACK # == tcprecvthresh ) { /* fast retransmit */

retransmission; ssthresh = cwnd / 2; cwnd = 1; /* again slow start */

} If (timeout) {

retransmission; timeout *= 2; /* exponentially backoff */ cwnd = 1;

}�

2014 information Network 1� ���

Totally, TCP behaves like this… �

2014 information Network 1� ���

slow start�

T�

# of packets�

Max throughput (may change)�

RTO Calculation�! Err = M – A

A ← A+ gErr D ← D + h(|Err| -D) RTO = A + 4D

–  A: smoothed RTT –  D: smoothed mean deviation –  g: gain for the average (1/8) –  h: gain for the deviation (1/4)

! simply… –  RTO = {average RTT} + 4 × {smoothed mean deviation}

2014 information Network 1� ���

More improvement – TCP reno�! Issues Tahoe

–  Too much penalty on doing slow start after Fast retransmit –  More good control on cwnd

! Fast recovery If (dup ACK # == tcprecvthresh) {

retransmission; /* fast retransmit */ ssthresh = cwnd / 2 ; cwnd = cwnd/2 + tcprecvthresh;

} If (dup ACK # > cwnd/2 )

send new one packet on every dup ACK; If (ACK on retransmission)

cwnd = ssthresh ;

2014 information Network 1� ���

Less penalty…�

2014 information Network 1� ���

slow start�

T�

# of packets�

Maximum throughput (may change)�

cwnd /2�

More improvement �! Selective Acknowledgement (SACK) ! Rate flow control – TCP vegas ! TFRC - TCP Friendly Rate Control (RFC4828)

! Explicit Congestion Notification (ECN) ! Interaction with RED ! TCP extensions for wireless links ! ….�

2014 information Network 1� ��

Summary�! Flow control

–  Stop-and-Wait –  Go back N –  Sliding window

! Congestion control –  Slow start –  Congestion avoidance –  Fast retransmit –  Fast recovery�

2014 information Network 1� ��