Upload
vanminh
View
222
Download
1
Embed Size (px)
Citation preview
TCP – transmission control protocol�
Suguru Yamaguchi
��2014 Information Network 1 �
Functions that transport layer provides�! Model: inter-process communication
– Identification of process – Communication pair of processes
! Interfaces for upper layers – Connection oriented (virtual circuit) – Connectionless (datagram)
! Contention and coordination of network resources – Flow control, maximizing peer benefit. – Congestion control, maximizing network welfare.�
��2014 Information Network 1 �
Transport protocols in Internet protocol suites�! TCP
– Connection-oriented – Almost all applications are using. – Powerful functions
! UDP – Connectionless – Simple, less overhead. IP + process identification
! others – Many implementations and standards,…
• SCTP, RTP, DCCP, …..�
��2014 Information Network 1 �
Process and connection�! Identification of “process”
– (IP, port)
! Identification of TCP connection – (source IP, source port, destination IP, destination port)�
80�3175�
25�1040�
163.221.52.100� 203.178.136.36�
(163.221.52.100, 1040)� (203.178.136.36, 25)�
connection�
connection�
��2014 Information Network 1 �
port�! Port is defined for each transport protocol, separately.
– TCP/25 is NOT equal to UDP/25 – The number has meaning.
• IANA manages the numbers. • Well-known port: 1 – 1023
– www (world wide web) = 80 – smtp (simple mail transfer protocol) = 25
• Registered port: 1024 – 49151 – Registration to IANA
• Private port: 49152 – 65535 – http://www.iana.org/assignments/port-numbers �
��2014 Information Network 1 �
TCP service model (1)�! Connection-oriented ! Byte-stream service
– No explicit boundary among messages – Message structure defined by applications
! Full duplex – Independent stream for sending and receiving
! Reliable – Managing message order, duplications, discarding, and bit
errors.
TCP being viewed as byte-stream service
OLLEH� OLLEH�OK�OK�
��2014 Information Network 1 �
Reliable steam, how?�! ACK: acknowledgement
– Active acknowledgement – Duplicate ACK = notification of “packet drop”
! Timeout and retransmission – In the case the sender does not receive ACK from its
receiver, TIMEOUT! – Suppose the message transmission did incomplete with
some errors, sender does retransmission again for its receiver.
• Exponential back-off
�2014 Information Network 1 �
ACK
Packet in transit
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
�2014 Information Network 1 �
Piggybacking: speed up for ACK�
Packet in transit
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Nara Institute of Science and Technology
Nara Insti
Graduate School of Information Science
Graduate S
User data arrives
*** Not accurate �
Receiver
Sender Receiver
Sent but unacknowledged Sent and acknowledged 2014 Information Network 1 � ��
Duplicate ACK
Outstanding packets
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Institute o
Packet loss
10
16
16
���2014 Information Network 1 �
TCP header�
IP Header
TCP Header TCP data
TCP segment
16bit source port 16bit destination port
32bit sequence number
32bit acknowledgment number
4bit hlen 16bit window size
16bit TCP checksum 16bit urgent pointer
(options)
(TCP data)
reserved flags
20 octets
���2014 Information Network 1 �
Nagle algorithm�! Q. header (20bytes+20bytes) is too large for 1byte
data. How can we deal with this?
! Nagle algorithm – Only one unacknowledged small segment in the connection
• If the sending segment is smaller than its receiver buffer, wait until it exceeds, or wait predefined time for transmission
• Small RTT - small waiting time • Large RTT – fill up the buffer for good throughput
2014 Information Network 1 � ���
TCP service model (2)�! Buffered transfer
– Write messages as long as you want – No explicit synchronization needed in application layer – OS manages status of processes.
! Virtual circuit – Connection setup & release – Detecting disconnection in communication
2014 Information Network 1 � ���
Buffered transfer�
2014 Information Network 1 � ���
OS kernel�
process�
Send buffer�
Recv buffer�
write()� read()�
OS kernel�
process�
Send buffer�
Recv buffer�
write()� read()�
TCP connection�
TCP header again�
Sender port #�
Sequence #�
Receiver port #�
ACK #�
Window size�
Checksum� Pointer to OOB�
TCP option�
Hdr len�rsv�
20 octets�
FIN
SY
N
RS
T P
SH
A
CK
U
RG
flags�
2014 Information Network 1 � ���
TCP connection setup - 3-way Handshake
Client (active open)�
Server (passive open)�
LISTEN SYN_SENT
SYN_RECEIVED
ESTABLISHED
SYN J
SYN K, ACK J+1
ACK K+1
ESTABLISHED
2014 Information Network 1 � ���
TCP connection release�
��
close FIN
Ack of FIN
ACK of FIN
FIN close
2014 Information Network 1 �
TCP connection reset�! RST
– Abortive release – Nonexistent port�
2014 Information Network 1 � ��
Options�! TCP options in 3-way handshake
– Negotiation on options in 3way handshake – MSS option – Maximum segment Size negotiation – Window scale option
• For huge message buffer, larger than 64k, with bit shift • High speed networks
– Timestamp option • More accurate RTT measurement • With MSS option
– Many options available
2014 Information Network 1 � ���
TCP state transition�CLOSED
LISTEN
SYN_RCVD
FIN_WAIT_1
FIN_WAIT_2 TIME_WAIT
CLOSING
SYN_SENT
CLOSE_WAIT
LAST_ACK
ESTABLISHED
Active open�
Data transmission�
Simul.close�
2MSL �������
Passive open�
start�
Active close�
Passive close�
recv: ACK send: <nodata>
appl: CLOSE Or timeout�
recv: SYN send: SYN, ACK Simul. open�
recv: FIN send: ACK
recv: FIN send: ACK
recv: FIN send: <nodata>
recv: FIN send: ACK
recv: ACK send: <nodata>
recv: CLOSE send: FIN
recv: CLOSE send: FIN
appl: passive opem send: <nodata>
Server �Client �
2014 Information Network 1 � ���
Summary�! Functions in transport layer (L4)
! Internet transport protocol ! TCP service model ! High performance: ACK, piggybacking, Nagle
algorithm ! Connection management �
2014 Information Network 1 � ���
Tcpdump – 3way handshake�# tcpdump tcp and host iplab.naist.jp 15:26:50.965563 IP rm.naist.jp.64868 > iplab.naist.jp.http: S 2196338486:2196338486(0) win
32120 <mss 1460,nop,wscale 0,nop,nop,timestamp 234659186 0,sackOK,eol> 15:26:51.013517 IP iplab.naist.jp.http > rm.naist.jp.64868: S 2951392133:2951392133(0) ack
2196338487 win 57344 <mss 1414,nop,wscale 0,nop,nop,timestamp 10980172 234659186>
15:26:51.013634 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 1 win 32246 <nop,nop,timestamp 234659187 10980172>
2014 Information Network 1 � ���
Time src.port > dst.port flag [ from:to(nbytes) | ack # ] win # opt�
� 32bit sequence number & acknowledgement number � flags�
Tcpdump – connection release�15:26:51.149121 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 5857 win 30554
<nop,nop,timestamp 234659188 10980187> 15:27:06.103280 IP iplab.naist.jp.http > rm.naist.jp.64868: F 5857:5857(0) ack 430 win 58296
<nop,nop,timestamp 10981679 234659188> 15:27:06.103372 IP rm.naist.jp.64868 > iplab.naist.jp.http: . ack 5858 win 32246
<nop,nop,timestamp 234659337 10981679> 15:27:10.938811 IP rm.naist.jp.64868 > iplab.naist.jp.http: F 430:430(0) ack 5858 win 32246
<nop,nop,timestamp 234659385 10981679> 15:27:10.961089 IP iplab.naist.jp.http > rm.naist.jp.64868: . ack 431 win 58296
<nop,nop,timestamp 10982169 234659385> �
2014 Information Network 1 � ���
Play with tcpdump�! Tcpdump – microscope of TCP communication
– RST use – Packet transmission order – TCP option
• MSS options • Window scale options
2014 Information Network 1 � ���
TCP – flow control & congestion control�
Suguru Yamaguchi�
���2014 information Network 1�
Contention and coordination of resources �! Flow control
– Negotiation of processing performance – Recovery from message disorders – Recovery from message duplication, discard and bit errors – Maximizing performance of data transmission
! Congestion control – Sharing network bandwidth among connections, suppressing
network congestions. – Fair sharing – Maximizing network welfare
2014 information Network 1� ���
Flow control�! Stop-and-wait ! Go Back N ! Selective repeat
! Many schemes – ARQ (Adaptive Repeat reQuest)�
2014 information Network 1� ��
2014 information Network 1 28
Stop-and-wait ARQ
t1 t2 t3
t4
t5 t1 Sender
Receiver
t1: transmission delay t2: frame transmission time t3: frame processing time t4: ACK transmission time t5: ACK processing time�
2014 information Network 1 29
Go-back-N ARQ
1 6 5 3 4 5 4 3 2
1 6 5 3 4 5 4 2
Timeout on frame3�!!
TCP flow control �! End to End
– No global coordination – Working with available bandwidth estimation at individual
hosts – No interference with intermediate routers – Implicit signaling through packet drops
! Scalable – Working at each end host
• Autonomous → less state management → Scalable�
2014 information Network 1� ���
End to end control in TCP�
2014 information Network 1� ���
Possible packet drop in Intermediate routers (both data and ACK)
Data flow�
ACK flow�
- timer & duplicate ACK - delayed ACKing - window size notification - buffering for reordering packets�
- timer & retransmission - packet interval handlings - on-the-fly packet control - buffering for retransmission�
Many contributions for TCP�! Very simple algorithm
– Macroscopic self-stabilization
! No assumption with Greedy nodes – No global control system – No greedy node for eating bandwidth as much as possible – Reject the idea of intermediate policing system
! For many data-links – General purpose – Modest performance on almost all data-links
! Long term tuning for last 20 years�
2014 information Network 1� ���
TCP flow control�! Bandwidth usage coordination
– Sliding window
! Sequence number based control – Window size
! Packet gap control – ACK clocking
! others – Error detection - TCP checksum – Discard detection - duplicate ACK, timeout
2014 information Network 1� ���
Window size�
Sliding window
Packets in transit (on-the-fly packets, outstanding packets)
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
���2014 information Network 1�
Sequence number
Advertisement window size from receivers�
! Flow control of classic TCP
! rwnd: advertisement window – Notification from receiver, on maximum receivable packet
size
– Coordination with sender’s sliding window size
– Too sensitive on bottleneck link
2014 information Network 1� ���
ACK clocking�
2014 information Network 1� ���
Data flow�
ACK flow�
Packets in the Bottle neck, with packet gap T �
• Transmission with receiving speed of ACK (bottle neck speed) • self clocking in its balancing stituation�
data�
ACK gen.�
(gap �)�
TCP congestion control – TCP tahoe�! Fair-share model: End to end ! Increase/decrease of Window size
– Additive increase – Multiplicative decrease
• For Self-stabilization (Jain, et.al)
! Strategy on changing of Window size\ ! Detect congestion through packet drops�
2014 information Network 1� ��
More control parameters – TCP tahoe�! Parameters in sender
– Cwnd • congestion window • Init 1
– Ssthresh • slow start threshold, • Init large
– Tcprecvthresh • dup ACK number for fast recovery, • Init 3 for many implementations
2014 information Network 1� ��
Increasing Window size�! Increase congestion window (cwnd) exponentially, by
slow start threshold (ssthresh)
! Overview of algorithm – On receiving an ACK:
If (cwnd < ssthresh) { /* slow start */ send 2 packets on every ACK; /* exponentially growth*/ cwnd += 1;
} else { /* congestion avoidance */ send cwnd+1 packets on every ACK, cwnd += 1 / cwnd; /* liner behavior */
}�2014 information Network 1� ���
Increasing Window size�! Slow start
– Exponential increase
! Congestion avoidance – Additive increase – Liner growth
2014 information Network 1� ���
slow start�
congestion avoidance�
T�
# of packets�
Reducing Window size (idea)�! In the case the transmission exceeds maximum
throughput… – Packet drop may occur, because buffer overrun.
! In the case of packet drop, … – Return Duplicate ACK
• Congested, but not serious (because ACK was traveled) • Maybe OK for retransmission
– Timeout!! • Retransmission Time Out (RTO) • ACK cannot travel back, so serious heavy congestion…. • It’s better to wait some..
2014 information Network 1� ���
Reducing Window size�(overview of algorithm) ! On detecting packet drop:
If (dup ACK # == tcprecvthresh ) { /* fast retransmit */
retransmission; ssthresh = cwnd / 2; cwnd = 1; /* again slow start */
} If (timeout) {
retransmission; timeout *= 2; /* exponentially backoff */ cwnd = 1;
}�
2014 information Network 1� ���
Totally, TCP behaves like this… �
2014 information Network 1� ���
slow start�
T�
# of packets�
Max throughput (may change)�
RTO Calculation�! Err = M – A
A ← A+ gErr D ← D + h(|Err| -D) RTO = A + 4D
– A: smoothed RTT – D: smoothed mean deviation – g: gain for the average (1/8) – h: gain for the deviation (1/4)
! simply… – RTO = {average RTT} + 4 × {smoothed mean deviation}
2014 information Network 1� ���
More improvement – TCP reno�! Issues Tahoe
– Too much penalty on doing slow start after Fast retransmit – More good control on cwnd
! Fast recovery If (dup ACK # == tcprecvthresh) {
retransmission; /* fast retransmit */ ssthresh = cwnd / 2 ; cwnd = cwnd/2 + tcprecvthresh;
} If (dup ACK # > cwnd/2 )
send new one packet on every dup ACK; If (ACK on retransmission)
cwnd = ssthresh ;
2014 information Network 1� ���
Less penalty…�
2014 information Network 1� ���
slow start�
T�
# of packets�
Maximum throughput (may change)�
cwnd /2�
More improvement �! Selective Acknowledgement (SACK) ! Rate flow control – TCP vegas ! TFRC - TCP Friendly Rate Control (RFC4828)
! Explicit Congestion Notification (ECN) ! Interaction with RED ! TCP extensions for wireless links ! ….�
2014 information Network 1� ��
Summary�! Flow control
– Stop-and-Wait – Go back N – Sliding window
! Congestion control – Slow start – Congestion avoidance – Fast retransmit – Fast recovery�
2014 information Network 1� ��