Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Internet Engineering 2018Transport layer: TCP
Youki KadobayashiNAIST
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
ApplicationPresentation
SessionTransportNetwork
Data LinkPhysical
ApplicationPresentation
SessionTransportNetwork
Data LinkPhysical
2
Transport layer: a birds-eye view
H HRRR R
IPIP IP IPIP
Transport
Routers don’t maintain per-host state
Hosts maintain state for each transport-layer endpoint
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Functions provided by the transport layer
n Communication between processes¨ Identify process¨ Identify inter-process channel
n Interface for upper layer¨ Connection-oriented (virtual circuit)¨ Connectionless (datagram)
n Competition and arbitration of network resource¨ Flow control¨ Congestion control
Next lecture
P Q
(P, Q)
3
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Transport protocols in the Internet protocol suite
n TCP (RFC793)¨ Transmission Control Protocol
¨ Connection-oriented¨ Multiple functions
n for reliability
n SCTP (RFC4960)
n UDP (RFC768)¨ User Datagram Protocol
¨ Connectionless¨ IP & Process identification
n Packets can be lost
n DCCP (RFC4340)Advanced topic
4
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Identifying process and connection in TCP
n Unique identification of process¨(IP, port)
n Unique identification of TCP connection¨ (source IP, source port, destination IP, destination port)
1040
80
22
2137
(203.178.136.36, 22)
163.221.52.100 203.178.136.36
(163.221.52.100, 1040)
connection
connection
process
5
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Some well-known protocols/ports
Port Protocol Use20,21 FTP File transfer
22 SSH Remote login, replacement for telnet23 TELNET Telnet25 SMTP Email 80 HTTP World Wide Web
110 POP-3 Remote email access143 IMAP Remote email access443 HTTPS Secure Web (HTTP over SSL/TLS)543 RTSP Media player control631 IPP Printer sharing
6https://www.iana.org
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
TCP service model (1)
n Connection-orientedn Virtual Circuit
¨ Looks like a circuit, but virtual: no physical wires between specific endpoints
n Adapts to speed¨ Adapts to speed of intermediate networks as well as the
speed of endpoints
7
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Establishing a TCP connection
n 3-way handshaken SYN, SYN-ACK, ACKn Ensure full-duplex communication
16bit source port 16bit destination port
32bit sequence number
32bit acknowledgment number
4bit hlen 16bit window size
16bit TCP checksum 16bit urgent pointer
reserved flags
URG ACK PSH RST SYN FIN
SYN
SYN-ACK
ACK
8
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Establishing TCP connection: a concrete example with Wiresharkn # tshark -i en0 -n -f 'port 80'n tcpdump: listening on de0n Capturing on en0n 0.164720 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [SYN] Seq=0
Win=65535 Len=0 MSS=1460 WS=16 TSval=1135377929 TSecr=0 SACK_PERM=1
n 0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1
n 0.195322 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=1 Ack=1 Win=131760 Len=0 TSval=1135377958 TSecr=190559467
Reply with Sequence number + 1 as an ACK implies acknowledgment
9
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Meanings of Wireshark output
n time source IP -> dest IP TCP source port > dest port [ flags ] Seq=n Ack=n …
n 0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1
10
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Virtual Circuit(2):TCP connection release
FIN
Ack of FIN
FIN
Ack of FIN
close
close
5.749366 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [FIN, ACK] Seq=659 Ack=2449 Win=401096 Len=0 TSval=190560021 TSecr=11353784825.749464 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=2449 Ack=660 Win=131104 Len=0 TSval=1135383482 TSecr=1905600215.749650 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [FIN, ACK] Seq=2449 Ack=660 Win=131104 Len=0 TSval=1135383482 TSecr=1905600215.765279 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [ACK] Seq=660 Ack=2450Win=401096 Len=0 TSval=190560024 TSecr=1135383482
11
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
TCP connection reset
n RST¨ Abortive release¨ Nonexistent port (e.g., dead process)
telnet to sh.naist.jp, port 80 will result in connection reset:n 0.000000 10.0.1.148 -> 163.221.10.10 TCP 63436 > 80 [SYN] Seq=0 Win=65535
Len=0 MSS=1460 WS=16 TSval=1137289324 TSecr=0 SACK_PERM=1n 0.018841 163.221.10.10 -> 10.0.1.148 TCP 80 > 63436 [RST, ACK] Seq=1
Ack=1 Win=0 Len=0
If you kill “Dropbox”:n 55.992214 10.0.1.148 -> 108.160.163.51 TCP 62991 > 80 [RST] Seq=310 Win=0
Len=012
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Virtual Circuit Summary:TCP state transition diagram
State transition:trigger / response
13
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
$ netstatActive Internet connectionsProto Recv-Q Send-Q Local Address Foreign Address (state)tcp4 0 0 45.1.20.101.57456 74.125.235.138.http SYN_SENTtcp4 0 0 45.1.20.101.57455 ey-in-f101.1e100.http ESTABLISHEDtcp4 0 0 45.1.20.101.57454 74.125.235.148.http ESTABLISHED
Troubleshooting TCP with state machine
n netstat
SYN sent, awaiting SYN+ACK response
Many other open-source tools are available:lsof, trpt, tcptraceroute, tcptrace, tcpflow, etc.Try some of these tools in the provided VM image.
14
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Hands-on: observe connection setup/release
In the provided virtual machine,1. Observe connection setup/release;
¨e.g., by using browser within VM2. Observe connection reset by terminating program;
¨e.g., by skill firefox3. Observe connection reset by connecting to
nonexistent port on the working machine.¨e.g., by telnet sh.naist.jp 80
15
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Buffered transfer: adapts to varying speed of endpoints and intermediate networks
TCP connection
sendbuffer
recvbuffer
sendbuffer
recvbuffer
Process
OS kernel
block/unblockRead()Write()
Process
Read()Write()
OS kernel16
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
TCP service model (2)
n Byte-stream service¨ Upper layer can’t see boundaries between packets¨ no boundary: structuring (framing) is needed at upper layer
n Full duplex¨ independent two streams in single connection
n Reliable¨ masks packet reordering, duplication, discard and bit error
O L L E H O L L E HTCP being viewed asbyte-stream serviceOK OK
17
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
How does TCP implement reliable stream service?
n ACK: Acknowledgment¨ Active acknowledgment
n Explicitly acknowledge the receipt of packet¨ Duplicate ACK
n Implicitly communicate the packet loss information
n Timeout and Retransmission¨ Whenever sender doesn’t receive ACK after fixed time,
a “Timeout” event is triggeredp Retransmission: assuming that transmission has failed
n Exponential back-off: 3, 6, 12, …
18
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
ACK: Acknowledgment
Packets in transit
Sent but unacknowledgedSent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
19
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Piggybacking: Exploiting full-duplex channel
Packets in transit
Sent but unacknowledgedSent and acknowledged
User data arrives
SenderReceiver
ReceiverSender
Nara Institute of Science and Technology
Nara Insti
Graduate School of Information Science
Graduate S
User data arrives
Note: rough sketch
Sent but unacknowledgedSent and acknowledged 20
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Duplicate ACK: implicit communication of packet loss
Packets in transit
Sent but unacknowledgedSent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Institute o
Packet loss
10
16
16
21
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
From byte-stream to packets: TCP header
IPHeader
TCPHeader TCP data
TCP segment
16bit source port 16bit destination port
32bit sequence number
32bit acknowledgment number
4bit hlen 16bit window size
16bit TCP checksum 16bit urgent pointer
(options)
(TCP data)
reserved flags
20octets
Chop appropriate length from byte-stream buffer & add TCP header
22
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Nagle algorithm
n Q. If you added 20byte+20byte size header to 1byte data, isn’t that overhead big?
n Nagle algorithm (RFC896)¨ There is only one small segment which is unacknowledged
in the network.¨ In case of short RTT:à acceptable overhead, as LAN bandwidth is abundantà send packets with small buffering
¨ In case of long RTTà reduce overhead, due to WAN bandwidth constraints
23
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Hands-on: observe packet loss & retransmission
Within provided virtual machine,1.Observe duplicate ACK;
¨Emulate lossy network with:¨ # tc qdisc add dev eth1 root netem loss 10%
1.Observe the effect of Nagle algorithm;¨Emulate satellite network with:¨ # tc qdisc add dev eth1 root netem delay 500ms
24
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Questions?
25
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Summary thus far
n Transport layer
n The transport protocol on the internet – TCPn TCP: service model, and features
n Efficiency: ACK, piggybacking, Nagle algorithmn TCP connection establishment and releasen Diagnosis: tools + state machine knowledge
26
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Transport protocol challenges
n With a number of unknown parameters:¨ Number of active communications¨ Bottleneck bandwidth¨ Error rate
n how can we accommodate as much communications as possible, without collapsing network itself?
1 1
n k
… …
Key ideas: probing, estimation, self-policing, macro-level stability27
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Flow control and congestion control:Competition and Arbitration of Network Resource
n Flow control¨Absorb difference of transmission rate¨Recovery from sequence error¨Recovery from duplication, packets drop and bit
errorn Congestion control
¨Sharing the bandwidth while suppressing the congestion
¨Fair sharing of bandwidth
28
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Characteristics of TCP Flow Control
n End to end¨ No global assignment of resource¨ Estimate available bandwidth at individual hosts¨ Routers do not explicitly allocate resource
n Implicit signaling through packet drops
n Scalable¨ Host-based autonomous method has better scalability,
because it doesn’t need state management inside network.n Autonomous à Less state management à Scalable
29
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
TCP: key characteristics
n Very simple algorithm¨ Macroscopic self-stabilization
n TCP doesn't assume the presence of greedy nodes.¨ Eliminate global control system¨ Reject the idea of intermediate policing system
n Modest performance across almost all data-links¨ .. as opposed to optimal performance in specific condition
n Stepwise improvements over 30 years
30
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Flow Control on TCP
n Control available bandwidth¨ Sliding window
n Using sequence number, not packet number¨ Window size
n Control transmission interval of packets¨ ACK clocking
n Miscellaneous¨ Error detection with TCP checksum¨ Packet drop detection with duplicate ACK, timeout
31
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Sliding window
Packets in flight
Sent but unacknowledgedSent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
Window size Sequence number
32
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Congestion Control on TCP
n Fair-share model: fair distribution of bandwidthEnd to end
n Increase and decrease of window size¨ additive increase¨ multiplicative decrease
n AIMD is known to be self-stabilizing (R. Jain, et al., “Analysis of the increase and decrease algorithms for congestion avoidance in computer networks”, 1989)
n Switch increasing strategy of window sizen Detect congestion through packet drops
33
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Increasing Window size
n TCP switches increasing strategy of congestion window (cwnd) by slow start threshold (ssthresh)
(Outline of the algorithm)n On receiving an ACK:
if (cwnd < ssthresh) {/* slow start: exponential increase */cwnd += 1;
} else {/* congestion avoidance: additive increase
*/cwnd += 1 / cwnd;
}
34
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Increasing Window size
n Slow start¨ exponential increase
n Congestion avoidance¨ additive increase
n Effectiveness:see V. Jacobson, “Congestion Avoidance and Control”, SIGCOMM’88.
Source: TCP/IP Illustrated, Vol.1
35
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Decreasing Window size
(Outline of algorithm )n On detecting packet drop:
ssthresh = cwnd / 2;if (timeout) {
cwnd = 1;}
Source: TCP/IP Illustrated, Vol.1
36
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Hands-on: observe slow start & congestion avoidance
n Start Wiresharkn Access speed test website:
¨ http://www.speedtest6.com
n Filter the TCP stream with:¨ ip.src==27.120.100.45
n Plot the Time-Sequence Graph¨ Select one of the filtered packets¨ Go to Statistics -> TCP StreamGraph -> Time Sequence Graph
(Stevens)¨ Go to Statistics -> TCP StreamGraph -> Time Sequence Graph
(tcptrace)
37
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Questions?
38
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Packet drop
n Detection with 2 methods:¨Duplicate ACK¨Retransmission Time Out (RTO)
n How do I judge the time out?→ RTO ~ RTT Estimator
39
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
RTO Calculation
n Err = M – AA <- A + gErrD <- D + h(|Err| - D)RTO = A + 4D
¨A: smoothed RTT¨D: smoothed mean deviation¨g: gain for the average (1/8)¨h: gain for the deviation (1/4)
Source: TCP/IP Illustrated, Vol.1
40
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
RTO calculation
Source: A Quick Tour Around TCP,http://web.eecs.utk.edu/~dunigan/tcptour/
41
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
User Datagram Protocol (UDP) RFC 768
n “Unreliable Data Protocol” (J)n Connectionless transport protocol
¨ Avoid overhead and delays of ordered, reliable delivery¨ Send messages to and receive them from a socket
n Lightweight delivery service¨ IP and port numbers to support (de)multiplexing¨ Optional error checking on the packet contents
42
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Incentives on using UDP
n Finer control over what data is sent and when¨ UDP packages the data and send the packet as soon as an
application process writes into the socket
n No delay for connection establishment¨ There are no preliminaries in UDP
n No connection state¨ No allocation of buffers, parameters, sequence numbers,
etc.
n Small packet header overhead: only eight -bytes long
43
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
UDP applications
n Multimedia streaming¨ Real-time Transport Protocol (RTP): internet radio,
telephony, music-on-demand, video-on-demand¨ Retransmission of lost/corrupted packets is not worthwhile
n Simple query protocols like Domain Name System¨ Overhead of connection establishment is overkill¨ Easier to have application retransmit if needed
44
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Summary
n Transport layern The transport protocol on the internet – TCPn TCP: service model, and features
n Efficiency: ACK, piggybacking, Nagle algorithmn TCP connection establishment and releasen Flow control and congestion control in TCP
45
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Assignment 5: Observe TCP mechanisms in action
n Use iperf3 to establish TCP connections with four servers provided by our TA and capture the TCP traffic between the servers and the client.¨1) Explain details of its TCP connection.¨2) (optional) Cause delay and packet loss in the
client side by using tc command, then conduct the same investigation as above.
n Submission deadline: July 24, 17:00 JST
46
Copyright(C)2018 Youki Kadobayashi. All rights reserved.
Hands-on: observe the effect of loss/delay
n Start Wiresharkn Start iperf
¨ Install iperf3 on the VM: $ sudo apt install iperf3¨ run iperf3: $ iperf3 -c 163.221.52.133 -p 5201 -t 2
n Configure tc¨ Loss: $ sudo tc qdisc add dev eth1 root netem loss 10%
n To confirm: $ sudo tc qdisc shown To remove: $ sudo tc qdisc del dev eth1 root netem loss 10%
¨ Delay: $sudo tc qdisc add dev eth1 root netem delay 500ms
n Observe the effect of loss/delay¨ Select one of the filtered packets¨ Go to Statistics -> TCP StreamGraph -> Time Sequence Graph (Stevens)¨ Go to Statistics -> TCP StreamGraph -> Time Sequence Graph (tcptrace)
47