Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
Information Network 1
TCP 1/2
Youki Kadobayashi
NAIST
Copyright(C)2013 Youki Kadobayashi. All rights reserved.
Transport layer: a birds-eye view
2
H H R R R R
IP IP IP IP IP
Transport
Routers don’t maintain per-host
state Hosts maintain state for each
transport-layer endpoint
Application
Presentation
Session
Transport
Network
Data Link
Physical
Application
Presentation
Session
Transport
Network
Data Link
Physical
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 3
Functions provided by the transport layer
Communication between processes
Identify process
Identify inter-process channel
Interface for upper layer
Connection-oriented (virtual circuit)
Connectionless (datagram)
Competition and arbitration of network resource
Flow control
Congestion control
Next lecture
P Q
(P, Q)
Copyright(C)2013 Youki Kadobayashi. All rights reserved.
Transport protocols in the Internet protocol suite
TCP (RFC793)
Transmission Control Protocol
Connection-oriented
Multiple functions
for reliability
SCTP (RFC4960)
UDP (RFC768)
User Datagram Protocol
Connectionless
IP & Process identification
Packets can be lost
DCCP (RFC4340)
4
For self study
Advanced topic
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 5
Identifying process and connection in TCP
Unique identification of process
(IP, port)
Unique identification of TCP connection (source IP, source port, destination IP, destination port)
1040
80
22
2137
(203.178.136.36, 22)
163.221.52.100 203.178.136.36
(163.221.52.100, 1040)
connection
connection
process
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 6
TCP service model (1)
Connection-oriented
Virtual Circuit
Looks like a circuit, but virtual: no physical wires between
specific endpoints
Adapts to speed
Adapts to speed of intermediate networks as well as the
speed of endpoints
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 7
Establishing a TCP connection
3-way handshake
SYN, SYN-ACK, ACK
Ensure full-duplex communication
16bit source port 16bit destination port
32bit sequence number
32bit acknowledgment number
4bit hlen 16bit window size
16bit TCP checksum 16bit urgent pointer
reserved flags
URG ACK PSH RST SYN FIN
SYN
SYN-ACK
ACK
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 8
Establishing TCP connection:
a concrete example with Wireshark
# tshark -i en0 -n -f 'port 80'
tcpdump: listening on de0
Capturing on en0
0.164720 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=16 TSval=1135377929 TSecr=0 SACK_PERM=1
0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1
0.195322 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=1 Ack=1 Win=131760 Len=0 TSval=1135377958 TSecr=190559467
Reply with Sequence number + 1 as an ACK implies acknowledgment
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 9
Meanings of Wireshark output
time source IP -> dest IP TCP source port > dest port [ flags ] Seq=n Ack=n …
0.195154 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [SYN, ACK] Seq=0 Ack=1 Win=50137 Len=0 TSval=190559467 TSecr=1135377929 MSS=1460 WS=8 SACK_PERM=1
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 10
Virtual Circuit(2):TCP connection release
FIN
Ack of FIN
FIN
Ack of FIN
close
close
5.749366 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [FIN, ACK] Seq=659 Ack=2449
Win=401096 Len=0 TSval=190560021 TSecr=1135378482
5.749464 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [ACK] Seq=2449 Ack=660
Win=131104 Len=0 TSval=1135383482 TSecr=190560021
5.749650 10.0.1.148 -> 163.221.8.221 TCP 63428 > 80 [FIN, ACK] Seq=2449 Ack=660
Win=131104 Len=0 TSval=1135383482 TSecr=190560021
5.765279 163.221.8.221 -> 10.0.1.148 TCP 80 > 63428 [ACK] Seq=660 Ack=2450
Win=401096 Len=0 TSval=190560024 TSecr=1135383482
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 11
TCP connection reset
RST
Abortive release
Nonexistent port (e.g., dead process)
telnet to sh.naist.jp, port 80 will result in connection reset: 0.000000 10.0.1.148 -> 163.221.10.10 TCP 63436 > 80 [SYN] Seq=0 Win=65535
Len=0 MSS=1460 WS=16 TSval=1137289324 TSecr=0 SACK_PERM=1
0.018841 163.221.10.10 -> 10.0.1.148 TCP 80 > 63436 [RST, ACK] Seq=1
Ack=1 Win=0 Len=0
If you kill “Dropbox”: 55.992214 10.0.1.148 -> 108.160.163.51 TCP 62991 > 80 [RST] Seq=310 Win=0
Len=0
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 12
Virtual Circuit Summary:
TCP state transition diagram
State transition:
trigger / response
Copyright(C)2013 Youki Kadobayashi. All rights reserved.
$ netstat Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 45.1.20.101.57456 74.125.235.138.http SYN_SENT tcp4 0 0 45.1.20.101.57455 ey-in-f101.1e100.http ESTABLISHED tcp4 0 0 45.1.20.101.57454 74.125.235.148.http ESTABLISHED
Troubleshooting TCP with state machine
netstat
13
SYN sent, awaiting SYN+ACK response
Many other open-source tools are available:
lsof, trpt, tcptraceroute, tcptrace, tcpflow, etc.
Try some of these tools in the provided VM image.
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 14
Hands on: observe connection setup/release
In the provided virtual machine,
1.Observe connection setup/release;
e.g., by using browser within VM
2.Observe connection reset by terminating program;
e.g., by skill firefox
3.Observe connection reset by connecting to nonexistent
port on the working machine.
e.g., by telnet sh.naist.jp 80
Try many tools: wireshark, tshark, tshark -V
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 15
Buffered transfer: adapts to varying speed of
endpoints, as well as intermediate networks
TCP connection
send
buffer
recv
buffer
send
buffer
recv
buffer
Process
OS kernel
block/unblock Read() Write()
Process
Read() Write()
OS kernel
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 16
TCP service model (2)
Byte-stream service
Upper layer can’t see boundaries between packets
no boundary: structuring (framing) is needed at upper layer
Full duplex
independent two streams in single connection
Reliable
masks packet reordering, duplication, discard and bit error
O L L E H O L L E H TCP being viewed as
byte-stream service OK OK
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 17
How does TCP implement reliable stream service?
ACK: Acknowledgment
Active acknowledgment Explicitly acknowledge the receipt of packet
Duplicate ACK Implicitly communicate the packet loss information
Timeout and Retransmission
Whenever sender doesn’t receive ACK after fixed time, a “Timeout” event is triggered
Retransmission: assuming that transmission has failed Exponential back-off: 3, 6, 12, …
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 18
ACK: Acknowledgment
Packets in transit
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 19
Piggybacking: Exploiting full-duplex channel
Packets in transit
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Receiver
Sender
Nara Institute of Science and Technology
Nara Insti
Graduate School of Information Science
Graduate S
User data arrives
Note: rough sketch
Sent but unacknowledged Sent and acknowledged
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 20
Duplicate ACK: implicit communication of
packet loss
Packets in transit
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Institute o
Packet loss
10
16
16
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 21
From byte-stream to packets: TCP header
IP
Header
TCP
Header TCP data
TCP segment
16bit source port 16bit destination port
32bit sequence number
32bit acknowledgment number
4bit hlen 16bit window size
16bit TCP checksum 16bit urgent pointer
(options)
(TCP data)
reserved flags
20 octe
ts
Chop appropriate length from byte-stream buffer & add TCP header
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 22
Nagle algorithm
Q. If you added 20byte+20byte size header to 1byte data, isn’t
that overhead big?
Nagle algorithm (RFC896)
There is only one small segment which is unacknowledged
in the network.
In case of short RTT:
acceptable overhead, as LAN bandwidth is abundant
send packets with small buffering
In case of long RTT
reduce overhead, due to WAN bandwidth constraints
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 23
Hands on: observe packet loss & retransmission
Within provided virtual machine,
1.Observe duplicate ACK;
Emulate lossy network with:
# tc qdisc change dev eth0 root netem loss 10%
1.Observe the effect of Nagle algorithm;
Emulate satellite network with:
# tc qdisc add dev eth0 root netem delay 500ms
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 24
Questions?
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 25
Summary thus far
Transport layer
The transport protocol on the internet – TCP
TCP: service model, and features
Efficiency: ACK, piggybacking, Nagle algorithm
TCP connection establishment and release
Diagnosis: tools + state machine knowledge
Copyright(C)2013 Youki Kadobayashi. All rights reserved.
Transport protocol challenges
With a number of unknown parameters:
Number of active communications
Bottleneck bandwidth
Error rate
how can we accommodate as much communications as possible, without collapsing network itself?
26
1 1
n k
…
…
Key ideas: probing, estimation, self-policing, macro-level stability
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 27
Flow control and congestion control: Competition and Arbitration of Network Resource
Flow control
Absorb difference of transmission rate
Recovery from sequence error
Recovery from duplication, packets drop and bit
error
Congestion control
Sharing the bandwidth while suppressing the
congestion
Fair sharing of bandwidth
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 28
Characteristics of TCP Flow Control
End to end
No global assignment of resource
Estimate available bandwidth at individual hosts
Routers do not explicitly allocate resource Implicit signaling through packet drops
Scalable
Host-based autonomous method has better scalability, because it doesn’t need state management inside network.
Autonomous Less state management Scalable
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 29
TCP: key characteristics
Very simple algorithm
Macroscopic self-stabilization
TCP doesn't assume the presence of greedy nodes.
Elimination of global control system
Reject the idea of intermediate policing system
Modest performance across almost all data-links
As opposed to optimal performance in specific condition
Stepwise improvements over 30 years
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 30
Flow Control on TCP
Control available bandwidth
Sliding window
Using sequence number, not packet number
Window size
Control transmission interval of packets
ACK clocking
Miscellaneous
Error detection with TCP checksum
Packet drop detection with duplicate ACK, timeout
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 31
Sliding window
Packets in flight
Sent but unacknowledged Sent and acknowledged
User data arrives
Sender
Receiver
Nara Institute of Science and Technology
Nara Insti
10
16
Window size Sequence number
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 32
Congestion Control on TCP
Fair-share model: fair distribution of bandwidth
End to end
Increase and decrease of window size
additive increase
multiplicative decrease AIMD is known to be self-stabilizing (R. Jain, et al., “Analysis of the
increase and decrease algorithms for congestion avoidance in computer networks”, 1989)
Switch increasing strategy of window size
Detect congestion through packet drops
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 33
Increasing Window size
TCP switches increasing strategy of congestion window (cwnd) by slow start threshold (ssthresh)
(Outline of the algorithm)
On receiving an ACK: if (cwnd < ssthresh) { /* slow start: exponential increase */ cwnd += 1; } else { /* congestion avoidance: additive increase */ cwnd += 1 / cwnd; }
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 34
Increasing Window size
Slow start
exponential increase
Congestion avoidance
additive increase
Effectiveness: see V. Jacobson, “Congestion Avoidance and Control”, SIGCOMM’88.
Source: TCP/IP Illustrated, Vol.1
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 35
Decreasing Window size
(Outline of algorithm )
On detecting packet drop:
ssthresh = cwnd / 2;
if (timeout) {
cwnd = 1;
}
Source: TCP/IP Illustrated, Vol.1
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 36
Questions?
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 37
Packet drop
Detection with 2 methods:
Duplicate ACK
Retransmission Time Out (RTO)
How do I judge the time out?
→ RTO ~ RTT Estimator
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 38
RTO Calculation
Err = M – A A <- A + gErr D <- D + h(|Err| - D) RTO = A + 4D
A: smoothed RTT
D: smoothed mean deviation
g: gain for the average (1/8)
h: gain for the deviation (1/4)
Source: TCP/IP Illustrated, Vol.1
Copyright(C)2013 Youki Kadobayashi. All rights reserved.
RTO calculation
39
Source: A Quick Tour Around TCP,
http://web.eecs.utk.edu/~dunigan/tcptour/
Copyright(C)2013 Youki Kadobayashi. All rights reserved. 40
Summary
Transport layer
The transport protocol on the internet – TCP
TCP: service model, and features
Efficiency: ACK, piggybacking, Nagle algorithm
TCP connection establishment and release
Flow control and congestion control in TCP