4-1 Lecture 04: Transport Layer r Transport layer protocols in the Internet: m UDP: connectionless...

Preview:

Citation preview

4-1

Lecture 04: Transport Layer

Transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport TCP congestion control

link

session

path

name

address

Provides end-to-end connectivity, but not necessarily good performance

Internet transport-layer protocols reliable, in-order delivery (TCP) congestion control flow control connection setup

unreliable, unordered delivery: UDP no-frills extension of “best-effort” IP

services not available: delay guarantees bandwidth guarantees

application

transport

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

application

transport

networkdata link

physical

logical end-end transport

Two Basic Transport Features Demultiplexing: port numbers

Error detection: checksums

Web server(port 80)

Client host

Server host 128.2.194.242

Echo server(port 7)

Service request for128.2.194.242:80

(i.e., the Web server)OSClient

IP payload

detect corruption

User Datagram Protocol (UDP) Datagram messaging service

Demultiplexing: port numbers Detecting corruption: checksum

Lightweight communication between processes Send and receive messages Avoid overhead of ordered, reliable delivery

SRC port

DST port

checksum length

DATA

Advantages of UDP

Fine-grain control UDP sends as soon as the application writes

No connection set-up delay UDP sends without establishing a connection

No connection state No buffers, parameters, sequence #s, etc.

Small header overhead UDP header is only eight-bytes long

Popular Applications That Use UDP Multimedia streaming

Retransmitting packets is not always worthwhile

E.g., phone calls, video conferencing, gaming, IPTV

Simple query-response protocols Overhead of connection establishment is overkill

E.g., Domain Name System (DNS), DHCP, etc.

“Address for www.cnn.com?”

“12.3.4.15”

Transmission Control Protocol (TCP) Stream-of-bytes service Sends and receives a stream of bytes

Reliable, in-order delivery Corruption: checksums Detect loss/reordering: sequence numbers

Reliable delivery: acknowledgments and retransmissions

Connection oriented Explicit set-up and tear-down of TCP connection

Flow control Prevent overflow of the receiver’s buffer space

Congestion control Adapt to network congestion for the greater good

Breaking a Stream of Bytes into TCP Segments

TCP “Stream of Bytes” Service

By te 0

By te 1

By te 2

By te 3

By te 0

By te 1

By te 2

By te 3

Host A

Host B

By te 8 0

By te 8 0

…Emulated Using TCP “Segments”

By te 0

By te 1

By te 2

By te 3

By te 0

By te 1

By te 2

By te 3

Host A

Host B

By te 8 0

TCP Data

TCP Data

By te 8 0

Segment sent when:1. Segment full (Max Segment Size),2. Not full, but times out, or3. “Pushed” by application.

TCP Segment IP packet

No bigger than Maximum Transmission Unit (MTU)

E.g., up to 1500 bytes on an Ethernet link TCP packet

IP packet with a TCP header and data inside TCP header is typically 20 bytes long

TCP segment No more than Maximum Segment Size (MSS) bytes

E.g., up to 1460 consecutive bytes from the stream

IP HdrIP Data

TCP HdrTCP Data (segment)

Sequence NumberHost A

Host B

TCP Data

TCP Data

ISN (initial sequence number)

Sequence number = 1st

byte

By te 8 1

Reliable Delivery on a Lossy Channel With Bit Errors

Challenges of Reliable Data Transfer

Over a perfectly reliable channel Easy: sender sends, and receiver receives

Over a channel with bit errors Receiver detects errors and requests retransmission

Over a lossy channel with bit errors Some data are missing, and others corrupted Receiver cannot always detect loss

Over a channel that may reorder packets Receiver cannot distinguish loss from out-of-order

An Analogy

Alice and Bob are talking What if Bob couldn’t understand Alice? Bob asks Alice to repeat what she said

What if Bob hasn’t heard Alice for a while? Is Alice just being quiet? Has she lost reception?

How long should Bob just keep on talking? Maybe Alice should periodically say “uh huh”

… or Bob should ask “Can you hear me now?”

Take-Aways from the Example Acknowledgments from receiver

Positive: “okay” or “uh huh” or “ACK” Negative: “please repeat that” or “NACK”

Retransmission by the sender After not receiving an “ACK” After receiving a “NACK”

Timeout by the sender (“stop and wait”) Don’t wait forever without some acknowledgment

TCP Support for Reliable Delivery Detect bit errors: checksum

Used to detect corrupted data at the receiver …leading the receiver to drop the packet

Detect missing data: sequence number Used to detect a gap in the stream of bytes ... and for putting the data back in order

Recover from lost data: retransmission Sender retransmits lost or corrupted data Two main ways to detect lost packets

TCP AcknowledgmentsHost A

Host B

TCP Data

TCP Data

ISN (initial sequence number)

Sequence number = 1st byte

ACK sequence number = next expected byte

Automatic Repeat reQuest (ARQ)

ACK and timeouts Receiver sends ACK when it receives packet

Sender waits for ACK and times out

Simplest ARQ protocol Stop and wait Send a packet, stop and wait until ACK arrives

Time

Packet

ACKTim

eou

t

Sender Receiver

Flow Control:TCP Sliding Window

22

Motivation for Sliding Window Stop-and-wait is inefficient

Only one TCP segment is “in flight” at a time

Especially bad for high “delay-bandwidth product”

delay

bandwidth

Numerical Example 1.5 Mbps link with 45 msec round-trip time (RTT) Delay-bandwidth product is 67.5 Kbits (or 8 KBytes)

Sender can send at most one packet per RTT Assuming a segment size of 1 KB (8 Kbits) 8 Kbits/segment at 45 msec/segment 182 Kbps

That’s just one-eighth of the 1.5 Mbps link capacity

3-24

Pipelined protocols

Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged packets range of sequence numbers must be increased buffering at sender and/or receiver

Pipelined protocols: concurrent logical channels, sliding window protocol

3-25

Sliding Window Protocol Consider an infinite array, Source, at the sender, and an infinite array, Sink, at the receiver.

0 1 2 a–1 a s–1 s

send window

acknowledged unacknowledged

0 1 2 r

received

delivered receive window

r + RW – 1

Source:

Sink:

P1Sender

P2Receiver

next expected

RW receive window sizeSW send window size (s - a SW)

3-26

Sliding Windows in Action Data unit r has just been received by P2

Receive window slides forward P2 sends cumulative ack with sequence number it expects to receive next (r+3)

unacknowledged

0 1 2 a–1 a s–1 s

send window

acknowledged

Source:

P1Sender

0 1 2 r

delivered receive window

r + RW – 1Sink:

P2Receiver

next expected

r+3

3-27

Sliding Windows in Action P1 has just received cumulative ack with r+3 as next expected sequence number Send window slides forward

0 1 2 a–1 a s–1 s

send window

acknowledged

0 1 2 r

delivered receive window

r + RW – 1

Source:

Sink:

P1Sender

P2Receiver

next expected

3-28

Sliding Window protocol Functions provided

error control (reliable delivery) in-order delivery flow and congestion control (by varying send window size)

TCP uses only cumulative acks Other kinds of acks

selective nack selective ack (TCP SACK) bit-vector representing entire state of receive window (in addition to first sequence number of window)

3-29

Sliding Window Protocol

At the sender, a will be pointed to by SendBase, and s by NextSeqNum

0 1 2 a–1 a s–1 s

send window

acknowledged unacknowledged

0 1 2 r

received

delivered receive window

r + RW – 1

Source:

Sink:

P1Sender

P2Receiver

next expected

RW receive window sizeSW send window size (s - a SW)

3-30

TCP Flow Controlreceiver: explicitly

informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP segment

sender: keeps amount of transmitted, unACKed data less than most recently received RcvWindow value

sender won’t overrun

receiver’s buffers by

transmitting too much,

too fast

flow control

buffer at receive side of a TCP connection

Optimizing Retransmissions

Reasons for Retransmission

Packet

ACK

Tim

eou

t

Packet

ACK

Tim

eou

t

Packet

Tim

eou

t

Packet

ACK

Tim

eou

t

Packet

ACK

Tim

eou

tPacket

ACK

Tim

eou

t

ACK lostDUPLICATE

PACKET

Packet lost Early timeoutDUPLICATEPACKETS

How Long Should Sender Wait? Sender sets a timeout to wait for an ACK

Too short: wasted retransmissions Too long: excessive delays when packet lost

TCP sets timeout as a function of the RTT Expect ACK to arrive after an “round-trip time”

… plus a fudge factor to account for queuing But, how does the sender know the RTT?

Running average of delay to receive an ACK

TCP Round Trip Time and TimeoutQ: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT

TCP Round Trip Time and TimeoutEstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially fast

typical value: = 0.125

Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

3-37

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

timepremature timeout scenario

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=92 timeout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

losstimeout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

Seq=92 timeout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

3-38

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

timeout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Fast Retransmit

Time-out period often relatively long: long delay before resending lost packet

Detect lost segments via duplicate ACKs. Sender often sends many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend segment before timer expires

Host A

timeout

Host B

time

X

resend 2nd segment

Figure 3.37 Resending a segment after triple duplicate ACK

3-41

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there remains a not-yet-acknowledged segment) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y

reset timer for y }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Effectiveness of Fast Retransmit

When does Fast Retransmit work best? High likelihood of many packets in flight Long data transfers, large window size, …

Implications for Web traffic Most Web transfers are short (e.g., 10 packets)•So, often there aren’t many packets in flight

Making fast retransmit is less likely to “kick in”•Forcing users to click “reload” more often…

Starting and Ending a Connection:TCP Handshakes

Establishing a TCP Connection

Three-way handshake to establish connection Host A sends a SYN (open) to the host B Host B returns a SYN acknowledgment (SYN ACK) Host A sends an ACK to acknowledge the SYN ACK

SYN

SYN ACK

ACKData

A B

Data

Each host tells its ISN to the other host.

What if the SYN Packet Gets Lost? Suppose the SYN packet gets lost

Packet is lost inside the network, or Server rejects the packet (e.g., listen queue is full)

Eventually, no SYN-ACK arrives Sender sets a timer and wait for the SYN-ACK … and retransmits the SYN if needed

How should the TCP sender set the timer? Sender has no idea how far away the receiver is

Some TCPs use a default of 3 or 6 seconds

SYN Loss and Web Downloads User clicks on a hypertext link

Browser creates a socket and does a “connect” The “connect” triggers the OS to transmit a SYN

If the SYN is lost… The 3-6 seconds of delay is very long The impatient user may click “reload”

User triggers an “abort” of the “connect” Browser “connects” on a new socket Essentially, forces a fast send of a new SYN!

Lecture 04: Transport Layer

Transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport TCP congestion control

Principles of Congestion Control

Congestion: informally: “too many sources sending too much data too fast for network to handle”

different from flow control! manifestations:

lost packets (buffer overflow at routers)

long delays (queueing in router buffers)

a top-10 problem!

Receiver Window vs. Congestion Window Flow control

Keep a fast sender from overwhelming a slow receiver

Congestion control Keep a set of senders from overloading the network

Different concepts, but similar mechanisms TCP flow control: receiver window TCP congestion control: congestion window Sender TCP window =

min { congestion window, receiver window }

How it Looks to the End Host Delay: Packet experiences high delay

Loss: Packet gets dropped along path

How does TCP sender learn this? Delay: Round-trip time estimate Loss: Timeout and/or duplicate acknowledgments✗

Congestion Collapse

Easily leads to congestion collapse Senders retransmit the lost packets Leading to even greater load … and even more packet loss

Load

Goodput “congestioncollapse”

Increase in load that results in a decrease in

useful work done.

Approaches towards congestion control

End-to-end congestion control:

no explicit feedback from network

congestion inferred from end-system’s observed loss and/or delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit sending rate for sender

TCP Congestion control

end-to-end control (no network assistance)

Tradeoff Pro: avoids needing explicit network feedback

Con: continually under- and over-shoots “right” rate

TCP Congestion control

Each TCP sender maintains a congestion windowMax number of bytes to have in transit (not yet ACK’d)

Adapting the congestion window Decrease upon losing a packet: backing off

Increase upon success: optimistically exploring

Always struggling to find right transfer rate

TCP Congestion Control

How does sender determine CongWin? loss event = timeout or 3 duplicate acks TCP sender reduces CongWin after loss event

three mechanisms: slow start AIMD reduce to 1 segment after timeout event

TCP Slow Start

Probing for usable bandwidth

When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps

available bandwidth may be >> MSS/RTT desirable to quickly ramp up to a higher rate

TCP Slow Start (more)

When connection begins, increase rate exponentially until first loss event or “threshold” double CongWin every RTT

done by incrementing CongWin by 1 MSS for every ACK received

Summary: initial rate is slow but ramps up exponentially fast

Host A

one segment

RTT

Host B

time

two segments

four segments

Congestion avoidance state & responses to loss eventsQ: If no loss, when

should the exponential increase switch to linear?

A: When CongWin gets to current value of threshold

Implementation: For initial slow start,

threshold is set to a very large value (e.g., 65 Kbytes)

At loss event, threshold is set to 1/2 of CongWin just before loss event

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transmission round

con

ges

tio

n w

ind

ow

siz

e (s

egm

ents

)

Series1 Series2

threshold

TCP Tahoe

TCP

Reno

Rationale for Reno’s Fast Recovery

After 3 dup ACKs: CongWin is cut in half window then grows linearly

But after timeout event: CongWin is set to 1 MSS instead;

window then grows exponentially to a threshold, then grows linearly

3 dup ACKs indicates network capable of delivering some segments timeout occurring before 3 dup ACKs is “more alarming”

Summary: TCP Congestion Control When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.

When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.

When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.

When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.

AIMD in steady state

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

multiplicative decrease: cut CongWin in half after loss event (3 dup acks)

additive increase: increase CongWin by 1 MSS every RTT in the absence of any loss event: probing

Long-lived TCP connection

Why is TCP fair?

Two competing sessions:

R

R

equal window size

Connection 1 window sizeConnection 2 window size

congestion avoidance: additive increase

loss: decrease window by factor of 2congestion avoidance: additive

increase

loss: decrease window by factor of 2

Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K (AIMD only provides convergence to same window size, not necessarily same throughput rate)

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

TCP Fairness

Fairness (more)

Fairness and UDP Multimedia apps often do not use TCP

do not want rate throttled by congestion control

Instead use UDP: pump audio/video at constant rate, tolerate packet loss

TCP-friendly congestion control for apps that prefer UDP, e.g., Datagram Congestion Control Protocol (DCCP)

End of Lecture04

Recommended