81
Fall 2007 CSci232: Transport Layer & TCP 1 Transport Layer Transport Layer Services connection-oriented vs. connectionless multiplexing and demultplexing UDP: Connectionless Unreliable Service TCP: Connection-Oriented Reliable Service connection management: set-up and tear down reliable data transfer protocols flow and congestion control Readings: Chapter 5

Fall 2007CSci232: Transport Layer & TCP1 Transport Layer Transport Layer Services connection-oriented vs. connectionless multiplexing and demultplexing

Embed Size (px)

Citation preview

Fall 2007 CSci232: Transport Layer & TCP 1

Transport Layer

Transport Layer Services connection-oriented vs. connectionless multiplexing and demultplexing

UDP: Connectionless Unreliable Service TCP: Connection-Oriented Reliable

Service connection management: set-up and tear down reliable data transfer protocols flow and congestion control

Readings: Chapter 5

Fall 2007 CSci232: Transport Layer & TCP 2

Transport Protocols

• Lowest level end-to-end protocol.– Header generated by

sender is interpreted only by the destination

– Routers view transport header as part of the payload

77

66

55

77

66

55

TransportTransport

IPIP

DatalinkDatalink

PhysicalPhysical

TransportTransport

IPIP

DatalinkDatalink

PhysicalPhysical

IPIP

router

22 22

11 11

Fall 2007 CSci232: Transport Layer & TCP 3

Transport Services and Protocols

• provide logical communication between app processes running on different hosts

• transport protocols run in end systems – send side: breaks app

messages into segments, passes to network layer

– rcv side: reassembles segments into messages, passes to app layer

• more than one transport protocol available to apps– Internet: TCP and UDP

application

transportnetworkdata linkphysical

application

transportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysicalnetwork

data linkphysical

logical end-end transport

Fall 2007 CSci232: Transport Layer & TCP 4

Transport Layer Services• Underlying best-effort network

– drops messages– re-orders messages– delivers duplicate copies of a given message– delivers messages after an arbitrarily long delay

• Common end-to-end services– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– allow the receiver to flow control the sender– support multiple application processes on each host

Fall 2007 CSci232: Transport Layer & TCP 5

Transport vs. Application and Network Layer• application layer:

application processes and message exchange

• network layer: logical communication between hosts

• transport layer: logical communication support for app processes – relies on, enhances,

network layer services

Household analogy:12 kids sending letters

to 12 kids• processes = kids• app messages =

letters in envelopes• hosts = houses• transport protocol =

Ann and Bill• network-layer

protocol = postal service

Fall 2007 CSci232: Transport Layer & TCP 6

End to End Issues• Transport services built on top of (potentially)

unreliable network service– packets can be corrupted or lost– Packets can be delayed or arrive “out of order”

• Do we detect and/or recover errors for apps? – Error Control & Reliable Data Transfer

• Do we provide “in-order” delivery of packets?– Connection Management & Reliable Data Transfer

• Potentially different capacity at destination, and potentially different network capacity– Flow and Congestion Control

Fall 2007 CSci232: Transport Layer & TCP 7

Internet Transport ProtocolsTCP service:• connection-oriented: setup

required between client, server

• reliable transport between sender and receiver

• flow control: sender won’t overwhelm receiver

• congestion control: throttle sender when network overloaded

UDP service:• unreliable data

transfer between sender and receiver

• does not provide: connection setup, reliability, flow control, congestion control

Both provide logical communication between app processes running on different hosts!

Fall 2007 CSci232: Transport Layer & TCP 8

Multiplexing/Demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= API (“socket”)

delivering received segmentsto correct application process

Demultiplexing at rcv host:gathering data from multipleapp processes, enveloping data with header (later used for demultiplexing)

Multiplexing at send host:

Fall 2007 CSci232: Transport Layer & TCP 9

How Demultiplexing Works

• host receives IP datagrams– each datagram has source IP

address, destination IP address– each datagram carries 1

transport-layer segment– each segment has source,

destination port number (recall: well-known port numbers for specific applications)

• host uses IP addresses & port numbers to direct segment to appropriate app process (identified by “socket’)

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

Fall 2007 CSci232: Transport Layer & TCP 10

UDP: User Datagram Protocol [RFC 768]

• “no frills,” “bare bones” Internet transport protocol

• “best effort” service, UDP segments may be:– lost– delivered out of order to

app

• connectionless:– no handshaking between

UDP sender, receiver– each UDP segment

handled independently of others

Why is there a UDP?• no connection

establishment (which can add delay)

• simple: no connection state at sender, receiver

• small segment header• no congestion control:

UDP can blast away as fast as desired

Fall 2007 CSci232: Transport Layer & TCP 11

UDP (cont’d)

• often used for streaming multimedia apps– loss tolerant– rate sensitive

• other UDP uses– DNS– SNMP

• reliable transfer over UDP: add reliability at application layer– application-specific

error recovery!

source port # dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

Fall 2007 CSci232: Transport Layer & TCP 12

UDP Checksum

Sender:• treat segment contents

as sequence of 16-bit integers

• checksum: addition (1’s complement sum) of segment contents

• sender puts checksum value (1’s complement of 1’s complement sum of 16-bit words) into UDP checksum field

Receiver:• compute checksum of

received segment• check if computed checksum

equals checksum field value:– NO - error detected– YES - no error detected. But

maybe errors nonetheless? More later ….

Goal: detect “errors” (e.g., flipped bits) in transmitted segment

Fall 2007 CSci232: Transport Layer & TCP 13

Checksum: Example

+

sum: 0100101011001011 checksum(1’s complement): 1011010100110100

verify by adding: 1111111111111111

0110011001100110

1101010101010101

0000111100001111

arrange data segmentin sequences of16-bit words

Fall 2007 CSci232: Transport Layer & TCP 14

TCP Overview• Connection-oriented• Byte-stream

– app writes bytes– TCP sends

segments– app reads bytes

Application process

Writebytes

TCP

Send buffer

Segment Segment Segment

Transmit segments

Application process

Readbytes

TCP

Receive buffer

… …

• Full duplex• Flow control: keep sender from

overrunning receiver• Congestion control: keep

sender from overrunning network

Fall 2007 CSci232: Transport Layer & TCP 15

Functionality Split

• Network provides best-effort delivery• End-systems implement many functions

– Reliability– In-order delivery– Demultiplexing– Message boundaries– Connection abstraction– Flow Control– Congestion control– …

Fall 2007 CSci232: Transport Layer & TCP 16

High-Level TCP Characteristics

• Protocol implemented entirely at the ends

– Fate sharing

• Protocol has evolved over time and will continue to do so

– Nearly impossible to change the header– Use options to add information to the header– Change processing at endpoints– Backward compatibility is what makes it TCP

Fall 2007 CSci232: Transport Layer & TCP 17

Evolution of TCP

1975 1980 1985 1990

1982TCP & IP

RFC 793 & 791

1974TCP described by

Vint Cerf and Bob KahnIn IEEE Trans Comm

1983BSD Unix 4.2

supports TCP/IP

1984Nagel’s algorithmto reduce overhead

of small packets;predicts congestion

collapse

1987Karn’s algorithmto better estimate

round-trip time

1986Congestion

collapseobserved

1988Van Jacobson’s

algorithmscongestion avoidance and congestion control(most implemented in

4.3BSD Tahoe)

19904.3BSD Renofast retransmitdelayed ACK’s

1975Three-way handshake

Raymond TomlinsonIn SIGCOMM 75

Fall 2007 CSci232: Transport Layer & TCP 18

TCP Through the 1990s

1993 1994 1996

1994ECN

(Floyd)Explicit

CongestionNotification

1993TCP Vegas

(Brakmo et al)real congestion

avoidance

1994T/TCP

(Braden)Transaction

TCP

1996SACK TCP(Floyd et al)

Selective Acknowledgement

1996Hoe

Improving TCP startup

1996FACK TCP

(Mathis et al)extension to SACK

Fall 2007 CSci232: Transport Layer & TCP 19

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

TCP Segment Header Structure

URG: urgent data (generally not used)

ACK: ACK #valid

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

PSH: push data now(generally not used)

Fall 2007 CSci232: Transport Layer & TCP 20

• Each connection identified with 4-tuple:– (SrcPort, SrcIPAddr, DstPort, DstIPAddr)

• Sliding window + flow control– acknowledgment, SequenceNum, AdvertisedWinow

• Flags– SYN, FIN, ACK, RESET, PUSH, URG

• Checksum– pseudo header (src & dst IP addresses) + TCP header +

data

TCP Segment Format (cont)

Sender

Data (SequenceNum)

Acknowledgment +AdvertisedWindow

Receiver

Fall 2007 CSci232: Transport Layer & TCP 21

TCP Connection Set Up

TCP sender, receiver establish “connection” before exchanging data segments

• initialize TCP variables:– seq. #– buffers, flow control info

• client: end host that initiates connection

• server: end host contacted by client

Three way handshake:

Step 1: client sends TCP SYN control segment to server– specifies initial seq #

Step 2: server receives SYN, replies with SYN+ACK control segment

– ACKs received SYN– specifies server receiver

initial seq. #

Step 3:client receives SYN+ACK,

replies with ACK segment (which may contain 1st data segment)

Fall 2007 CSci232: Transport Layer & TCP 22

Question:

a. What kind of “state” client and server need to maintain?

b. What initial sequence # should client (and server) use?

TCP 3-Way Hand-Shake

client

SYN, seq=x

server

SYN+ACK, seq=y, ack=x+1

ACK, seq=x+1, ack=y+1

initiate connection

connectionestablished

connection established

SYNreceived

(1st data segment)

Fall 2007 CSci232: Transport Layer & TCP 23

TCP Connection Setup Example

No. Time Source > Destination Proto SrcPort>DstPort [Flags] 1 13.734375 70.13.155.114 128.101.35.150 TCP 1414 > 22 [SYN] Seq=758244755 Len=0 MSS=1260

2 13.968750 128.101.35.150 70.13.155.114 TCP 22 > 1414 [SYN, ACK] Seq=3778406755 Ack=758244756 Win=25200 Len=0 MSS=1460

3 13.968750 70.13.155.114 128.101.35.150 TCP 1414 > 22 [ACK] Seq=758244756 Ack=3778406756 Win=16384 Len=0

Fall 2007 CSci232: Transport Layer & TCP 24

TCP Connection Setup Example

No. Time Source > Destination Proto SrcPort>DstPort [Flags] 1 13.6611233 70.13.155.114 128.101.35.204 TCP 1567 > 80 [SYN] Seq=3724852786 Len=0 MSS=1260

2 13.890625 128.101.35.204 70.13.155.114 TCP 80> 1567 [SYN, ACK] Seq=484733971 Ack=3724852787 Win=25200 Len=0 MSS=1460

3 13.890625 70.13.155.114 128.101.35.204 TCP 1567 > 80 [ACK] Seq=3724852787 Ack=484733972 Win=17640 Len=0

4 13.890625 70.13.155.114 128.101.35.204 TCP 1567 > 80 [PSH,ACK] Seq=73724852787 Ack=484733972 Win=17640 Len=564

5 14.630860 128.101.35.204 70.13.155.114 TCP 80> 1567 [ACK] Seq=484733972 Ack=3724853351 Win=25200 Len=0 MSS=1460

Fall 2007 CSci232: Transport Layer & TCP 25

3-Way Handshake: Finite State Machine

Client FSM?

info (“state”) maintained at client?

Server FSM?

closed

Upper layer: initiate connection

?

?

sent SYN w/ initial seq =x

SYNsent

connestab’ed

?

?

SYN+ACK received

sent ACK

?

?

?

?

Fall 2007 CSci232: Transport Layer & TCP 26

Connection Setup Error Scenarios

• Lost (control) packets– What happen if SYN lost? client vs. server actions– What happen if SYN+ACK lost? client vs. server

actions– What happen if ACK lost? client vs. server actions

• Duplicate (control) packets– What does server do if duplicate SYN received?– What does client do if duplicate SYN+ACK received?– What does server do if duplicate ACK received?

Fall 2007 CSci232: Transport Layer & TCP 27

Connection Setup Error Scenarios (cont’d)

• Importance of (unique) initial seq. no.?– When receiving SYN, how does server know it’s a new

connection request?– When receiving SYN+ACK, how does client know it’s a

legitimate, i.e., a response to its SYN request?

• Dealing with old duplicate packets from old connections (or from malicious users)– If not careful: “TCP Hijacking”

• How to choose unique initial seq. no.?– randomly choose a number (and add to last syn# used)

• Other security concern:– “SYN Flood” -- denial-of-service attack

Fall 2007 CSci232: Transport Layer & TCP 29

TCP State Diagram: Connection Setup

CLOSED

SYNSENT

SYNRCVD

ESTAB

LISTEN

active OPENcreate TCBSnd SYN

create TCB

passive OPEN

delete TCB

CLOSE

delete TCB

CLOSE

snd SYN

SEND

snd SYN ACKrcv SYN

Send FINCLOSE

rcv ACK of SYNSnd ACK

Rcv SYN, ACK

rcv SYN

snd ACK

Client

Server

Fall 2007 CSci232: Transport Layer & TCP 30

Client wants to close connection:

Step 1: client end system sends TCP FIN control segment to server

TCP: Closing ConnectionRemember TCP duplex connection!

client server

FIN

serverclosing

ACK

halfclosed

FINclientclosin

g

halfclosed

Step 2: server receives FIN, replies with ACK. half closed

Server finishes sending data, also ready to close:

Step 4: server sends FIN.

Step 3: client receives ACK.

half closed, wait for server to close

Fall 2007 CSci232: Transport Layer & TCP 31

Step 5: client receives FIN, replies with ACK. connection fully closed

TCP: Closing Connection (cont’d)

client

FIN

server

ACK

FIN

clientclosin

ghalf

closed

server

closing

fullclose

d

halfclose

d

ACKfullclose

dProblem Solved?

Well Done!

Step 6: server, receives ACK. connection fully closed

Fall 2007 CSci232: Transport Layer & TCP 32

Step 5: client receives FIN, replies with ACK.

– Enters “timed wait” - will respond with ACK to received FINs

TCP: Closing Connection (revised)client

FIN

server

ACK

FIN

clientclosin

ghalf

closed

server

closing

halfclose

d

Two Army Problem!

Step 6: server, receives ACK. connection fully closed

full closed

fullclose

d

ACKStep 7: client, timer expires, connection fully closed

tim

ed w

ait ACK FINX timeout

Fall 2007 CSci232: Transport Layer & TCP 33

TCP Connection Tear-Down Example

No. Time Source > Destination Proto SrcPort>DstPort [Flags]80 35.156250 70.13.155.114 128.101.35.150 TCP 1414 > 22 [PSH,ACK] Seq=758246388 Ack=3778411633 Win=15920 Len=32 81 35.156250 70.13.155.114 128.101.35.150 TCP 1414 > 22 [FIN, ACK] Seq=758246420 Ack=3778411633 Win=15920 Len=0

82 35.437500 128.101.35.150 70.13.155.114 TCP 22 > 1414 [ACK] Seq=3778411633 Ack=758246420 Win=25200 Len=0 13.968750

83 35.453125 128.101.35.150 70.13.155.114 TCP 22 > 1414 [ACK] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.96875084 35.453125 128.101.35.150 70.13.155.114 TCP 22 > 1414 [FIN,ACK] Seq=3778411633 Ack=758246421 Win=25200 Len=0 13.968750

85 35.453125 70.13.155.114 128.101.35.150 TCP 1414 > 22 [ACK] Seq=758246421 Ack=3778411634 Win=15920 Len=0

Fall 2007 CSci232: Transport Layer & TCP 34

State Diagram: Connection Tear-down

CLOSING

CLOSEWAIT

FINWAIT-1

ESTAB

TIME WAIT

snd FIN

CLOSE

send FIN

CLOSE

rcv ACK of FIN

LAST-ACK

CLOSED

FIN WAIT-2

snd ACK

rcv FIN

delete TCB

Timeout=2min

send FIN

CLOSE

send ACK

rcv FIN

snd ACK

rcv FIN

rcv ACK of FIN

snd ACK

rcv FIN+ACK

ACK

Active Close

Passive Close

Fall 2007 CSci232: Transport Layer & TCP 35

TCP Connection Management FSM

TCP clientlifecycle

TCP client lifecycle

Fall 2007 CSci232: Transport Layer & TCP 36

TCP Connection Management FSM

TCP serverlifecycle

TCP server lifecycle

Fall 2007 CSci232: Transport Layer & TCP 37

• ARQ vs. FEC– automatic retransmission request– forward error correction

• General ARQ Algorithms – Stop & Wait

• Perform issue: low utilization when delay-bw product large

– Sliding Window Protocols• Go-Back-N• Selective Repeat• Key design issues: window size vs. size of seq. no. space

Reliability and Error Recovery

Fall 2007 CSci232: Transport Layer & TCP 38

Error Recovery: Stop and Wait

Time

Packet

ACKTim

eou

t

• ARQ– Receiver sends

acknowledgement (ACK) when it receives packet

– Sender waits for ACK and timeouts if it does not arrive within some time period

• Simplest ARQ protocol• Send a packet, stop

and wait until ACK arrives

Sender Receiver

Fall 2007 CSci232: Transport Layer & TCP 39

Recovering from Error

Packet

ACK

Tim

eou

t

Packet

ACK

Tim

eou

t

Packet

Tim

eou

t

Packet

ACKT

ime

out

Time

Packet

ACK

Tim

eou

t

Packet

ACK

Tim

eou

t

ACK lost Packet lost Early timeoutDUPLICATEPACKETS!!!

Fall 2007 CSci232: Transport Layer & TCP 40

• How to recognize a duplicate• Performance

– Can only send one packet per round trip

Problems with Stop and Wait

Fall 2007 CSci232: Transport Layer & TCP 41

How to Recognize Resends?

• Use sequence numbers– both packets and acks

• Sequence # in packet is finite How big should it be? – For stop and wait?

• One bit – won’t send seq #1 until received ACK for seq #0

Pkt 0

ACK 0

Pkt 0

ACK 1

Pkt 1ACK 0

Fall 2007 CSci232: Transport Layer & TCP 42

• Can’t keep the pipe full– Utilization is low when bandwidth-delay product (R x RTT)is large!

Sender Receiver

data (L bytes)

ACK

first packet bit transmitted, t = 0

RTT

first packet bit arrives

ACK arrives, send next packet, t =

RTT + L / R

Problem with Stop & Wait Protocol

Fall 2007 CSci232: Transport Layer & TCP 43

Stop & Wait: Performance Analysis

Example: 1 Gbps connection, 15 ms end-end prop. delay, data segment size: 1 KB = 8Kb

– U sender: utilization, i.e., fraction of time sender busy sending

– 1KB data segment every 30 msec (round trip time) --> 0.027% x 1 Gbps = 33kB/sec throughput over 1 Gbps link

00027.0008.30

008.

*/

/

LRRTT

L

RLRTT

RLsenderU

ms 008.0s108

b/s 10

kb 8

bps) rate,ion (transmiss

bits)in length (packet

6

9transmit

R

LT

Moral of story: network protocol limits use of physical resources!

Fall 2007 CSci232: Transport Layer & TCP 44

How to Keep the Pipe Full?• Send multiple packets without

waiting for first to be acked– Number of pkts in flight = window

• Reliable, unordered delivery– Several parallel stop & waits– Send new packet after each ack– Sender keeps list of unack’ed packets;

resends after timeout– Receiver same as stop & wait

• How large a window is needed?– Suppose 10Mbps link, 4ms delay,

500byte pkts• 1? 10? 20?

– Round trip delay * bandwidth = capacity of pipe

Fall 2007 CSci232: Transport Layer & TCP 45

Pipelined (Sliding Window) ProtocolsPipelining: sender allows multiple, “in-flight”,

yet-to-be-acknowledged data segments– range of sequence numbers must be increased– buffering at sender and/or receiver

• Two generic forms of pipelined protocols: Go-Back-N and Selective Repeat

Fall 2007 CSci232: Transport Layer & TCP 46

Pipelining: Increased Utilization

first packet bit transmitted, t = 0

sender receiver

RTT

last bit transmitted, t = L / R

first packet bit arriveslast packet bit arrives, send ACK

ACK arrives, send next packet, t = RTT + L / R

last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK

U sender

= .024

30.008 = 0.0008

microseconds

3 * L / R

RTT + L / R =

Increase utilizationby a factor of 3!

Fall 2007 CSci232: Transport Layer & TCP 47

Sliding Window

• Reliable, ordered delivery• Receiver has to hold onto a packet until

all prior packets have arrived– Why might this be difficult for just parallel stop &

wait?– Sender must prevent buffer overflow at receiver

• Circular buffer at sender and receiver– Packets in transit buffer size – Advance when sender and receiver agree packets at

beginning have been received

Fall 2007 CSci232: Transport Layer & TCP 48

ReceiverReceiverSenderSender

Sender/Receiver State

… …

Sent & Acked Sent Not Acked

OK to Send Not Usable

… …

Max acceptable

Receiver window

Max ACK received Next seqnum

Received & Acked Acceptable Packet

Not Usable

Sender window

Next expected

Fall 2007 CSci232: Transport Layer & TCP 49

Window Sliding – Common Case

• On reception of new ACK (i.e. ACK for something that was not acked earlier)– Increase sequence of max ACK received– Send next packet

• On reception of new in-order data packet (next expected)– Hand packet to application– Send cumulative ACK – acknowledges reception of all packets up

to sequence number– Increase sequence of max acceptable packet

Fall 2007 CSci232: Transport Layer & TCP 50

Loss Recovery• On reception of out-of-order packet

– Send nothing (wait for source to timeout)– Cumulative ACK (helps source identify loss)

• Timeout (Go-Back-N recovery)– Set timer upon transmission of packet– Retransmit all unacknowledged packets

• Performance during loss recovery– No longer have an entire window in transit– Can have much more clever loss recovery

Fall 2007 CSci232: Transport Layer & TCP 51

Go-Back-N in Action

Fall 2007 CSci232: Transport Layer & TCP 52

Selective Repeat• Receiver individually acknowledges all

correctly received pkts– Buffers packets, as needed, for eventual in-order delivery

to upper layer

• Sender only resends packets for which ACK not received– Sender timer for each unACKed packet

• Sender window– N consecutive seq #’s– Again limits seq #s of sent, unACKed packets

Fall 2007 CSci232: Transport Layer & TCP 53

Selective Repeat: Sender, Receiver Windows

Fall 2007 CSci232: Transport Layer & TCP 54

Sequence Numbers

• How large does size of sequence number space need to be?– Must be able to detect wrap-around– Depends on sender/receiver window size

• E.g.– size of seq. no. space = 8, send win=recv win=7– If pkts 0..6 are sent succesfully and all acks lost

• Receiver expects 7,0..5, sender retransmits old 0..6!!!

• size of sequence no. space must be send window + recv window

Fall 2007 CSci232: Transport Layer & TCP 55

Sequence Numbers in TCP• TCP regards data as a “byte-stream”

– each byte in byte stream is numbered.• 32 bit value, wraps around• initial values selected at start up time

• TCP breaks up byte stream in packets.– Packet size is limited to the Maximum Segment Size (MSS)

• Each packet has a sequence number– seq. no of 1st byte indicates where it fits in the byte stream

• TCP connection is duplex– data in each direction has its own sequence numbers

packet 8packet 9 packet 10

13450 14950 16050 17550

Fall 2007 CSci232: Transport Layer & TCP 56

TCP Seq. #’s and ACKs

Seq. #’s:

byte stream “number”of first byte in segment’s data

ACKs:

seq # of next byte expected from other side

host ACKsreceipt

of echoed‘C’

Host A Host B

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

Usertypes

‘C’ host ACKsreceipt of

‘C’, echoesback ‘C’

simple telnet scenario

timered: A-to-B green: B-to-A

Fall 2007 CSci232: Transport Layer & TCP 57

TCP Reliable Data Transfer

• TCP creates reliable data transfer service on top of IP’s unreliable service

• Pipelined segments• Cumulative ACKs• TCP uses single

retransmission timer

• Retransmissions are triggered by:– timeout events– duplicate acks

• Initially consider simplified TCP sender:– ignore duplicate acks– ignore flow control,

congestion control

Fall 2007 CSci232: Transport Layer & TCP 58

TCP = Go-Back-N Variant• Sliding window with cumulative acks

– Receiver can only return a single “ack” sequence number to the sender.

– Acknowledges all bytes with a lower sequence number– Starting point for retransmission– Duplicate acks sent when out-of-order packet received

• But: sender only retransmits a single packet.– Reason???

• Only one that it knows is lost• Network is congested shouldn’t overload it

• Error control is based on byte sequences, not packets.– Retransmitted packet can be different from the original lost

packet – Why?

Fall 2007 CSci232: Transport Layer & TCP 59

TCP Sender Events:data rcvd from app:• Create segment with

seq #• seq # is byte-stream

number of first data byte in segment

• start timer if not already running (think of timer as for oldest unacked segment)

• expiration interval: TimeOutInterval

timeout:• retransmit segment

that caused timeout• restart timer ACK received:• If acknowledges

previously unACKed segments– update what is known to

be ACKed– start timer if there are

outstanding segments

Fall 2007 CSci232: Transport Layer & TCP 60

TCP ACK generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver Action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte

Immediate send ACK, provided thatsegment starts at lower end of gap

Fall 2007 CSci232: Transport Layer & TCP 61

TCP Flow Control

• receive side of TCP connection has a receive buffer:

• speed-matching service: matching the send rate to the receiving app’s drain rate• app process may be

slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

Fall 2007 CSci232: Transport Layer & TCP 62

TCP Flow Control: How It Works

(Suppose TCP receiver discards out-of-order segments)

• spare room in buffer= RcvWindow (dynamically

changes)

= RcvBuffer-[LastByteRcvd - LastByteRead]

• Rcvr advertises spare room by including value of RcvWindow in segments

• Sender limits unACKed data to RcvWindow– guarantees receive

buffer doesn’t overflow

Fall 2007 CSci232: Transport Layer & TCP 63

source port # dest port #

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

TCP Segment Structure

URG: urgent data (generally not used)

ACK: ACK #valid

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

PSH: push data now(generally not used)

Fall 2007 CSci232: Transport Layer & TCP 64

Triggering Transmission• How does TCP decide to transmit a

segment?– MSS (Maximum segment size)

• Set to size of the largest segment TCP can send without local IP fragmentation (MTU of directly connected)

– Sending process explicitly asked to do (Push to flush) – Firing timer

• Silly Window Syndrome– Flow control needs to be maintained– Sender can transmit full segment (MSS) when Acked

by receiver

Fall 2007 CSci232: Transport Layer & TCP 65

Silly Window Syndrome (cont’d)

– Window currently closed from receiver– ACK opens MSS/2 bytes– Should sender transmit MSS/2?

• Original TCP implementation silent• Early implementation of TCP decided to go ahead• Sender can not know when the window will open for full MSS

– If sender is aggressive, sending available window size• results Silly window syndrome• small segment size remains indefinitely

– Hence a problem when either sender transmits a small segment or receiver opens window a small amount

Fall 2007 CSci232: Transport Layer & TCP 66

Triggering Transmission (cont’d)

– Receiver may delay ACKs, but how long?– Ultimate solution lies with sender:

• When does the TCP sender decide to transmit a segment?

• Nagle’s Algorithm:– Waiting too long hurt interactive applications (Telnet)– Without waiting, risk of sending a bunch of tiny

packets (silly window syndrome)– Wait till timer expires:

• Self clocking: As long as TCP has any data in flight, sender receives an ACK which can be used to trigger transmission

• If no data in flight, immediately send the segment (setting TCP_NoDElAY option)

Fall 2007 CSci232: Transport Layer & TCP 67

TCP Round Trip Time and Timeout

Q: how to set TCP timeout value?

• longer than RTT– but RTT varies

• too short: premature timeout– unnecessary

retransmissions• too long: slow

reaction to segment loss

Q: how to estimate RTT?• SampleRTT: measured time

from segment transmission until ACK receipt– ignore retransmissions,

why?• SampleRTT will vary, want

estimated RTT “smoother”– average several recent

measurements, not just current SampleRTT

Fall 2007 CSci232: Transport Layer & TCP 68

Round-trip Time Estimation

• Wait at least one RTT before retransmitting• Importance of accurate RTT estimators:

– Low RTT estimate• unneeded retransmissions

– High RTT estimate• poor throughput

• RTT estimator must adapt to change in RTT– But not too fast, or too slow!

• Spurious timeouts– “Conservation of packets” principle – never more

than a window worth of packets in flight

Fall 2007 CSci232: Transport Layer & TCP 69

Adaptive Retransmission(Original Algorithm)

• Measure SampleRTT for each segment/ ACK pair

• Compute weighted running average of RTT– EstRTT = x EstimatedRTT + (1- x SampleRTT between 0.8 and 0.9 ( to smooth Estimated RTT)- Small indicates temp. fluctuation, a large value more

stable, may not be quick to adapt to real changes

• Set timeout based on EstRTT– TimeOut = 2 x EstRTT

Fall 2007 CSci232: Transport Layer & TCP 70

Retransmission Ambiguity

• ACK is for Original transmission but was for retransmission => Sample RTT is too large

• ACK is for retransmission but was for original => Sample RTT too small

Sender Receiver

Original transmission

ACK

Sam

pleR

TT

Retransmission

Sender Receiver

Original transmission

ACK

Sam

pleR

TT

Retransmission

Fall 2007 CSci232: Transport Layer & TCP 71

Karn/Partridge Algorithm

• Solution:• Do not sample RTT when retransmitting

– only measures sample RTT for segments sent once

• Double timeout for each retransmission– Next timeout to be twice the last timeout, rather than basing it on

the last Estimated RTT

• Karn and Patridge proposal is exponential backoff– Congestion is most likely cause of lost segments– TCP sources should not react too aggressively to a timeout– More timeouts mean more cautious the source should become

(congestion problem)

Fall 2007 CSci232: Transport Layer & TCP 72

Jacobson/ Karels Algorithm• Original computation for RTT did not take the

variance of sample RTTs into account– If variation among samples is small, Estimated RTT can be

better used without increasing the estimate twice– A large variance in the samples mean Time out values should

not be too tightly coupled to the Estimated RTT

• New Calculations for average RTT– Diff = SampleRTT - EstRTT– EstRTT = EstRTT + ( x Diff)– Dev = Dev + ( |Diff| - Dev)

• where is a fraction between 0 and 1

• Consider variance when setting timeout value– TimeOut = x EstRTT + x Dev

• where = 1 and = 4

Fall 2007 CSci232: Transport Layer & TCP 73

TCP Round Trip Time EstimationEstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

• Exponential weighted moving average• influence of past sample decreases exponentially fast• typical value: = 0.125

Setting the timeout interval• Estimted RTT plus “safety margin”

– large variation in EstimatedRTT -> larger safety margin

• “safty margin”: accommodate variations in estimatedRTT

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|(typically, = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

Fall 2007 CSci232: Transport Layer & TCP 74

Example RTT Estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Fall 2007 CSci232: Transport Layer & TCP 75

Timestamp Extension

• Used to improve timeout mechanism by more accurate measurement of RTT

• When sending a packet, insert current time into option– 4 bytes for time, 4 bytes for echo a received

timestamp

• Receiver echoes timestamp in ACK– Actually will echo whatever is in timestamp

• Removes retransmission ambiguity– Can get RTT sample on any packet

Fall 2007 CSci232: Transport Layer & TCP 76

Timer Granularity

• Many TCP implementations set RTO (Retransmission Timeout) in multiples of 200,500,1000ms

• Why?– Avoid spurious timeouts – RTTs can vary quickly due

to cross traffic– Make timers interrupts efficient

• What happens for the first couple of packets?– Pick a very conservative value (seconds)

Fall 2007 CSci232: Transport Layer & TCP 77

Important Lessons

• TCP state diagram setup/teardown

• TCP timeout calculation how is RTT estimated

• Modern TCP loss recovery– Why are timeouts bad?– How to avoid them? e.g. fast retransmit

Fall 2007 CSci232: Transport Layer & TCP 78

Fast Retransmit

• What are duplicate acks (dupacks)?– Repeated acks for the same sequence

• When can duplicate acks occur?– Loss– Packet re-ordering– Window update – advertisement of new flow control

window

• Assume re-ordering is infrequent and not of large magnitude– Use receipt of 3 or more duplicate acks as indication of loss– Don’t wait for timeout to retransmit packet

Fall 2007 CSci232: Transport Layer & TCP 79

Fast Retransmit

Time

Sequence No Duplicate Acks

RetransmissionX

Packets

Acks

Fall 2007 CSci232: Transport Layer & TCP 80

TCP (Reno variant)

Time

Sequence NoX

X

XX

Now what? - timeout

Packets

Acks

Fall 2007 CSci232: Transport Layer & TCP 81

SACK• Basic problem is that cumulative acks

provide little information• Selective acknowledgement (SACK)

essentially adds a bitmask of packets received – Implemented as a TCP option– Encoded as a set of received byte ranges (max of 4

ranges/often max of 3)

• When to retransmit?– Still need to deal with reordering wait for out of order

by 3pkts

Fall 2007 CSci232: Transport Layer & TCP 82

SACK

Time

Sequence NoX

X

XX

Now what? – sendretransmissions as soonas detected

Packets

Acks