59
History and Internals History and Internals of TCP/IP of TCP/IP Andrew Tucker Andrew Tucker February 15, 2000 February 15, 2000

History and Internals of TCP/IP Andrew Tucker February 15, 2000

Embed Size (px)

Citation preview

History and Internals of History and Internals of TCP/IPTCP/IP

Andrew TuckerAndrew Tucker

February 15, 2000February 15, 2000

What We’ll CoverWhat We’ll Cover

Big picture of network protocolsBig picture of network protocols Where TCP/IP lives in the network layer modelWhere TCP/IP lives in the network layer model Protocols that utilize TCP/IPProtocols that utilize TCP/IP Under the hood of IPUnder the hood of IP

• Addressing and RoutingAddressing and Routing Under the hood of TCP (and UDP)Under the hood of TCP (and UDP)

• Ensuring reliable deliveryEnsuring reliable delivery Weaknesses of TCP/IPWeaknesses of TCP/IP Resources for more infoResources for more info

What We’ll CoverWhat We’ll Cover

All topics should be considered All topics should be considered overviewsoverviews

References for more depth on each References for more depth on each subject will be given at the endsubject will be given at the end

Programming with sockets will be Programming with sockets will be covered in next sessioncovered in next session

Feel free to interrupt with questions Feel free to interrupt with questions at any timeat any time

TCP/IP in the Big PictureTCP/IP in the Big Picture

What is TCP/IP?What is TCP/IP?

Set of protocols that are used for Set of protocols that are used for communication across a networkcommunication across a network

TCP/IPTCP/IP = Transmission Control = Transmission Control Protocol / Internet ProtocolProtocol / Internet Protocol

UDPUDP = User Datagram Protocol = User Datagram Protocol Standard method for transferring Standard method for transferring

data and information on the data and information on the InternetInternet

What is a protocol?What is a protocol?

Definition: Definition: A set of rules that regulate the way data is transmitted between computers.

An infinite amount of ways to An infinite amount of ways to realize this abstract notion - so realize this abstract notion - so why did the Internet standardize why did the Internet standardize on TCP/IP?on TCP/IP?

Why TCP/IP?Why TCP/IP?

‘‘cuz Uncle Sam said so!cuz Uncle Sam said so! Originally a set of conventions Originally a set of conventions

developed by the DOD and DARPA in developed by the DOD and DARPA in 1969, formalized into TCP/IP in 1980s1969, formalized into TCP/IP in 1980s

Original ideas attributed to Vinton Cerf Original ideas attributed to Vinton Cerf and Robert Kahnand Robert Kahn

Gained popularity in the user Gained popularity in the user community because of inclusion in v4.2 community because of inclusion in v4.2 of BSD UNIXof BSD UNIX

Why TCP/IP?Why TCP/IP?

DARPA network was the early precursor DARPA network was the early precursor of the Internet of the Internet

If you wanted to talk on the DARPANET If you wanted to talk on the DARPANET you needed to speak TCP/IPyou needed to speak TCP/IP

TCP/IP was designed well enough to TCP/IP was designed well enough to scale to the Internet*scale to the Internet*

* - until recently...* - until recently...

Why TCP/IP?Why TCP/IP?

Three Main Goals:Three Main Goals:• InteroperabilityInteroperability - communicate - communicate

between heterogeneous hardware between heterogeneous hardware and OSand OS

• RobustnessRobustness - reliability and - reliability and performanceperformance

• Ease of ReconfigurationEase of Reconfiguration - add and - add and remove computers without disruptionremove computers without disruption

ISO OSI 7-layer modelISO OSI 7-layer model

ISO developed the 7-layer Open ISO developed the 7-layer Open Systems Interconnect (OSI) model Systems Interconnect (OSI) model independent of TCP/IP in the 1970sindependent of TCP/IP in the 1970s

Allows each layer of a protocol to Allows each layer of a protocol to be changed without affecting be changed without affecting layers above or belowlayers above or below

ApplicationPresentationSessionTransportNetworkData LinkPhysical

Layer 7: interfaces with end user

Layer 6: data format conversion

Layer 5: establishes node connection

Layer 4: ensures delivery and correctness

Layer 3: routing and addressing

Layer 2: interface for physical line (NIC)

Layer 1: actual transmission line or “bit pipe”

ISO OSI 7-layer modelISO OSI 7-layer model

Modified Conceptual 5 Modified Conceptual 5 Layer ModelLayer Model

Top three layers ISO OSI model Top three layers ISO OSI model don’t relate well to Internet don’t relate well to Internet protocols using TCP/IPprotocols using TCP/IP

Conceptually it helps to think Conceptually it helps to think about a 5 layer model for the about a 5 layer model for the Internet and TCP/IPInternet and TCP/IP

ApplicationPresentationSessionTransportNetworkData LinkPhysical

Modified 5 Layer Modified 5 Layer Conceptual ModelConceptual Model

Application

TransportNetworkData LinkPhysical

TCP/IP In the 5 Layer TCP/IP In the 5 Layer ModelModel

TCPTCP handles the transport layer handles the transport layer and guarantees data delivery and and guarantees data delivery and correctnesscorrectness

UDPUDP is a TCP replacement that is a TCP replacement that doesn’t guarantee deliverydoesn’t guarantee delivery

IPIP lives in the network layer and lives in the network layer and handles routing and addressinghandles routing and addressing

TCP/IP In the 5 Layer TCP/IP In the 5 Layer ModelModel

Application

Transport: TCP, UDP

Network: IP, ICMP, IGMP

Data Link: LLC, MAC

Physical: Ethernet, Token Ring, PPP

Stream Connection Connectionless DatagramSockets API

Protocols Built on TCP/IPProtocols Built on TCP/IP

IP

TCP UDP

FTP HTTP NNTP TFTP DNS

Telnet SMTP

TCP/IP InternalsTCP/IP Internals

IP InternalsIP Internals

Current version in widespread use Current version in widespread use is IPv4is IPv4

Each node in an internet has a 32-Each node in an internet has a 32-bit IP address such as 10.0.3.172bit IP address such as 10.0.3.172

IP knows nothing of text names like IP knows nothing of text names like www.bsquare.com - they are www.bsquare.com - they are translated to the numeric form by translated to the numeric form by DNSDNS

IP InternalsIP Internals

IP addresses are split into two parts:IP addresses are split into two parts:• networknetwork - same for all hosts on the - same for all hosts on the

same networksame network• hosthost - identifies a specific host within a - identifies a specific host within a

networknetwork The number of bits that represent The number of bits that represent

the network and host vary by the the network and host vary by the address “class”address “class”

IP InternalsIP Internals

0 Network Host Class A

1 0 Network Host Class B

1 1 0 Network Host Class C

7 24

14 16

21 8

IP Internals IP Internals

Original idea was to have a small Original idea was to have a small number of WANs (class A), modest number of WANs (class A), modest number of campus size networks number of campus size networks (class B) and a large number of LANs (class B) and a large number of LANs (class C)(class C)

Explosion of the Internet has changed Explosion of the Internet has changed this - many clever interpretations of IP this - many clever interpretations of IP addresses have been invented to addresses have been invented to stretch the limitstretch the limit

IP InternalsIP Internals

IP routes information across a IP routes information across a network via “packet switching” (as network via “packet switching” (as opposed to circuit switching)opposed to circuit switching)

Each packet is transmitted as a Each packet is transmitted as a separate entityseparate entity

Different packets can take Different packets can take different routes and can arrive in different routes and can arrive in different order than they were sentdifferent order than they were sent

IP InternalsIP Internals

Packets are sent as datagrams, so Packets are sent as datagrams, so delivery isn’t guaranteeddelivery isn’t guaranteed

Each packet has an IP header that Each packet has an IP header that contains source and destination contains source and destination address, data and header length, etcaddress, data and header length, etc

Packets are routed based on the Packets are routed based on the network specified in the destination network specified in the destination addressaddress

IP InternalsIP Internals

If the source and destination If the source and destination address are on the same network address are on the same network life is simple (e.g. Ethernet uses life is simple (e.g. Ethernet uses ARP to get the MAC address)ARP to get the MAC address)

If the source and destination If the source and destination address are on different networks address are on different networks it is more complicated...it is more complicated...

IP InternalsIP Internals

Special nodes called “gateways” Special nodes called “gateways” connect networks connect networks

Gateways have tables that map Gateways have tables that map network numbers to gateway addressesnetwork numbers to gateway addresses

Datagrams are forwarded to the Datagrams are forwarded to the gateway corresponding to their gateway corresponding to their destination network numberdestination network number

What if there is no gateway available?What if there is no gateway available?

IP InternalsIP Internals

Default gateways are used if no Default gateways are used if no mapping is presentmapping is present

Once a mapping is found the Once a mapping is found the sender is notified of the correct sender is notified of the correct gateway mapping (via ICMP)gateway mapping (via ICMP)

Over time, routers build up a Over time, routers build up a mapping table based on ICMP mapping table based on ICMP notificationsnotifications

IP InternalsIP Internals

A simple routing example via TraceRoute:A simple routing example via TraceRoute: 1 www.worldaccessnet.com (206.190.139.3)1 www.worldaccessnet.com (206.190.139.3)

2 2 worldaccessnet-2t1-ltipdxbackbone.ltinet.net worldaccessnet-2t1-ltipdxbackbone.ltinet.net (206.190.136.117)(206.190.136.117)

3 3 pdx2lc.worldaccessnet.com (206.190.136.6)pdx2lc.worldaccessnet.com (206.190.136.6)

4 4 seattle-portland-ds3.sea.above.net (seattle-portland-ds3.sea.above.net (209.133.31.50))

5 5 POS1-0-0.GW2.SEA4.ALTER.NET (157.130.177.121)POS1-0-0.GW2.SEA4.ALTER.NET (157.130.177.121)

6 6 112.ATM3-0.XR2.SEA4.ALTER.NET (146.188.200.174)112.ATM3-0.XR2.SEA4.ALTER.NET (146.188.200.174)

7 7 292.ATM3-0.XR2.SEA1.ALTER.NET (146.188.200.157)292.ATM3-0.XR2.SEA1.ALTER.NET (146.188.200.157)

8 8 194.ATM9-0-0.GW1.SEA1.ALTER.NET (146.188.200.45)194.ATM9-0-0.GW1.SEA1.ALTER.NET (146.188.200.45)

9 9 63.76.82.94 (63.76.82.94)63.76.82.94 (63.76.82.94)

10 10 www.bsquare.com (63.76.82.70)www.bsquare.com (63.76.82.70)

IP InternalsIP Internals

TTL (Time To Live) field in IP TTL (Time To Live) field in IP header eliminates endless routing header eliminates endless routing loops by limiting hop countloops by limiting hop count

127.0.0.1 is a special loopback 127.0.0.1 is a special loopback addressaddress

UDP InternalsUDP Internals

Ensures data correctness, but not Ensures data correctness, but not reliable deliveryreliable delivery

Adds a “port” number to IP Adds a “port” number to IP Think of a port as channels for a Think of a port as channels for a

single machine - more on this in single machine - more on this in the discussion of socketsthe discussion of sockets

UDP InternalsUDP Internals

Sends entire chuck of data in one Sends entire chuck of data in one packetpacket

Sends datagrams in one directionSends datagrams in one direction

TCP InternalsTCP Internals

Lots of versions floating around: Lots of versions floating around: • Tahoe - released with BSD NR 1.0Tahoe - released with BSD NR 1.0• Reno - released with BSD NR 2.0Reno - released with BSD NR 2.0• New TCP RenoNew TCP Reno• TCP VegasTCP Vegas

Versions are guaranteed to Versions are guaranteed to interoperate but not with optimal interoperate but not with optimal performanceperformance

TCP InternalsTCP Internals

Guarantees data correctness and Guarantees data correctness and deliverydelivery

Uses ports identical to UDPUses ports identical to UDP Breaks data into individual packetsBreaks data into individual packets Full duplex two-way streamFull duplex two-way stream Complete implementation is Complete implementation is

complicatedcomplicated with lots of intricate details with lots of intricate details - we’ll touch on interesting highlights- we’ll touch on interesting highlights

TCP InternalsTCP Internals

Operates on two basic principles: Operates on two basic principles: flow flow controlcontrol and and congestion controlcongestion control

Flow controlFlow control involves preventing involves preventing senders from overrunning the capacity senders from overrunning the capacity of receiversof receivers

Congestion controlCongestion control involves preventing involves preventing too much data from being injected into too much data from being injected into the network, causing links and switches the network, causing links and switches to become overloadedto become overloaded

TCP InternalsTCP Internals

Follows a basic protocol design rule Follows a basic protocol design rule called “smart sender, dumb receiver”called “smart sender, dumb receiver”

Flow control done via “sliding window”Flow control done via “sliding window”• For window size n, only n bytes can be For window size n, only n bytes can be

sent without receiving an sent without receiving an acknowledgementacknowledgement

• When data is acknowledged, the window When data is acknowledged, the window slides forwardslides forward

TCP InternalsTCP Internals

TCP packet header advertises a TCP packet header advertises a window size indicating the number window size indicating the number of bytes the receiver is willing to of bytes the receiver is willing to getget

Initial window size established in Initial window size established in TCP connection setupTCP connection setup

TCP InternalsTCP Internals

Packet header includes the last Packet header includes the last byte acknowledged and the packet byte acknowledged and the packet sequence numbersequence number

Sequence numbers are used to Sequence numbers are used to reassemble packets in the order reassemble packets in the order they were sentthey were sent

TCP InternalsTCP Internals

4 5 6 7 8 91 2 3 10 11 12

offered window(advertised by receiver)

usable window

sent andacknowledged

sent, not ACKed

can send ASAPcan’t send untilwindow moves

Left side of window advances when data is acknowledgedRight side controlled by size of window advertisement

TCP InternalsTCP Internals

What if receiver’s buffer fills up and What if receiver’s buffer fills up and results in an advertised window size of 0?results in an advertised window size of 0?

TCP periodically sends a 1-byte “probe” TCP periodically sends a 1-byte “probe” packet which fails but has a new advertised packet which fails but has a new advertised window sizewindow size

EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked)

TCP InternalsTCP Internals

Acks indicate last consecutive Acks indicate last consecutive packet receivedpacket received

Packets are retransmitted if an Packets are retransmitted if an ACK is not received after a certain ACK is not received after a certain time periodtime period

Timeout value varies depending on Timeout value varies depending on previous packets average round previous packets average round trip time (RTT)trip time (RTT)

TCP InternalsTCP Internals

Congestion control is built on top Congestion control is built on top of sliding window flow controlof sliding window flow control

Consists of three intertwined Consists of three intertwined mechanisms:mechanisms:• Additive Increase / Multiplicative Additive Increase / Multiplicative

DecreaseDecrease• Slow StartSlow Start• Fast RetransmitFast Retransmit

TCP InternalsTCP Internals

An additional window size is An additional window size is maintained in each packet header maintained in each packet header called the congestion windowcalled the congestion window

Similar to advertised window, but Similar to advertised window, but not directly controlled by sender or not directly controlled by sender or receiverreceiver

TCP InternalsTCP Internals

Effective window size calculation Effective window size calculation changes:changes:

MaxWindow = MIN(CongestionWindow,AdvertisedWindow)

EffectiveWindow = MaxWindow - (LastByteSent - LastByteAcked)EffectiveWindow = MaxWindow - (LastByteSent - LastByteAcked)

How is congestion window size How is congestion window size calculated?calculated?

TCP InternalsTCP Internals

Initially it is set to the Maximum Initially it is set to the Maximum Segment Size (MSS)Segment Size (MSS)

Whenever a congestion window Whenever a congestion window size is successfully transmitted, size is successfully transmitted, the size is incremented by MSS - the size is incremented by MSS - hence the term “additive increase”hence the term “additive increase”

TCP InternalsTCP Internals

If a packet is dropped (e.g an ACK If a packet is dropped (e.g an ACK times out), it is assumed to be due times out), it is assumed to be due to network congestionto network congestion

When a packet is dropped, the When a packet is dropped, the congestion window size is cut in congestion window size is cut in half - hence the term half - hence the term “multiplicative decrease”“multiplicative decrease”

TCP InternalsTCP Internals

Result is that the window size is Result is that the window size is eased up until a packet is dropped eased up until a packet is dropped and then it is throttled backand then it is throttled back

Works OK during the middle of a Works OK during the middle of a connection, but takes too long to connection, but takes too long to ramp up when starting from ramp up when starting from scratch...scratch...

TCP InternalsTCP Internals

Slow Start addresses initial connection Slow Start addresses initial connection issue and temporarily discards additive issue and temporarily discards additive increaseincrease

Congestion window size starts at 1 Congestion window size starts at 1 packet and is doubled every time a full packet and is doubled every time a full window is successfully transmittedwindow is successfully transmitted

Eventually a packet is dropped and Eventually a packet is dropped and additive increase is resumedadditive increase is resumed

TCP InternalsTCP Internals

Why is it called Slow Start if it Why is it called Slow Start if it changes from linear to exponential changes from linear to exponential growth of congestion window size?growth of congestion window size?

Refers to difference when compared to Refers to difference when compared to original TCP strategy of always starting original TCP strategy of always starting with full advertised window sizewith full advertised window size

TCP InternalsTCP Internals

Fast retransmit was not part of Fast retransmit was not part of original TCP specoriginal TCP spec

Added by TCP Reno circa 1990 to Added by TCP Reno circa 1990 to deal with performance problemsdeal with performance problems

TCP InternalsTCP Internals

Fast Retransmit means that if the Fast Retransmit means that if the sender sees a number of duplicate sender sees a number of duplicate ACKs it retransmits first packet ACKs it retransmits first packet after ACKafter ACK

Assumes that a number of Assumes that a number of duplicate ACKs imply a dropped duplicate ACKs imply a dropped packetpacket

TCP InternalsTCP Internals

Packet 1

ACK 1

Packet 2

Packet 3

ACK 1

ACK 1

Packet 4

Packet 5

Packet 2

ACK 5

Fast Retransmit in action!Fast Retransmit in action!

TCP/IP WeaknessesTCP/IP Weaknesses

TCP/IP WeaknessesTCP/IP Weaknesses

IPIP• address space is too small address space is too small • size of routing information size of routing information

transmitted and stored is too bigtransmitted and stored is too big• lack of real-time support necessary lack of real-time support necessary

for voice and multimediafor voice and multimedia

TCP/IP WeaknessesTCP/IP Weaknesses

Being addressed by IPv6Being addressed by IPv6• Increases address space to 128 bitsIncreases address space to 128 bits• Over 1500 addresses per square foot Over 1500 addresses per square foot

of the earth’s surface!of the earth’s surface! Difficult to roll out and guarantee Difficult to roll out and guarantee

cooperation with IPv4cooperation with IPv4

TCP/IP WeaknessesTCP/IP Weaknesses

TCPTCP• congestion control algorithm is a congestion control algorithm is a

problem over wireless connectionsproblem over wireless connections• maximum packet size of 64K and 32-maximum packet size of 64K and 32-

bit sequence number is too small for bit sequence number is too small for broadband pipesbroadband pipes

• reliability guarantee causes reliability guarantee causes degradation in multimedia streamsdegradation in multimedia streams

TCP/IP WeaknessesTCP/IP Weaknesses

TCP has unused header bits that TCP has unused header bits that could be used for a temporary could be used for a temporary hackhack

No structured initiative like IPv6 for No structured initiative like IPv6 for solving TCP issuessolving TCP issues

Resource MaterialResource Material

Resources for the Curious Resources for the Curious and Diligentand Diligent

RFCs at www.faqs.org/rfcsRFCs at www.faqs.org/rfcs Computer Networks: A Systems Computer Networks: A Systems

Perspective by Peterson and DaviePerspective by Peterson and Davie Internetworking with TCP/IP 1, 2, Internetworking with TCP/IP 1, 2,

and 3 by Doug Comerand 3 by Doug Comer TCP/IP Illustrated 1, 2, and 3 by TCP/IP Illustrated 1, 2, and 3 by

Richard StevensRichard Stevens

Resources for the Curious Resources for the Curious and Diligentand Diligent

Understanding IP Addressing at Understanding IP Addressing at /www.3com.com/nsc/501302s.html/www.3com.com/nsc/501302s.html

2 part article on embedding a 2 part article on embedding a TCP/IP stack in Dec 99 and Jan 99 TCP/IP stack in Dec 99 and Jan 99 issues of ESPissues of ESP

Thanks for staying awake!Thanks for staying awake!

Questions?Questions?