52
1 1 Final Review EE 122: Intro to Communication Networks Fall 2007 (WF 4-5:30 in Cory 277) Vern Paxson TAs: Lisa Fowler, Daniel Killebrew & Jorge Ortiz http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, and colleagues at Princeton and UC Berkeley 2 Announcements • Project #2, phase 2 due Monday 11PM • My office hours next week: by appointment • Course evaluations today ~5:15PM No 5-minute break during this lecture

Vern Paxson - University of California, Berkeley

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vern Paxson - University of California, Berkeley

1

1

Final Review

EE 122: Intro to Communication Networks

Fall 2007 (WF 4-5:30 in Cory 277)

Vern Paxson

TAs: Lisa Fowler, Daniel Killebrew & Jorge Ortiz

http://inst.eecs.berkeley.edu/~ee122/

Materials with thanks to Jennifer Rexford, Ion Stoica,and colleagues at Princeton and UC Berkeley

2

Announcements

• Project #2, phase 2 due Monday 11PM

• My office hours next week: by appointment

• Course evaluations today ~5:15PM–No 5-minute break during this lecture

Page 2: Vern Paxson - University of California, Berkeley

2

3

Final Review

• Tuesday Dec. 18, 8AM-11AM, in 277 Cory–We will start on “Berkeley Time”, 8:10AM

• Closed book

• You can have two regular-sized (8.5”x11”) sheetsof paper with notes on both sides

• No PDAs, calculators, electronic/Internet gadgets,smart cell phones, etc.

• No Blue Books - all answers on exam sheets

• Ensure legibility (pencil + eraser)

• Emphasis is on material since midterm

4

Fundamental Challenges for Networking

• Speed-of-light

• Desiring a pervasive global network

• Need for it to work efficiently/cheaply

• Failure of components

• Enormous dynamic range– “no such thing as typical”

• Disparate parties must work together

• Rapid growth/evolution

• Crooks & other bad guys

• Wow, have we done a lot!

– E.g., 80+ acronyms; 130+ concepts

Page 3: Vern Paxson - University of California, Berkeley

3

5

Avoiding Manual Configuration• Dynamic Host Configuration Protocol (DHCP)–End host learns how to send packets– Learn IP address, DNS servers, “gateway”, what’s local

• Address Resolution Protocol (ARP)– For local destinations, learn mapping between IP

address and MAC address

host host DNS... host host DNS...

router router

1.2.3.0/23255.255.254.0

5.6.7.0/24

1.2.3.7 1.2.3.1561.2.3.48

1.2.3.19

router

1A-2F-BB-76-09-AD

6

Key Ideas in Both Protocols• Broadcasting: when in doubt, shout!

• Caching: remember the past for a while

• Soft state: eventually forget the past–Key for robustness in the face of unpredictable change

Page 4: Vern Paxson - University of California, Berkeley

4

7

Dynamic Host Configuration Protocol

arrivingclient

DHCP server203.1.2.5

DHCP discover(broadcast)

DHCP offer

DHCP request

DHCP ACK

(broadcast)

8

Figuring Out Where To Send Locally• Two cases:

– Destination is on the local networko So need to address it directly

– Destination is not local (“remote”)o Need to figure out the first “hop” on the local network

• Determining if it’s local: use the netmask– E.g., mask destination IP address w/ 255.255.254.0– Is it the same value as when we mask our own address?

o Yes = localo No = remote

host host DNS... host host DNS...

router router

1.2.3.0/23255.255.254.0

5.6.7.0/24

1.2.3.7 1.2.3.1561.2.3.48

1.2.3.19

router

1A-2F-BB-76-09-AD

Page 5: Vern Paxson - University of California, Berkeley

5

9

Address Resolution Protocol• Every node maintains an ARP table– <IP address, MAC address> pair

• Consult the table when sending a packet

• But: what if IP address not in the table?–Sender broadcasts: “Who has IP address 1.2.3.156?”–Receiver responds: “MAC address 58-23-D7-FA-20-B0”–Sender caches result in its ARP table

• Link-layer protocol (RFC826)–Because necessary to bootstrap IP connectivity

⇒ For DHCP, ARP, etc. understand The CompleteLife of a Web Page Fetch from Homework #3.

10

Security Analysis of ARP• Impersonation–Any node that hears request can answer …–… and can say whatever they want

• Actual legit receiver never sees a problem–Because even though later packets carry its IP address,

its NIC doesn’t capture them since not its MAC address

• Or: Man-in-the-middle attack– Imposter forwards everything it receives for destination

but gets to inspect (& maybe alter) it first

• Does the attacker have to “win” a race?–Maybe not, if sender blindly believes ARP responses

Page 6: Vern Paxson - University of California, Berkeley

6

11

Internet Control Message Protocol• ICMP runs on top of IP–Viewed as an integral part of IP

o Not viewed as a transport protocol

• Diagnostics– Triggered when an IP packet encounters a problem

o E.g., Time Exceeded or Destination Unreachable– ICMP packet sent back to the source IP address

o Includes the error information (e.g., type and code)o … and IP header plus 8+ byte excerpt from original packet

–Source host receives the ICMP packeto Inspects excerpt (e.g., protocol and ports)o … to identify which socket should receive the error

–Exception: ICMP not sent if problem packet is ICMPo And just for fragment 0 of a group of fragmentso (why?)

12

Path MTU Discovery• MTU = Maximum Transmission Unit– Largest IP packet that a link supports

• Path MTU (PMTU) = minimum end-to-end MTU–Sender must keep datagrams no larger to avoid

fragmentation

• How does the sender know the PMTU is?

• Strategy (RFC 1191):– Try a desired value–Set DF to prevent fragmentation–Upon receiving Need Fragmentation ICMP …

o … oops, that didn’t work, try a smaller value

Page 7: Vern Paxson - University of California, Berkeley

7

13

traceroute to www.whitehouse.gov (204.102.114.49), 30 hops max, 40 byte packets 1 cory115-1-gw.EECS.Berkeley.EDU (128.32.48.1) 0.829 ms 0.660 ms 0.565 ms 2 cory-cr-1-1-soda-cr-1-2.EECS.Berkeley.EDU (169.229.59.233) 0.953 ms 0.857 ms 0.727 ms 3 soda-cr-1-1-soda-br-6-2.EECS.Berkeley.EDU (169.229.59.225) 1.461 ms 1.260 ms 1.137 ms 4 g3-8.inr-202-reccev.Berkeley.EDU (128.32.255.169) 1.402 ms 1.298 ms * 5 ge-1-3-0.inr-002-reccev.Berkeley.EDU (128.32.0.38) 1.428 ms 1.889 ms 1.378 ms 6 oak-dc2--ucb-ge.cenic.net (137.164.23.29) 1.731 ms 1.643 ms 1.680 ms 7 dc-oak-dc1--oak-dc2-p2p-2.cenic.net (137.164.22.194) 3.045 ms 1.640 ms 1.630 ms 8 * * * 9 dc-lax-dc1--sac-dc1-pos.cenic.net (137.164.22.126) 13.104 ms 13.163 ms 12.988 ms10 137.164.22.21 (137.164.22.21) 13.328 ms 42.981 ms 13.548 ms11 dc-tus-dc1--lax-dc2-pos.cenic.net (137.164.22.43) 18.775 ms 17.469 ms 21.652 ms12 a204-102-114-49.deploy.akamaitechnologies.com (204.102.114.49) 18.137 ms 14.905 ms 19.730 ms

Lost Reply

Router doesn’t send ICMPs

Final HopNo PTR record for address

14

• Each router has a complete picture of the network

• How does each router get the global state?– Each router reliably floods information about its neighbors to every

other router

• Each router independently calculates the shortest path fromitself to every other router– Dijkstra’s Shortest Path Algorithm

Link State Routing

Host A

Host B Host E

Host D

Host C

N1 N2

N3

N4

N5

N7N6

A

B E

DC

A

B E

DC A

B E

DC

A

B E

DC

A

B E

DC

A

B E

DC

A

B E

DC

Page 8: Vern Paxson - University of California, Berkeley

8

15

Dijsktra’s Algorithm

1 Initialization:2 S = {A};3 for all nodes v4 if v adjacent to A5 then D(v) = c(A,v);6 else D(v) = ;78 Loop9 find w not in S such that D(w) is a minimum;10 add w to S;11 update D(v) for all v adjacent to w and not in S:12 if D(w) + c(w,v) < D(v) then // w gives us a shorter path to v than we’ve found so far13 D(v) = D(w) + c(w,v); p(v) = w;14 until all nodes in S;

!

• c(i,j): link cost from node i to j

• D(v): current cost source → v

• p(v): predecessor node alongpath from source to v, that isnext to v

• S: set of nodes whose leastcost path definitively known

16

Distance Vector Routing• Each router knows the links to its immediate

neighbors–Does not flood this information to the whole network

• Each router has some idea about the shortest pathto each destination–E.g.: Router A: I can get to router B with cost 11 via next

hop router D–Routers exchange this information with their neighboring

routerso Again, no flooding the whole network

–Routers update their idea of the best path using info fromneighbors

Page 9: Vern Paxson - University of California, Berkeley

9

17

Distance Vector Algorithm (cont’d)1 Initialization:2 for all neighbors V do3 if V adjacent to A4 D(A, V) = c(A,V);5 else6 D(A, V) = ∞;7 send D(A, Y) to all neighbors loop:8 wait (until A sees a link cost change to neighbor V /* case 1 */9 or until A receives update from neighbor V) /* case 2 */10 if (c(A,V) changes by ±d) /* ⇐ case 1 */11 for all destinations Y that go through V do12 DV(A,Y) = DV(A,Y) ± d13 else if (update D(V, Y) received from V) /* ⇐ case 2 */ /* shortest path from V to some Y has changed */14 DV(A,Y) = DV(A,V) + D(V, Y); /* may also change D(A,Y) */15 if (there is a new minimum for destination Y)16 send D(A, Y) to all neighbors17 forever

• c(i,j): link cost from node i to j

• DZ(A,V): cost from A to V via Z

• D(A,V): cost of A’s best path to V

18

Routing: Link State vs. Distance Vector

Per-node message complexity

• LS: O(e) messages– e: number of edges

• DV: O(d) messages, many times– d is node’s degree

Complexity/Convergence

• LS: O(N log N) computation

• DV: convergence time varies–may be routing loops– count-to-infinity problem

Page 10: Vern Paxson - University of California, Berkeley

10

19

Interdomain Routing• Challenges of interdomain routing–Scale, privacy, and policy– Limitations of link-state and distance-vector routing

• Path-vector routing– Faster loop detection than distance-vector routing–More flexibility than shortest-path routing

• Border Gateway Protocol (BGP)– Incremental, prefix-based, path-vector protocol–Runs between Autonomous Systems (ASs)–Programmable import and export policies–Multi-step decision process for selecting “best” route

o But often skewed by Hot Potato routing

20

TCP Service Model• Reliable, in-order, byte-stream delivery

– and with good performance

• Challenges - the network can– drop packets

o Even perhaps a large number

– delay packetso Even perhaps for many seconds

– deliver packets out-of-ordero Follows from possibility of arbitrary delay

– replicate packetso Weird, but it does sometimes happen

– corrupt packets– (What’s missing?) (security)

Page 11: Vern Paxson - University of California, Berkeley

11

21

TCP Header

Source port Destination port

Sequence number

Acknowledgment

Advertised windowHdrLen Flags0

Checksum Urgent pointer

Options (variable)

Data

22

Timing Diagram: 3-Way Handshaking

Client (initiator)

Server

SYN, SeqNum = x

SYN + ACK, SeqNum = y, Ack = x + 1

ACK, Ack = y + 1

ActiveOpen

PassiveOpen

connect()

listen()

accept()

Page 12: Vern Paxson - University of California, Berkeley

12

23

Normal Termination, One Side At A Time

• Finish (FIN) to close and receive remaining bytes– FIN occupies one octet in the sequence space

• Other host ack’s the octet to confirm

• Closes A’s side of the connection, but not B’s– Until B likewise sends a FIN– Which A then acks

SYN

SYN

AC

K

AC

KD

ata

FIN

AC

K

AC

K

timeA

B

FIN

AC

K

Timeout:Avoid reincarnation

Can retransmitFIN ACK if lost

Connectionnow half-closed

Connectionnow closed

24

Abrupt Termination

• A sends a RESET (RST) to B– E.g., because app. process on A crashed

• That’s it– B does not ack the RST– Thus, RST is not delivered reliably– And: any data in flight is lost– But: if B sends anything more, will elicit another RST

SYN

SYN

AC

K

AC

KD

ata

RST

AC

K

timeA

B

Data R

ST

Page 13: Vern Paxson - University of California, Berkeley

13

25

Reasons for Retransmission

Packet

ACK

Tim

eout

Packet

ACK

Tim

eout

Packet

Tim

eout

Packet

ACK

Tim

eout

Packet

ACK

Tim

eout

Packet

ACKT

imeo

ut

ACK lostDUPLICATE

PACKET

Packet lost Early timeoutDUPLICATEPACKETS

26

RTT Estimation

• Use exponential averaging:

!

SampleRTT = AckRcvdTime " SendPacketTime

EstimatedRTT =# $ EstimatedRTT + (1"#) $ SampleRTT

# = 7 /8 (for one measurement per flight)

Est

imat

edR

TT

Time

SampleRTT

Page 14: Vern Paxson - University of California, Berkeley

14

27

Jacobson/Karels Algorithm• Compute “slop” in terms of observed variability

!

Difference = SampleRTT " EstimatedRTT

Deviation = Deviation + # $ (|Difference |"Deviation)

RTO = µ $ EstimatedRTT + % $Deviation

# =1/4 (again, for one measurement per flight)

µ =1

% = 4

– Implementations often use a coarse-grained (500 msec)timer, so resulting value is large

28

Problem: Ambiguous Measurement• How to differentiate between the real ACK, and

ACK of the retransmitted packet?

ACK

Retransmission

Original Transmission

Sam

pleR

TT

?

Sender Receiver

ACKRetransmission

Original Transmission

Sam

pleR

TT

?

Sender Receiver

• Karn/Partridge algorithm: Measure SampleRTT onlyfor original transmissions–And use exponential backoff

Page 15: Vern Paxson - University of California, Berkeley

15

29

TCP State Diagram

30

Flow Control vs. Congestion Control• Flow control keeps one fast sender fromoverwhelming a slow receiver–Controlled by advertised window

• Congestion control keeps a set of sendersfrom overloading the network–Controlled by CWND

Page 16: Vern Paxson - University of California, Berkeley

16

31

View from a Single Flow

• Knee – point after which– Throughput increases very

slowly– Delay increases quickly

• Cliff – point after which– Throughput starts to decrease

very fast to zero (congestioncollapse)

– Delay approaches infinity

Load

Load

Thr

ough

put

Del

ay

knee cliff

congestioncollapse

packetloss

32

Additive Increase, Multiplicative Decrease

• How much to increase and decrease?– Increase linearly, decrease multiplicatively (AIMD)–Necessary condition for stability of TCP

• Additive increase–On success for last window of data, increase linearly

o One packet (MSS) per RTTo Or: increment per ACK: CWND += MSS * (MSS / CWND)

• Multiplicative decrease–On loss of packet, divide congestion window in half

Page 17: Vern Paxson - University of California, Berkeley

17

33

Slow StartDouble CWND per round-trip time

Simple implementation:on each ack, CWND += MSS

D A D D A A D D

A A

D

A

Src

Dest

D

A

1 2 4 8

34

Fast Retransmission

• Resend a segmentafter 3 duplicate ACKs– Duplicate ACK means

that an out-of sequencesegment was received

segment 1cwnd = 1

ACK 2

cwnd = 2 segment 2segment 3

ACK 4

cwnd = 4 segment 4segment 5segment 6segment 7

ACK 4

ACK 4• Notes:

– ACKs are for nextexpected packet

– Packet reordering cancause duplicate ACKs

– Window may be too smallto generate enoughduplicate ACKs

ACK 3cwnd = 3

ACK 4 segment 4

3 duplicateACKs

cwnd = 2

Page 18: Vern Paxson - University of California, Berkeley

18

35

Summary of TCP Mechanisms• Delayed Acknowledgment– Lessens overhead (40 bytes per ACK)–But can cause CWND to grow more slowly

• Fast Retransmit–NACK-based loss detection in 1 RTT–Avoids timeout delay–AIMD after subsequent Slow Start reaches SSTHRESH

• Fast Recovery–Avoids needing to Slow Start after Fast Retransmit– True AIMD

• SACK–Both speeds recovery and avoids unnecessary rexmit.

36

TCP Performance• Time-Sequence plots provide a powerful tool for

visualizing TCP behavior & performance

• Spectrum of TCP mechanisms influenceperformance–Advertised window, sender window– Timeout, slow start, exponential backoff–Acking policy (delayed; ack-splitting; SACK)– Fast Retransmit (avoid RTO stall)– Fast Recovery (full AIMD)–Window scaling (required for large bandwidth-delay

product)

Page 19: Vern Paxson - University of California, Berkeley

19

37

Example of Time-Sequence Plot

Hollow squares = Acks

Solid squares = Data

MSS

Window

RTT

(Circles =AdvertisedWindow)

38

Example of Time-Sequence Plot

Slope gives overall throughput (bytes/sec)

Page 20: Vern Paxson - University of California, Berkeley

20

39

Fast Retransmission

Window stays at 5 MSS⇒ transition toCongestion Avoidance

After pending data ack’d,slow start. CWND = 2 MSSsince ACK arrivalincremented it by MSS

Third dup triggers retransmission

40

Same Fast Retransmission @ Recv.

What happened here?

Reordering.

Again, arrivals muchmore smooth due tobottleneck shaping

Page 21: Vern Paxson - University of California, Berkeley

21

41

Round-TripTime(RTT)

Sender Receiver

ACK 486

Data 4381:5841

Data 1461:2921Data 2921:4381

Data 5841:7301

ACK 973

ACK 1461

Data 1:1461• Rule: grow window by one full-sized packet for each valid ACK received

• Send M (distinct) ACKs forone packet

• Growth factor proportional to M

ACK-splitting

42

TCP Throughput Equation• For packets of B bytes and packet loss rate p,

throughput is:

• Implications:– Long-term throughput falls as 1/RTT– Long-term throughput falls as 1/sqrt(p)

• Non-TCP transport can use equation to provideTCP-friendly congestion control

!

T =1.5B

RTT p

Page 22: Vern Paxson - University of California, Berkeley

22

43

Generic Router Architecture

• Input and output interfacesare connected through aninterconnect

• Interconnect can beimplemented by– Shared memory

o Low capacity routers (e.g.,PC-based routers)

– Shared buso Medium capacity routers

– Point-to-point (switched) buso High capacity routerso Packets fragmented into

cellso Essentially a network inside

the router!

input interface output interface

Inter-connect

44

Output Queued Routers

• Only output interfaces storepackets

• Advantages– Easy to design algorithms:

only one congestion point

• Disadvantages– Requires an output speedup

Ro = C•N, where N is thenumber of interfaces ⇒ notfeasible

input interface output interface

Backplane

CRo

Page 23: Vern Paxson - University of California, Berkeley

23

45

Input Queued Routers

• Input interfaces storepackets

• Easier to build sinceonly need R ≈ C– Though need to

implement “backpressure” to know whento send

• But harder to buildefficiently due tocontention andhead-of-line blocking

input interface output interface

Backplane

C R

46

Head-of-line Blocking• Cell at head of an input queue cannot be

transferred, thus blocking the following cells

Cannot betransferred because output buffer overflow

Cannot be transferred because is blocked by orange cell

Output 1

Output 2

Output 3

Input 1

Input 2

Input 3

• Modern high-speed routers use combination of input &output queuing, with flow control & multiple “virtual queues”

Page 24: Vern Paxson - University of California, Berkeley

24

47

Simple Queuing - FIFO and Drop Tail• Most of today’s routers

• Transmission via FIFO scheduling– First-in first-out queue–Packets transmitted in the order they arrive

• Buffer management: drop-tail– If the queue is full, drop the incoming packet

48

Random Early Detection (RED)• Basic idea of RED– Router notices that the queue is getting backlogged– … and randomly drops arriving packets to signal congestion

• Packet drop probability– Drop probability increases with average queue length– If buffer is below some level, don’t drop anything– … otherwise, set drop probability as function of length

• RED controls average queue size, avoiding lost bursts …– … and distributing congestion losses in a more fair fashion

Average Queue Length

Pro

bab

ility

Page 25: Vern Paxson - University of California, Berkeley

25

49

Explicit Congestion Notification• Early dropping of packets–Good: gives early/refined feedback–Bad: costs a packet drop to give the feedback

• Explicit Congestion Notification (ECN)–Router instead marks the packet with an ECN bit–… which end system interprets as a sign of congestion

• Surmounting the challenges–Must be supported by both end hosts as well as routers–Requires two bits in the IP header and 2 TCP header bits

50

Little’s Theorem• Assume a system where packets arrive at rate λ• Let d be mean delay of packet, i.e., mean time a packet

spends in the system

• Q: What is N, mean # of packets in the system? = average occupancy– E.g., for a router N would give the size of the queue

systemλ = mean arrival rate

d = mean delay

• A: N = λ x d

Page 26: Vern Paxson - University of California, Berkeley

26

51

Wireless Link Characteristics

51

(Figure Courtesy of Kurose and Ross)

52

Other Wireless Link Characteristics• Path loss–Signal attenuation as a function of distance–Signal-to-noise ratio (SNR—Signal Power/Noise Power)

decreases, make signal unrecoverable

• Multipath Propagation–Signal reflects off surfaces, effectively causing self-

interference

• Interference from other sources– Internal Interference

o Hosts within range of each other collide with one another’stransmission (remember Aloha)

–External Interferenceo Microwave is turned on and blocks your signal 52

Page 27: Vern Paxson - University of California, Berkeley

27

53

802.11 Architecture

• Designed for limited

geographical area

• AP’s are set to specific channel and broadcast beaconmessages with SSID and MAC Address periodically

• Hosts scan all the channels to discover the AP’s– Host associates with AP (actively or passively)

53

Kurose and Ross

802.11 framesexchanges

802.3 (Ethernet)frames exchanged

5454

• A and C can both send to B but can’t hear each other– A is a hidden terminal for C and vice versa

• CSMA/CD will be ineffective – need to sense at receiver

Hidden Terminals

A B C

transmit range

Page 28: Vern Paxson - University of California, Berkeley

28

5555

Exposed Terminals

• Exposed node: B sends a packet to A; C hears this anddecides not to send a packet to D (despite the fact that thiswill not cause interference)!

A B C D

5656

CSMA/CA: CSMA w/ Collision Avoidance

• Since we can’t detect collisions, we try to avoidthem

• When medium busy, choose random interval(contention window)– Wait for that many idle timeslots to pass before

sending

• When a collision is inferred, retransmit withbinary exponential backoff (like Ethernet)– Use ACK from receiver to infer “no collision”– Use exponential backoff to adapt contention

window

Page 29: Vern Paxson - University of California, Berkeley

29

5757

MACA = Multiple Access with Collision Avoidance

Overcome exposed/hidden terminal problems withcontention-free protocol1. B stimulates C with Request To Send (RTS)2. A hears RTS and defers (to allow C to answer)3. C replies to B with Clear To Send (CTS)4. D hears CTS and defers to allow the data5. B sends to C

RTS / CTS Protocols (MACA)

B C DRTS

CTSA

Courtesy Mikko Hypponen

Page 30: Vern Paxson - University of California, Berkeley

30

59

Summary• Wireless connectivity provides a very different set of

tradeoffs from wired– Much greater ease of deployment– Mobility– But: unprotected physical signaling– Complications due to interference, attenuated range– Leading to much more frequent loss

• Hidden terminal and Exposed terminal problems motivateneed for a different style of Media Access Control:CSMA/CA

• Multihop provides applications to sensornets, citynets– But additional complications of routing, contention

• Wireless devices bring new security risks

60

Summary of QoS• Basic mechanism for achieving better-than-best-

effort performance: scheduling–Multiple queues allow priority service– Fair queuing provides isolation between flows

• But: still need end-to-end mechanisms–Reservations & admission control–Descriptions of bursty traffic: token buckets

Page 31: Vern Paxson - University of California, Berkeley

31

61

Summary of QoS, con’t• IntServ provides per-flow performance guarantees– But lacks scalability

• DiffServ provides per-aggregate tiers of relativeperformance–Scalable, but not as powerful

• Neither is generally available end-to-end today

• ISPs manipulating what services receive whatperformance raises issues of network neutrality

62

Scheduling• Decide when and what packet to send on output link

• Classifier partitions incoming traffic into flows each with theirown FIFO queue

1

2

Scheduler

flow 1

flow 2

flow n

Classifier

Buffermanagement

Page 32: Vern Paxson - University of California, Berkeley

32

63

Max-Min Fairness

• Denote– C – link capacity– N – number of flows– ri – arrival rate

• Max-min fair rate computation:1. compute C/N (= the remaining fair share)2. if there are flows i such that ri ≤ C/N

then update C and N

and go to 13. if not, f = C/N; terminate

• Flows receive at most the fair rate, i.e., min(f, ri)!

C = C " ri

i s.t ri #C /N$ ; N = N " k (for k such flows)

64

Characterizing Burstiness: Token Bucket• Parameters– r – average rate, i.e., rate at which tokens fill the bucket– b – bucket depth (limits size of burst)–R – maximum link capacity or peak rate

• A bit can be transmitted only when a token isavailable

r bps

b bits

≤ R bps

regulatortime

bits

b·R/(R-r)

slope R

slope r

Maximum # of bits sent

b/(R-r)

Page 33: Vern Paxson - University of California, Berkeley

33

65

Arrival Curve: Example• Arrival curve – maximum amount of bits

transmitted during an interval of time Δt

• Use token bucket to bound arrival curve

bitsArrival curve

time

bps

0 1 2 3 4 5

1

2

1 2 3 4 5

1

2

3

4

(R=2,b=1,r=1)

Δt

66

QoS Guarantees: Per-hop Reservation

• Router: allocate bandwidth ra, buffer space Basuch that– no packet is dropped– no packet experiences a delay larger than D

bits

b*R/(R-r)

slope rArrival curve

DBa

slope ra

R

Page 34: Vern Paxson - University of California, Berkeley

34

67

Differentiated Service (DS) Field

• DS field encodes Per-Hop Behavior (PHB)– E.g., Expedited Forwarding (all packets receive

minimal delay & loss)– E.g., Assured Forwarding (packets marked with

low/high drop probabilities)

Version HLen TOS Length

Identification Fragment offsetFlags

Source address

Destination address

TTL Protocol Header checksum

0 4 8 16 19 31

Data

IPheader

DS Field0 5 6 7

ECN

68

Comparison to Best-Effort & Intserv

Per flow steupLong term setupNo setupComplexity

End-to-endDomainEnd-to-endServicescope

Not scalable(each routermaintains perflow state)

Scalable

(edge routersmaintain peraggregate state; corerouters per classstate)

Highly scalable(nodes maintainonly routing state)

Scalability

Per flow isolation

Per flowguarantee

Per aggregateisolation

Per aggregateguarantee

Connectivity

No isolation

No guarantees

Service

IntservDiffservBest-Effort

Page 35: Vern Paxson - University of California, Berkeley

35

69

Summary of Middleboxes• Middleboxes address important problems–Using fewer IP addresses–Blocking unwanted traffic–Monitoring activity–Shaping use of network resources– Improving/controlling performance (vs. network neutrality)

• Middleboxes cause problems of their own–Connectivity erodes

o Notion of addresses, ports weakenedo Middlebox state management can lead to connection termination

–Harder to deploy new apps

70

Network Address Translation Example

10.0.0.1

10.0.0.2

10.0.0.3

S: 10.0.0.1, 3345D: 128.119.40.186, 80

110.0.0.4

138.76.29.7

1: host 10.0.0.1 sends datagram to 128.119.40.186, 80

NAT translation tableWAN side addr LAN side addr

138.76.29.7, 5001 10.0.0.1, 3345…… ……

S: 128.119.40.186, 80D: 10.0.0.1, 3345 4

S: 138.76.29.7, 5001D: 128.119.40.186, 802

2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table

S: 128.119.40.186, 80D: 138.76.29.7, 5001 3

3: Reply arrives dest. address: 138.76.29.7, 5001

4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345

Page 36: Vern Paxson - University of California, Berkeley

36

71

Objections Against NAT• Difficult to support peer-to-peer applications–P2P needs a host to act as a server

• Layering violation (hence messiness)

• NAT violates the end-to-end principle–Network nodes should not modify the packets

• Connections become brittle

• Barrier to deployment of new apps

• IPv6 is a cleaner solution–Better to migrate than to limp along with a hack

72

Firewalls

administerednetwork

publicInternet

firewall

• Isolates organization’s internal net from Internet

• Allows some packets to pass, blocks others– (Refinement: shape some traffic, allow other unimpeded)

• Twin goals: security and policy enforcement

Page 37: Vern Paxson - University of California, Berkeley

37

73

Example of Firewall Configuration• Alice’s firewall rules–#1: Don’t let Trudy machines in

o Deny <src = 111.55.66.0/24, dst = 222.33.0.0/16>

–#2: Let rest of Bob’s network in to special dstso Permit <src=111.55.0.0/16, dst = 222.33.44.0/24>

–#3: Block the rest of the worldo Deny <src = 0.0.0.0/0, dst = 0.0.0.0/0>

74

Misleading Stateless Inspection

Source port Destination port

Sequence number

Acknowledgment

Advertised windowHdrLen SYN0

Checksum Urgent pointer

Options (variable)

Data

Split into twofragments.First is just 8bytes of IPpayload, i.e.,here

Page 38: Vern Paxson - University of California, Berkeley

38

75

Misleading Stateless Inspection, con’t

Source port Destination port

Sequence number

Acknowledgment

Advertised windowHdrLen SYN0

Checksum Urgent pointer

Options (variable)

Data

Second fragmentstarts 8 bytes latercovering all of this

Firewall looks14 bytes intopayload, i.e.,here, which isunder thecontrol of theattacker

76

Example: Tunneling IP over EmailFrom: [email protected]: [email protected]: Here’s my IP datagram

IP-header-version: 4IP-header-len: 5IP-ID: 11234IP-src: 1.2.3.4IP-dst: 5.6.7.8IP-payload: 0xa144bf2c0102…

Program receives this legal email, builds an IP packetcorresponding to description in email body …… and injects it into the network

Page 39: Vern Paxson - University of California, Berkeley

39

77

Overlays

• Deploy processing in the network

• Have packets processed as they traverse thenetwork

AS-1IP

AS-1Overlay Network(over IP)

78

Main Peer-to-Peer Challenges• Find where a particular file is stored

• Scale: up to hundred of thousands or millions of machines

• Churn: machines can come and go at any time

AB

C

D

E

F

E?

• Example of a Distributed Hash Table: Chord

Page 40: Vern Paxson - University of California, Berkeley

40

79

Chord Mapping Identifier to Node

• Node 8 maps [5,8]

• Node 15 maps [9,15]

• Node 20 maps [16,20]

• …

• Node 4 maps [59, 4]

• Each nodemaintains a pointerto its successor

4

20

3235

8

15

44

58

80

Achieving Efficiency: finger tables

80 + 2080 + 21

80 + 2280 + 23

80 + 24

80 + 25(80 + 26) mod 27 = 16

0Say m=7

ith entry at peer with id n is first peer with id >= )2(mod2 min +

i ft[i]0 961 962 963 964 965 1126 20

Finger Table at 80

32

4580

20112

96

Page 41: Vern Paxson - University of California, Berkeley

41

81

Summary of Cryptographic Mechanisms• Requirements for secure communication:–Authentication, authorization, integrity, confidentiality,

non-repudiation, availability

• Workhorse for many of these: cryptography–Symmetric encryption: fast, but requires shared secret–Public key encryption: no need for shared secret

• Hash functions provide integrity and signatures

• There are a range of attacks on cryptosystems–However, crypto is in fact our most mature security

technology

• Managing public keys: PKI–Digital certificates

82

Symmetric Key Ciphers - DES & AES• Idea: one-time pad (XOR w/ randombits)–But requires as much key material as plaintext

• Data Encryption Standard (DES)– 56-bit key (decreased from 64 bits at NSA’s request)–Still fairly strong other than brute-forcing the key space

o But custom hardware can crack a key in < 24 hours– Today many financial institutions use Triple DES

• Advanced Encryption Standard (AES)• Replacement for DES standardized in 2002• Key size: 128, 192 or 256 bits

• How fundamentally strong are they?• No one knows (no proofs exist)

Page 42: Vern Paxson - University of California, Berkeley

42

83

Cryptographically Strong Hashes• Desired properties when faced with an adversary:–Hard to invert

o Given hash, adversary can’t find input that produces it–Hard to find collisions

o Adversary can’t find two inputs that produce the same hash

⇒ Someone cannot alter the message withoutmodifying the digest

• Hashes let us–Succinctly refer to large objects–Obliquely refer to private objects (e.g., passwords)

o Send hash of object rather than object itself (since hard to invert)o Can prepend a (secret) key so that hashes of known items is

unpredictable

84

Standard Cryptographic Hash Functions• MD5 (Message Digest version 5)–Produces 128 bit hashes–Widely used (RFC 1321)–Broken:

o Recent work quickly finds collisions

• SHA-1 (Secure Hash Algorithm)–Produces 160 bit hashes–Widely used (SSL/TLS, SSH, PGP, IPSEC)–Broken:

o Recent work finds collisions, though not really quickly … yet

Page 43: Vern Paxson - University of California, Berkeley

43

85

Public Key / Asymmetric Encryption• Sender uses receiver’s public key–Advertised to everyone

• Receiver uses complementary private key–Must be kept secret

InternetEncrypt withpublic key

Decrypt withprivate key

Plaintext Plaintext

Ciphertext

86

RSA Encryption and Decryption• Encryption of message block m: c = E(m, e) = me mod n

• Decryption of ciphertext c: m = D(c, d) = cd mod n

–Works due to number-theoretic properties–Note: D(E(x, e), d) = E(D(x, d), e) = x

oI.e., D & E are inverses

Page 44: Vern Paxson - University of California, Berkeley

44

87

RSA Crypto & Signatures• Suppose Alice has published public key KE

• If she wishes to prove who she is, she cansend a message x encrypted with herprivate key KD (i.e., she sends D(x,KD))–Recall: E(x,KE) and D(x,KD) are inverses–Therefore: anyone w/ public key KE can recover

x, verify that Alice must have sent the messageo It provides a signature

–Alice can’t deny it ⇒ non-repudiation

88

Summary of Our Crypto Toolkit• If we can securely distribute a key, then–Symmetric ciphers (e.g., AES) offer fast,

presumably strong confidentiality

• Public key cryptography does away with(potentially major) problem of secure keydistribution–But: not as computationally efficient

o Use public key crypto to exchange a session key–And: also not guaranteed secure (but major

result if not)–Strength of popular RSA algorithm rests on

factoring large numbers

Page 45: Vern Paxson - University of California, Berkeley

45

89

Summary of Our Crypto Toolkit, con’t• Cryptographically strong hash functions provide

major building block for integrity (e.g., SHA-1)–As well as providing concise digests–And providing a way to prove you know something

(e.g., passwords) without revealing it (non-invertibility)–But: worrisome recent results regarding their strength

• Public key also gives us signatures– Including sender non-repudiation

90

Digital Certificate

• Signed data structure that binds an entity withits corresponding public key–Signed by a recognized and trusted authority, i.e.,

Certification Authority (CA)–Provide assurance that a particular public key

belongs to a specific entity

• Example: certificate of entity Y Cert = E({nameY, KYpublic}, KCAprivate)–KCAprivate: private key of Certificate Authority–KYpublic: public key of entity Y– nameY: name of entity Y

• Your browser has a bunch of CAs wired into it

Page 46: Vern Paxson - University of California, Berkeley

46

91

Putting It All Together - HTTPS• https = “Use HTTP over SSL/TLS”• SSL = Secure Socket Layer• TLS = Transport Layer Security• Successor to SSL, and compatible with it• RFC 4346

• Provides security layer (authentication,encryption) on top of TCP• Fairly transparent to the app

92

HTTPS Connection (SSL/TLS), con’t

• Browser (client) connectsvia TCP to Amazon’sHTTPS server

• Client sends over list ofcrypto protocols itsupports

• Server picks protocols touse for this session

• Server sends over itscertificate

• (all of this is in the clear)

SYN

SYN ACK

ACK

Browser Amazon

Hello. I support(TLS+RSA+AES128+SHA1) or

(SSL+RSA+3DES+MD5) or …

Let’s use

TLS+RSA+AES128+SHA1

Here’s my cert

~1 KB of dat

a

Page 47: Vern Paxson - University of California, Berkeley

47

93

HTTPS Connection (SSL/TLS), con’t• Browser constructs a random

session key K

• Browser encrypts K usingAmazon’s public key

• Browser sends E(K, {n, e}) toserver

• Browser displays

• All subsequent communicationencrypted w/ symmetric cipher(e.g., AES128) using key K– E.g., client can authenticate using

a password

• (missing? Checking for revocation)

Browser Amazon

Here’s my cert

~1 KB of dat

a

E(K, {n,e})K

K

E(password …, K)

E(response …, K)

Agreed

94

Summary of Attacks• Attacks that compromise a system can occur at

different semantic levels–E.g., Buffer overflow vs. cross-site-scripting vs. social

engineering–Automated attacks lead to worms and bots

• Denial-of-service via flooding likewise can occur atdifferent semantic levels–Network layer vs. transport layer vs. application layer–Very hard to address if attacker has a lot of zombies

Page 48: Vern Paxson - University of California, Berkeley

48

95

Host Compromise• Tricking a host into executing on your behalf

• Can consider what is attacked (server or client)and the semantic level at which it is attacked

• Violation of program semantics:–E.g., buffer overflow

• Exploiting logic errors–E.g., code injection attacks–No violation of program semantics

• Social engineering–E.g., phishing

96

Example: Buffer Overflowvoid get_cookie(char *packet) {

. . . (200 bytes of local vars) . . .

munch(packet);

. . .

}

void munch(char *packet) {

int n;

char cookie[512];

. . .

code here computes offset of cookie inpacket, stores it in n

strcpy(cookie, &packet[n]);

. . .

}

return address backto get_cookie()

n

Stack

X

X - 4

X - 8

X - 520

X - 524return address back

to munch()

get_cookie()’s stack frame

X + 200 cookievaluereadfrom

packet

Page 49: Vern Paxson - University of California, Berkeley

49

97

Example: Buffer Overflowvoid get_cookie(char *packet) {

. . . (200 bytes of local vars) . . .

munch(packet);

. . .

}

void munch(char *packet) {

int n;

char cookie[512];

. . .

code here computes offset of cookie inpacket, stores it in n

strcpy(cookie, &packet[n]);

. . .

}

X

Stack

X

X - 4

ExecutableCode

X + 200

Now branches to code read in fromthe network

From here on, machine fallsunder the attacker’s control

98

Semantic Level of Compromise, con’t• Logic errors

• E.g., suppose your Web server passes anyargument named “rev” in a URL request to abackend script called munch via the equivalent of

sh munch $revand returns its output

• Now suppose you receive the following request:

GET /bin/TWikiUsers?rev=2%20|more%20/etc/passwd

It decodes to:

$rev = “2 |more /etc/passwd”

Page 50: Vern Paxson - University of California, Berkeley

50

99

Logic Errors, con’t• Your script is invoked as

sh munch 2 |more /etc/passwdwhich returns as output the password file.

• “Cross-site scripting attack”• Similar “SQL injection” attacks on backend databases

• Note: no violation of programming semantics!

⇒ Very hard to detect. Need to understandintended semantics.

100

Automated Compromise: Worms & Bots• When attacker compromises a host, they can instruct it to

do whatever they want

• Instructing it to find more vulnerable hosts creates aworm: a program that self-replicates across a network• Can spread via picking random 32-bit #s (IP addresses)• … but this isn’t fundamental

• As the worm repeatedly replicates, it grows exponentiallyfast because each copy of the worm works in parallel tofind more victims

• Attacker can instead install a bot to facilitate future accessto the system

Page 51: Vern Paxson - University of California, Berkeley

51

101

Summary of Denial-of-Service• Can occur at different semantic levels–Network layer vs. transport layer vs. application layer–Very hard to address if attacker has a lot of zombies

• Principle: attacker finds bottleneck element …–… and sends it more work than it can cope with

• E.g.:–Router’s packets-per-second processing capability– Link’s bits-per-second transmission capability–End host’s memory available for new connections …–… or cycles available to validate connections (cookies)–Server’s cycles for processing requests

102

Distributed Denial-of-Service (DDoS)

Master

Slave 1

Slave 3

Slave 4

Slave 2

Victim

Control traffic directsslaves at victim

src = randomdst = victim

Slaves send streams of traffic(perhaps spoofed) to victim

Page 52: Vern Paxson - University of California, Berkeley

52

103

Defending Against Network Flooding• How do we defend against such floods?

• Answer: basically, we don’t! Big problem today!

• Techniques exist to trace spoofed traffic back toorigins, but this isn’t useful in face of a large attack

• Techniques exist to filter traffic, but a well-designedflooding stream defies stateless filtering

• Best solutions to date:– Overprovision - have enough raw capacity that it’s hard to

flood your linkso Largest confirmed botnet to date: 1.5 million hostso Floods seen to date: 40+ Gbps

– Distribute your services - force attacker to flood many pointso E.g., the root name servers