181
IP MPLS Virtual Private Networks Presented by: Chris Chase

Usda Training Mpls

Embed Size (px)

Citation preview

Page 1: Usda Training Mpls

IP MPLS VirtualPrivate Networks

Presented by:Chris Chase

Page 2: Usda Training Mpls

2

MPLS Concept

Outgrowth of IP Switching (e.g., MPOA, Epsilon’s IP Switching, Cisco’s tag switching)

Key concept:

Separate routing (the selection of paths through the network) from forwarding/switching plus an abstraction of aggregation

Page 3: Usda Training Mpls

3

Non-MPLS Routing

• Hierarchical topology - edge and backbone routers

• Forward packet - lookup route at each hop

BFR - Big Fast RouterPER - Provider Edge RouterCR - Customer Router

CR

CR

CR

BFR

BFR

BFRPER

PER

PER

PER

PER

PER

PER

PER

Provider RouterNetwork

CR

Packet forwarded by hop-by-hop route lookup

Routes chosen using OSPF interior routing protocols

A

A.1

Page 4: Usda Training Mpls

4

Routing with MPLS• Interior routes are assigned Labels that identify a

connection/path Called a Label Switched Path (LSP) instead

of a PVC

LSR - Label Switch Router PER - Provider Edge RouterCR - Customer Router

CR

CR

CR

LSR

LSR

LSRPER

PER

PER

PER

PER

PER

PER

PER

CR

Routes chosen using OSPF interior routing protocols

LSP: Route lookup once and associated label assigned to packet

A

Page 5: Usda Training Mpls

5

Routing with MPLS• Interior routes are assigned Labels that identify a

connection/path Called a Label Switched Path (LSP) instead of a PVC

LSR - Label Switch Router PER - Provider Edge RouterCR - Customer Router

CR

CR

CR

LSR

LSR

LSRPER

PER

PER

PER

PER

PER

PER

PER

CR

Routes chosen using OSPF interior routing protocols

LSP: Route lookup once and associated label assigned to packet

A

A.1

A.1

Page 6: Usda Training Mpls

6

Routing with MPLS

• Traffic Engineering – can use alternative to the IGP shortest path

LSR - Label Switch Router PER - Provider Edge RouterCR - Customer Router

CR

CR

CR

LSR

LSR

LSRPER

PER

PER

PER

PER

PER

PER

PER

CR

Routes chosen using OSPF interior routing protocols

LSP: Route lookup once and associated label assigned to packet

Page 7: Usda Training Mpls

7

Routing with MPLS• Interior routes are assigned Labels that identify a

connection/path. Called a Label Switched Path (LSP) instead of a PVC.

LSR - Label Switch Router PER - Provider Edge RouterCR - Customer Router

CR

CR

CR

LSR

LSR

LSR

PER

PER

PER

PER

PER

PER

PER

CR

Routes chosen using OSPF interior routing protocols

LSP: Route lookup once and associated label assigned to packet

Page 8: Usda Training Mpls

8

MPLS: Decouples routingand forwarding

IP packet header only examined at ingress PER Hierarchy of routing/Label Stacking

– Interior knows nothing about external addresses or routes

– Only needs to know how to get between edges (PERs)

Enables very efficient explicit routing– Explicit routing in IPv4/v6 is expensive – Use explicit route for LSP instead of OSPF route

VPN & Scale

Traffic Engineering

Page 9: Usda Training Mpls

9

History

IP cut through switching– Improve performance and provide QoS to IP

• Multiprotocol over ATM (MPOA)• Epsilon’s IP Switching• Ascend’s IP Navigator• Cisco’s tag switching• IBM’s Aris

1997 – Needed alternative to SVC service for FR and ATM

• Provider based IP VPN concept conceived– MPLS work initiated at IETF

• A technology solution looking for a problem

Page 10: Usda Training Mpls

10

Killer Applications of MPLS

– IP VPNs• Provider Based, Simple, scalable, “layer 2” security• Overlapping, private addressing plans

– Layer 2 VPNs – FR, Ethernet, Circuit services– Traffic Engineering

• Deliver service guarantees similar to FR/ATM• Fast reroute

– Hierarchical Networks• Carrier’s carrier

– Universal control plane• Label = Optical (Lambda), Sonet/TDM, Spatial

(ports/conduits)• GMPLS and “Optical UNI”* Not really an advantage: Performance

Page 11: Usda Training Mpls

11

The Basics

Page 12: Usda Training Mpls

12

Generic MPLS Encapsulation

• MPLS does not define a link layer protocol – no framing provided• A “shim” header between link and network protocol

• New LLC and PID defined for Ethernet, PPP, ATM, FR to carry label• Can stack tags/labels. Stack bit indicates end of stack.• There is no protocol ID field to indicate type of encapsulated packet.

• Protocol of encapsulated packet is implied by the label• Indicated when the label is signaled (next slide)

Layer 2 Header | PID MPLS Label 1 MPLS Label 2 MPLS Label n Layer 3 Packet …

Label (20bits) | CoS (3 bits) | Stack (1 bit) | TTL (8 bits)

Page 13: Usda Training Mpls

13

Forwarding Equivalence Class(FEC) and Hierarchy

FEC = All packets with the same forwarding requirements– i.e., same path, same QoS (policing, scheduling, discard)

• COS bits can modify packet handling– Many different FEC types:

• IPv4, IPv6, FR, ATM, Ethernet VLANFEC label – all packets in this class get the same label

Can stack labels (end of stack bit) Hierarchy of equivalence classes

• Hierarchy of routing• VPNs – L3 and L2• Traffic engineering

Page 14: Usda Training Mpls

14

Multi-protocol

Forwarding/Switching is content agnostic– Can carry IP, FR, ATM, Ethernet, anything– Label represents base common treatment shared by

all packets with that label (FEC) Control Plane (Routing and signaling) is content

agnostic– IP control plane

• Routing – OSPF, IS-IS, BGP, PIM• Signaling – LDP, CR-LDP, RSVP-TE, BGP+ext, PIM+ext• CoS – Diff-serv

Many Layer 2 technologies, e.g, FR and ATM, have been fitted to MPLS

MPLS is not ATM– But ATM switches can be MPLS switches

Page 15: Usda Training Mpls

15

Standards

IETF– First RFCs

• 2702 (TE reqs), 3031 (arch), 3032 (stack encoding), 3034 (FR), 3035 (ATM VC), 3036 (LDP), 2547 (VPN), 3107 (BGP), etc.

– Drafts: GMPLS, BGP, Multicast, Fast Recovery, L2 VPNs, …– http://www.ietf.org/html.charters/mpls-charter.html– L3 VPN:

• http://www.ietf.org/html.charters/ppvpn-charter.html• draft-ietf-ppvpn-rfc2547bis-04.txt

– L2 VPN• http://www.ietf.org/html.charters/pwe3-charter.html

Additional ITU work, MPLS and ATM Forum

Page 16: Usda Training Mpls

16

Layer 3 MPLS VPNThe Next Generation IP WAN

Based on 2547 draft Another tool in the WAN toolbox for the network

architect

Page 17: Usda Training Mpls

17

Traditional Point-to-Point WANs

Rely on a hub architecture

CC CCC

H

Page 18: Usda Training Mpls

18

Dual Star - Redundancy

CC CC

H H

Page 19: Usda Training Mpls

19

Aggregation/Distribution LayerScaling through hierarchy

CC CC

H HH H

AA

Page 20: Usda Training Mpls

20

Domains of Enterprise WANs

Private lines FR/ATM VC

– Private line replacement– Hub-and-spoke– Very reliable, trusted, common

Site-to-site Internet VPN (i.e., IPSEC tunnels)– Point-to-point topologies (typically hub-and-spoke)– Extranets (also SSL)– Remote access– Footprint– Outsourced versus do-it-yourself

L3 MPLS VPN• Layer 3 IP routing “outsourced to carrier”Following slides

They complement each other

Page 21: Usda Training Mpls

L3 MPLS VPN – 2547 style

Provider-based VPN• Vis-à-vis CPE-based tunneling VPN, e.g., L2TP with IPSEC• Others: Virtual router VPNs, Layer 2 MPLS VPNs

IP MPLS VPN defined as a set of interfaces– Interface: PPP, FR/ATM VC, Ethernet Vlan, LT2P– VPN membership assigned when provisioned

Customer interface: standard IP, no MPLS VPN appears as an Autonomous System (AS)

– Customer router peers with this AS - a transit only AS in between customer’s sites

– “Private” - separated from other VPNs

Like having your own little “Internet”

Page 22: Usda Training Mpls

MPLS VPN Layer 3 IP Architecture

CER

PER

LSR

LSR

LSR

CER

PERIBGP

BGP orother protocol or

Static Routes

OSPF

Access IP serial linkEncapsulated in PPP or FR/ATM PVC or Ethernet

PER = Provider Edge RouterCER = Customer Edge Router

LSR = Label Switch Router

MPLS Network

Page 23: Usda Training Mpls

23

MPLS VPN Value Adds vs. Other VPNs

Any to Any IP Connectivity – Optimal Routing without SVCs

• Improved delay by avoiding tandem routing through a hub• Offload hub router

Any IP address scheme - Intranets and extranets Circuit Consolidation – eliminate aggregation layer Diversity via IP routing - simplified DRO Ease of network expansion Access technology agnostic

– FR, ATM, PPP over DS0-OC48; Ethernet IP Class of Service Provider-based IP VPN

• No CPE-based tunneling and encryption equipment/software nor PKI management.

Page 24: Usda Training Mpls

24

Combined Services

CER

CER

CER

Service GW

Internet

Remote AccessNetwork

IP MPLS VPN

FW

InternationalMPLS VPN

GenericGW

dial

Cable,DSL,dial

IPSEC tunnel

L2TP (optionally IPSEC)

MPLS VPN and Edge VPN (IP-VPN)

CER

Access:FR/ATM/DSLEthernet/P-L PPP

Page 25: Usda Training Mpls

25

Load Balancing:From VPN toward customer

PE

PE

PE

PE

CE6

CE3

CE4

CE1

CE5

CE2

MPLS VPN

Pt-to-pt link

A

A

Link1,BGP

Link2,BGP

All flows from remote CE’s (3-6) matching route A will load balance across Links 1 and 2 (even from CE4). Note: the load balancing decision is made (using MPLS) at the ingress to the network.

Cust site

Network A

Page 26: Usda Training Mpls

26

Outbound Route Filtering (ORF)

Allowing a CER to communicate route filter to PER– Dynamically transferred through BGP

CER PER

AC MPLS

BGP

BGP Route Refresh message carries any inbound

prefix-based filter

Any inbound prefix based filter is applied as out bound filter to PER

PER=Provider Edge Router

CER=Customer Edge Router

AC=Access Connection

BGP=Border Gateway Protocol

MPLS=Multi Protocol Label Switching

InboundPermit 0/0Deny all

Out boundPermit 0/0Deny all

Page 27: Usda Training Mpls

27

Class of Service Concepts

The ability for user to differentiate traffic– A provider could differentiate in many ways:

• Isolation - keep traffic in different classes from unfairly impacting each other

• Performance• Bandwidth• Delay• Discard

• Service• Availability• Support

– Network engineer view as a toolset to manage traffic• As opposed to the marketing/management view around

perception

Page 28: Usda Training Mpls

28

IP Header Class ofService Marking

IHL Type of Service/Diffserv codepoint

Destination Address

Source Address

Header ChecksumProtocolTTL

Fragment OffsetFlagsIdentification

Total lengthVersion

0 8 16 3224

Prec 3b | D | T | R | x | x

DSCP 6b | x | x

old

new

Page 29: Usda Training Mpls

29

CoS via IP Packet Marking

CE classifies traffic per packet via marking– IP Diffserv Codepoints (Precedence bits)

Marking interpreted – Separate queuing per class– Per class resource scheduling, e.g.,

• Priority queuing• Bandwidth scheduling (WFQ)

– Drop differentiation• Packets marked “discard eligible” above class

bandwidth• Transmitted when not congested

Page 30: Usda Training Mpls

30

VPN CoS

PERPER

CERCER

LSR

LSRLSR

Port

Trunk

CER

Per Class PolicingClass Servicing

PQBursty

MPLS LSP

Policer

Classification (application/policy level),Session control

Session Control (H.323, SIP)

Gatekeeper

PQBursty

Interface

Page 31: Usda Training Mpls

31

CoS MarkingTransparency Using MPLS

Users don’t want packet markings to change– And some older system’s TCP breaks

Provider can indicate reclassification by marking label instead of remarking IP packet

Policer

Label | CoS=AF12 | ...IP, DSCP=AF11|…

Label | CoS=AF11 | ...IP, DSCP=AF11|…IP, DSCP=AF11|…

Page 32: Usda Training Mpls

32

RFC2547 ConstrainedRoute Distribution

Route Targets (RT) are used to constrain connectivity– Keeps VPNs separate– Creates topology within VPN

Concept of Hub and Spoke route policies used as building blocks– Hub and Spokes have a certain RT import and export list– Hub sites can see other hub and spoke routes– Spokes see only hub routes

Combine/compose to create VPN topology– Union of multiple hubs and spokes create arbitrary

topologies These slides don’t show the explicit RT import/export

lists– Hub types are shown as Hi and spoke types as Si

Page 33: Usda Training Mpls

Any-to-Any Topology

H0

H0

H0

VPN

Page 34: Usda Training Mpls

34

Hub and Spoke Topology

Hi = “hub” interfaces, Si = “spoke” interfaces. The term “hub” and “spoke” just refers to how routes are constrained.Si can only exchange routes with Hi. Hi exchanges routes with all Hi and with Si. Specifically in terms of route targets (RTs), Hi exports RT_Hi and imports {RT_Hi, RT_Si}, while Si exports RT_Si and imports RT_Hi.

S1

S2

S2

H1

H1

H2

S1

VPN

Page 35: Usda Training Mpls

35

Hub and Spoke Topology

Here we combine connectivity policies. Using H0, all hubs talk to each other.By taking such unions of policies completely arbitrary bi-directional connectivity graphs can be realized (in fact completely arbitrary uni-directional graphs could be achieved which might be applicable for something like a firewall).

S1

S2

S2

H1

H1

H2

S1

VPN

Page 36: Usda Training Mpls

Note – RT’s only Constrain Routes

This does not use access lists that filter packets!– BGP MPLS VPN technology constrains route distribution (i.e.,

connectivity), they do not require per packet manipulation (which does not scale nor manage well).

– Packets always follow routes (in the reverse direction of route flow)

But constraint of a specific route is not sufficient to constrain the reachability of a destination matching the route!– Overlapping routes (e.g., aggregates or defaults) can cause

problems.

36

Page 37: Usda Training Mpls

Tradeoffs of L3 MPLS VPN

37

For all the IP values-adds the drawbacks are:– Have to route with provider!

• Troubleshooting is more difficult than L2 WANs• L2 has clear demarc connection up or down.

• Convergence is slower• Route changes have to propagate through provider routers

• Certain routing problems are more difficult to solve• Some problems are more easily solved with direct topology

manipulation• E.g., hub connectivity based on source

• Peering model rather than flat model – a different paradigm

– Only IP: no IPX, SNA, DECNET, Appletalk• Have to tunnel

– Technology not as mature

Page 38: Usda Training Mpls

Customer Support

38

Layer 2 services are easier to support– A customer doesn’t call if his FR-connected routers

aren’t seeing the same set of routes Layer 3 VPN

– Customer calls – “I can’t see my route. Help me troubleshoot my

network.”

– Customer visible provider-based tools can help sectionalize and show customer whether there is a problem with provider network – without getting a technician on the line.

Page 39: Usda Training Mpls

39

Comments about CER-PER Protocol 2547 VPNs are a peering architecture!

– Static• Stable, but fill out order form• Only can detect local link failure

– BGP• Many policies; geared towards multihoming; peering• Load balancing and ORF

– OSPF• Changes intra-area to inter-area backdoor always preferred

– EIGRP• Proprietary• Without ability to summarize need ability to avoid going active• CER acts as stub can loop (count to infinity) without new

feature

Page 40: Usda Training Mpls

How MPLS and 2547 VPNs Work

40

Quick BGP intro LDP operation 2547 VPNs

– Follow the VPN route and label– Follow the packet

Page 41: Usda Training Mpls

41

BGP Basics

BGP - fairly simple protocol– Uses TCP for reliable delivery– Distributes appearance/change and withdraw/disappearance of

routes in route table• Routing Information Base (RIB) = BGP route table

– eBGP = between AS– iBGP = within AS

AS_PATH – list of where route has been Next hop Other attributes are about policy, i.e., which route is

“best”

R1 R2RIB------

TCP Connection

A| zW| s,rRIB------

Page 42: Usda Training Mpls

Route Reflector – scaling IBGP among AS edges

42

RR RRPE

PE

PECE

CE

CE

CE

CEIBGP client

IBGPEBGP

PE

PEPE

Page 43: Usda Training Mpls

Route Reflector

43

Updates in RIB from inbound peer type are sent to outbound peer type in table below

Client is a special kind of IBGP neighbor– Any BGP router with a neighbor designated as a client is

a “route reflector”

Outbound

Inbound

EBGP IBGP Client

EBGP X X X

IBGP X X

Client X X X

Page 44: Usda Training Mpls

A Label Switched Path – LSP

44

The downstream node assigns label

Often called an MPLS tunnel: payload headers are not Inspected inside of an LSP.

417 data 666 data 233 data datadata

POP! PUSH!SWAP! SWAP!

A label switched path“tail end” “head end”

Page 45: Usda Training Mpls

LSP’s are Unidirectional

45

Destination FEC based – Can’t distinguish upstream sender– LSP’s merge– Results in multipoint-to-point LSPs

417 IP1 823 IP1 IP1 IP1

IP2 IP2

233 IP1

417 IP2912 IP2565 IP2

LSP merge

Page 46: Usda Training Mpls

Penultimate Hop Popping

46

417 IP 666 IP 233 IP IP IP

POP +IP Lookup PUSHSWAP SWAP

666 IP 233 IP IP IP

IP Lookup PUSHPOP SWAP

IP

Look up the label pop + look up header underneath– Why even bother sending a label?

Page 47: Usda Training Mpls

47

Follow the Routeand Follow the Packet

CR1 at Site 1 has a packet addressed to a hostin network Z at Site 2. How does it get there?

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

MPLS VPN Cloud

Network Z

Site 1Site 2

Page 48: Usda Training Mpls

48

Label Distribution in Interior

For links configured for label switching– Node sends out periodic UDP Hello “all router subnet broadcast” on well-

know LDP UDP port• Contains IP address for desired LDP session and desired label space

– Creates a TCP-based LDP session to any node answering Hello• Session is initiated by node with lower advertised IP address

Each node advertises routes (FEC) and labels to all LDP peers– Node picks a local label for each route in its routing table and advertises this

to everyone• Called downstream, unsolicited and independent control mode

• The upstream node only installs into its label forwarding table where the downstream node is the next hop for the route (FEC)

• Note in this mode if there is a reroute that changes the next hop the label is just rebound locally – no signaling upstream

Much faster reroute compared to ATM, but local label assignment does not guarantee LSP is in place

• ATM-LSRs work differently

Page 49: Usda Training Mpls

49

LSP Setup for OSPF Route to PER2

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

IPFR Cloud

LSP for the OSPF route to reach PER2

L2 pop

Li - labels requested via LDP from next hop neighbor for each routing table entry

L1 L2

PER2 L1

L4 L2

Page 50: Usda Training Mpls

50

How MPLS VPNs Work

1) Follow the routes– Each VPN on a PER has a private routing table

• Called a Virtual Routing Forwarding (vrf) table• vrf is assigned attributes that are unique to the VPN

• Route Targets (RT) - attached to VPN routes.• only vrfs with common RTs share routes with each other

• Route Distinguishers (RD) - appended to routes to ensure uniqueness even if VPNs have overlapping address spaces

• Creates a new address family called vpnv4 = RD+ipv4

• Note: RTs and RDs are not applied to packets

2) Follow the packet– A stack of two labels is used to forward the packet on the

interior LSP and then external interface

Page 51: Usda Training Mpls

51

VPN extensions

Route Target (RT)– BGP 64 bit extended community value– First 16bit identify as RT type. Other 48 bit is

variable• Conventional format – ASN:X, i.e., 16b:32b

Route Distinguisher– 64 bit, first 16 identify RD type

• 48 bit selectable with format convention ASN:X, i.e., 16b:32b

Page 52: Usda Training Mpls

52

Distributing Customer Routes

CR1

LSR1

LSR3

LSR2

CR2

PER2

Network Z

IPFR Cloud

LSPLi - labels

PER2 learns Rt Z via BGP or is statically configured with Rt Z.

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z LNK2

LNK1PER1

LNK2

Page 53: Usda Training Mpls

53

Customer Routes Distributed via IBGP with

Label

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

IBGP msg Network Z

IPFR Cloud

RD1+Z, L4, RT1, PER2

Li - labelsLSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

LNK2

Page 54: Usda Training Mpls

54

Only vrfs with MatchingRTs Import Route

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

Network Z

IPFR Cloud

Li - labels

LNK1 data: vrf1vrf1: RT1, RD2 table: Rt Z L4, PER2 PER2 L1, LSR1

LSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

LNK2

Page 55: Usda Training Mpls

55

Purpose of BGP Label

Indicates which vrf and optionally which interface on the egress PER

Locally, the egress PER will treat labels in two possible ways:

– Non-aggregate label is associated with an external route

• Will be switched directly to an outgoing interface• IP header is not examined

– Aggregate label is associated with a locally originated or directly connected route

• Packet will be looked up in the vrf context

Page 56: Usda Training Mpls

56

CR1 learns RT Z via BGP (or statically configured)

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2Network Z

IPFR Cloud

Li - labels

LNK1 data: vrf1vrf1: RT1, RD2 table: Rt Z L4, PER2 PER2 L1, LSR1

table: Rt Z PER1

LSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

Page 57: Usda Training Mpls

57

Packet for Rt Z forwarded by CR1

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

Route Z

IPFR Cloud

Li - labels

LNK1 data: vrf1vrf1: RT1, RD1 table: Rt Z L4, PER2 PER2 L1, LSR1

table: Rt Z PER1,LNK1

Z| packet

LSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

Page 58: Usda Training Mpls

58

Top label is label-switched through interior

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

Route Z

IPFR Cloud

Li - labels

LNK1 data: vrf1vrf1: RT1, RD1 table: Rt Z L4, PER2 PER2 L1, LSR1 L1|L4|Z| packet

L1 L2

L2 pop

LSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

Page 59: Usda Training Mpls

59

Top label popped at end of LSP

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

Route Z

IPFR Cloud

Li - labels

L4|Z| packet

LSP

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

Page 60: Usda Training Mpls

60

Inner label determinesegress interface and then is popped.

CR1

PER1

LSR1

LSR3

LSR2

CR2

PER2

Route Z

IPFR Cloud

LSPLi - labels

Z|packet

LNK2 data: vrf1vrf1: RT1, RD1 table: Rt Z L4,CR2,LNK2

Page 61: Usda Training Mpls

61

MPLS in Core Not Needed

MPLS for IGP domain serves as a tunneling method among PERs

Could use other tunneling methods Advantages to MPLS:

– Full mesh of LSP tunnels automatically created– Can use MPLS TE

Internet draft to use IP or GRE tunneling– Automatically (treat vpnv4 BGP next hop as a

recursive encapsulation)

Page 62: Usda Training Mpls

62

MPLS VPN Security

There is a private routing table for each VPN (vrf) VPN membership Identity associated with each access

connection – VPN membership is not determined by IP header, only by

interface (e.g., DLCI, VPI/VCI, PPP, VLAN tag).– Label and RT for VPN attached to routes advertised for

interface.– Route and its matching label are only imported by routing

tables that match the VPN RT.– Impossible for a packet on a PVC in one vrf to spoof its

way or jump into another vrf

Page 63: Usda Training Mpls

63

MPLS VPN Security

Requires correct provisioning of connections and RT’s

Same as FR/ATM security• Given correct provisioning it is impossible for packet to

“jump” from one PVC to another PVC– If you don’t need encryption on FR/ATM/P-L then don’t

need it here• If you would encrypt over FR/ATM/P-L then you would also

encrypt here

Page 64: Usda Training Mpls

64

MPLS VPN Scale for the Carrier

Can the MPLS VPN technology scale to meet the size of the market?

Can it be managed at scale? Does this have anything to do with the Internet?

Page 65: Usda Training Mpls

65

All Those Routes

Can be a lot of routes in the service– The aggregate over all VPNs– There is no summarization of routes!

BUT …– No VPN state in backbone LSRs, only in PERs.– PER only holds routes for VPNs touching it.– Route Reflectors (RR) only handle VPNs they touch

Page 66: Usda Training Mpls

66

Large CorporateIntranet vs Internet

Intranet = Private Corporate Network– aka VPN– P/L or FR/ATM VCs

• Recent survey - total number of U.S. enterprises using FR at ~35,000

– Tens to thousands of sites• 95% T1

Internet – corporate access from Intranet– 10 Corporate gateways– T1

Page 67: Usda Training Mpls

67

Modeling Large Enterprise VPNs

Based on customer observations and business nature of large corps

Large # sites, N, for N=100 – 10,000 #routes = K*N + C

– K = 2, 3, … 10• one route always for CER-PER link

Largest FR/ATM customer N ~20,000 sites 95% < T1 BW utilization < 25%

Page 68: Usda Training Mpls

68

Constraints

User Plane – forwarding (pps)– vrf/MPLS cost is small

Control Plane• Interior (IGP) component is independent of MPLS/VPN

– Space (memory)• vrf and BGP session overhead small compared to route space

– Signaling/Routing (CPU)• most important in transient situations (e.g., link failures)

Public resources– e.g, registered IPv4 addresses, RT’s among partners

OSS

Page 69: Usda Training Mpls

69

Divide and Conquer

How to keep dimensions within constraints?– (i) State reduction– (ii) Partition– (iii) Distribute

• Forwarding• Control plane

Page 70: Usda Training Mpls

70

State Reduction

Route Summarization– In “middle” of customer network– not really an option, maybe greenfield intranets

Limit the routes in a VPN– Keep # allowed routes commensurate with number

of interfaces purchased• vrf route limit

– Use carrier’s carrier• “Providers”, non-enterprise, that have lots of routes, few

ports

Page 71: Usda Training Mpls

71

Partitioning

Limit VPNs touching a PER– To avoid poor PER utilization – aggregate and fan-out

• Groom up interfaces rather than push PER toward CE• Fan out across PERs in a POP

Limit VPNs touching RR– (i) Via RTs and ORF – requires RT assignment strategy– (ii) Via communities– (iii) Via PER-RR mutually exclusive VPN subsets

PER’s and RR’s only need to handle the largest enterprise customer VPN

Page 72: Usda Training Mpls

72

Distributing Forwarding and Control

Distributed forwarding– All modern routers have distributed user (forwarding)

plane – BUT most have centralized control

• Low CER speeds lots CERs/PER more likely to be constrained by control plane limits than packets per second

Distributed control– CER-PER routing and vrf tables limited to necessary

interfaces– Central controller has no vrf tables, vpnv4 route

tables, or CER peering protocols

Page 73: Usda Training Mpls

73

It is based on Multicast Domain (draft-rosen-vpn-mcast-05.txt) P and PE routers multicast enabled Provider internal multicast routing tables Globally PEs configured to run PIM (global instance) with adjacent P routers PEs maintain PIM adjacencies with CE devices Normal PIM configuration in customer network

–PIM modes, RPs , multicast addressing

PIM Adjacency

CE

CE

CE

Backbone Multicast

PIM Adjacency PIM Adjacency

VRF

mVRF PE

PE

PE P Backbone

IP Multicast VPN Solution

Page 74: Usda Training Mpls

74

Multicast Tunnels and the default MDTs

Per mvrf default multicast distribution tree (default MDT) using traditional PIM within backbone

MDT used to distribute end customer multicast packets and PIM control messages

Access to the MDT is via a multicast GRE tunnel interface on PE Each PE in VPN is a leaf _and_ root on the MDT For efficiency (but more state) can launch per session (S,G) MDTs for a

VPN

Provider Network

Per VRF MDTPE

CE

CE

PE

PE

CE

Page 75: Usda Training Mpls

75

Using an MDT

Forwarding onto the MDT is done in encapsulated packets off PE–GRE or IP-in-IP

C-Packets - customer control and data packets P-Packets – provider control and data packets

–Destination Address = MDT group address for VRF–Source Address = IP address of PE M-BGP peering address

C-packet becomes a P-packet when encapsulated MPLS is NOT used!

C-packetSRC=PC1

DST=225.1.1.1

P-packetSRC=Lo1

DST=234.10.10.1

MDT GROUP ADDRESS234.10.10.1

Lo1

C-packetSRC=PC1

DST=225.1.1.1

PE PE

SRCReceiver

CE CE

Page 76: Usda Training Mpls

76

IP Traffic Engineering and MPLS

Improving utilization of backbone resources

Page 77: Usda Training Mpls

77

The Multi-Commodity Flow Problem

ni

nj

cij

Demands d(i,j) from node i to jConstraints - link capacity b(i,j) Costs, e.g., link costs C(i,j) Path (route) p(k) variables for each demand

The traffic engineering problem:1) Find a feasible solution2) Find a min cost solution3) Find feasible and min cost solution

with single node or link deletion

Page 78: Usda Training Mpls

78

Explicit Routing

Solutions to the arbitrary TE problem require specifying the explicit route (path) for each demand

Could calculate explicit routes satisfying constraints offline

– Then specify explicit routes in network without constraints

Page 79: Usda Training Mpls

79

Constraint-based Shortest Path First (CSPF)

Can let the network enforce constraints CSPF distributed algorithm

– Given full knowledge of network resource allocation– Route a demand by

1) Pruning network to only feasible paths2) Pick shortest path

• Compromise to solving the full TE problem

Page 80: Usda Training Mpls

80

IP TE Metric Manipulation

– i.e., pick OSPF weights to create feasible solution• Limited in problems that it can solve

Simple Topology and capacity augmentation– Tends to over-engineer or restrict topology

Source Route– IPv4 option that allows explicit route– Very costly, not practical

No efficient explicit routing nor knowledge of network resource allocation

Page 81: Usda Training Mpls

81

Making it Fit with Plain IP Routing

1

2D

A

B

C3

4

Link size = 1, d(1,2) = 0.75, d(1,3) = 0.5, d(1,4) = 0.5Can’t pick OSPF weights that work

Page 82: Usda Training Mpls

82

ATM TE ATM routing (PNNI) has knowledge of resource

usage– Bandwidth booked per trunk

Performs CSPF to find feasible path for – New demand– Rerouted demand– Feasibility referred to as Call Admission Control

Page 83: Usda Training Mpls

83

Making it Fit with ATM switching

Link size = 1, d(1,2) = 0.75, d(1,3) = 0.5, d(1,4) = 0.5

1

2D

A

B

C3

4

ATM switch

Page 84: Usda Training Mpls

84

IP over ATM

The way to build ISP backbones not too long ago Allowed efficiently utilizing a limited number of

costly facilities shared among routers Typically a full mesh of ATM PVCs is created among

the backbone routers– PVCs sized to router endpoint demand

But …– Led to N^2 IP peering– IP router investment outstripped speed of ATM

Page 85: Usda Training Mpls

85

MPLS TE for IP

Provides efficient explicit routing for IP Can communicate resource constraints But not an overlay routing design

– Routers not in a full peering mesh Uses IP-based control plane protocols rather than a

different protocol– RSVP-TE uses extensions of RSVP to carry labels and

additional constraints

Page 86: Usda Training Mpls

86

How RSVP-TE Works

PATH downstream contains explicit hops and bandwidth

RESV upstream contains labels

RESV with labels

1

2D

A

B

C3

PATH <A, B, C, 3> 0.5Mbps

pop5118

pop

3.1

3.1|9|

18 3.1|9|51 3.1| 9

LDP 3 L9

3.1

Page 87: Usda Training Mpls

87

Online CSPF with OSPF-TE

Can use RSVP-TE without resource reservation– Calculate constrained paths offline

For online CSPF need:– Knowledge of resource assignment in network

• Add resources to OSPF link states• i.e., bandwidth available per class (diff-serv)

• Flood changes in resource allocation• Unlike normal OSPF which just floods when link up/down

changes– Now use RSVP-TE with non-zero reservations per class

(diff-serv)– Similar to ATM PNNI

Page 88: Usda Training Mpls

88

MPLS Fast Reroute

Using MPLS TE to improve availability– RSVP-TE creates backup tunnels– On failure of protected LSP, packets are shoved down

backup LSP tunnel– Switchover is faster than waiting for CSPF to calculate

and signal a new LSP For local repair (link or node) can recover ~100ms or

better– Backup LSP is already in place, so as soon as the failure

is detected locally the headend just needs to reprogram the label FIB

Page 89: Usda Training Mpls

89

Link Protection

Create backup LSP around link to Next Hop With or without reservation

– Can also backup normal LDP LSP

1

2D

A

B

C3

1

2D

A

B

C3

Protected LSP

Backup tunnel.Pushes label 51 onto tunnel

1851 45

pop

Page 90: Usda Training Mpls

90

Node Protection

Create backup tunnel LSP for two hops away (next-next hop)

Backs up RSVP-TE tunnel– Learns labels from RESV recorded route of protected

tunnel

1

2D

A

B

C3

Protected LSP

Backup tunnel.Pushes label 45 onto tunnel

1851 45

pop

Page 91: Usda Training Mpls

91

Path Protection

Create an end-to-end diverse backup tunnel Slower than local protection – have to wait for

headend to detect failure

1

2D

A

B

C3

Protected LSP1851 45

pop

Backup LSP

Page 92: Usda Training Mpls

92

What are Layer 2 VPNs?

Defined at the IETF PPVPN and PWE3 groupshttp://www.ietf.org/internet-drafts/draft-ietf-ppvpn-l2vpn-requirements-00.txthttp://www.ietf.org/html.charters/pwe3-charter.html

Point-to-point– Virtual Private Wire Service (VPWS)– Offers FR, ATM and Ethernet “pvc”-like services

• Nothing new here – have been available for many years Multi-point Ethernet Bridging

– Virtual Private LAN Service (VPLS)• Similar to the Transparent LAN Services

• Around for a while using standard Ethernet switching• But VPLS is more scalable over the WAN

Page 93: Usda Training Mpls

93

So what?

IP or MPLS as the multi-service carrier core– Was ATM, but ATM didn’t keep up with IP investments– On one core network carrier can put

• Internet, Voice (trunking and service), FR, Ethernet, ATM, L3 IP VPN, IPSEC VPN

• Finally, network convergence for the carrier?? New market for struggling carriers

– Some newer providers only built fiber transport and IP backbone for Internet service and no ATM backbone

• They are eager to go after the Enterprise WAN business• L2 VPNs can be built on their existing IP infrastructure

For Customers – nothing really new, just more competitors for their WAN

Page 94: Usda Training Mpls

94

Business Communications Review,Jan 2002 chart from Vertical Systems

Page 95: Usda Training Mpls

95

Tunneling

PE-to-PE tunnel– L2TP– MPLS

Multiplexer field– One tunnel, many connections called Pseudo Wires

Control field– Optional sequence number (detect out of order

packets)– Protocol specific control bits (e.g., DE, FECN, CLP, PTI)

Page 96: Usda Training Mpls

96

Encapsulations

IP header (20B) Session ID (4B) Cookie (8B) Control word (opt)

Payload

Tunnel label (4B) VC Label (4B) Control word (opt)

Payload

Payload cont

L2TPv3 – purely connectionless with IP header– No new technology in carrier IP backbone … but– Spoofable

• Cookie provides no strong verification– No QoS other than diff-serv

MPLS– Less overhead– Can use MPLS TE

Page 97: Usda Training Mpls

97

L2 MPLS VPN: Example FR

“Directed” LDP between PE pair exchanges FEC and label for a particular pseudo wire

PVCs within tunnels

MPLS Network DLCI 100

DLCI 300

DLCI 200

PER

L1 | L2 | Cw | FR PDU

PEPER

CER

CER

CER

FR PVC from DLCI 100 to 300

Page 98: Usda Training Mpls

98

VPLS – Virtual Private LAN Service

Multipoint to multipoint service– Any-to-any

Does LAN bridging, MAC address learning While Ethernet frame based

– IT IS ACCESS TECHNOLOGY AGNOSTIC• Don’t have to use GigE• Can use Ethernet bridging over other access types

• i.e., bridged over FR/ATM/PPP for NxDS0, T1, NxT1, T3 or bridged over SONET

– Any protocol – not just IP [e.g., IPX, DECNET] No routing with carrier! … but

– More than a few dozen sites on a VPN (single LAN)?– No Spanning Tree, so just connect routers

Page 99: Usda Training Mpls

99

State of the Art

OAM work still needs a lot of work– Fault detection/isolation, performance measurement,

probing Little “Call Admission Control”

– How to map bandwidth resources and classes onto tunnels

No Multi-AS implementations Minimal legacy interworking

– Just glue connections at “dumb” interconnect to ATM

Page 100: Usda Training Mpls

100

MPLS to Prem or in the Enterprise? Can run MPLS to CER

– By running BGP CER-CER• Can create own VPNs (vpnv4) on top of providers

• Tenant service• Hierarchy of ipv4 routing

• 3rd tier ISP backbone outsourcing (carrier’s of carrier)– But don’t need MPLS for tunneling CER-CER

• Use IP tunneling with transparent interoperability with carrier

MPLS in private network– Create own VPNs (essentially an internal carrier)– For traffic engineering IP

Page 101: Usda Training Mpls

101

Completed PHASE I- MPLSPlease Continue to the Next Phase

Page 102: Usda Training Mpls

Performance Engineering in MPLS-based VPNs

Susan HiltonEnterprise Network Consultant

Page 103: Usda Training Mpls

103

Performance Engineering

Rationale CoS Foundation Technologies Service Implementation Applied Performance Engineering

Page 104: Usda Training Mpls

104

Not All Traffic is Equal

BANDWIDTH

DELAY

JITTER

InteractiveData 3-Tier ERP Bulk Transfer Interactive

Voice

LOW MEDIUM HI MEDIUM

LOW MEDIUM HIGH LOW

MEDIUM MEDIUM HIGH LOW

APPLICATIONS

SER

VIC

E M

ET

RIC

S

Page 105: Usda Training Mpls

105

Multi-Application Networks Mixing applications with ‘similar’ traffic characteristics

and similar performance requirements is simply a sizing exercise

– Statistical multiplication

Mixing applications with conflicting traffic characteristics often causes some to not meet Response Time requirements

– Even with sufficient bandwidth deployed!

Page 106: Usda Training Mpls

106

Latency ≤ 150 ms Jitter ≤ 30 ms Loss ≤ 1%One-way requirements

Traffic Profiles and basic QoS requirements Voice, Video and Data

Smooth Drop Sensitive Delay Sensitive UDP Priority

VoiceVoice

Bandwidth per call depends on codec and sampling-rate

BurstyBursty Drop Sensitive Delay Sensitive UDP Priority

Video-ConfVideo-Conf

Latency ≤ 150 ms Jitter ≤ 30 ms Loss ≤ 1%One-way requirements

Similar performance requirements as VoIP, but radically different traffic patterns

Smooth/BurstySmooth/Bursty Drop InsensitiveDrop Insensitive Delay InsensitiveDelay Insensitive TCP RetransmitsTCP Retransmits

DataData

Data Classes:Data Classes:Mission-Critical AppsMission-Critical AppsTransactional/Interactive AppsTransactional/Interactive AppsBulk Data AppsBulk Data AppsBest Effort Apps (Default)Best Effort Apps (Default)

Traffic patterns for Data vary among applications (and even among different versions of the same application)

Page 107: Usda Training Mpls

107

Data Classifications: Application ExamplesApplication Class Example Applications Application / Traffic Properties Packet / Message Sizes

Interactive Telnet, Citrix, Oracle Thin-Clients,AOL Instant Messenger,

Yahoo Instant Messenger,PlaceWare (Conference),Netmeeting Whiteboard

Highly Interactive applications with tight user feedback requirements.

Average Message Size < 100 B

Max message size < 1 KB bytes

Transactional SAP, PeopleSoft - Vantive, Oracle – Financials + Internet

Procurement + B2B + Supply Chain Mgmt + Application Server

Oracle 8i Database,Ariba Buyer,

I2, Siebel, E.piphany,Broadvision,

IBM Bus 2 Bus,Microsoft SQL,

Lotus Notes, Microsoft Outlook,BEA Systems,

Email Download (SMTP),DLSw+

Transactional applications typically use a client-server protocol model.

User initiated client based queries followed by server response. Query

response may consist of many messages between client and server.

Query response may consist of many TCP and FTP sessions running simultaneously

(e.g. HTTP based applications)

Depends on application.

Could be anywhere from

1 KB to 50 MB

Bulk Database Syncs, Network based Backups,Video Content Distribution, Large ftp file

transfers

Long file transfersAlways invokes TCP congestion

management

Average message size 64 KB or greater

Best-Effort All non-critical traffic,HTTP Web Browsing + Other

Miscellaneous traffic

Page 108: Usda Training Mpls

108

Performance Engineering Includes:

– Network Engineering– Capacity Planning– Traffic Engineering– Bandwidth management

• congestion management/avoidance to ensure the availability of high priority traffic and at the same time increase the network efficiency

What is performance engineering?– The process of engineering a network to assure that

applications attain their required performance.– Provide defined service metrics for different

applications

Page 109: Usda Training Mpls

109

Service Metrics

Bandwidth:

Delay/Latency: – Time it takes a packet to travel from origination to destination – Distance, switching, insertion, queuing

Jitter (Variability in Delay): – Latency that is unpredictable; 1st packet 10ms delay, 2nd

packet 30ms delay. (Early/late packets)

Packet Loss: – Buffer Overflows, Selective discards, Line Errors

Page 110: Usda Training Mpls

110

Technology Evolution - General Attributes

PLPL

FRFR

II

VPNSVPNS

Increase in: Connectivity Shared

Resources (<$?)

Path Variance Delay Delay

VarianceDecrease in: Per

connection engineering

FRFR

ATMATM

Most connectivityMost delayLeast cost

Least connectivityLeast delayMost cost

L2L2VPNVPN

Page 111: Usda Training Mpls

111

Technology Evolution - Future Direction

Connectivity Shared Resources

(<$?)

PLPL

FRFR

ATMATM

II

VPNSVPNS

Class-of-Service features added to improve:•Path Variance•Delay •Delay

Variance

COS

COS

COS

PLPL

II

VPNSVPNS

FRFR

ATMATM

L2L2VPNVPN

L2L2VPNVPNCOS

Page 112: Usda Training Mpls

112

QoS / CoS…What’s the Difference?

QoS – Quality of Service– Absolute Metrics, ‘Contracted’ parameters– Each flow must be engineered independently– Addresses the service requirements of different applications in order to

provide more than “best effort” service for specific applications

CoS – Class of Service– Relative treatment of contending flows– Implies that flows can be categorized or differentiated!– The implementation that provides QoS

Used interchangeably in this session

Page 113: Usda Training Mpls

113

Is Bandwidth the Answer?

Just deploy more bandwidth– Queuing Delay is f(link speed)– Bandwidth is cheap– QoS is complicated

•MAYBE for LAN, MAN•Maybe even for WAN backbone•WAN edge will remain a bottleneck for foreseeable future

Page 114: Usda Training Mpls

114

Why IP QoS is Needed

Enterprise networks are migrating toward IP transport

Best effort is not good enough– ‘Engineered’ performance is required for

enterprise applications.

Emerging applications (VOIP, Streaming) are highly sensitive to delay, jitter (delay variation), and packet loss.

Need performance/reliability of private networks with ubiquity/cost advantage of Internet.

• One approach– MPLS

Page 115: Usda Training Mpls

115

CoS for IP VPNs

Traditional techniques do not work for mesh topologies as we will see.

Egress port speed is still a bottleneck– The Service needs to participate in CoS solution.

Page 116: Usda Training Mpls

116

CoS Foundation Technologies

Advanced Queuing Queue Management Traffic Shaping Classification and Marking Fragmentation

Page 117: Usda Training Mpls

117

Advanced Queuing

Advanced Queuing is any technique that transmits packets in a different order than they were received.

– I.e. Not FIFO

These techniques only kick-in when there is congestion. (I.e. if there is no queuing, then there is no ‘advanced’ queuing.

– Advanced queuing only makes sense where there is a speed mismatch.

• I.e arrival rate is greater than departure rate.

Page 118: Usda Training Mpls

118

Priority Queuing

Prioritization allows specified traffic to preempt competing traffic.

Can be multiple levels of priority.– (4 in Cisco)

Higher priority traffic can ‘starve’ lower priority traffic.

In

Out

Page 119: Usda Training Mpls

119

Bandwidth Allocation Also called:

– Custom Queuing– Weighted Round Robin•Each traffic type gets a relative

allocation of the bandwidth.

• bits, bytes, packets…

In

105311

Out

Page 120: Usda Training Mpls

120

Bandwidth Allocation Considerations

Cycle Time– All buckets are served in each cycle. (No ‘priority’)– More buckets longer cycle time

Unused allocation shared proportionately across remaining traffic types.

Cisco de-queuing quirk– Always serve at least 1 packet, even if it is bigger than

allocation– No concept of ‘credit’ or ‘deficit’.

Page 121: Usda Training Mpls

121

Fair Queue

Not all packets are equal– Large and small packets– Part of sparse or heavy peer to peer– More or less time sensitive to application

Scheduling algorithms – basic– When a packet within a flow arrives, calculate when it would get

served as part of each flow– Process packets in this order (not necessarily at this time)

1

3

2

4

5

Page 122: Usda Training Mpls

122

Weighted and Flow Based

Simple algorithm works well to give everyone a fair share. Some problems -

– Sparse flows must wait even though they require little bandwidth (high jitter)

– High priority packets must wait Assign weights to scheduled times based on

– Precedence– Sparse or heavy flow (Cisco Flow Based WFQ)– Other

Page 123: Usda Training Mpls

123

Weighted Fair Queuing No ‘defined’ traffic classes– Packets ‘hashed’ to 1 of 64 queues– Hash f (SourceIP, Source Port, DestIP, DestPort)– Possibility of bulk and interactive with same hash

Weighted Fair’ means each queue has equal weight (1)

Each Packet is ‘scheduled’ at arrival time– Schedule Time = Queue Tail + (Weight * Length)– De-Queue based on Schedule Time

• ‘Calendar Queuing’

123

64 60

250

200

150

125

1000

300

310

120180240300

0

Hash

Page 124: Usda Training Mpls

124

Flow Based WFQ – more detail

Detects bandwidth of layer 4 flows (also know as conversations)

Classifies traffic into as many as 64 bins Allocates bandwidth equally across all flows Light flows get the bandwidth they need, heavy

flows share the remaining bandwidth On by default in Cisco low speed interfaces

Page 125: Usda Training Mpls

125

Class-Based Weighted Fair Queuing

A Hybrid– ‘Class-Based’

• Defined Classes instead of hashing– ‘Weighted Fair Queuing’

• Weighted Fair Queuing with defined ‘weight’• Schedule Time = Queue Tail + (Weight * Length)

A

600

300

400

600

1200

0Class-Mapping

B

C

4

2

10

Weight

60 bytes 60 bytes

100 bytes

150 bytes 150 bytes

Page 126: Usda Training Mpls

126

Class-Based Weighted Fair Queuing

Weight of a traffic class is implied by Bandwidth– Weight ~ 1/(BW Percent)

Behavior is opposite of WFQ– Higher BW = Higher Priority

A

244, 198, 132, 66

15,0000

Class-Mapping

B

10

1.1

Weight

60 bytes each

1500 bytes

Example:Class A – FTP BW 10% Weight 10Class B – Telnet BW 90% Weight 1.111

Page 127: Usda Training Mpls

127

CBWFQ - Cisco Unlike WFQ that applies relative priority, CBWFQ enables absolute

guarantees Assigns flows to classes

– Allocation to a class can be based on almost anything Made up of many parts

– FQ ~ Fair Queue, nobody gets it all– W ~ Weighs applied to queues– CB ~ Class Based, uses class to define queues

None of this applies unless there is a queue to manage. When no congestion, no priority is assigned.

Only available Cisco solution for high speed router ports (above E1, with possible exception of DS3 Frame Relay)

Weights are applied per class Generally uses Flow Based WFQ within a class Includes a special class – Low Latency Queue (LLQ)

Page 128: Usda Training Mpls

128

Configuring CBWFQ (Cisco) Policy-map defines the classes Class-map assigns packets to a class Service-policy invokes the policy on an interface

class-map class1match access-group 101

class-map class2match input-interface s0

!policy-map policy1

class class1bandwidth 50queue-limit 100

class class2bandwidth 20queue-limit 35

class class-defaultfair-queue

interface atm0.1 point-to-pointip address 10.10.10.1 255.255.255.252pvc atlanta 1/105

vbr-nrt 40000 72000 32service-policy out policy1

Page 129: Usda Training Mpls

129

Low Latency Queue - LLQ

A special queue defined by the policy-map Applies strict priority up to the bandwidth specified

– will not serve any other queue until the LLQ is empty

Drops packets above the specified bandwidth – WRED and queue depth do not apply

Invoked by using the “priority” command in place of the “bandwidth” command

Page 130: Usda Training Mpls

130

Queue Management

Queue Depth is specified in policy-map CBWFQ defines how queues are served but what

happens when a particular queue gets too big? Packets are discarded

– Tail Drop drops all packets arriving after the queue is full

– Weighted Random Early Detection (WRED)

Page 131: Usda Training Mpls

131

Queue Management – Tail Drop Tail Drop, Global Synchronization, WRED•Tail drop tends to affect all flows in a queue. This effect is called global synchronization. All flows crank up their windows, congestion occurs, tail drop drops all arriving packets, all widows reset.

Queue Full

IndividualTCP

Sessions

TotalBW

UtilizationAverage Utilization After tail drop.

Page 132: Usda Training Mpls

132

Queue Management - WRED Weighted Random Early Detect– WRED drops a few random packets before congestion reaches the

queue depth threshold. This causes a small number of flows to reset their TCP/IP window; while the remaining flows continue to use available bandwidth.

– Effective for ‘large’ number of flows, questionable for enterprise.– Can specify differing WRED threshold based on IP Prec.

Queue FullWRED Thresh

IndividualTCP

Sessions

TotalBW

UtilizationAverage Utilization After WRED drop.

Page 133: Usda Training Mpls

133

Tail Drop vs. WRED

Tail drop tends to affect all flows in a queue. This effect is called global synchronization. All flows crank up their windows, congestion occurs, tail drop drops all arriving packets, all widows reset.

WRED drops a few random packets before congestion reaches the queue depth threshold. This causes a small number of flows to reset their TCP/IP window.

Page 134: Usda Training Mpls

134

Traffic Shaping

• Traffic shaping is a tool to ‘move’ a queue from one place in a network to another.

• Queues only occur where there is a speed mismatch.• If arrival rate > departure rate -> queue

• Traffic Shaping forces a speed mismatch in a router to preventa speed mismatch in a network.

Page 135: Usda Training Mpls

135

Traffic Shaping

FRNetwork

T1 In / 64K Out Queue in the Network

Traffic Shape to ~ 64K – No Queue in NetworkQueue is in Router, where advanced queuing can be used

64K Port

T1 Port

RouterRouter

Page 136: Usda Training Mpls

136

Mesh Based Shaping

In mesh topologies, where is the choke point? That is where a queue will build!

Router Router

Q

Can we control this queue at the high speed end? NO, because of other remote sites.

Page 137: Usda Training Mpls

137

Value of MPLS Traffic Shaping Unlike mesh PVCs, MPLS services can apply

shaping in the cloud to manage the queue

Router Router

Q

Page 138: Usda Training Mpls

138

Policing / Shaping Two Sides of the Same Coin

A service subscription has an implied traffic contract– Service provider Polices arriving packets against the

contract– Customer Shapes traffic to assure conformance to the

contract

Page 139: Usda Training Mpls

139

Policing

Policing– Enforce a traffic ‘contract’– Pass all traffic within contract– Out of contract

• Drop• Or ‘mark’ as out of contract

• DE Bit in frame relay networks• CLP in ATM networks• IP Precedence or DSCP in IP networks

Page 140: Usda Training Mpls

140

What happens to Non-Conforming Traffic

Mark but do not discard, unless congestionOR

Drop

Page 141: Usda Training Mpls

141

Classification and Marking Need to identify packets in order to determine

what service level is required (classification)– Supported by marking or coloring– Marking is done in the IP header

Ver Hdr Type of Service LengthID Flag Fragment

OffsetTTL Protocol Header Checksum

Source IP AddressDestination IP

AddressOptions

Data

20 bytes

Page 142: Usda Training Mpls

142

Marking IP Precedence Type of Service provides 8 bits

– Bits 0-2 IP Precedence

Value Bits Name0 000 Routine1 001 Priority2 010 Immediate3 011 Flash4 100 Flash

Override5 101 Critical6 110 Internet

Control7 111 Network

Control

Page 143: Usda Training Mpls

143

Marking DSCP Start with IP Precedence 3 bits (Class Selectors) Add Drop Precedence Levels of 3 bits

Precedence7 Same—network control6 Same—internet control5 Express Forwarding (EF)4 Class 43 Class 32 Class 21 Class 10 Best Effort

Page 144: Usda Training Mpls

144

Marking DSCP The second three bits are used for Per Hop Behavior or Drop Probability

Applies to Class 1-4 or Assured Forwarding (AF) Provides more flexibility

Class 1 Class 2 Class 3 Class 4

Low drop 001010AF11 DSCP 10

010010AF21DSCP 18

011010AF31DSCP 26

100010AF41DSCP 34

Medium drop

001100AF12DSCP 12

010100AF22DSCP 20

011100AF32DSCP 28

100100AF42DSCP 36

High drop

001110AF13 DSCP 14

010110AF23DSCP 22

011110AF33DSCP 30

100110AF43DSCP 38

Page 145: Usda Training Mpls

145

Fragmentation

On low speed (<768K) ports, queueing is not enough Insertion (serialization) delay is an issue Insertion = packet size (bits)/line speed

– Example (1500*8)/56 = 214 msec Objective for voice is 10 msec insertion delay per

packet

Page 146: Usda Training Mpls

146

Insertion Delays

t

56Kbps

100 bytes

800bits /56K = 14ms 100bytes / 7 = 14ms

14ms

56Kbps=7bytes/ms

t

128Kbps

100 bytes

64Kbps

6.25ms

12.5ms

t

128Kbps

500 bytes

(8*500)*(1/128K + 1/ 64K) = 93.75ms

64Kbps

31.25ms

62.5ms

Fn(line speed) Fn(packet length)

(8*100)*(1/128K + 1/64K) = 18.75ms

Page 147: Usda Training Mpls

147

Fragmentation & Compression

Even with optimal queuing treatment, performance may not be acceptable.

The best treatment that can be obtained with strict prioritization is queuing delay of O(1/2 of a packet).

• I.e. Best case, prioritized packet gets transmitted immediately.• Worst case, prioritized packet has to wait for the currently transmitting packet to finish.• On average, wait time for prioritized packet is ½ of a non-prioritized packet.

This is still substantial delay (and jitter) for low speed ports. Fragmentation & Compression is a means to make the low priority packets

smaller; so O(1/2 packet delay is smaller)

Page 148: Usda Training Mpls

148

Head of Line Blocking Real time traffic arrives but a 1500 byte packet is

just starting transmission LLQ/CBWFQ gives only priority if the line has not

started to send the packet… There is no preemptive capability..

The 1500-byte frame takes 187.5 ms to serialize on a 64-kbps access. Real time traffic have to wait. This is HOL Blocking…

Link Fragmentation and Interleaving (LFI) is the mechanism fragment large data frames into regularly sized pieces and to interleave small real time packets into the flow.

Page 149: Usda Training Mpls

149

Head of Line Blocking

Page 150: Usda Training Mpls

150

Fragmentation MTU

– Dangerous – many apps set Do Not Fragment– Can be done at source, but no very practical

FRF.12– Fragment data packets– Prioritize voice packets– Voice only, not suitable for priority data

• No LFI• ATM Interworking is complicated

ML-PPP– Best ‘generic’ solution– More overhead than FRF.12, but OK with compression

Page 151: Usda Training Mpls

151

Fragmentation

50 100 250 500 1000 15001536 0 1 1 3 5 8768 1 1 3 5 10 16512 1 2 4 8 16 23256 2 3 8 16 31 47192 2 4 10 21 42 63128 3 6 16 31 63 9464 6 13 31 63 125 18856 7 14 36 71 143 214

T (Transmit 1 Packet) mS

LinkSpeed

Packet Size (bytes)

Page 152: Usda Training Mpls

152

Why Compression? IP header UDP header RTP header

– Real time protocol, synchronizes packets

2 Voice SamplesIP Header UDP Header RTP Header

20 bytes 12 bytes8 bytes 20 bytes

Total = 60 bytes, 63% is overhead!

60 bytes * 8 bits/byte * 50 PPS = 24KBPS!

Page 153: Usda Training Mpls

153

Why Compression?

Page 154: Usda Training Mpls

154

How does it work?

• Compressor and decompressor share consistent state : that includes fixed fields, first order differences and second order difference fields (delta encoding)

• There is Context ID (CID) that identifies flows and used as database index.

• Flows are hashed against IP source&destination address and UDP source&destination ports and assigned CID at compressor

• Decompressor use CID as database index ..• There is sequence number to detect packet loss … • Bandwidth vs CPU

Page 155: Usda Training Mpls

155

How it all works together!

Page 156: Usda Training Mpls

156

Service Implementation Example AT&T’s IP Enabled Frame

Relay/ATM– Marking– Shaping– Queuing– Profiles– Policing– Futures

Page 157: Usda Training Mpls

157

Service Architecture

CE

MPLSCore

CE

PER

Port

PER

Port

Frame Relay or ATM

Access PVC(CDR)

Page 158: Usda Training Mpls

158

Marking CER marks TOS or DSCP bits

– Real Time 101 110 or 101 000 • Dropped if rate exceeded

– Bursty High• 011 010 or 011 000 in contract• Remarked to 011 100 if out of contract

– Bursty Low• 010 010 or 010 000 in contract• 010 100 out of contract

– Best Effort • 000 000

Page 159: Usda Training Mpls

159

Shaping Based on egress port speed

– Contracted rate not a factor– Shaped to egress port speed– Moves queue into the network router

Page 160: Usda Training Mpls

160

IPFR/ATM CoS Implementation AT&T uses Low Latency Queueing (LLQ) and Class Based

Weighted Fair Queueing (CBWFQ) in the IP FR/ATM network to implement CoS

LLQ has strict priority—best for delay sensitive applications like voice

CBWFQ allows critical data classes to get more bandwidth allocation than other classes

2 2 2 2 2 2 11VVV VVV

Transmit Queue(FIFO)

Class 1(FIFO)

Class 3 (FIFO)

Class 2 (FIFO)

LLQ drained completelywhenever packets queued (upto Max BW during congestion)

Other classes typicalCB-WFQ

Interface

1 1 1 1 1 1

3 3 3 3 3 3

LLQ (FIFO)

VVVV

Page 161: Usda Training Mpls

161

Queuing 4 Queues (‘Classes of Service’)

– Real Time--LLQ– Bursty-Hi—CBWFQ with WRED– Bursty-Lo—CBWFQ with WRED– Best Effort—WFQ with WRED

– Mapping based on IP Precedence or DifServ marking

Page 162: Usda Training Mpls

162

CoS Profiles AT&T’s IPFR/ATM Service provides for 4 separate classes

– Real Time– Bursty High– Bursty Low– Best Effort

The 4 classes can be thought of as a priority hierarchy The hierarchy is controlled by bandwidth allocation

– Simply, the more bandwidth that is allocated, the higher the priority

– Real time class is given a strict bandwidth allocation– Other classes are assigned a percentage of CDR bandwidth

Page 163: Usda Training Mpls

163

Real Time Class

Best for voice, maybe video—not both if port speed is <768K

Strictly policed—excess over the allocated bandwidth is dropped

Must be purchased in increments of 20% of CDR Determine real time allocation based on call

requirements

Page 164: Usda Training Mpls

164

Policing – Real Time class

RT– Ingress – Police to contracted value

• Drop excess• No burst

– Egress – Police only when port is congested

• Drop excess• No burst

Page 165: Usda Training Mpls

165

Policing – Bursty Data Ingress

– Remark Excess (out of contract)• Contract based on allocation profile

– Burst Size? Egress

– No policing function– Queuing treatment of ‘out of contract’ identical to ‘in

contract’– ‘Lower’ drop thresholds

Page 166: Usda Training Mpls

166

Futures MPLS EXP bits used for CoS in the backbone

– Will no longer remark customer packets MPLS Traffic Engineering (TE)

Page 167: Usda Training Mpls

167

Applied Performance Engineering

Optimizing response times– Identify traffic classes & requirements– Implement policies at network bottlenecks.– Calculate ‘expected’ behavior.

Page 168: Usda Training Mpls

168

Applied Performance Engineering

Remember the relevant parameters:– Bandwidth– Delay– Jitter– Loss

Page 169: Usda Training Mpls

169

Performance Engineering Guidelines

Potential queuing (Bottlenecks) exists at any point where the arriving rate can exceed the departing rate.

– i.e. find the speed mismatch points

The dominant bottleneck is at the ‘slowest’ link in the end to end connection.

– TCP protocols tend to adapt to the available bandwidth. This means that the only place where there can be a sustained congestion is at the slowest link. Any ‘faster’ links will only experience ‘transient’ congestion.

For ‘Low Speed’ ports, priority is more important For ‘ High Speed’ resources, BW allocation is sufficient

Page 170: Usda Training Mpls

170

Frame Relay Performance Engineering

FR, ATM– Speed Mismatch Network Buffering

App1

App2

App3

Router Router

Q

Page 171: Usda Training Mpls

171

Application Inventory

List all applications Include ‘administrative’ apps

– Routing– DNS– OS

Establish Requirements– Response Time, Bandwidth

Page 172: Usda Training Mpls

172

What other applications are out there? Applications that have low delay tolerance (sub second)

– Telnet– Citrix– DLSW– TN3270

Applications that have multi-second response times– ERPs (SAP, Peoplesoft, Siebel)– Credit card authorizations– Reservations– HTTP applications

Background applications– Email– FTP– Database synchronization

Page 173: Usda Training Mpls

173

Response Time Requirements

Group applications according to (response time requirements) delay sensitivity.

VeryDelay

Sensitive

NotDelay

Sensitive

Voice FTPE-Mail

WebP.O.STelnet

My-SAPPeopleSoft 7

Page 174: Usda Training Mpls

174

GroupingIf voice is present

Put it in RTGroup adjacent application classes into

available bins.• Use caution if RT is to be used for data apps• You don’t have to use all available classes

VOICEP.O.STelnetDNS

ERPWeb

MailFTP

Real Time Bursty-Hi Bursty-Lo Best Effort

Page 175: Usda Training Mpls

175

Capacity Planning

Establish BW / Application– To meet stated requirements

Determine required port speed

Page 176: Usda Training Mpls

176

What is the “best” profile? Only one type of data applicationsBursty High Mix of sub-second and background

– Sub-second Bursty High– Background Best Effort

Mix of multi-second and background– Multi-second Bursty High– Background Best Effort

Mix of all types– Sub-second Bursty High– Multi-second Bursty Low– Background Best Effort

This process will help to identify the “best” profile!

Page 177: Usda Training Mpls

177

Profile SelectionCOS Package RT% of CDR BH% of CDR BL% of CDR BE

Multimedia High 80 20 0 --

60 40 0 --

60 20 20 --

Multimedia Standard

40 60 0 --

40 40 20 --

20 80 0 --

20 60 20 --

20 40 40 --

10 80 10 --

10 60 30 --

10 40 50 --

Critical Data 0 100 0 --

0 80 20 --

0 40 60 --

Business Data 0 0 100 --

Economy 0 0 0 100

Page 178: Usda Training Mpls

178

Insertion Delay Exercise   Given a 1500 byte packet, what is the insertion delay

– o  at 56K?– o  at 128K?

 If you wanted to run voice on the same 128K port as the 1500 byte packet, what size would you recommend fragmenting the packet to minimize jitter? (hint—10 msec should be the insertion delay of the fragment)

Page 179: Usda Training Mpls

179

CoS Exercise 1 Need to support 3 simultaneous calls with a G.729

codec Rest of data is web surfing

What port size do I need? What profile should I pick?

Page 180: Usda Training Mpls

180

CoS Exercise 2

No voice requirements Lots of telnet traffic Some http-based ERP applications Rest is web surfing and email

What profile should I consider? Why?

Page 181: Usda Training Mpls

181

Completed PHASE IICongratulations

Completed All Phases