336
IXP Training Workshops Contact: [email protected] WROU03_v1.0

IXP Training Workshops Contact: [email protected] WROU03_v1.0

Embed Size (px)

Citation preview

Page 1: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IXP Training Workshops

Contact: [email protected]

WROU03_v1.0

Page 2: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to The Internet

IXP Training Workshops

2

Page 3: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to the Internet Topologies and Definitions IP Addressing Internet Hierarchy Gluing it all together

3

Page 4: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Topologies and Definitions

What does all the jargon mean?

4

Page 5: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Some Icons…

5

Router (layer 3, IP datagram forwarding)

Network Cloud

Ethernet switch (layer 2, packet forwarding)

Page 6: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Routed Backbone ISPs build networks covering

regions Regions can cover a country,

sub-continent, or even global Each region has points of

presence built by the ISP Routers are the

infrastructure Physical circuits run between

routers Easy routing configuration,

operation and troubleshooting

The dominant topology used in the Internet today

6

Page 7: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

MPLS Backbones Some ISPs & Telcos use

Multi Protocol Label Switching (MPLS)

MPLS is built on top of router infrastructure Used replace old ATM

technology Tunnelling technology

Main purpose is to provide VPN services Although these can be

done just as easily with other tunnelling technologies such as GRE

7

Page 8: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Points of Presence PoP – Point of Presence

Physical location of ISP’s equipment Sometimes called a “node”

vPoP – virtual PoP To the end user, it looks like an ISP location In reality a back hauled access point Used mainly for consumer access networks

Hub/SuperPoP – large central PoP Links to many PoPs

8

Page 9: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

PoP Topologies Core routers

high speed trunk connections Distribution routers

higher port density, aggregating network edge to the network core

Access routers high port density, connecting the end users to the

network Border routers

connections to other providers Service routers

hosting and servers Some functions might be handled by a single

router9

Page 10: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Typical PoP Design

10

Backbone linkto another PoP

Backbone linkto another PoP

Business Customer

Aggregation

Other ISPs

NetworkCore

ISP Services (DNS, Mail, News,

FTP, WWW)

Hosted Services

Consumer Aggregation

Other ISPs

Border

Service

Access AccessServiceNetwork

Operation Centre

Page 11: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

More Definitions Transit

Carrying traffic across a network Usually for a fee

Peering Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering

Default Where to send traffic when there is no

explicit match in the routing table

11

Page 12: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering and Transit example

12

provider A

provider C

provider B

Backbone Provider D

A and B peer for free, but need transit arrangements with D to get packets to/from C

IXP-WestIXP-East

Page 13: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Private Interconnect

13

ISP A

ISP B

Autonomous System 99

Autonomous System 334

border border

Page 14: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Public Interconnect A location or facility where several ISPs are

present and connect to each other over a common shared media

Why? To save money, reduce latency, improve

performance IXP – Internet eXchange Point NAP – Network Access Point

14

Page 15: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Public Interconnect Centralised (in one facility) Distributed (connected via WAN links) Switched interconnect

Ethernet (Layer 2) Technologies such as SRP, FDDI, ATM, Frame

Relay, SMDS and even routers have been used in the past

Each provider establishes peering relationship with other providers at IXP ISP border router peers with all other provider

border routers

15

Page 16: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Public Interconnect

16

Each of these represents a border router in a different autonomous system

ISP 1

ISP 2

ISP 3 ISP 6

ISP 5

ISP 4

IXP

Page 17: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

ISPs participating in Internet Bringing all pieces together, ISPs:

Build multiple PoPs in a distributed network Build redundant backbones Have redundant external connectivity Obtain transit from upstream providers Get free peering from local providers at IXPs

17

Page 18: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Example ISP Backbone Design

18

NetworkCorePoP 1

PoP 4

PoP 3

PoP 2

IXP

ISP PeerISP Peer

ISP PeerISP Peer

Backbone LinksUpstream1

Upstream 2

Upstream1

Upstream 2

Page 19: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IP Addressing

Where to get address space and who from

19

Page 20: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IP Addressing Internet uses classless routing Concept of IPv4 class A, class B or class C

is no more Engineers talk in terms of prefix length, for

example the class B 158.43 is now called 158.43/16.

All routers must be CIDR capable Classless InterDomain Routing RFC1812 – Router Requirements

20

Page 21: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IP Addressing Pre-CIDR (before 1994)

Big networks got a class A Medium networks got a class B Small networks got a class C

The CIDR IPv4 years (1994 to 2010) Sizes of IPv4 allocations/assignments made according to

demonstrated need – CLASSLESS IPv6 adoption (from 2011)

The size of IPv4 address allocations and assignments are now very limited as IANA’s free pool has run out

21

Page 22: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IP Addressing IP Address space is a resource shared amongst all

Internet users Regional Internet Registries delegated allocation

responsibility by the IANA AfriNIC, APNIC, ARIN, LACNIC & RIPE NCC are the five

RIRs RIRs allocate address space to ISPs and Local Internet

Registries ISPs/LIRs assign address space to end customers or

other ISPs All usable IPv4 address space has been allocated

to the RIRs by the IANA (February 2011) The time for IPv6 is now

22

Page 23: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Non-portable Address Space “Provider Aggregatable” or “PA Space”

Customer uses RIR member’s address space while connected to Internet

Customer has to renumber to change ISP Aids control of size of Internet routing table Need to fragment provider block when

multihoming PA space is allocated to the RIR member

All assignments made by the RIR member to end sites are announced as an aggregate to the rest of the Internet

23

Page 24: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Portable Address Space “Provider Independent” or “PI Space”

Customer gets or has address space independent of ISP

Customer keeps addresses when changing ISP Is very bad for size of Internet routing table Is very bad for scalability of the routing system PI space is rarely distributed by the RIRs

24

Page 25: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Hierarchy

The pecking order

25

Page 26: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

High Level View of the Global Internet

26

Internet Exchange PointR4

Global Providers

Regional Provider 1

AccessProvider 1

Customer Networks

AccessProvider 2

Regional Provider 2

Content Provider 1

Content Provider 2

Page 27: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Detailed View of the Global Internet Global Transit Providers

Connect to each other Provide connectivity to Regional Transit Providers

Regional Transit Providers Connect to each other Provide connectivity to Content Providers Provide connectivity to Access Providers

Access Providers Connect to each other across IXPs (free peering) Provide access to the end user

27

Page 28: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Categorising ISPs

28

Tier 1 ISP

Tier 1 ISP Tier 1 ISP

Tier 1 ISP

$$$$$$$$$$$$$$$

Tier 2 ISP

IXP

Tier 3 ISP

Tier 2 ISP Tier 2 ISP

Tier 2 ISP

IXP

Tier 3 ISP

Tier 3 ISP Tier 3 ISP

Tier 3 ISP

Tier 3 ISP

Page 29: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Inter-provider relationships Peering between equivalent sizes of

service providers (e.g. Tier 2 to Tier 2) Shared cost private interconnection, equal

traffic flows No cost peering

Peering across exchange points If convenient, of mutual benefit, technically

feasible Fee based peering

Unequal traffic flows, “market position”29

Page 30: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Default Free Zone

30

The default free zone is made up of Internet routers which

have explicit routing information about the rest of the Internet, and therefore do not need to use a default route

NB: is not related to where an ISP is in the hierarchy

Page 31: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Gluing it together

31

Page 32: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Gluing it together Who runs the Internet?

No one (Definitely not ICANN, nor the RIRs, nor the US,…)

How does it keep working? Inter-provider business relationships and the need for

customer reachability ensures that the Internet by and large functions for the common good

Any facilities to help keep it working? Not really. But… Engineers keep working together!

32

Page 33: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Engineers keep talking to each other... North America

NANOG (North American Network Operators Group) NANOG meetings and mailing list www.nanog.org

Latin America Foro de Redes NAPLA LACNOG – supported by LACNIC

Middle East MENOG (Middle East Network Operators Group) www.menog.net

33

Page 34: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Engineers keep talking to each other... Asia & Pacific

APRICOT annual conference www.apricot.net

APOPS & APNIC-TALK mailing lists mailman.apnic.net/mailman/listinfo/apops mailman.apnic.net/mailman/listinfo/apnic-talk

PacNOG (Pacific NOG) mailman.apnic.net/mailman/listinfo/pacnog

SANOG (South Asia NOG) E-mail to [email protected]

34

Page 35: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Engineers keep talking to each other... Europe

RIPE meetings, working groups and mailing lists e.g. Routing WG: www.ripe.net/mailman/listinfo/routing-

wg Africa

AfNOG meetings and mailing list And many in-country ISP associations and NOGs IETF meetings and mailing lists

www.ietf.org

35

Page 36: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary Topologies and Definitions IP Addressing

PA versus PI address space Internet Hierarchy

Local, Regional, Global Transit Providers IXPs

Gluing it all together Engineers cooperate, common business

interests

36

Page 37: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to The Internet

ISP Training Workshops

37

Page 38: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The Value of Peering

ISP Training Workshops

38

Page 39: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The Internet Internet is made up of ISPs of all shapes and sizes

Some have local coverage (access providers) Others can provide regional or per country coverage And others are global in scale

These ISPs interconnect their businesses They don’t interconnect with every other ISP (over

41000 distinct autonomous networks) – won’t scale They interconnect according to practical and business

needs Some ISPs provide transit to others

They interconnect other ISP networks

39

Page 40: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Categorising ISPs

40

Global ISP

Global ISP Global ISP

Global ISP

$$$$$$$$$$$$$$$

Regional ISP

IXP

Access ISP

Regional ISP Regional ISP

Regional ISP

IXP

Access ISP

Access ISP Access ISP

Access ISP

Access ISP

Page 41: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering and Transit Transit

Carrying traffic across a network Usually for a fee Example: Access provider connects to a

regional provider Peering

Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering Example: Regional provider connects to

another regional provider

41

Page 42: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Private Interconnect Two ISPs connect their networks over a

private link Can be peering arrangement

No charge for traffic Share cost of the link

Can be transit arrangement One ISP charges the other for traffic One ISP (the customer) pays for the link

42

ISP 1 ISP 2

Page 43: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Public Interconnect Several ISPs meeting in a common neutral

location and interconnect their networks Usually is a peering arrangement between

their networks

43

IXP

ISP 1 ISP 2

ISP 3

ISP 4ISP 5

ISP 6

Page 44: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

ISP Goals Minimise the cost of operating the business Transit

ISP has to pay for circuit (international or domestic) ISP has to pay for data (usually per Mbps) Repeat for each transit provider Significant cost of being a service provider

Peering ISP shares circuit cost with peer (private) or runs circuit

to public peering point (one off cost) No need to pay for data Reduces transit data volume, therefore reducing cost

44

Page 45: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Transit – How it works Small access provider provides Internet access

for a city’s population Mixture of dial up, wireless and fixed broadband Possibly some business customers Possibly also some Internet cafes

How do their customers get access to the rest of the Internet?

ISP buys access from one, two or more larger ISPs who already have visibility of the rest of the Internet This is transit – they pay for the physical connection to

the upstream and for the traffic volume on the link45

Page 46: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering – How it works If two ISPs are of equivalent sizes, they have:

Equivalent network infrastructure coverage Equivalent customer size Similar content volumes to be shared with the Internet Potentially similar traffic flows to each other’s networks

This makes them good peering partners If they don’t peer

They both have to pay an upstream provider for access to each other’s network/customers/content

Upstream benefits from this arrangement, the two ISPs both have to fund the transit costs

46

Page 47: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The IXP’s role Private peering makes sense when there

are very few equivalent players Connecting to one other ISP costs X Connecting to two other ISPs costs 2 times X Connecting to three other ISPs costs 3 times X Etc… (where X is half the circuit cost plus a

port cost) The more private peers, the greater the

cost IXP is a more scalable solution to this

problem47

Page 48: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The IXP’s role Connecting to an IXP

ISP costs: one router port, one circuit, and one router to locate at the IXP

Some IXPs charge annual “maintenance fees” The maintenance fee has potential to significantly

influence the cost balance for an ISP Generally connecting to an IXP and peering there

becomes cost effective when there are at least three other peers The real $ amount varies from region to region, IXP to

IXP

48

Page 49: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Who peers at an IXP? Access Providers

Don’t have to pay their regional provider transit fees for local traffic

Keeps latency for local traffic low ‘Unlimited’ bandwidth through the IXP (compared with

costly and limited bandwidth through transit provider) Regional Providers

Don’t have to pay their global provider transit for local and regional traffic

Keeps latency for local and regional traffic low ‘Unlimited’ bandwidth through the IXP (compared with

costly and limited bandwidth through global provider)

49

Page 50: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The IXP’s role Global Providers can be located close to IXPs

Attracted by the potential transit business available Advantageous for access & regional providers

They can peer with other similar providers at the IXP And in the same facility pay for transit to their regional

or global provider (Not across the IXP fabric, but a separate connection)

50

Transit

IXP

Access

Page 51: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Connectivity Decisions Transit

Almost every ISP needs transit to reach rest of Internet One provider = no redundancy Two providers: ideal for traffic engineering as well as

redundancy Three providers = better redundancy, traffic engineering

gets harder More then three = diminishing returns, rapidly

escalating costs and complexity Peering

Means low (or zero) cost access to another network Private or Public Peering (or both)

51

Page 52: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Transit Goals1. Minimise number of transit providers

But maintain redundancy 2 is ideal, 4 or more is bad

2. Aggregate capacity to transit providers More aggregated capacity means better value

Lower cost per Mbps 4x 45Mbps circuits to 4 different ISPs will

almost always cost more than 2x 155Mbps circuits to 2 different ISPs Yet bandwidth of latter (310Mbps) is greater than

that of former (180Mbps) and is much easier to operate 52

Page 53: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering or Transit? How to choose? Or do both? It comes down to cost of going to an IXP

Free peering Paying for transit from an ISP co-located in

same facility, or perhaps close by Or not going to an IXP and paying for the

cost of transit directly to an upstream provider There is no right or wrong answer, someone

has to do the arithmetic53

Page 54: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Private or Public Peering Private peering

Scaling issue, with costs, number of providers, and infrastructure provisioning

Public peering Makes sense the more potential peers there are (more is

usually greater than “two”) Which public peering point?

Local Internet Exchange Point: great for local traffic and local peers

Regional Internet Exchange Point: great for meeting peers outside the locality, might be cheaper than paying transit to reach the same consumer base

54

Page 55: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Local Internet Exchange Point Defined as a public peering point serving

the local Internet industry Local means where it becomes cheaper to

interconnect with other ISPs at a common location than it is to pay transit to another ISP to reach the same consumer base Local can mean different things in different

regions!

55

Page 56: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Regional Internet Exchange Point These are also “local” Internet Exchange Points But also attract regional ISPs and ISPs from

outside the locality Regional ISPs peer with each other And show up at several of these Regional IXPs

Local ISPs peer with ISPs from outside the locality They don’t compete in each other’s markets Local ISPs don’t have to pay transit costs ISPs from outside the locality don’t have to pay transit

costs Quite often ISPs of disparate sizes and influences will

happily peer – to defray transit costs

56

Page 57: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Which IXP? How many routes are available?

What is traffic to & from these destinations, and by how much will it reduce cost of transit?

What is the cost of co-lo space? If prohibitive or space not available, pointless choosing

this IXP What is the cost of running a circuit to the

location? If prohibitive or competitive with transit costs, pointless

choosing this IXP What is the cost of remote hands/assistance?

If no remote hands, doing maintenance is challenging and potentially costly with a serious outage

57

Page 58: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Example: South Asian ISP @ LINX Date: October 2011 Facts:

Route Server plus bilateral peering offers 81k prefixes

IXP traffic averages 55Mbps/15Mbps Transit traffic averages 35Mbps/3Mbps

Analysis: 61% of inbound traffic comes from 81k prefixes

available by peering 39% of inbound traffic comes from remaining

287k prefixes from transit provider

58

Page 59: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Example: South Asian ISP @ HKIX Date: October 2011 Facts:

Route Server plus bilateral peering offers 34k prefixes

IXP traffic is 130Mbps/30Mbps Transit traffic is 125Mbps/40Mbps

Analysis: 51% of inbound traffic comes from 42k prefixes

available by peering 49% of inbound traffic comes from remaining

326k prefixes from transit provider

59

Page 60: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Example: South Asian ISP Summary:

Traffic by Peering: 185Mbps/45Mbps Traffic by Transit: 160Mbps/43Mbps

54% of incoming traffic is by peering 52% of outbound traffic is by peering

60

Page 61: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Example: South Asian ISP Router at remote co-lo

Benefits: can select peers, easy to swap transit providers

Costs: co-lo space and remote hands Servers at remote co-lo

Benefits: mail filtering, content caching, etc Costs: co-lo space and remote hands

Overall advantage: Can control what goes on the expensive

connectivity “back to home”61

Page 62: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Value propositions Peering at a local IXP

Reduces latency & transit costs for local traffic Improves Internet quality perception

Participating at a Regional IXP A means of offsetting transit costs

Managing connection back to home network

Improving Internet Quality perception for customers

62

Page 63: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary Benefits of peering

Private Internet Exchange Points

Local versus Regional IXPs Local services local traffic Regional helps defray transit costs

63

Page 64: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example

Single International TransitVersus

Local IXP + Regional IXP + Transit64

Page 65: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example ISP A is local access provider

Some business customers (around 200 fixed links) Some co-located content provision (datacentre with 100

servers) Some consumers on broadband (5000

DSL/Cable/Wireless) Some consumers on dial (1000 on V.34 type speeds)

They have a single transit provider Connect with a 16Mbps international leased link to their

transit’s PoP Transit link is highly congested

65

Page 66: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (2) There are two other ISPs serving the same locality

There is no interconnection between any of the three ISPs

Local traffic (between all 3 ISPs) is traversing International connections

Course of action for our ISP: Work to establish local IXP Establish presence at overseas co-location

First Step Assess local versus international traffic ratio Use NetFlow on border router connecting to transit

provider

66

Page 67: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (3) Local/Non-local traffic ratio

Local = traffic going to other two ISPs Non-local = traffic going elsewhere

Example: balance is 30:70 Of 16Mbps, that means 5Mbps could stay in country and

not congest International circuit 16Mbps transit costs $50 per Mbps per month traffic

charges = $250 per month, or $3000 per year for local traffic

Circuit costs $100k per year: $30k is spent on local traffic

Total is $33k per year for local traffic

67

Page 68: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (4) IXP cost:

Simple 8 port 10/100 managed switch plus co-lo space over 3 years could be around US$30k total; or $3k per year per ISP

One router to handle 5Mbps (e.g. 2801) would be around $3k (good for 3 years)

One local 10Mbps circuit from ISP location to IXP location would be around $5k per year, no traffic charges

Per ISP total: $9k Somewhat cheaper than $33k Business case for local peering is straightforward - $24k

saving per annum

68

Page 69: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (5) After IXP establishment

5Mbps removed from International link Leaving 5Mbps for more International traffic – and that

fills the link within weeks of the local traffic being removed

Next step is to assess transit charges and optimise costs ISPs visits several major regional IXPs Assess routes available Compares routes available with traffic generated by

those routes from its Netflow data Discovers that 30% of traffic would transfer to one IXP

via peering

69

Page 70: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (6) Costs:

Router for Regional IXP (e.g. 2801) at $3k over three years

Co-lo space at Regional IXP venue at $3k per year Best price for transit at the Regional IXP venue by

competitive tender is $30 per Mbps per month, plus $1k port charge

30% of traffic offloads to IXP, leaving 70% of 16Mbps to transit provider = $330 per month, or $5k per annum

Total with this model is $9k per year, plus the cost of the circuit (still $100k)

Compare this with paying $50 per Mbps per month to the transit provider = $10k per annum (plus cost of the circuit)

70

Page 71: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Worked Example (7) Result:

ISP co-locates at Regional IXP Pays reduced transit charges to transit provider

(competitive tender) Pays no charges for traffic across Regional IXP

Bonuses: Rate limits on router at Regional IXP Co-lo

Can prioritise congestion dependent on customer demands Install servers at Regional IXP co-lo facility

Filters e-mail (spam and viruses) – relieves some capacity on link

Caches content – relieves a little more capacity on link

71

Page 72: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Conclusion Within the original costs of having one

international transit provider: ISP has turned up at the local IXP and offloaded local

traffic for free ISP has turned up at a major regional IXP and offloaded

traffic, avoiding paying transit charges to transit provider

ISP has reduced remaining transit charges by competitive tender at the regional IXP co-location facility

Caveat These numbers are typical of the Internet today As ever, your mileage may vary – but do the financial

calculations first and in the context of potential technical advantages too

72

Page 73: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The Value of Peering

ISP Training Workshops

73

Page 74: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to OSPF

ISP Training Workshops

74

Page 75: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Open Shortest Path

First Link state or SPF

technology Developed by OSPF

working group of IETF (RFC 1247)

OSPFv2 standard described in RFC2328

Designed for: TCP/IP environment Fast convergence Variable-length subnet

masks Discontiguous subnets Incremental updates Route authentication

Runs on IP, Protocol 89

75

Page 76: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Link State

76

Topology Information is kept in a Database separate from the Routing Table

AABBCC

2213131313

QQZZXX

ZZ

XX

YYQQ

Z ’s Link State

Q ’s Link State

X ’s Link State

Page 77: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Link State Routing Neighbour discovery Constructing a Link State Packet (LSP) Distribute the LSP

(Link State Announcement – LSA)

Compute routes On network failure

New LSPs flooded All routers recompute routing table

77

Page 78: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Low Bandwidth Utilisation

Only changes propagated Uses multicast on multi-access broadcast

networks78

LSA

X

LSA

R1

Page 79: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Fast Convergence Detection Plus LSA/SPF

Known as the Dijkstra Algorithm

79

X N2

Alternate Path

Primary Path

N1

R2

R1 R3

Page 80: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Fast Convergence Finding a new

route LSA flooded

throughout area Acknowledgement

based Topology database

synchronised Each router derives

routing table to destination network

80

LSA

N1

R1 X

Page 81: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Areas Area is a group of

contiguous hosts and networks Reduces routing traffic

Per area topology database Invisible outside the

area Backbone area

MUST be contiguous All other areas must

be connected to the backbone

81

Area 1

Area 2 Area 3

R1 R2

R3R6

Area 4

R5 R4R7R8

RaRd

RbRc

Area 0Backbone Area

Page 82: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Virtual Links between OSPF Areas

Virtual Link is used when it is not possible to physically connect the area to the backbone

ISPs avoid designs which require virtual links Increases complexity Decreases reliability

and scalability

82

Area 1

R3R6

Area 4R5 R4

R7R8

RaRd

RbRc

Area 0Backbone Area

Page 83: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Classification of Routers

Internal Router (IR) Area Border Router (ABR) Backbone Router (BR) Autonomous System

Border Router (ASBR)

83

R1 R2

R3

R5 R4

Rd Ra

RbRc

IR

ABR/BR

IR/BRASBR

To other AS

IR

Area 1

Area 0

Area 2 Area 3

Page 84: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Route Types

Intra-area Route all routes inside an area

Inter-area Route routes advertised from

one area to another by an Area Border Router

External Route routes imported into

OSPF from other protocol or static routes

84

R1 R2

R3

R5 R4

Rd Ra

RbRc

IR

ABR/BR

ASBR

To other AS

IR

Area 1

Area 0

Area 2 Area 3

Page 85: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

External Routes Prefixes which are redistributed into OSPF from

other protocols Flooded unaltered throughout the AS

Recommendation: Avoid redistribution!! OSPF supports two types of external metrics

Type 1 external metrics Type 2 external metrics (Cisco IOS default)

85

RIPEIGRPBGPStaticConnectedetc.

OSPF

Redistribute

R2

Page 86: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

External Routes Type 1 external metric: metrics are added

to the summarised internal link cost

86

NetworkN1N1

Type 11110

Next HopR2R3

Cost = 10to N1

External Cost = 1

to N1 External Cost = 2Cost = 8

Selected Route

R3

R1

R2

Page 87: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

External Routes Type 2 external metric: metrics are

compared without adding to the internal link cost

87

Cost = 10to N1

External Cost = 1

to N1 External Cost = 2Cost = 8

Selected Route

R3

R1

R2

NetworkN1N1

Type 112

Next HopR2R3

Page 88: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Topology/Link State Database

A router has a separate LS database for each area to which it belongs

All routers belonging to the same area have identical database

SPF calculation is performed separately for each area

LSA flooding is bounded by area Recommendation:

Limit the number of areas a router participates in!! 1 to 3 is fine (typical ISP design) >3 can overload the CPU depending on the area

topology complexity

88

Page 89: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The Hello Protocol Responsible for

establishing and maintaining neighbour relationships

Elects designated router on multi-access networks

89

Hello

HelloHello

Page 90: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

The Hello Packet Contains:

Router priority Hello interval Router dead

interval Network mask List of neighbours DR and BDR Options: E-bit, MC-

bit,… (see A.2 of RFC2328)

90

Hello

HelloHello

Page 91: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Designated Router There is ONE designated router per multi-

access network Generates network link advertisements Assists in database synchronization

91

Designated Router

Designated Router

BackupDesignated Router

BackupDesignated

Router

Page 92: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Designated Router by Priority

Configured priority (per interface) ISPs configure high priority on the routers they want

as DR/BDR Else determined by highest router ID

Router ID is 32 bit integer Derived from the loopback interface address, if

configured, otherwise the highest IP address

92144.254.3.5

R2 Router ID = 131.108.3.3

131.108.3.2 131.108.3.3

R1 Router ID = 144.254.3.5

DR R2R1

Page 93: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Neighbouring States Full

Routers are fully adjacent Databases synchronised Relationship to DR and BDR

93

Full

DR BDR

Page 94: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Neighbouring States 2-way

Router sees itself in other Hello packets DR selected from neighbours in state 2-way or

greater

94

2-way

DR BDR

Page 95: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

When to Become Adjacent Underlying network is point to point Underlying network type is virtual link The router itself is the designated router

or the backup designated router The neighbouring router is the designated

router or the backup designated router

95

Page 96: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

LSAs Propagate Along Adjacencies

LSAs acknowledged along adjacencies

96

DR BDR

Page 97: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Broadcast Networks IP Multicast used for Sending and

Receiving Updates All routers must accept packets sent to

AllSPFRouters (224.0.0.5) All DR and BDR routers must accept packets

sent to AllDRouters (224.0.0.6) Hello packets sent to AllSPFRouters

(Unicast on point-to-point and virtual links)

97

Page 98: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Routing Protocol Packets Share a common protocol header Routing protocol packets are sent with type of

service (TOS) of 0 Five types of OSPF routing protocol packets

Hello – packet type 1 Database description – packet type 2 Link-state request – packet type 3 Link-state update – packet type 4 Link-state acknowledgement – packet type 5

98

Page 99: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Different Types of LSAs Six distinct type of LSAs

Type 1 : Router LSA Type 2 : Network LSA Type 3 & 4: Summary LSA Type 5 & 7: External LSA (Type 7 is for NSSA) Type 6: Group membership LSA Type 9, 10 & 11: Opaque LSA (9: Link-Local, 10: Area)

99

Page 100: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Router LSA (Type 1) Describes the state and cost of the router’s

links to the area All of the router’s links in an area must be

described in a single LSA Flooded throughout the particular area

and no more Router indicates whether it is an ASBR,

ABR, or end point of virtual link

100

Page 101: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Network LSA (Type 2) Generated for every transit broadcast and

NBMA network Describes all the routers attached to the

network Only the designated router originates this

LSA Flooded throughout the area and no more

101

Page 102: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary LSA (Type 3 and 4) Describes the destination outside the area

but still in the AS Flooded throughout a single area Originated by an ABR Only inter-area routes are advertised into

the backbone Type 4 is the information about the ASBR

102

Page 103: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

External LSA (Type 5 and 7) Defines routes to destination external to

the AS Default route is also sent as external Two types of external LSA:

E1: Consider the total cost up to the external destination

E2: Considers only the cost of the outgoing interface to the external destination

(Type 7 LSAs used to describe external LSA for one specific OSPF area type)

103

Page 104: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Inter-Area Route Summarisation Prefix or all subnets Prefix or all networks ‘Area range’ command

104

1.A 1.B 1.C

(ABR)Network1

Next HopR1

Network1.A1.B1.C

Next HopR1R1R1

With summarisation

Withoutsummarisation

BackboneArea 0

Area 1R1

R2

Page 105: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

No Summarisation Specific Link LSA advertised out of each area Link state changes propagated out of each area

105

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

1.A1.B1.C1.D Area 0

2.A2.B2.C2.D

3.A3.B3.C3.D

Page 106: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

With Summarisation Only summary LSA advertised out of each area Link state changes do not propagate out of the area

106

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

1

Area 0

2

3

Page 107: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

No Summarisation Specific Link LSA advertised in to each area Link state changes propagated in to each area

107

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

2.A 2.B2.C 2.D3.A 3.B3.C 3.D Area 0

1.A 1.B1.C 1.D3.A 3.B3.C 3.D

1.A 1.B1.C 1.D2.A 2.B2.C 2.D

Page 108: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

With Summarisation Only summary link LSA advertised in to each area Link state changes do not propagate in to each area

108

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

2 3 Area 0

1 3

12

Page 109: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Types of Areas Regular Stub Totally Stubby Not-So-Stubby Only “regular” areas are useful for ISPs

Other area types handle redistribution of other routing protocols into OSPF – ISPs don’t redistribute anything into OSPF

The next slides describing the different area types are provided for information only

109

Page 110: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Regular Area (Not a Stub) From Area 1’s point of view, summary networks from other

areas are injected, as are external networks such as X.1

110

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

2 3 Area 0

1 3

12

ASBRExternal networks

X.1

X.1

X.1

X.1

X.1

X.1

X.1

Page 111: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Normal Stub Area Summary networks, default route injected Command is area x stub

111

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

2 3 Area 0

1 3

12

ASBRExternal networks

X.1

X.1

Default

X.1

X.1

Default

Default

Page 112: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Totally Stubby Area Only a default route injected

Default path to closest area border router Command is area x stub no-summary

112

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

Area 0

1 3

1 2

ASBRExternal networks

X.1

X.1

Default

X.1

X.1

Default

DefaultTotally Stubby Area

Page 113: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Not-So-Stubby Area Capable of importing routes in a limited fashion Type-7 LSA’s carry external information within an NSSA NSSA Border routers translate selected type-7 LSAs into type-5 external

network LSAs

113

3.A3.B

3.C 3.D2.A2.B

2.C 2.D

1.A1.B

1.C 1.D

Area 0

1 3

1 2

ASBRExternal networks

X.1

X.1

Default

X.1

X.1

Default X.2

Default X.2

Not-So-Stubby Area

External networks

X.2

X.2

X.2

Page 114: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

ISP Use of Areas ISP networks use:

Backbone area Regular area

Backbone area No partitioning

Regular area Summarisation of point to point link addresses used

within areas Loopback addresses allowed out of regular areas without

summarisation (otherwise iBGP won’t work)

114

Page 115: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Addressing for Areas

Assign contiguous ranges of subnets per area to facilitate summarisation

115

Area 1network 192.168.1.64range 255.255.255.192

Area 2network 192.168.1.128range 255.255.255.192

Area 3network 192.168.1.192range 255.255.255.192

Area 0network 192.168.1.0range 255.255.255.192

Page 116: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary Fundamentals of Scalable OSPF Network

Design Area hierarchy DR/BDR selection Contiguous intra-area addressing Route summarisation Infrastructure prefixes only

116

Page 117: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to OSPF

ISP Training Workshops

117

Page 118: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Deploying OSPF for ISPs

ISP Training WorkshopsISP Training Workshops

118

Page 119: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Agenda OSPF Design in SP Networks Adding Networks in OSPF OSPF in Cisco’s IOS

119

Page 120: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design

As applicable to Service Provider Networks

120

Page 121: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Service Providers SP networks are divided

into PoPs PoPs are linked by the

backbone Transit routing information

is carried via iBGP IGP is only used to carry

the next hop for BGP Optimal path to the next

hop is critical

121

Page 122: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

SP Architecture Major routing

information is ~430K prefixes via BGP

Largest known IGP routing table is ~9–10K

Total of 440K 10K/440K is 2½% of IGP

routes in an ISP network A very small factor but

has a huge impact on network convergence!

122

IP Backbone

POP

POPPOP

POP

Area 1/L1BGP 1

POP POP

Area 6/L1BGP 1

Area 5/L1BGP 1

Area 4/L1BGP 1

Area 2/L1BGP 1

Area 3/L1BGP 1Area0/L2

BGP 1

Page 123: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

SP Architecture You can reduce the IGP

size from 10K to approx the number of routers in your network

This will bring really fast convergence

Optimise where you must and summarise where you can

Stops unnecessary flapping

123

RR

Regional Core

Access

customer customer customer

IGP

Page 124: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Addressing OSPF Design and Addressing go together

Objective is to keep the Link State Database lean

Create an address hierarchy to match the topology

Use separate Address Blocks for loopbacks, network infrastructure, customer interfaces & customers

124

InfrastructureCustomer Address Space LoopbacksPtP Links

Page 125: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Addressing Minimising the number of prefixes in OSPF:

Number loopbacks out of a contiguous address block

But do not summarise these across area boundaries: iBGP peer addresses need to be in the IGP

Use contiguous address blocks per area for infrastructure point-to-point links

Use area range command on ABR to summarise

With these guidelines: Number of prefixes in area 0 will then be very close to

the number of routers in the network It is critically important that the number of prefixes and

LSAs in area 0 is kept to the absolute minimum

125

Page 126: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Areas Examine physical topology

Is it meshed or hub-and-spoke? Use areas and summarisation

This reduces overhead and LSA counts (but watch next-hop for iBGP when summarising)

Don’t bother with the various stub areas No benefits for ISPs, causes problems for iBGP

Push the creation of a backbone Reduces mesh and promotes hierarchy

126

Page 127: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Areas One SPF per area, flooding done per area

Watch out for overloading ABRs Avoid externals in OSPF

DO NOT REDISTRIBUTE into OSPF External LSAs flood through entire network

Different types of areas do different flooding Normal areas Stub areas Totally stubby (stub no-summary) Not so stubby areas (NSSA)

127

Page 128: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Areas Area 0 must be contiguous

Do NOT use virtual links to join two Area 0 islands Traffic between two non-zero areas always goes

via Area 0 There is no benefit in joining two non-zero areas

together Avoid designs which have two non-zero areas touching

each other (Typical design is an area per PoP, with core routers

being ABR to the backbone area 0)

128

Page 129: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Design: Summary Think Redundancy

Dual Links out of each area – using metrics (cost) for traffic engineering

Too much redundancy… Dual links to backbone in stub areas must be

the same cost – other wise sub-optimal routing will result

Too Much Redundancy in the backbone area without good summarisation will effect convergence in the Area 0

129

Page 130: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Areas: Migration Where to place OSPF Areas?

Follow the physical topology! Remember the earlier design advice

Configure area at a time! Start at the outermost edge of the network Log into routers at either end of a link and change the

link from Area 0 to the chosen Area Wait for OSPF to re-establish adjacencies And then move onto the next link, etc Important to ensure that there is never an Area 0 island

anywhere in the migrating network

130

Page 131: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF Areas: Migration

Migrate small parts of the network, one area at a time Remember to introduce summarisation where feasible

With careful planning, the migration can be done with minimal network downtime 131

Area 0

AA

BB

GGFFEE

DD

CC

Area 10

Page 132: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF for Service Providers

Configuring OSPF & Adding Networks

132

Page 133: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Configuration Starting OSPF in Cisco’s IOS

router ospf 100 Where “100” is the process ID

OSPF process ID is unique to the router Gives possibility of running multiple instances of OSPF

on one router Process ID is not passed between routers in an AS Many ISPs configure the process ID to be the same as

their BGP Autonomous System Number

133

Page 134: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Establishing Adjacencies Cisco IOS OSPFv2 automatically tries to establish

adjacencies on all defined interfaces (or subnets) Best practice is to disable this

Potential security risk: sending OSPF Hellos outside of the autonomous system, and risking forming adjacencies with external networks

Example: Only POS4/0 interface will attempt to form an OSPF adjacency

router ospf 100

passive-interface default

no passive-interface POS4/0

134

Page 135: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksOption One Redistribution:

Applies to all connected interfaces on the router but sends networks as external type-2s – which are not summarised

router ospf 100

redistribute connected subnets

Do NOT do this! Because: Type-2 LSAs flood through entire network These LSAs are not all useful for determining paths

through backbone; they simply take up valuable space

135

Page 136: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksOption Two Per link configuration – from IOS 12.4 onwards

OSPF is configured on each interface (same as ISIS) Useful for multiple subnets per interface

interface POS 4/0

ip address 192.168.1.1 255.255.255.0

ip address 172.16.1.1 255.255.255.224 secondary

ip ospf 100 area 0

!

router ospf 100

passive-interface default

no passive-interface POS 4/0

136

Page 137: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksOption Three Specific network statements

Every active interface with a configured IP address needs an OSPF network statement

Interfaces that will have no OSPF neighbours need passive-interface to disable OSPF Hello’s

That is: all interfaces connecting to devices outside the ISP backbone (i.e. customers, peers, etc)

router ospf 100

network 192.168.1.0 0.0.0.3 area 51

network 192.168.1.4 0.0.0.3 area 51

passive-interface Serial 1/0

137

Page 138: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksOption Four Network statements – wildcard mask

Every active interface with configured IP address covered by wildcard mask used in OSPF network statement

Interfaces covered by wildcard mask but having no OSPF neighbours need passive-interface (or use passive-interface default and then activate the interfaces which will have OSPF neighbours)

router ospf 100

network 192.168.1.0 0.0.0.255 area 51

passive-interface default

no passive interface POS 4/0

138

Page 139: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksRecommendations Don’t ever use Option 1 Use Option 2 if supported; otherwise: Option 3 is fine for core/infrastructure routers

Doesn’t scale too well when router has a large number of interfaces but only a few with OSPF neighbours

solution is to use Option 3 with “no passive” on interfaces with OSPF neighbours

Option 4 is preferred for aggregation routers Or use iBGP next-hop-self Or even ip unnumbered on external point-to-point links

139

Page 140: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksExample One (Cisco IOS ≥ 12.4)

Aggregation router with large number of leased line customers and just two links to the core network:

interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0interface POS 0/0 ip address 192.168.10.1 255.255.255.252 ip ospf 100 area 0interface POS 1/0 ip address 192.168.10.5 255.255.255.252 ip ospf 100 area 0interface serial 2/0:0 ... ip unnumbered loopback 0! Customers connect here ^^^^^^^router ospf 100 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0

140

Page 141: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksExample One (Cisco IOS < 12.4)

Aggregation router with large number of leased line customers and just two links to the core network:

interface loopback 0 ip address 192.168.255.1 255.255.255.255interface POS 0/0 ip address 192.168.10.1 255.255.255.252interface POS 1/0 ip address 192.168.10.5 255.255.255.252interface serial 2/0:0 ... ip unnumbered loopback 0! Customers connect here ^^^^^^^router ospf 100 network 192.168.255.1 0.0.0.0 area 51 network 192.168.10.0 0.0.0.3 area 51 network 192.168.10.4 0.0.0.3 area 51 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0

141

Page 142: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksExample Two (Cisco IOS ≥ 12.4)

Core router with only links to other core routers:

interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0interface POS 0/0 ip address 192.168.10.129 255.255.255.252 ip ospf 100 area 0interface POS 1/0 ip address 192.168.10.133 255.255.255.252 ip ospf 100 area 0interface POS 2/0 ip address 192.168.10.137 255.255.255.252 ip ospf 100 area 0interface POS 2/1 ip address 192.168.10.141 255.255.255.252 ip ospf 100 area 0router ospf 100 passive interface loopback 0

142

Page 143: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksExample Two (Cisco IOS < 12.4)

Core router with only links to other core routers:

interface loopback 0 ip address 192.168.255.1 255.255.255.255interface POS 0/0 ip address 192.168.10.129 255.255.255.252interface POS 1/0 ip address 192.168.10.133 255.255.255.252interface POS 2/0 ip address 192.168.10.137 255.255.255.252interface POS 2/1 ip address 192.168.10.141 255.255.255.252router ospf 100 network 192.168.255.1 0.0.0.0 area 0 network 192.168.10.128 0.0.0.3 area 0 network 192.168.10.132 0.0.0.3 area 0 network 192.168.10.136 0.0.0.3 area 0 network 192.168.10.140 0.0.0.3 area 0 passive interface loopback 0

143

Page 144: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF: Adding NetworksSummary Key Theme when selecting a technique:

Keep the Link State Database Lean Increases Stability Reduces the amount of information in the Link

State Advertisements (LSAs) Speeds Convergence Time

144

Page 145: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

OSPF in Cisco IOS

Useful features for ISPs

145

Page 146: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Areas An area is stored as

a 32-bit field: Defined in IPv4

address format (i.e. Area 0.0.0.0)

Can also be defined using single decimal value (i.e. Area 0)

0.0.0.0 reserved for the backbone area

146

Area 0

Area 1

Area 2

Area 3

Page 147: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Logging Adjacency Changes The router will generate a log message

whenever an OSPF neighbour changes state Syntax:

[no] [ospf] log-adjacency-changes (OSPF keyword is optional, depending on IOS

version) Example of a typical log message:

%OSPF-5-ADJCHG: Process 1, Nbr 223.127.255.223 on Ethernet0 from LOADING to FULL, Loading Done

147

Page 148: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Number of State Changes The number of state transitions is

available via SNMP (ospfNbrEvents) and the CLI: show ip ospf neighbor [type number] [neighbor-id] [detail]

Detail—(Optional) Displays all neighbours given in detail (list all neighbours). When specified, neighbour state transition counters are displayed per interface or neighbour ID

148

Page 149: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

State Changes (Continued) To reset OSPF-related statistics, use the clear ip ospf counters command This will reset neighbour state transition

counters per interface or neighbour id clear ip ospf counters [neighbor [<type number>] [neighbor-id]]

149

Page 150: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Router ID If the loopback interface exists and has

an IP address, that is used as the router ID in routing protocols – stability!

If the loopback interface does not exist, or has no IP address, the router ID is the highest IP address configured – danger!

OSPF sub command to manually set the Router ID: router-id <ip address>

150

Page 151: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Cost & Reference Bandwidth

Bandwidth used in Metric calculation Cost = 108/bandwidth Not useful for interface bandwidths > 100 Mbps

Syntax: ospf auto-cost reference-bandwidth <reference-bw>

Default reference bandwidth still 100 Mbps for backward compatibility

Most ISPs simply choose to develop their own cost strategy and apply to each interface type

151

Page 152: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Cost: Example Strategy100GE 100Gbps cost = 140GE/OC768 40Gbps cost = 210GE/OC192 10Gbps cost = 5OC48 2.5Gbps cost = 10GigEthernet 1Gbps cost = 20OC12 622Mbps cost = 50OC3 155Mbps cost = 100FastEthernet 100Mbps cost = 200Ethernet 10Mbps cost = 500E1 2Mbps cost = 1000

152

Page 153: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Default routes Originating a default route into OSPF

default-information originate metric <n> Will originate a default route into OSPF if there is

a matching default route in the Routing Table (RIB)

The optional always keyword will always originate a default route, even if there is no existing entry in the RIB

153

Page 154: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Clear/Restart OSPF clear commands

If no process ID is given, all OSPF processes on the router are assumed

clear ip ospf [pid] redistribution This command clears redistribution based on OSPF routing

process ID clear ip ospf [pid] counters

This command clears counters based on OSPF routing process ID

clear ip ospf [pid] process This command will restart the specified OSPF process. It

attempts to keep the old router-id, except in cases where a new router-id was configured or an old user configured router-id was removed. Since this command can potentially cause a network churn, a user confirmation is required before performing any action 154

Page 155: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Use OSPF Authentication Use authentication

Too many operators overlook this basic requirement When using authentication, use the MD5 feature

Under the global OSPF configuration, specify:area <area-id> authentication message-digest

Under the interface configuration, specify:ip ospf message-digest-key 1 md5 <key>

Authentication can be selectively disabled per interface with:

ip ospf authentication null

155

Page 156: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Point to Point Ethernet Links For any broadcast media (like Ethernet), OSPF will

attempt to elect a designated and backup designated router when it forms an adjacency If the interface is running as a point-to-point WAN link, with

only 2 routers on the wire, configuring OSPF to operate in "point-to-point mode" scales the protocol by reducing the link failure detection times

Point-to-point mode improves convergence times on Ethernet networks because it:

Prevents the election of a DR/BDR on the link, Simplifies the SPF computations and reduces the router's memory

footprint due to a smaller topology database.

interface fastethernet0/2

ip ospf network point-to-point156

Page 157: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Tuning OSPF (1) DR/BDR Selection

ip ospf priority 100 (default 1) This feature should be in use in your OSPF

network Forcibly set your DR and BDR per segment so

that they are known Choose your most powerful, or most idle routers,

so that OSPF converges as fast as possible under maximum network load conditions

Try to keep the DR/BDR limited to one segment each

157

Page 158: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Tuning OSPF (2) OSPF startup

max-metric router-lsa on-startup wait-for-bgp Avoids blackholing traffic on router restart Causes OSPF to announce its prefixes with highest

possible metric until iBGP is up and running When iBGP is running, OSPF metrics return to normal,

make the path valid

ISIS equivalent: set-overload-bit on-startup wait-for-bgp

158

Page 159: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Tuning OSPF (3) Hello/Dead Timers

ip ospf hello-interval 3 (default 10) ip ospf dead-interval 15 (default is 4x hello) This allows for faster network awareness of a failure, and

can result in faster reconvergence, but requires more router CPU and generates more overhead

LSA Pacing timers lsa-group-pacing 300 (default 240) Allows grouping and pacing of LSA updates at configured

interval Reduces overall network and router impact

159

Page 160: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Tuning OSPF (4) OSPF Internal Timers

timers spf 2 8 (default is 5 and 10) Allows you to adjust SPF characteristics The first number sets wait time from topology

change to SPF run The second is hold-down between SPF runs BE CAREFUL WITH THIS COMMAND; if you’re

not sure when to use it, it means you don’t need it; default is sufficient 95% of the time

160

Page 161: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Tuning OSPF (5) LSA filtering/interface blocking

Per interface: ip ospf database-filter all out (no options)

Per neighbor: neighbor 1.1.1.1 database-filter all out (no options)

OSPFs router will flood an LSA out all interfaces except the receiving one; LSA filtering can be useful in cases where such flooding unnecessary (i.e., NBMA networks), where the DR/BDR can handle flooding chores

area <area-id> filter-list <acl> Filters out specific Type 3 LSAs at ABRs

Improper use can result in routing loops and black-holes that can be very difficult to troubleshoot

161

Page 162: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary OSPF has a bewildering number of

features and options Observe ISP best practices Keep design and configuration simple Investigate tuning options and suitability

for your own network Don’t just turn them on!

162

Page 163: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Deploying OSPF for ISPs

ISP Training WorkshopsISP Training Workshops

163

Page 164: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

164

Introduction to BGP

ISP Training WorkshopsISP Training Workshops

Page 165: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

165

Border Gateway Protocol A Routing Protocol used to exchange routing

information between different networks Exterior gateway protocol

Described in RFC4271 RFC4276 gives an implementation report on BGP RFC4277 describes operational experiences using BGP

The Autonomous System is the cornerstone of BGP It is used to uniquely identify networks with a common

routing policy

Page 166: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

166

BGP Path Vector Protocol Incremental Updates Many options for policy enforcement Classless Inter Domain Routing (CIDR) Widely used for Internet backbone Autonomous systems

Page 167: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

167

Path Vector Protocol BGP is classified as a path vector routing

protocol (see RFC 1322) A path vector protocol defines a route as a

pairing between a destination and the attributes of the path to that destination.

12.6.126.0/24 207.126.96.43 1021 0 6461 7018 6337 11268 i12.6.126.0/24 207.126.96.43 1021 0 6461 7018 6337 11268 i

AS PathAS Path

Page 168: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

168

Path Vector Protocol

AS6461

AS7018

AS6337AS11268

AS500

AS600

Page 169: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

169

Definitions Transit – carrying traffic across a network,

usually for a fee Peering – exchanging routing information

and traffic Default – where to send traffic when there

is no explicit match in the routing table

Page 170: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Default Free Zone

170

The default free zone is made up of Internet routers which

have explicit routing information about the rest of the Internet, and therefore do not need to use a default route

NB: is not related to where an ISP is in the hierarchy

Page 171: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

171

provider A

provider C

provider B

Backbone Provider D

Peering and Transit example

A and B can peer, but need transit arrangements with D to get packets to/from C

IXP-EastIXP-West

Page 172: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

172

AS 100

Autonomous System (AS)

Collection of networks with same routing policy Single routing protocol Usually under single ownership, trust and

administrative control Identified by a unique 32-bit integer (ASN)

Page 173: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

173

Autonomous System Number (ASN) Two ranges

0-65535 (original 16-bit range) 65536-4294967295 (32-bit range – RFC4893)

Usage: 0 and 65535 (reserved) 1-64495 (public Internet) 64496-64511 (documentation – RFC5398) 64512-65534 (private use only) 23456 (represent 32-bit range in 16-bit

world) 65536-65551 (documentation – RFC5398) 65552-4294967295 (public Internet)

32-bit range representation specified in RFC5396 Defines “asplain” (traditional format) as standard notation

Page 174: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Autonomous System Number (ASN) ASNs are distributed by the Regional Internet

Registries They are also available from upstream ISPs who are

members of one of the RIRs Current 16-bit ASN allocations up to 61439 have

been made to the RIRs Around 42000 are visible on the Internet

Each RIR has also received a block of 32-bit ASNs Out of 3100 assignments, around 2800 are visible on the

Internet See www.iana.org/assignments/as-numbers

174

Page 175: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Configuring BGP in Cisco IOS This command enables BGP in Cisco IOS:

router bgp 100

For ASNs > 65535, the AS number can be entered in either plain or dot notation:router bgp 131076

or

router bgp 2.4

IOS will display ASNs in plain notation by default Dot notation is optional:router bgp 2.4

bgp asnotation dot

175

Page 176: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

176

AS 100 AS 101

AS 102

EE

BB DD

AA CC

Peering

BGP Basics

Runs over TCP – port 179 Path vector protocol Incremental updates “Internal” & “External” BGP

Page 177: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

177

AS 100 AS 101

AS 102

DMZ Network

AA

BB

CC

DD

EE

DMZ is the link or network shared between ASes

Demarcation Zone (DMZ)

Page 178: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

178

BGP General Operation Learns multiple paths via internal and

external BGP speakers Picks the best path and installs it in the

routing table (RIB) Best path is sent to external BGP

neighbours Policies are applied by influencing the best

path selection

Page 179: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Constructing the Forwarding Table BGP “in” process

receives path information from peers results of BGP path selection placed in the BGP table “best path” flagged

BGP “out” process announces “best path” information to peers

Best path stored in Routing Table (RIB) Best paths in the RIB are installed in forwarding

table (FIB) if: prefix and prefix length are unique lowest “protocol distance”

179

Page 180: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

180

Constructing the Forwarding Table

BGP inprocess

BGPtable

BGP outprocess

accepted

discarded

bgp

peerroutingtable

in

out

best paths

everything

forwardingtable

Page 181: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

eBGP & iBGP BGP used internally (iBGP) and externally

(eBGP) iBGP used to carry

Some/all Internet prefixes across ISP backbone ISP’s customer prefixes

eBGP used to Exchange prefixes with other ASes Implement routing policy

181

Page 182: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

182

BGP/IGP model used in ISP networks Model representation

IGP

iBGP

IGP

iBGP

IGP

iBGP

IGP

iBGP

eBGP eBGP eBGP

AS1 AS2 AS3 AS4

Page 183: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

183

AS 100 AS 101CC

AA

BB

External BGP Peering (eBGP)

Between BGP speakers in different AS Should be directly connected Never run an IGP between eBGP peers

Page 184: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

184

Configuring External BGP

Router A in AS100

interface ethernet 5/0 ip address 102.102.10.2 255.255.255.240!router bgp 100 network 100.100.8.0 mask 255.255.252.0 neighbor 102.102.10.1 remote-as 101 neighbor 102.102.10.1 prefix-list RouterC in neighbor 102.102.10.1 prefix-list RouterC out!

ip address on ethernet interface

ip address of Router C ethernet interface

Local ASN

Remote ASN

Inbound and outbound filters

Page 185: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

185

Configuring External BGP

Router C in AS101

interface ethernet 1/0/0 ip address 102.102.10.1 255.255.255.240!router bgp 101 network 100.100.64.0 mask 255.255.248.0 neighbor 102.102.10.2 remote-as 100 neighbor 102.102.10.2 prefix-list RouterA in neighbor 102.102.10.2 prefix-list RouterA out!

ip address on ethernet interface

ip address of Router A ethernet interface

Local ASN

Remote ASN

Inbound and outbound filters

Page 186: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internal BGP (iBGP) BGP peer within the same AS Not required to be directly connected

IGP takes care of inter-BGP speaker connectivity

iBGP speakers must be fully meshed: They originate connected networks They pass on prefixes learned from outside the

ASN They do not pass on prefixes learned from

other iBGP speakers

186

Page 187: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

187

AS 100

AA

DD

CC

BB

Internal BGP Peering (iBGP)

Topology independent Each iBGP speaker must peer with every other

iBGP speaker in the AS

Page 188: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

188

Peering between Loopback Interfaces

Peer with loop-back interface Loop-back interface does not go down – ever!

Do not want iBGP session to depend on state of a single interface or the physical topology

AS 100

AA

BB

CC

Page 189: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

189

Configuring Internal BGP

Router A in AS100

interface loopback 0 ip address 105.3.7.1 255.255.255.255

!router bgp 100

network 100.100.1.0 neighbor 105.3.7.2 remote-as 100 neighbor 105.3.7.2 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 !

ip address on loopback interface

ip address of Router B loopback interface

Local ASN

Local ASN

Page 190: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

190

Configuring Internal BGP

Router B in AS100

interface loopback 0 ip address 105.3.7.2 255.255.255.255

!router bgp 100

network 100.100.1.0 neighbor 105.3.7.1 remote-as 100 neighbor 105.3.7.1 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 !

ip address on loopback interface

ip address of Router A loopback interface

Local ASN

Local ASN

Page 191: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

191

Inserting prefixes into BGP Two ways to insert prefixes into BGP

redistribute static network command

Page 192: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

192

Inserting prefixes into BGP –redistribute static

Configuration Example:router bgp 100

redistribute static

ip route 102.10.32.0 255.255.254.0 serial0

Static route must exist before redistribute command will work

Forces origin to be “incomplete” Care required!

Page 193: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

193

Inserting prefixes into BGP –redistribute static Care required with redistribute!

redistribute <routing-protocol> means everything in the <routing-protocol> will be transferred into the current routing protocol

Will not scale if uncontrolled Best avoided if at all possible redistribute normally used with “route-maps”

and under tight administrative control

Page 194: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

194

Inserting prefixes into BGP –network command Configuration Example

router bgp 100

network 102.10.32.0 mask 255.255.254.0

ip route 102.10.32.0 255.255.254.0 serial0

A matching route must exist in the routing table before the network is announced

Forces origin to be “IGP”

Page 195: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

195

Configuring Aggregation Three ways to configure route aggregation

redistribute static aggregate-address network command

Page 196: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

196

Configuring Aggregation Configuration Example:

router bgp 100

redistribute static

ip route 102.10.0.0 255.255.0.0 null0 250

static route to “null0” is called a pull up route packets only sent here if there is no more

specific match in the routing table distance of 250 ensures this is last resort static care required – see previously!

Page 197: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

197

Configuring Aggregation – Network Command Configuration Example

router bgp 100

network 102.10.0.0 mask 255.255.0.0

ip route 102.10.0.0 255.255.0.0 null0 250

A matching route must exist in the routing table before the network is announced

Easiest and best way of generating an aggregate

Page 198: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Configuring Aggregation – aggregate-address command Configuration Example:

router bgp 100

network 102.10.32.0 mask 255.255.252.0

aggregate-address 102.10.0.0 255.255.0.0 [summary-only]

Requires more specific prefix in BGP table before aggregate is announced

summary-only keyword Optional keyword which ensures that only the summary is

announced if a more specific prefix exists in the routing table

Page 199: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

199

SummaryBGP neighbour status

Router6>sh ip bgp sum

BGP router identifier 10.0.15.246, local AS number 10

BGP table version is 16, main routing table version 16

7 network entries using 819 bytes of memory

14 path entries using 728 bytes of memory

2/1 BGP path/bestpath attribute entries using 248 bytes of memory

0 BGP route-map cache entries using 0 bytes of memory

0 BGP filter-list cache entries using 0 bytes of memory

BGP using 1795 total bytes of memory

BGP activity 7/0 prefixes, 14/0 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

10.0.15.241 4 10 9 8 16 0 0 00:04:47 2

10.0.15.242 4 10 6 5 16 0 0 00:01:43 2

10.0.15.243 4 10 9 8 16 0 0 00:04:49 2

...

BGP Version Updates sent and received

Updates waiting

Page 200: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

200

SummaryBGP Table

Router6>sh ip bgpBGP table version is 16, local router ID is 10.0.15.246Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incompleteRPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path *>i 10.0.0.0/26 10.0.15.241 0 100 0 i *>i 10.0.0.64/26 10.0.15.242 0 100 0 i *>i 10.0.0.128/26 10.0.15.243 0 100 0 i *>i 10.0.0.192/26 10.0.15.244 0 100 0 i *>i 10.0.1.0/26 10.0.15.245 0 100 0 i *> 10.0.1.64/26 0.0.0.0 0 32768 i *>i 10.0.1.128/26 10.0.15.247 0 100 0 i *>i 10.0.1.192/26 10.0.15.248 0 100 0 i *>i 10.0.2.0/26 10.0.15.249 0 100 0 i *>i 10.0.2.64/26 10.0.15.250 0 100 0 i...

Page 201: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

201

Summary BGP4 – path vector protocol iBGP versus eBGP stable iBGP – peer with loopbacks announcing prefixes & aggregates

Page 202: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

202

Introduction to BGP

ISP Training WorkshopsISP Training Workshops

Page 203: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP Policy Control

ISP Training Workshops

203

Page 204: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Applying Policy with BGP Policy-based on AS path, community or the

prefix Rejecting/accepting selected routes Set attributes to influence path selection Tools:

Prefix-list (filters prefixes) Filter-list (filters ASes) Route-maps and communities

204

Page 205: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Prefix List Per neighbour prefix filter

incremental configuration Inbound or Outbound Based upon network numbers (using

familiar IPv4 address/mask format) Using access-lists in Cisco IOS for filtering

prefixes was deprecated long ago Strongly discouraged!

205

Page 206: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Prefix-list Command Syntax Syntax:

[no] ip prefix-list list-name [seq seq-value] permit|deny network/len [ge ge-value] [le le-value]

network/len: The prefix and its lengthge ge-value: “greater than or equal to”le le-value: “less than or equal to”

Both “ge” and “le” are optional Used to specify the range of the prefix length to be

matched for prefixes that are more specific than network/len

Sequence number is also optional no ip prefix-list sequence-number to disable

display of sequence numbers206

Page 207: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Prefix Lists – Examples Deny default route

ip prefix-list EG deny 0.0.0.0/0

Permit the prefix 35.0.0.0/8ip prefix-list EG permit 35.0.0.0/8

Deny the prefix 172.16.0.0/12ip prefix-list EG deny 172.16.0.0/12

In 192/8 allow up to /24ip prefix-list EG permit 192.0.0.0/8 le 24 This allows all prefix sizes in the 192.0.0.0/8 address

block, apart from /25, /26, /27, /28, /29, /30, /31 and /32.

207

Page 208: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Prefix Lists – Examples In 192/8 deny /25 and above

ip prefix-list EG deny 192.0.0.0/8 ge 25 This denies all prefix sizes /25, /26, /27, /28, /29, /30, /31

and /32 in the address block 192.0.0.0/8. It has the same effect as the previous example

In 193/8 permit prefixes between /12 and /20ip prefix-list EG permit 193.0.0.0/8 ge 12 le 20 This denies all prefix sizes /8, /9, /10, /11, /21, /22, …

and higher in the address block 193.0.0.0/8. Permit all prefixes

ip prefix-list EG permit 0.0.0.0/0 le 32 0.0.0.0 matches all possible addresses, “0 le 32”

matches all possible prefix lengths208

Page 209: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Prefix List Example Configuration

router bgp 100

network 105.7.0.0 mask 255.255.0.0

neighbor 102.10.1.1 remote-as 110

neighbor 102.10.1.1 prefix-list AS110-IN in

neighbor 102.10.1.1 prefix-list AS110-OUT out

!

ip prefix-list AS110-IN deny 218.10.0.0/16

ip prefix-list AS110-IN permit 0.0.0.0/0 le 32

ip prefix-list AS110-OUT permit 105.7.0.0/16

ip prefix-list AS110-OUT deny 0.0.0.0/0 le 32

209

Page 210: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Filter List Filter routes based on AS path

Inbound or Outbound Example Configuration:

router bgp 100

network 105.7.0.0 mask 255.255.0.0

neighbor 102.10.1.1 filter-list 5 out

neighbor 102.10.1.1 filter-list 6 in

!

ip as-path access-list 5 permit ^200$

ip as-path access-list 6 permit ^150$

210

Page 211: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Regular Expressions Like Unix regular expressions

. Match one character* Match any number of preceding expression+ Match at least one of preceding expression^ Beginning of line$ End of line\ Escape a regular expression character_ Beginning, end, white-space, brace| Or() brackets to contain expression[] brackets to contain number ranges

211

Page 212: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Regular Expressions Simple Examples

.* match anything

.+ match at least one character^$ match routes local to this AS_1800$ originated by AS1800^1800_ received from AS1800_1800_ via AS1800_790_1800_ via AS1800 and AS790_(1800_)+ multiple AS1800 in sequence

(used to match AS-PATH prepends)_\(65530\)_ via AS65530 (confederations)

212

Page 213: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Regular Expressions Not so simple Examples

^[0-9]+$ Match AS_PATH length of one^[0-9]+_[0-9]+$ Match AS_PATH length of two^[0-9]*_[0-9]+$ Match AS_PATH length of one or two^[0-9]*_[0-9]*$ Match AS_PATH length of one or two

(will also match zero)^[0-9]+_[0-9]+_[0-9]+$ Match AS_PATH length of three_(701|1800)_ Match anything which has gone

through AS701 or AS1800_1849(_.+_)12163$ Match anything of origin AS12163

and passed through AS1849

213

Page 214: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Policy Control – Route Maps A route-map is like a “programme” for IOS Has “line” numbers, like programmes Each line is a separate condition/action Concept is basically:

if match then do expression and exitelseif match then do expression and exitelse etc

Route-map “continue” lets ISPs apply multiple conditions and actions in one route-map

214

Page 215: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – Caveats Lines can have multiple set statements Lines can have multiple match statements Line with only a match statement

Only prefixes matching go through, the rest are dropped Line with only a set statement

All prefixes are matched and set Any following lines are ignored

Line with a match/set statement and no following lines Only prefixes matching are set, the rest are dropped

215

Page 216: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – Caveats Example

Omitting the third line below means that prefixes not matching list-one or list-two are dropped

route-map sample permit 10 match ip address prefix-list list-one set local-preference 120!route-map sample permit 20 match ip address prefix-list list-two set local-preference 80!route-map sample permit 30 ! Don’t forget this

216

Page 217: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – Matching prefixes Example Configuration

router bgp 100 neighbor 1.1.1.1 route-map infilter in!route-map infilter permit 10 match ip address prefix-list HIGH-PREF set local-preference 120!route-map infilter permit 20 match ip address prefix-list LOW-PREF set local-preference 80!ip prefix-list HIGH-PREF permit 10.0.0.0/8ip prefix-list LOW-PREF permit 20.0.0.0/8

217

Page 218: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – AS-PATH filtering Example Configuration

router bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-as-path in!route-map filter-on-as-path permit 10 match as-path 1 set local-preference 80!route-map filter-on-as-path permit 20 match as-path 2 set local-preference 200!ip as-path access-list 1 permit _150$ip as-path access-list 2 permit _210_

218

Page 219: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – AS-PATH prepends Example configuration of AS-PATH prepend

router bgp 300

network 105.7.0.0 mask 255.255.0.0

neighbor 2.2.2.2 remote-as 100

neighbor 2.2.2.2 route-map SETPATH out

!

route-map SETPATH permit 10

set as-path prepend 300 300

Use your own AS number when prepending Otherwise BGP loop detection may cause disconnects

219

Page 220: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – Matching Communities

Example Configurationrouter bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-community in!route-map filter-on-community permit 10 match community 1 set local-preference 50!route-map filter-on-community permit 20 match community 2 exact-match set local-preference 200!ip community-list 1 permit 150:3 200:5ip community-list 2 permit 88:6

220

Page 221: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Community-List Processing Note:

When multiple values are configured in the same community list statement, a logical AND condition is created. All community values must match to satisfy an AND conditionip community-list 1 permit 150:3 200:5

When multiple values are configured in separate community list statements, a logical OR condition is created. The first list that matches a condition is processed

ip community-list 1 permit 150:3

ip community-list 1 permit 200:5221

Page 222: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Maps – Setting Communities

Example Configurationrouter bgp 100 network 105.7.0.0 mask 255.255.0.0 neighbor 102.10.1.1 remote-as 200 neighbor 102.10.1.1 send-community neighbor 102.10.1.1 route-map set-community out!route-map set-community permit 10 match ip address prefix-list NO-ANNOUNCE set community no-export!route-map set-community permit 20 match ip address prefix-list AGGREGATE!ip prefix-list NO-ANNOUNCE permit 105.7.0.0/16 ge 17ip prefix-list AGGREGATE permit 105.7.0.0/16 222

Page 223: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Map Continue Handling multiple conditions and actions in one route-

map (for BGP neighbour relationships only)route-map peer-filter permit 10 match ip address prefix-list group-one continue 30 set metric 2000!route-map peer-filter permit 20 match ip address prefix-list group-two set community no-export!route-map peer-filter permit 30 match ip address prefix-list group-three set as-path prepend 100 100!

223

Page 224: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Order of processing BGP policy For policies applied to a specific BGP

neighbour, the following sequence is applied: For inbound updates, the order is:

Route-map Filter-list Prefix-list

For outbound updates, the order is: Prefix-list Filter-list Route-map

224

Page 225: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Managing Policy Changes New policies only apply to the updates going

through the router AFTER the policy has been introduced or changed

To facilitate policy changes on the entire BGP table the router handles the BGP peerings need to be “refreshed” This is done by clearing the BGP session either in or out,

for example:clear ip bgp <neighbour-addr> in|out

Do NOT forget in or out — doing so results in a hard reset of the BGP session

225

Page 226: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Managing Policy Changes Ability to clear the BGP sessions of groups of

neighbours configured according to several criteria

clear ip bgp <addr> [in|out]<addr> may be any of the followingx.x.x.x IP address of a peer* all peersASN all peers in an ASexternal all external peerspeer-group <name> all peers in a peer-group

226

Page 227: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP Policy Control

ISP Training WorkshopsISP Training Workshops

227

Page 228: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange Point Design

ISP Training Workshops

228

Page 229: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IXP Design Background Why set up an IXP? Layer 2 Exchange Point Layer 3 “Exchange Point” Design Considerations Route Collectors & Servers What can go wrong?

229

Page 230: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

A bit of history

In a time long gone…

230

Page 231: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

A Bit of History… End of NSFnet – one major backbone move towards commercial Internet

Private companies selling their bandwidth Need for coordination of routing exchange

between providers Traffic from ISP A needs to get to ISP B

Routing Arbiter project created to facilitate this

231

Page 232: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What is an Exchange Point Network Access Points (NAPs) established

at end of NSFnet The original “exchange points”

Major providers connect their networks and exchange traffic

High-speed network or ethernet switch Simple concept – any place where

providers come together to exchange traffic

232

Page 233: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange Points Layer 2 exchange point

Ethernet (100Gbps/10Gbps/1Gbps/100Mbps) Older technologies include ATM, Frame Relay,

SRP, FDDI and SMDS Layer 3 exchange point

Router based Has historical status now

233

Page 234: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Why an Internet Exchange Point?

Saving money, improving QoS,Generating a local Internet

economy

234

Page 235: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange PointWhy peer? Consider a region with one ISP

They provide internet connectivity to their customers They have one or two international connections

Internet grows, another ISP sets up in competition They provide internet connectivity to their customers They have one or two international connections

How does traffic from customer of one ISP get to customer of the other ISP? Via the international connections

235

Page 236: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange PointWhy peer? Yes, International Connections…

If satellite, RTT is around 550ms per hop So local traffic takes over 1s round trip

International bandwidth Costs significantly more than domestic

bandwidth Congested with local traffic Wastes money, harms performance

236

Page 237: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange PointWhy peer? Solution:

Two competing ISPs peer with each other Result:

Both save money Local traffic stays local Better network performance, better QoS,… More international bandwidth for expensive

international traffic Everyone is happy

237

Page 238: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange PointWhy peer? A third ISP enters the equation

Becomes a significant player in the region Local and international traffic goes over their

international connections They agree to peer with the two other ISPs

To save money To keep local traffic local To improve network performance, QoS,…

238

Page 239: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange PointWhy peer? Private peering means that the three ISPs

have to buy circuits between each other Works for three ISPs, but adding a fourth or a

fifth means this does not scale Solution:

Internet Exchange Point

239

Page 240: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange Point Every participant has to buy just one

whole circuit From their premises to the IXP

Rather than N-1 half circuits to connect to the N-1 other ISPs 5 ISPs have to buy 4 half circuits = 2 whole

circuits already twice the cost of the IXP connection

240

Page 241: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange Point Solution

Every ISP participates in the IXP Cost is minimal – one local circuit covers all domestic

traffic International circuits are used for just international

traffic – and backing up domestic links in case the IXP fails

Result: Local traffic stays local QoS considerations for local traffic is not an issue RTTs are typically sub 10ms Customers enjoy the Internet experience Local Internet economy grows rapidly

241

Page 242: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange

The traditional IXP

242

Page 243: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IXP Design Very simple concept:

Ethernet switch is the interconnection media IXP is one LAN

Each ISP brings a router, connects it to the ethernet switch provided at the IXP

Each ISP peers with other participants at the IXP using BGP

Scaling this simple concept is the challenge for the larger IXPs

243

Page 244: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange

244

ISP 1 ISP 2ISP 3

IXP ManagementNetwork

ISP 6 ISP 5 ISP 4

Ethernet Switch

IXP Services:

Root & TLD DNS,

Routing Registry

Looking Glass, etc

Page 245: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange

245

ISP 1 ISP 2ISP 3

IXP ManagementNetwork

ISP 6 ISP 5 ISP 4

Ethernet Switches

IXP Services:

Root & TLD DNS,

Routing Registry

Looking Glass, etc

Page 246: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange Two switches for redundancy ISPs use dual routers for redundancy or

loadsharing Offer services for the “common good”

Internet portals and search engines DNS Root & TLDs, NTP servers Routing Registry and Looking Glass

246

Page 247: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange Requires neutral IXP management

Usually funded equally by IXP participants 24x7 cover, support, value add services

Secure and neutral location Configuration

Private address space if non-transit and no value add services

Otherwise public IPv4 (/24) and IPv6 (/64) ISPs require AS, basic IXP does not

247

Page 248: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Layer 2 Exchange Network Security Considerations

LAN switch needs to be securely configured Management routers require TACACS+

authentication, vty security IXP services must be behind router(s) with

strong filters

248

Page 249: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

“Layer 3 IXP” Layer 3 IXP is marketing concept used by

Transit ISPs Real Internet Exchange Points are only

Layer 2

249

Page 250: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IXP Design Considerations

250

Page 251: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Design The IXP Core is an Ethernet switch

It must be a managed switch Has superseded all other types of network

devices for an IXP From the cheapest and smallest managed 12

or 24 port 10/100 switch To the largest switches now handling high

densities of 10GE and 100GE interfaces

251

Page 252: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Design Each ISP participating in the IXP brings a

router to the IXP location Router needs:

One Ethernet port to connect to IXP switch One WAN port to connect to the WAN media

leading back to the ISP backbone To be able to run BGP

252

Page 253: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Design IXP switch located in one equipment rack

dedicated to IXP Also includes other IXP operational equipment

Routers from participant ISPs located in neighbouring/adjacent rack(s)

Copper (UTP) connections made for 10Mbps, 100Mbps or 1Gbps connections

Fibre used for 1Gbps, 10Gbps, 40Gbps or 100Gbps connections

253

Page 254: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering Each participant needs to run BGP

They need their own AS number Public ASN, NOT private ASN

Each participant configures external BGP directly with the other participants in the IXP Peering with all participants

or Peering with a subset of participants

254

Page 255: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering (more) Mandatory Multi-Lateral Peering (MMLP)

Each participant is forced to peer with every other participant as part of their IXP membership

Has no history of success — the practice is strongly discouraged

Multi-Lateral Peering (MLP) Each participant peers with every other participant

(usually via a Route Server) Bi-Lateral Peering

Participants set up peering with each other according to their own requirements and business relationships

This is the most common situation at IXPs today

255

Page 256: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Routing ISP border routers at the IXP must NOT be

configured with a default route or carry the full Internet routing table Carrying default or full table means that this router and

the ISP network is open to abuse by non-peering IXP members

Correct configuration is only to carry routes offered to IXP peers on the IXP peering router

Note: Some ISPs offer transit across IX fabrics They do so at their own risk – see above

256

Page 257: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Routing (more) ISP border routers at the IXP should not be

configured to carry the IXP LAN network within the IGP or iBGP Use next-hop-self BGP concept

Don’t generate ISP prefix aggregates on IXP peering router If connection from backbone to IXP router goes

down, normal BGP failover will then be successful

257

Page 258: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Address Space Some IXPs use private addresses for the IX LAN

Public address space means IXP network could be leaked to Internet which may be undesirable

Because most ISPs filter RFC1918 address space, this avoids the problem

Some IXPs use public addresses for the IX LAN Address space available from the RIRs IXP terms of participation often forbid the IX LAN to be

carried in the ISP member backbone

258

Page 259: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Hardware Try not to mix port speeds

if 10Mbps and 100Mbps connections available, terminate on different switches (L2 IXP)

Don’t mix transports if terminating ATM PVCs and G/F/Ethernet,

terminate on different devices Insist that IXP participants bring their own

router moves buffering problem off the IXP security is responsibility of the ISP, not the IXP

259

Page 260: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Charging IXPs should be run at minimal cost to participants Examples:

Datacentre hosts IX for free Because ISP participants then use data centre for co-lo

services, and the datacentre benefits long term IX operates cost recovery

Each member pays a flat fee towards the cost of the switch, hosting, power & management

Different pricing for different ports One slot may handle 24 10GE ports Or one slot may handle 96 1GE ports 96 port 1GE card is tenth price of 24 port 10GE card Relative port cost is passed on to participants

260

Page 261: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Services Offered Services offered should not compete with

member ISPs (basic IXP) e.g. web hosting at an IXP is a bad idea unless

all members agree to it IXP operations should make performance

and throughput statistics available to members Use tools such as MRTG/Cacti to produce IX

throughput graphs for member (or public) information

261

Page 262: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Services to Offer ccTLD DNS

the country IXP could host the country’s top level DNS e.g. “SE.” TLD is hosted at Netnod IXes in Sweden Offer back up of other country ccTLD DNS

Root server Anycast instances of I.root-servers.net, F.root-

servers.net etc are present at many IXes Usenet News

Usenet News is high volume could save bandwidth to all IXP members

262

Page 263: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Services to Offer Route Collector

Route collector shows the reachability information available at the exchange

Technical detail covered later on Looking Glass

One way of making the Route Collector routes available for global view (e.g. www.traceroute.org)

Public or members only access

263

Page 264: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Services to Offer Content Redistribution/Caching

For example, Akamised update distribution service

Network Time Protocol Locate a stratum 1 time source (GPS receiver,

atomic clock, etc) at IXP Routing Registry

Used to register the routing policy of the IXP membership (more later)

264

Page 265: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to Route Collectors

What routes are available at the IXP?

265

Page 266: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What is a Route Collector? Usually a router or Unix system running

BGP Gathers routing information from service

provider routers at an IXP Peers with each ISP using BGP

Does not forward packets Does not announce any prefixes to ISPs

266

Page 267: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Purpose of a Route Collector To provide a public view of the Routing

Information available at the IXP Useful for existing members to check

functionality of BGP filters Useful for prospective members to check value

of joining the IXP Useful for the Internet Operations community

for troubleshooting purposes E.g. www.traceroute.org

267

Page 268: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Collector at an IXP

268Route Collector

R1

R3

R5SWITCH

R2 R4

Page 269: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Collector Requirements Router or Unix system running BGP

Minimal memory requirements – only holds IXP routes Minimal packet forwarding requirements – doesn’t

forward any packets Peers eBGP with every IXP member

Accepts everything; Gives nothing Uses a private ASN Connects to IXP Transit LAN

“Back end” connection Second Ethernet globally routed Connection to IXP Website for public access

269

Page 270: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Collector Implementation Most IXPs now implement some form of

Route Collector Benefits already mentioned Great public relations tool Unsophisticated requirements

Just runs BGP

270

Page 271: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Introduction to Route Servers

How to scale very large IXPs

271

Page 272: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What is a Route Server? Has all the features of a Route Collector But also:

Announces routes to participating IXP members according to their routing policy definitions

Implemented using the same specification as for a Route Collector

272

Page 273: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Features of a Route Server Helps scale routing for large IXPs Simplifies Routing Processes on ISP

Routers Optional participation

Provided as service, is NOT mandatory Does result in insertion of RS Autonomous

System Number in the Routing Path Optionally uses Policy registered in IRR

273

Page 274: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Diagram of N-squared Peering Mesh

For large IXPs (dozens for participants) maintaining a larger peering mesh becomes cumbersome and often too hard

274

Page 275: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Peering Mesh with Route Servers

ISP routers peer with the Route Servers Only need to have two eBGP sessions rather

than N275

RS RS

Page 276: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

RS based Exchange Point Routing Flow

276

TRAFFIC FLOW ROUTING INFORMATION FLOW

RS

Page 277: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Advantages of Using a Route Server Advantageous for large IXPs

Helps scale eBGP mesh Helps scale prefix distribution

Separation of Routing and Forwarding Simplifies BGP Configuration Management

on ISP routers

277

Page 278: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Disadvantages of using a Route Server ISPs can lose direct policy control

If RS is only peer, ISPs have no control over who their prefixes are distributed to

Completely dependent on 3rd party Configuration, troubleshooting, etc…

Insertion of RS ASN into routing path (If using a router rather than a dedicated route-

server BGP implementation) Traffic engineering/multihoming needs more

care

278

Page 279: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Typical usage of a Route Server Route Servers may be provided as an

OPTIONAL service Most common at large IXPs (>50 participants) Examples: LINX, TorIX, AMS-IX, etc

ISPs peer: Directly with significant peers With Route Server for the rest

279

Page 280: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Things to think about... Would using a route server benefit you?

Helpful when BGP knowledge is limited (but is NOT an excuse not to learn BGP)

Avoids having to maintain a large number of eBGP peers

But can you afford to lose policy control? (An ISP not in control of their routing policy is what?)

280

Page 281: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong…

The different ways IXP operators harm their IXP…

281

Page 282: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Concept Some Service Providers attempt to cash in

on the reputation of IXPs Market Internet transit services as

“Internet Exchange Point” “We are exchanging packets with other ISPs, so

we are an Internet Exchange Point!” So-called Layer-3 Exchanges — really Internet

Transit Providers Router used rather than a Switch Most famous example: SingTelIX

282

Page 283: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Financial Some IXPs price the IX out of the means of

most providers IXP is intended to encourage local peering Acceptable charging model is minimally cost-

recovery only Some IXPs charge for port traffic

IXPs are not a transit service, charging for traffic puts the IX in competition with members

(There is nothing wrong with charging different flat fees for 100Mbps, 1Gbps, 10Gbps etc ports as they all have different hardware costs on the switch.)

283

Page 284: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Competition Too many exchange points in one locale

Competing exchanges defeats the purpose Becomes expensive for ISPs to connect to

all of them

An IXP: is NOT a competition is NOT a profit making business

284

Page 285: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Rules and Restrictions IXPs try to compete with their membership

Offering services that ISPs would/do offer their customers

IXPs run as a closed privileged club e.g.: Restrictive membership criteria

IXPs providing access to end users rather than just Service Providers

IXPs interfering with ISP business decisions e.g. Mandatory Multi-Lateral Peering

285

Page 286: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Technical Design Errors Interconnected IXPs

IXP in one location believes it should connect directly to the IXP in another location

Who pays for the interconnect? How is traffic metered? Competes with the ISPs who already provide

transit between the two locations (who then refuse to join IX, harming the viability of the IX)

Metro interconnections work ok (e.g. LINX, AMS-IX, DE-CIX etc)

286

Page 287: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Technical Design Errors ISPs bridge the IXP LAN back to their

offices “We are poor, we can’t afford a router” Financial benefits of connecting to an IXP far

outweigh the cost of a router In reality it allows the ISP to connect any

devices to the IXP LAN — with disastrous consequences for the security, integrity and reliability of the IXP

287

Page 288: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Routing Design Errors Route Server implemented from Day One

ISPs have no incentive to learn BGP Therefore have no incentive to understand

peering relationships, peering policies, &c Entirely dependent on operator of RS for

troubleshooting, configuration, reliability RS can’t be run by committee!

Route Server is to help scale peering at LARGE IXPs

288

Page 289: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Routing Design Errors iBGP Route Reflector used to distribute prefixes

between IXP participants Claimed Advantage (1):

Participants don’t need to know about or run BGP Actually a Disadvantage

IXP Operator has to know BGP ISP not knowing BGP is big commercial disadvantage ISPs who would like to have a growing successful

business need to be able to multi-home, peer with other ISPs, etc — these activities require BGP

289

Page 290: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Routing Design Errors (cont) Route Reflector Claimed Advantage (2):

Allows an IXP to be started very quickly Fact:

IXP is only an Ethernet switch — setting up an iBGP mesh with participants is no quicker than setting up an eBGP mesh

290

Page 291: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Routing Design Errors (cont) Route Reflector Claimed Advantage (3):

IXP operator has full control over IXP activities Actually a Disadvantage

ISP participants surrender control of: Their border router; it is located in IXP’s AS Their routing and peering policy

IXP operator is single point of failure If they aren’t available 24x7, then neither is the IXP BGP configuration errors by IXP operator have real

impacts on ISP operations

291

Page 292: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

What can go wrong?Routing Design Errors (cont) Route Reflector Disadvantage (4):

Migration from Route Reflector to “correct” routing configuration is highly non-trivial

ISP router is in IXP’s ASN Need to move ISP router from IXP’s ASN to the ISP’s

ASN Need to reconfigure BGP on ISP router, add to ISP’s

IGP and iBGP mesh, and set up eBGP with IXP participants and/or the IXP Route Server

292

Page 293: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

More Information

293

Page 294: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange PointPolicies & Politics AUPs

Acceptable Use Policy Minimal rules for connection

Fees? Some IXPs charge no fee Other IXPs charge cost recovery A few IXPs are commercial

Nobody is obliged to peer Agreements left to ISPs, not mandated by IXP

294

Page 295: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point etiquette Don’t point default route at another IXP

participant Be aware of third-party next-hop Only announce your aggregate routes

Read RIPE-399 firstwww.ripe.net/docs/ripe-399.html

Filter! Filter! Filter!

295

Page 296: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Examples LINX in London, UK TorIX in Toronto, Canada AMS-IX in Amsterdam, Netherlands SIX in Seattle, Washington, US PA-IX in Palo Alto, California, US JPNAP in Tokyo, Japan DE-CIX in Frankfurt, Germany HK-IX in Hong Kong… All use Ethernet Switches

296

Page 297: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Features of IXPs (1) Redundancy & Reliability

Multiple switches, UPS Support

NOC to provide 24x7 support for problems at the exchange

DNS, Route Collector, Content & NTP servers ccTLD & root servers Content redistribution systems such as Akamai Route Collector – Routing Table view

297

Page 298: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Features of IXPs (2) Location

neutral co-location facilities Address space

Peering LAN AS Number

If using Route Collector/Server Route servers (optional, for larger IXPs) Statistics

Traffic data – for membership

298

Page 299: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

More info about IXPs http://www.pch.net/documents

Another excellent resource of IXP locations, papers, IXP statistics, etc

http://www.telegeography.com/ee/ix/index.php A collection of IXPs and interconnect points for

ISPs

299

Page 300: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary L2 IXP – most commonly deployed

The core is an ethernet switch ATM and other old technologies are obsolete

L3 IXP – nowadays is a marketing concept used by wholesale ISPs Does not offer the same flexibility as L2 Not recommended unless there are overriding

regulatory or political reasons to do so Avoid!

300

Page 301: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Internet Exchange Point Design

ISP Training Workshops

301

Page 302: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP Configuration for IXPs

ISP Training Workshops

302

Page 303: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Background This presentation covers the BGP

configurations required for a participant at an Internet Exchange Point It does not cover the technical design of an IXP Nor does it cover the financial and operational

benefits of participating in an IXP See the IXP Design Presentation that is part of

this Workshop Material set for financial, technical and operational details

303

Page 304: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Recap: Definitions Transit – carrying traffic across a network,

usually for a fee Traffic and prefixes originating from one AS are

carried across an intermediate AS to reach their destination AS

Peering – private interconnect between two ASNs, usually for no fee

Internet Exchange Point – common interconnect location where several ASNs exchange routing information and traffic

304

Page 305: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

IXP Peering Issues Only announce your aggregates and your

customer aggregates at IXPs Only accept the aggregates which your

peer is entitled to originate Never carry a default route on an IXP (or

private) peering router

305

Page 306: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

ISP Transit Issues

Many mistakes are made on the Internet today due to

incomplete understanding of how to configure BGP for

peering at Internet Exchange Points

306

Page 307: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Simple BGP Configuration

example

Exchange Point Configuration

307

Page 308: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Example Exchange point with 6 ASes present

Layer 2 – ethernet switch Each ISP peers with the other

NO transit across the IXP is allowed

308

Page 309: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point

Each of these represents a border router in a different autonomous system 309

AS110

AS100

AS130

AS150

AS120

AS140

AA

BB

C C

FF

EE

DD

Page 310: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Router configuration IXP router is usually located at the

Exchange Point premises Configuration needs to be such that

disconnecting it from the backbone does not cause routing loops or traffic blackholes

Create a peer-group for IXP peers All outbound policy to each peer will be the

same Ensure the router is not carrying the

default route Or the full routing table (for that matter)

310

Page 311: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Creating a peer-group & route-maprouter bgp 100

neighbor ixp-peer peer-group

neighbor ixp-peer send-community

neighbor ixp-peer prefix-list my-prefixes out

neighbor ixp-peer route-map set-local-pref in

!

ip prefix-list my-prefixes permit 121.10.0.0/19

!

route-map set-local-pref permit 10

set local-preference 150

!

311

Only allow AS100 address block to IXP peers

Prefixes heard from IXP peers have highest preference

Page 312: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Interface and BGP configuration (1)interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.1 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp no ip redirects!router bgp 100 neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peer neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in

312

IXP LAN BCP configuration

Page 313: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Interface and BGP Configuration (2) neighbor 120.5.10.4 remote-as 130 neighbor 120.5.10.4 peer-group ixp-peers neighbor 120.5.10.4 prefix-list peer130 in neighbor 120.5.10.5 remote-as 140 neighbor 120.5.10.5 peer-group ixp-peers neighbor 120.5.10.5 prefix-list peer140 in neighbor 120.5.10.6 remote-as 150 neighbor 120.5.10.6 peer-group ixp-peers neighbor 120.5.10.6 prefix-list peer150 in!ip route 121.10.0.0 255.255.224.0 null0!ip prefix-list peer110 permit 122.0.0.0/19ip prefix-list peer120 permit 122.30.0.0/19ip prefix-list peer130 permit 122.12.0.0/19ip prefix-list peer140 permit 122.18.128.0/19ip prefix-list peer150 permit 122.1.32.0/19

313

Peer-group applied to each peer

Each peer has own inbound filter

Page 314: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Configuration of the other routers in the

AS is similar in concept Notice inbound and outbound prefix filters

outbound announces myprefixes only inbound accepts peer prefixes only

Notice inbound route-map Set local preference higher than default

ensures that if the same prefix is heard via AS100 upstream, the best path for traffic is via the IXP

314

Page 315: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Ethernet port configuration

Be aware of LAN configuration best practices Switch off proxy arp, redirects and broadcasts

(if not already default) IXP border router must NOT carry prefixes

with origin outside local AS and IXP participant ASes Helps prevent “stealing of bandwidth”

315

Page 316: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Issues:

AS100 needs to know all the prefixes its peers are announcing

New prefixes requires the prefix-lists to be updated

Alternative solutions Use the Internet Routing Registry to build

prefix list Use AS Path filters (could be risky)

316

Page 317: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

More Complex BGP example

Exchange Point Configuration

317

Page 318: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Example Exchange point with 6 ASes present

Layer 2 – ethernet switch Each ISP peers with the other

NO transit across the IXP allowed ISPs at exchange points provide transit to their

BGP customers

318

Page 319: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point

Each of these represents a border router in a different autonomous system

319

AS110

AS100

AS130

AS150

AS120

AS140

AA

BB

C C

FF

EE

DD

AS200

AS201

Page 320: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange PointRouter A configuration

interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.2 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp no ip redirects!router bgp 100 neighbor ixp-peers peer-group neighbor ixp-peers send-community neighbor ixp-peers prefix-list bogons out neighbor ixp-peers filter-list 10 out neighbor ixp-peers route-map set-local-pref in...next slide

320

Filter by ASN rather than by prefix – and block bogons too

Page 321: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peers neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in neighbor 120.5.10.4 remote-as 130 neighbor 120.5.10.4 peer-group ixp-peers neighbor 120.5.10.4 prefix-list peer130 in neighbor 120.5.10.5 remote-as 140 neighbor 120.5.10.5 peer-group ixp-peers neighbor 120.5.10.5 prefix-list peer140 in neighbor 120.5.10.6 remote-as 150 neighbor 120.5.10.6 peer-group ixp-peers neighbor 120.5.10.6 prefix-list peer150 in

321

Page 322: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Pointip route 121.10.0.0 255.255.224.0 null0!ip as-path access-list 10 permit ^$ip as-path access-list 10 permit ^200$ip as-path access-list 10 permit ^201$!ip prefix-list peer110 permit 122.0.0.0/19ip prefix-list peer120 permit 122.30.0.0/19ip prefix-list peer130 permit 122.12.0.0/19ip prefix-list peer140 permit 122.18.128.0/19ip prefix-list peer150 permit 122.1.32.0/19!route-map set-local-pref permit 10 set local-preference 150

322

Page 323: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Notice the change in router A’s configuration

Filter-list instead of prefix-list permits local and customer ASes out to exchange

Prefix-list blocks Special Use Address prefixes – rest get out, could be risky

Other issues as previously This configuration will not scale as more and

more BGP customers are added to AS100 As-path filter has to be updated each time Solution: BGP communities

323

Page 324: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

More scalable BGP example

Exchange Point Configuration

324

Page 325: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point Example (Scalable) Exchange point with 6 ASes present

Layer 2 – ethernet switch Each ISP peers with the other

NO transit across the IXP allowed ISPs at exchange points provide transit to their

BGP customers (Scalable solution is presented here)

325

Page 326: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Exchange Point

Each of these represents a border router in a different autonomous system - each ASN has BGP customers of their own

326

AS110

AS100

AS130

AS150

AS120

AS140

AA

BB

C C

FF

EE

DD

Page 327: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Router configuration Take AS100 as an example

Has 15 BGP customers, in AS501 to AS515 Create a peer-group for IXP peers

All outbound policy to each peer will be the same

Communities will be used AS-path filters will not scale well

Community Policy AS100 aggregate put into 100:1000 All BGP customer aggregates go into 100:1100

327

Page 328: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Creating a peer-group & route-maprouter bgp 100 neighbor ixp-peer peer-group neighbor ixp-peer send-community neighbor ixp-peer route-map ixp-peers-out out neighbor ixp-peer route-map set-local-pref in!ip community-list 10 permit 100:1000ip community-list 11 permit 100:1100!route-map ixp-peers-out permit 10 match community 10 11!route-map set-local-pref permit 10 set local-preference 150!

328

AS100 aggregate

Prefixes heard from IXP peers have highest preference

AS100 BGP customers

Page 329: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP configuration for IXP router

router bgp 100 neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peer neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in...etc

Remaining configuration is the same as earlier Note the reliance again on inbound prefix-lists for

peers Peers need to update the ISP if filters need to be changed And that’s what the IRR is for (otherwise use email)

329

Page 330: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP configuration for AS100’s customer aggregation router

router bgp 100 network 121.10.0.0 mask 255.255.192.0 route-map set-comm neighbor 121.10.4.2 remote-as 501 neighbor 121.10.4.2 prefix-list as501-in in neighbor 121.10.4.2 prefix-list default out neighbor 121.10.4.2 route-map set-cust-policy in...etc!route-map set-comm permit 10 set community 100:1000!route-map set-cust-policy permit 10 set community 100:1100!

330

Set community on AS100 aggregate

Set community on BGP customer routes

Page 331: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Scalable IXP policy ISP Community policy is set on ingress ISP now relies on communities to determine what

is announced at the IXP No need to update any as-path filters, prefix-lists, &c

If BGP customer announces more prefixes, only the filters at the aggregation edge need to be updated And those new prefixes will automatically be tagged

with the community to allow them through to AS100’s IXP peers

Consult the BGP community presentation for more extensive examples

331

Page 332: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Servers IXP operators quite often provide a Route Server

to assist with scaling the BGP mesh All prefixes sent to a Route Server are usually

distributed to all ASNs that peer with the Route Server (although some IXPs offer ISPs the facility to configure

specific policies on their Route Server) BGP configuration to peer with a Route Server is

the same as for any other ordinary peer But note that the route server will offer prefixes from

several ASNs (the IXP membership who choose to participate)

Inbound filter should be constructed appropriately

332

Page 333: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Route Servers Route Server software suppresses the ASN of the

RS so that it doesn’t appear in the AS-path IOS by default will not accept prefixes from a

neighbouring AS unless that AS is first in the AS-path

router bgp 100

no bgp enforce-first-as

neighbor x.x.x.a remote-as 65534

neighbor x.x.x.a route-map IXP-RS-in in

neighbor x.x.x.a route-map ixp-peers-out out

333

Needed so that IOS can receive prefixes without AS65534 being first in path

Page 334: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary

Exchange Point Configuration

334

Page 335: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

Summary Ensure that BGP is scalable on your IXP peering

router Manually updating filters every time a new customer

connects is tiresome and has potential to cause errors Only carry local ASN prefixes and customer

routes on the IXP peering router Anything else (e.g. default or full BGP table) has the

potential to result in bandwidth theft Filter IXP peer announcements

Inbound – use the IRR if maintaining prefix-lists is difficult

Outbound – use communities for scalability

335

Page 336: IXP Training Workshops Contact: training@apnic.net WROU03_v1.0

BGP Configuration for IXPs

ISP Training Workshops

336