Upload
mikayla-juster
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction to The Internet
IXP Training Workshops
2
Introduction to the Internet Topologies and Definitions IP Addressing Internet Hierarchy Gluing it all together
3
Topologies and Definitions
What does all the jargon mean?
4
Some Icons…
5
Router (layer 3, IP datagram forwarding)
Network Cloud
Ethernet switch (layer 2, packet forwarding)
Routed Backbone ISPs build networks covering
regions Regions can cover a country,
sub-continent, or even global Each region has points of
presence built by the ISP Routers are the
infrastructure Physical circuits run between
routers Easy routing configuration,
operation and troubleshooting
The dominant topology used in the Internet today
6
MPLS Backbones Some ISPs & Telcos use
Multi Protocol Label Switching (MPLS)
MPLS is built on top of router infrastructure Used replace old ATM
technology Tunnelling technology
Main purpose is to provide VPN services Although these can be
done just as easily with other tunnelling technologies such as GRE
7
Points of Presence PoP – Point of Presence
Physical location of ISP’s equipment Sometimes called a “node”
vPoP – virtual PoP To the end user, it looks like an ISP location In reality a back hauled access point Used mainly for consumer access networks
Hub/SuperPoP – large central PoP Links to many PoPs
8
PoP Topologies Core routers
high speed trunk connections Distribution routers
higher port density, aggregating network edge to the network core
Access routers high port density, connecting the end users to the
network Border routers
connections to other providers Service routers
hosting and servers Some functions might be handled by a single
router9
Typical PoP Design
10
Backbone linkto another PoP
Backbone linkto another PoP
Business Customer
Aggregation
Other ISPs
NetworkCore
ISP Services (DNS, Mail, News,
FTP, WWW)
Hosted Services
Consumer Aggregation
Other ISPs
Border
Service
Access AccessServiceNetwork
Operation Centre
More Definitions Transit
Carrying traffic across a network Usually for a fee
Peering Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering
Default Where to send traffic when there is no
explicit match in the routing table
11
Peering and Transit example
12
provider A
provider C
provider B
Backbone Provider D
A and B peer for free, but need transit arrangements with D to get packets to/from C
IXP-WestIXP-East
Private Interconnect
13
ISP A
ISP B
Autonomous System 99
Autonomous System 334
border border
Public Interconnect A location or facility where several ISPs are
present and connect to each other over a common shared media
Why? To save money, reduce latency, improve
performance IXP – Internet eXchange Point NAP – Network Access Point
14
Public Interconnect Centralised (in one facility) Distributed (connected via WAN links) Switched interconnect
Ethernet (Layer 2) Technologies such as SRP, FDDI, ATM, Frame
Relay, SMDS and even routers have been used in the past
Each provider establishes peering relationship with other providers at IXP ISP border router peers with all other provider
border routers
15
Public Interconnect
16
Each of these represents a border router in a different autonomous system
ISP 1
ISP 2
ISP 3 ISP 6
ISP 5
ISP 4
IXP
ISPs participating in Internet Bringing all pieces together, ISPs:
Build multiple PoPs in a distributed network Build redundant backbones Have redundant external connectivity Obtain transit from upstream providers Get free peering from local providers at IXPs
17
Example ISP Backbone Design
18
NetworkCorePoP 1
PoP 4
PoP 3
PoP 2
IXP
ISP PeerISP Peer
ISP PeerISP Peer
Backbone LinksUpstream1
Upstream 2
Upstream1
Upstream 2
IP Addressing
Where to get address space and who from
19
IP Addressing Internet uses classless routing Concept of IPv4 class A, class B or class C
is no more Engineers talk in terms of prefix length, for
example the class B 158.43 is now called 158.43/16.
All routers must be CIDR capable Classless InterDomain Routing RFC1812 – Router Requirements
20
IP Addressing Pre-CIDR (before 1994)
Big networks got a class A Medium networks got a class B Small networks got a class C
The CIDR IPv4 years (1994 to 2010) Sizes of IPv4 allocations/assignments made according to
demonstrated need – CLASSLESS IPv6 adoption (from 2011)
The size of IPv4 address allocations and assignments are now very limited as IANA’s free pool has run out
21
IP Addressing IP Address space is a resource shared amongst all
Internet users Regional Internet Registries delegated allocation
responsibility by the IANA AfriNIC, APNIC, ARIN, LACNIC & RIPE NCC are the five
RIRs RIRs allocate address space to ISPs and Local Internet
Registries ISPs/LIRs assign address space to end customers or
other ISPs All usable IPv4 address space has been allocated
to the RIRs by the IANA (February 2011) The time for IPv6 is now
22
Non-portable Address Space “Provider Aggregatable” or “PA Space”
Customer uses RIR member’s address space while connected to Internet
Customer has to renumber to change ISP Aids control of size of Internet routing table Need to fragment provider block when
multihoming PA space is allocated to the RIR member
All assignments made by the RIR member to end sites are announced as an aggregate to the rest of the Internet
23
Portable Address Space “Provider Independent” or “PI Space”
Customer gets or has address space independent of ISP
Customer keeps addresses when changing ISP Is very bad for size of Internet routing table Is very bad for scalability of the routing system PI space is rarely distributed by the RIRs
24
Internet Hierarchy
The pecking order
25
High Level View of the Global Internet
26
Internet Exchange PointR4
Global Providers
Regional Provider 1
AccessProvider 1
Customer Networks
AccessProvider 2
Regional Provider 2
Content Provider 1
Content Provider 2
Detailed View of the Global Internet Global Transit Providers
Connect to each other Provide connectivity to Regional Transit Providers
Regional Transit Providers Connect to each other Provide connectivity to Content Providers Provide connectivity to Access Providers
Access Providers Connect to each other across IXPs (free peering) Provide access to the end user
27
Categorising ISPs
28
Tier 1 ISP
Tier 1 ISP Tier 1 ISP
Tier 1 ISP
$$$$$$$$$$$$$$$
Tier 2 ISP
IXP
Tier 3 ISP
Tier 2 ISP Tier 2 ISP
Tier 2 ISP
IXP
Tier 3 ISP
Tier 3 ISP Tier 3 ISP
Tier 3 ISP
Tier 3 ISP
Inter-provider relationships Peering between equivalent sizes of
service providers (e.g. Tier 2 to Tier 2) Shared cost private interconnection, equal
traffic flows No cost peering
Peering across exchange points If convenient, of mutual benefit, technically
feasible Fee based peering
Unequal traffic flows, “market position”29
Default Free Zone
30
The default free zone is made up of Internet routers which
have explicit routing information about the rest of the Internet, and therefore do not need to use a default route
NB: is not related to where an ISP is in the hierarchy
Gluing it together
31
Gluing it together Who runs the Internet?
No one (Definitely not ICANN, nor the RIRs, nor the US,…)
How does it keep working? Inter-provider business relationships and the need for
customer reachability ensures that the Internet by and large functions for the common good
Any facilities to help keep it working? Not really. But… Engineers keep working together!
32
Engineers keep talking to each other... North America
NANOG (North American Network Operators Group) NANOG meetings and mailing list www.nanog.org
Latin America Foro de Redes NAPLA LACNOG – supported by LACNIC
Middle East MENOG (Middle East Network Operators Group) www.menog.net
33
Engineers keep talking to each other... Asia & Pacific
APRICOT annual conference www.apricot.net
APOPS & APNIC-TALK mailing lists mailman.apnic.net/mailman/listinfo/apops mailman.apnic.net/mailman/listinfo/apnic-talk
PacNOG (Pacific NOG) mailman.apnic.net/mailman/listinfo/pacnog
SANOG (South Asia NOG) E-mail to [email protected]
34
Engineers keep talking to each other... Europe
RIPE meetings, working groups and mailing lists e.g. Routing WG: www.ripe.net/mailman/listinfo/routing-
wg Africa
AfNOG meetings and mailing list And many in-country ISP associations and NOGs IETF meetings and mailing lists
www.ietf.org
35
Summary Topologies and Definitions IP Addressing
PA versus PI address space Internet Hierarchy
Local, Regional, Global Transit Providers IXPs
Gluing it all together Engineers cooperate, common business
interests
36
Introduction to The Internet
ISP Training Workshops
37
The Value of Peering
ISP Training Workshops
38
The Internet Internet is made up of ISPs of all shapes and sizes
Some have local coverage (access providers) Others can provide regional or per country coverage And others are global in scale
These ISPs interconnect their businesses They don’t interconnect with every other ISP (over
41000 distinct autonomous networks) – won’t scale They interconnect according to practical and business
needs Some ISPs provide transit to others
They interconnect other ISP networks
39
Categorising ISPs
40
Global ISP
Global ISP Global ISP
Global ISP
$$$$$$$$$$$$$$$
Regional ISP
IXP
Access ISP
Regional ISP Regional ISP
Regional ISP
IXP
Access ISP
Access ISP Access ISP
Access ISP
Access ISP
Peering and Transit Transit
Carrying traffic across a network Usually for a fee Example: Access provider connects to a
regional provider Peering
Exchanging routing information and traffic Usually for no fee Sometimes called settlement free peering Example: Regional provider connects to
another regional provider
41
Private Interconnect Two ISPs connect their networks over a
private link Can be peering arrangement
No charge for traffic Share cost of the link
Can be transit arrangement One ISP charges the other for traffic One ISP (the customer) pays for the link
42
ISP 1 ISP 2
Public Interconnect Several ISPs meeting in a common neutral
location and interconnect their networks Usually is a peering arrangement between
their networks
43
IXP
ISP 1 ISP 2
ISP 3
ISP 4ISP 5
ISP 6
ISP Goals Minimise the cost of operating the business Transit
ISP has to pay for circuit (international or domestic) ISP has to pay for data (usually per Mbps) Repeat for each transit provider Significant cost of being a service provider
Peering ISP shares circuit cost with peer (private) or runs circuit
to public peering point (one off cost) No need to pay for data Reduces transit data volume, therefore reducing cost
44
Transit – How it works Small access provider provides Internet access
for a city’s population Mixture of dial up, wireless and fixed broadband Possibly some business customers Possibly also some Internet cafes
How do their customers get access to the rest of the Internet?
ISP buys access from one, two or more larger ISPs who already have visibility of the rest of the Internet This is transit – they pay for the physical connection to
the upstream and for the traffic volume on the link45
Peering – How it works If two ISPs are of equivalent sizes, they have:
Equivalent network infrastructure coverage Equivalent customer size Similar content volumes to be shared with the Internet Potentially similar traffic flows to each other’s networks
This makes them good peering partners If they don’t peer
They both have to pay an upstream provider for access to each other’s network/customers/content
Upstream benefits from this arrangement, the two ISPs both have to fund the transit costs
46
The IXP’s role Private peering makes sense when there
are very few equivalent players Connecting to one other ISP costs X Connecting to two other ISPs costs 2 times X Connecting to three other ISPs costs 3 times X Etc… (where X is half the circuit cost plus a
port cost) The more private peers, the greater the
cost IXP is a more scalable solution to this
problem47
The IXP’s role Connecting to an IXP
ISP costs: one router port, one circuit, and one router to locate at the IXP
Some IXPs charge annual “maintenance fees” The maintenance fee has potential to significantly
influence the cost balance for an ISP Generally connecting to an IXP and peering there
becomes cost effective when there are at least three other peers The real $ amount varies from region to region, IXP to
IXP
48
Who peers at an IXP? Access Providers
Don’t have to pay their regional provider transit fees for local traffic
Keeps latency for local traffic low ‘Unlimited’ bandwidth through the IXP (compared with
costly and limited bandwidth through transit provider) Regional Providers
Don’t have to pay their global provider transit for local and regional traffic
Keeps latency for local and regional traffic low ‘Unlimited’ bandwidth through the IXP (compared with
costly and limited bandwidth through global provider)
49
The IXP’s role Global Providers can be located close to IXPs
Attracted by the potential transit business available Advantageous for access & regional providers
They can peer with other similar providers at the IXP And in the same facility pay for transit to their regional
or global provider (Not across the IXP fabric, but a separate connection)
50
Transit
IXP
Access
Connectivity Decisions Transit
Almost every ISP needs transit to reach rest of Internet One provider = no redundancy Two providers: ideal for traffic engineering as well as
redundancy Three providers = better redundancy, traffic engineering
gets harder More then three = diminishing returns, rapidly
escalating costs and complexity Peering
Means low (or zero) cost access to another network Private or Public Peering (or both)
51
Transit Goals1. Minimise number of transit providers
But maintain redundancy 2 is ideal, 4 or more is bad
2. Aggregate capacity to transit providers More aggregated capacity means better value
Lower cost per Mbps 4x 45Mbps circuits to 4 different ISPs will
almost always cost more than 2x 155Mbps circuits to 2 different ISPs Yet bandwidth of latter (310Mbps) is greater than
that of former (180Mbps) and is much easier to operate 52
Peering or Transit? How to choose? Or do both? It comes down to cost of going to an IXP
Free peering Paying for transit from an ISP co-located in
same facility, or perhaps close by Or not going to an IXP and paying for the
cost of transit directly to an upstream provider There is no right or wrong answer, someone
has to do the arithmetic53
Private or Public Peering Private peering
Scaling issue, with costs, number of providers, and infrastructure provisioning
Public peering Makes sense the more potential peers there are (more is
usually greater than “two”) Which public peering point?
Local Internet Exchange Point: great for local traffic and local peers
Regional Internet Exchange Point: great for meeting peers outside the locality, might be cheaper than paying transit to reach the same consumer base
54
Local Internet Exchange Point Defined as a public peering point serving
the local Internet industry Local means where it becomes cheaper to
interconnect with other ISPs at a common location than it is to pay transit to another ISP to reach the same consumer base Local can mean different things in different
regions!
55
Regional Internet Exchange Point These are also “local” Internet Exchange Points But also attract regional ISPs and ISPs from
outside the locality Regional ISPs peer with each other And show up at several of these Regional IXPs
Local ISPs peer with ISPs from outside the locality They don’t compete in each other’s markets Local ISPs don’t have to pay transit costs ISPs from outside the locality don’t have to pay transit
costs Quite often ISPs of disparate sizes and influences will
happily peer – to defray transit costs
56
Which IXP? How many routes are available?
What is traffic to & from these destinations, and by how much will it reduce cost of transit?
What is the cost of co-lo space? If prohibitive or space not available, pointless choosing
this IXP What is the cost of running a circuit to the
location? If prohibitive or competitive with transit costs, pointless
choosing this IXP What is the cost of remote hands/assistance?
If no remote hands, doing maintenance is challenging and potentially costly with a serious outage
57
Example: South Asian ISP @ LINX Date: October 2011 Facts:
Route Server plus bilateral peering offers 81k prefixes
IXP traffic averages 55Mbps/15Mbps Transit traffic averages 35Mbps/3Mbps
Analysis: 61% of inbound traffic comes from 81k prefixes
available by peering 39% of inbound traffic comes from remaining
287k prefixes from transit provider
58
Example: South Asian ISP @ HKIX Date: October 2011 Facts:
Route Server plus bilateral peering offers 34k prefixes
IXP traffic is 130Mbps/30Mbps Transit traffic is 125Mbps/40Mbps
Analysis: 51% of inbound traffic comes from 42k prefixes
available by peering 49% of inbound traffic comes from remaining
326k prefixes from transit provider
59
Example: South Asian ISP Summary:
Traffic by Peering: 185Mbps/45Mbps Traffic by Transit: 160Mbps/43Mbps
54% of incoming traffic is by peering 52% of outbound traffic is by peering
60
Example: South Asian ISP Router at remote co-lo
Benefits: can select peers, easy to swap transit providers
Costs: co-lo space and remote hands Servers at remote co-lo
Benefits: mail filtering, content caching, etc Costs: co-lo space and remote hands
Overall advantage: Can control what goes on the expensive
connectivity “back to home”61
Value propositions Peering at a local IXP
Reduces latency & transit costs for local traffic Improves Internet quality perception
Participating at a Regional IXP A means of offsetting transit costs
Managing connection back to home network
Improving Internet Quality perception for customers
62
Summary Benefits of peering
Private Internet Exchange Points
Local versus Regional IXPs Local services local traffic Regional helps defray transit costs
63
Worked Example
Single International TransitVersus
Local IXP + Regional IXP + Transit64
Worked Example ISP A is local access provider
Some business customers (around 200 fixed links) Some co-located content provision (datacentre with 100
servers) Some consumers on broadband (5000
DSL/Cable/Wireless) Some consumers on dial (1000 on V.34 type speeds)
They have a single transit provider Connect with a 16Mbps international leased link to their
transit’s PoP Transit link is highly congested
65
Worked Example (2) There are two other ISPs serving the same locality
There is no interconnection between any of the three ISPs
Local traffic (between all 3 ISPs) is traversing International connections
Course of action for our ISP: Work to establish local IXP Establish presence at overseas co-location
First Step Assess local versus international traffic ratio Use NetFlow on border router connecting to transit
provider
66
Worked Example (3) Local/Non-local traffic ratio
Local = traffic going to other two ISPs Non-local = traffic going elsewhere
Example: balance is 30:70 Of 16Mbps, that means 5Mbps could stay in country and
not congest International circuit 16Mbps transit costs $50 per Mbps per month traffic
charges = $250 per month, or $3000 per year for local traffic
Circuit costs $100k per year: $30k is spent on local traffic
Total is $33k per year for local traffic
67
Worked Example (4) IXP cost:
Simple 8 port 10/100 managed switch plus co-lo space over 3 years could be around US$30k total; or $3k per year per ISP
One router to handle 5Mbps (e.g. 2801) would be around $3k (good for 3 years)
One local 10Mbps circuit from ISP location to IXP location would be around $5k per year, no traffic charges
Per ISP total: $9k Somewhat cheaper than $33k Business case for local peering is straightforward - $24k
saving per annum
68
Worked Example (5) After IXP establishment
5Mbps removed from International link Leaving 5Mbps for more International traffic – and that
fills the link within weeks of the local traffic being removed
Next step is to assess transit charges and optimise costs ISPs visits several major regional IXPs Assess routes available Compares routes available with traffic generated by
those routes from its Netflow data Discovers that 30% of traffic would transfer to one IXP
via peering
69
Worked Example (6) Costs:
Router for Regional IXP (e.g. 2801) at $3k over three years
Co-lo space at Regional IXP venue at $3k per year Best price for transit at the Regional IXP venue by
competitive tender is $30 per Mbps per month, plus $1k port charge
30% of traffic offloads to IXP, leaving 70% of 16Mbps to transit provider = $330 per month, or $5k per annum
Total with this model is $9k per year, plus the cost of the circuit (still $100k)
Compare this with paying $50 per Mbps per month to the transit provider = $10k per annum (plus cost of the circuit)
70
Worked Example (7) Result:
ISP co-locates at Regional IXP Pays reduced transit charges to transit provider
(competitive tender) Pays no charges for traffic across Regional IXP
Bonuses: Rate limits on router at Regional IXP Co-lo
Can prioritise congestion dependent on customer demands Install servers at Regional IXP co-lo facility
Filters e-mail (spam and viruses) – relieves some capacity on link
Caches content – relieves a little more capacity on link
71
Conclusion Within the original costs of having one
international transit provider: ISP has turned up at the local IXP and offloaded local
traffic for free ISP has turned up at a major regional IXP and offloaded
traffic, avoiding paying transit charges to transit provider
ISP has reduced remaining transit charges by competitive tender at the regional IXP co-location facility
Caveat These numbers are typical of the Internet today As ever, your mileage may vary – but do the financial
calculations first and in the context of potential technical advantages too
72
The Value of Peering
ISP Training Workshops
73
Introduction to OSPF
ISP Training Workshops
74
OSPF Open Shortest Path
First Link state or SPF
technology Developed by OSPF
working group of IETF (RFC 1247)
OSPFv2 standard described in RFC2328
Designed for: TCP/IP environment Fast convergence Variable-length subnet
masks Discontiguous subnets Incremental updates Route authentication
Runs on IP, Protocol 89
75
Link State
76
Topology Information is kept in a Database separate from the Routing Table
AABBCC
2213131313
QQZZXX
ZZ
XX
YYQQ
Z ’s Link State
Q ’s Link State
X ’s Link State
Link State Routing Neighbour discovery Constructing a Link State Packet (LSP) Distribute the LSP
(Link State Announcement – LSA)
Compute routes On network failure
New LSPs flooded All routers recompute routing table
77
Low Bandwidth Utilisation
Only changes propagated Uses multicast on multi-access broadcast
networks78
LSA
X
LSA
R1
Fast Convergence Detection Plus LSA/SPF
Known as the Dijkstra Algorithm
79
X N2
Alternate Path
Primary Path
N1
R2
R1 R3
Fast Convergence Finding a new
route LSA flooded
throughout area Acknowledgement
based Topology database
synchronised Each router derives
routing table to destination network
80
LSA
N1
R1 X
OSPF Areas Area is a group of
contiguous hosts and networks Reduces routing traffic
Per area topology database Invisible outside the
area Backbone area
MUST be contiguous All other areas must
be connected to the backbone
81
Area 1
Area 2 Area 3
R1 R2
R3R6
Area 4
R5 R4R7R8
RaRd
RbRc
Area 0Backbone Area
Virtual Links between OSPF Areas
Virtual Link is used when it is not possible to physically connect the area to the backbone
ISPs avoid designs which require virtual links Increases complexity Decreases reliability
and scalability
82
Area 1
R3R6
Area 4R5 R4
R7R8
RaRd
RbRc
Area 0Backbone Area
Classification of Routers
Internal Router (IR) Area Border Router (ABR) Backbone Router (BR) Autonomous System
Border Router (ASBR)
83
R1 R2
R3
R5 R4
Rd Ra
RbRc
IR
ABR/BR
IR/BRASBR
To other AS
IR
Area 1
Area 0
Area 2 Area 3
OSPF Route Types
Intra-area Route all routes inside an area
Inter-area Route routes advertised from
one area to another by an Area Border Router
External Route routes imported into
OSPF from other protocol or static routes
84
R1 R2
R3
R5 R4
Rd Ra
RbRc
IR
ABR/BR
ASBR
To other AS
IR
Area 1
Area 0
Area 2 Area 3
External Routes Prefixes which are redistributed into OSPF from
other protocols Flooded unaltered throughout the AS
Recommendation: Avoid redistribution!! OSPF supports two types of external metrics
Type 1 external metrics Type 2 external metrics (Cisco IOS default)
85
RIPEIGRPBGPStaticConnectedetc.
OSPF
Redistribute
R2
External Routes Type 1 external metric: metrics are added
to the summarised internal link cost
86
NetworkN1N1
Type 11110
Next HopR2R3
Cost = 10to N1
External Cost = 1
to N1 External Cost = 2Cost = 8
Selected Route
R3
R1
R2
External Routes Type 2 external metric: metrics are
compared without adding to the internal link cost
87
Cost = 10to N1
External Cost = 1
to N1 External Cost = 2Cost = 8
Selected Route
R3
R1
R2
NetworkN1N1
Type 112
Next HopR2R3
Topology/Link State Database
A router has a separate LS database for each area to which it belongs
All routers belonging to the same area have identical database
SPF calculation is performed separately for each area
LSA flooding is bounded by area Recommendation:
Limit the number of areas a router participates in!! 1 to 3 is fine (typical ISP design) >3 can overload the CPU depending on the area
topology complexity
88
The Hello Protocol Responsible for
establishing and maintaining neighbour relationships
Elects designated router on multi-access networks
89
Hello
HelloHello
The Hello Packet Contains:
Router priority Hello interval Router dead
interval Network mask List of neighbours DR and BDR Options: E-bit, MC-
bit,… (see A.2 of RFC2328)
90
Hello
HelloHello
Designated Router There is ONE designated router per multi-
access network Generates network link advertisements Assists in database synchronization
91
Designated Router
Designated Router
BackupDesignated Router
BackupDesignated
Router
Designated Router by Priority
Configured priority (per interface) ISPs configure high priority on the routers they want
as DR/BDR Else determined by highest router ID
Router ID is 32 bit integer Derived from the loopback interface address, if
configured, otherwise the highest IP address
92144.254.3.5
R2 Router ID = 131.108.3.3
131.108.3.2 131.108.3.3
R1 Router ID = 144.254.3.5
DR R2R1
Neighbouring States Full
Routers are fully adjacent Databases synchronised Relationship to DR and BDR
93
Full
DR BDR
Neighbouring States 2-way
Router sees itself in other Hello packets DR selected from neighbours in state 2-way or
greater
94
2-way
DR BDR
When to Become Adjacent Underlying network is point to point Underlying network type is virtual link The router itself is the designated router
or the backup designated router The neighbouring router is the designated
router or the backup designated router
95
LSAs Propagate Along Adjacencies
LSAs acknowledged along adjacencies
96
DR BDR
Broadcast Networks IP Multicast used for Sending and
Receiving Updates All routers must accept packets sent to
AllSPFRouters (224.0.0.5) All DR and BDR routers must accept packets
sent to AllDRouters (224.0.0.6) Hello packets sent to AllSPFRouters
(Unicast on point-to-point and virtual links)
97
Routing Protocol Packets Share a common protocol header Routing protocol packets are sent with type of
service (TOS) of 0 Five types of OSPF routing protocol packets
Hello – packet type 1 Database description – packet type 2 Link-state request – packet type 3 Link-state update – packet type 4 Link-state acknowledgement – packet type 5
98
Different Types of LSAs Six distinct type of LSAs
Type 1 : Router LSA Type 2 : Network LSA Type 3 & 4: Summary LSA Type 5 & 7: External LSA (Type 7 is for NSSA) Type 6: Group membership LSA Type 9, 10 & 11: Opaque LSA (9: Link-Local, 10: Area)
99
Router LSA (Type 1) Describes the state and cost of the router’s
links to the area All of the router’s links in an area must be
described in a single LSA Flooded throughout the particular area
and no more Router indicates whether it is an ASBR,
ABR, or end point of virtual link
100
Network LSA (Type 2) Generated for every transit broadcast and
NBMA network Describes all the routers attached to the
network Only the designated router originates this
LSA Flooded throughout the area and no more
101
Summary LSA (Type 3 and 4) Describes the destination outside the area
but still in the AS Flooded throughout a single area Originated by an ABR Only inter-area routes are advertised into
the backbone Type 4 is the information about the ASBR
102
External LSA (Type 5 and 7) Defines routes to destination external to
the AS Default route is also sent as external Two types of external LSA:
E1: Consider the total cost up to the external destination
E2: Considers only the cost of the outgoing interface to the external destination
(Type 7 LSAs used to describe external LSA for one specific OSPF area type)
103
Inter-Area Route Summarisation Prefix or all subnets Prefix or all networks ‘Area range’ command
104
1.A 1.B 1.C
(ABR)Network1
Next HopR1
Network1.A1.B1.C
Next HopR1R1R1
With summarisation
Withoutsummarisation
BackboneArea 0
Area 1R1
R2
No Summarisation Specific Link LSA advertised out of each area Link state changes propagated out of each area
105
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
1.A1.B1.C1.D Area 0
2.A2.B2.C2.D
3.A3.B3.C3.D
With Summarisation Only summary LSA advertised out of each area Link state changes do not propagate out of the area
106
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
1
Area 0
2
3
No Summarisation Specific Link LSA advertised in to each area Link state changes propagated in to each area
107
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
2.A 2.B2.C 2.D3.A 3.B3.C 3.D Area 0
1.A 1.B1.C 1.D3.A 3.B3.C 3.D
1.A 1.B1.C 1.D2.A 2.B2.C 2.D
With Summarisation Only summary link LSA advertised in to each area Link state changes do not propagate in to each area
108
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
2 3 Area 0
1 3
12
Types of Areas Regular Stub Totally Stubby Not-So-Stubby Only “regular” areas are useful for ISPs
Other area types handle redistribution of other routing protocols into OSPF – ISPs don’t redistribute anything into OSPF
The next slides describing the different area types are provided for information only
109
Regular Area (Not a Stub) From Area 1’s point of view, summary networks from other
areas are injected, as are external networks such as X.1
110
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
2 3 Area 0
1 3
12
ASBRExternal networks
X.1
X.1
X.1
X.1
X.1
X.1
X.1
Normal Stub Area Summary networks, default route injected Command is area x stub
111
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
2 3 Area 0
1 3
12
ASBRExternal networks
X.1
X.1
Default
X.1
X.1
Default
Default
Totally Stubby Area Only a default route injected
Default path to closest area border router Command is area x stub no-summary
112
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
Area 0
1 3
1 2
ASBRExternal networks
X.1
X.1
Default
X.1
X.1
Default
DefaultTotally Stubby Area
Not-So-Stubby Area Capable of importing routes in a limited fashion Type-7 LSA’s carry external information within an NSSA NSSA Border routers translate selected type-7 LSAs into type-5 external
network LSAs
113
3.A3.B
3.C 3.D2.A2.B
2.C 2.D
1.A1.B
1.C 1.D
Area 0
1 3
1 2
ASBRExternal networks
X.1
X.1
Default
X.1
X.1
Default X.2
Default X.2
Not-So-Stubby Area
External networks
X.2
X.2
X.2
ISP Use of Areas ISP networks use:
Backbone area Regular area
Backbone area No partitioning
Regular area Summarisation of point to point link addresses used
within areas Loopback addresses allowed out of regular areas without
summarisation (otherwise iBGP won’t work)
114
Addressing for Areas
Assign contiguous ranges of subnets per area to facilitate summarisation
115
Area 1network 192.168.1.64range 255.255.255.192
Area 2network 192.168.1.128range 255.255.255.192
Area 3network 192.168.1.192range 255.255.255.192
Area 0network 192.168.1.0range 255.255.255.192
Summary Fundamentals of Scalable OSPF Network
Design Area hierarchy DR/BDR selection Contiguous intra-area addressing Route summarisation Infrastructure prefixes only
116
Introduction to OSPF
ISP Training Workshops
117
Deploying OSPF for ISPs
ISP Training WorkshopsISP Training Workshops
118
Agenda OSPF Design in SP Networks Adding Networks in OSPF OSPF in Cisco’s IOS
119
OSPF Design
As applicable to Service Provider Networks
120
Service Providers SP networks are divided
into PoPs PoPs are linked by the
backbone Transit routing information
is carried via iBGP IGP is only used to carry
the next hop for BGP Optimal path to the next
hop is critical
121
SP Architecture Major routing
information is ~430K prefixes via BGP
Largest known IGP routing table is ~9–10K
Total of 440K 10K/440K is 2½% of IGP
routes in an ISP network A very small factor but
has a huge impact on network convergence!
122
IP Backbone
POP
POPPOP
POP
Area 1/L1BGP 1
POP POP
Area 6/L1BGP 1
Area 5/L1BGP 1
Area 4/L1BGP 1
Area 2/L1BGP 1
Area 3/L1BGP 1Area0/L2
BGP 1
SP Architecture You can reduce the IGP
size from 10K to approx the number of routers in your network
This will bring really fast convergence
Optimise where you must and summarise where you can
Stops unnecessary flapping
123
RR
Regional Core
Access
customer customer customer
IGP
OSPF Design: Addressing OSPF Design and Addressing go together
Objective is to keep the Link State Database lean
Create an address hierarchy to match the topology
Use separate Address Blocks for loopbacks, network infrastructure, customer interfaces & customers
124
InfrastructureCustomer Address Space LoopbacksPtP Links
OSPF Design: Addressing Minimising the number of prefixes in OSPF:
Number loopbacks out of a contiguous address block
But do not summarise these across area boundaries: iBGP peer addresses need to be in the IGP
Use contiguous address blocks per area for infrastructure point-to-point links
Use area range command on ABR to summarise
With these guidelines: Number of prefixes in area 0 will then be very close to
the number of routers in the network It is critically important that the number of prefixes and
LSAs in area 0 is kept to the absolute minimum
125
OSPF Design: Areas Examine physical topology
Is it meshed or hub-and-spoke? Use areas and summarisation
This reduces overhead and LSA counts (but watch next-hop for iBGP when summarising)
Don’t bother with the various stub areas No benefits for ISPs, causes problems for iBGP
Push the creation of a backbone Reduces mesh and promotes hierarchy
126
OSPF Design: Areas One SPF per area, flooding done per area
Watch out for overloading ABRs Avoid externals in OSPF
DO NOT REDISTRIBUTE into OSPF External LSAs flood through entire network
Different types of areas do different flooding Normal areas Stub areas Totally stubby (stub no-summary) Not so stubby areas (NSSA)
127
OSPF Design: Areas Area 0 must be contiguous
Do NOT use virtual links to join two Area 0 islands Traffic between two non-zero areas always goes
via Area 0 There is no benefit in joining two non-zero areas
together Avoid designs which have two non-zero areas touching
each other (Typical design is an area per PoP, with core routers
being ABR to the backbone area 0)
128
OSPF Design: Summary Think Redundancy
Dual Links out of each area – using metrics (cost) for traffic engineering
Too much redundancy… Dual links to backbone in stub areas must be
the same cost – other wise sub-optimal routing will result
Too Much Redundancy in the backbone area without good summarisation will effect convergence in the Area 0
129
OSPF Areas: Migration Where to place OSPF Areas?
Follow the physical topology! Remember the earlier design advice
Configure area at a time! Start at the outermost edge of the network Log into routers at either end of a link and change the
link from Area 0 to the chosen Area Wait for OSPF to re-establish adjacencies And then move onto the next link, etc Important to ensure that there is never an Area 0 island
anywhere in the migrating network
130
OSPF Areas: Migration
Migrate small parts of the network, one area at a time Remember to introduce summarisation where feasible
With careful planning, the migration can be done with minimal network downtime 131
Area 0
AA
BB
GGFFEE
DD
CC
Area 10
OSPF for Service Providers
Configuring OSPF & Adding Networks
132
OSPF: Configuration Starting OSPF in Cisco’s IOS
router ospf 100 Where “100” is the process ID
OSPF process ID is unique to the router Gives possibility of running multiple instances of OSPF
on one router Process ID is not passed between routers in an AS Many ISPs configure the process ID to be the same as
their BGP Autonomous System Number
133
OSPF: Establishing Adjacencies Cisco IOS OSPFv2 automatically tries to establish
adjacencies on all defined interfaces (or subnets) Best practice is to disable this
Potential security risk: sending OSPF Hellos outside of the autonomous system, and risking forming adjacencies with external networks
Example: Only POS4/0 interface will attempt to form an OSPF adjacency
router ospf 100
passive-interface default
no passive-interface POS4/0
134
OSPF: Adding NetworksOption One Redistribution:
Applies to all connected interfaces on the router but sends networks as external type-2s – which are not summarised
router ospf 100
redistribute connected subnets
Do NOT do this! Because: Type-2 LSAs flood through entire network These LSAs are not all useful for determining paths
through backbone; they simply take up valuable space
135
OSPF: Adding NetworksOption Two Per link configuration – from IOS 12.4 onwards
OSPF is configured on each interface (same as ISIS) Useful for multiple subnets per interface
interface POS 4/0
ip address 192.168.1.1 255.255.255.0
ip address 172.16.1.1 255.255.255.224 secondary
ip ospf 100 area 0
!
router ospf 100
passive-interface default
no passive-interface POS 4/0
136
OSPF: Adding NetworksOption Three Specific network statements
Every active interface with a configured IP address needs an OSPF network statement
Interfaces that will have no OSPF neighbours need passive-interface to disable OSPF Hello’s
That is: all interfaces connecting to devices outside the ISP backbone (i.e. customers, peers, etc)
router ospf 100
network 192.168.1.0 0.0.0.3 area 51
network 192.168.1.4 0.0.0.3 area 51
passive-interface Serial 1/0
137
OSPF: Adding NetworksOption Four Network statements – wildcard mask
Every active interface with configured IP address covered by wildcard mask used in OSPF network statement
Interfaces covered by wildcard mask but having no OSPF neighbours need passive-interface (or use passive-interface default and then activate the interfaces which will have OSPF neighbours)
router ospf 100
network 192.168.1.0 0.0.0.255 area 51
passive-interface default
no passive interface POS 4/0
138
OSPF: Adding NetworksRecommendations Don’t ever use Option 1 Use Option 2 if supported; otherwise: Option 3 is fine for core/infrastructure routers
Doesn’t scale too well when router has a large number of interfaces but only a few with OSPF neighbours
solution is to use Option 3 with “no passive” on interfaces with OSPF neighbours
Option 4 is preferred for aggregation routers Or use iBGP next-hop-self Or even ip unnumbered on external point-to-point links
139
OSPF: Adding NetworksExample One (Cisco IOS ≥ 12.4)
Aggregation router with large number of leased line customers and just two links to the core network:
interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0interface POS 0/0 ip address 192.168.10.1 255.255.255.252 ip ospf 100 area 0interface POS 1/0 ip address 192.168.10.5 255.255.255.252 ip ospf 100 area 0interface serial 2/0:0 ... ip unnumbered loopback 0! Customers connect here ^^^^^^^router ospf 100 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0
140
OSPF: Adding NetworksExample One (Cisco IOS < 12.4)
Aggregation router with large number of leased line customers and just two links to the core network:
interface loopback 0 ip address 192.168.255.1 255.255.255.255interface POS 0/0 ip address 192.168.10.1 255.255.255.252interface POS 1/0 ip address 192.168.10.5 255.255.255.252interface serial 2/0:0 ... ip unnumbered loopback 0! Customers connect here ^^^^^^^router ospf 100 network 192.168.255.1 0.0.0.0 area 51 network 192.168.10.0 0.0.0.3 area 51 network 192.168.10.4 0.0.0.3 area 51 passive-interface default no passive interface POS 0/0 no passive interface POS 1/0
141
OSPF: Adding NetworksExample Two (Cisco IOS ≥ 12.4)
Core router with only links to other core routers:
interface loopback 0 ip address 192.168.255.1 255.255.255.255 ip ospf 100 area 0interface POS 0/0 ip address 192.168.10.129 255.255.255.252 ip ospf 100 area 0interface POS 1/0 ip address 192.168.10.133 255.255.255.252 ip ospf 100 area 0interface POS 2/0 ip address 192.168.10.137 255.255.255.252 ip ospf 100 area 0interface POS 2/1 ip address 192.168.10.141 255.255.255.252 ip ospf 100 area 0router ospf 100 passive interface loopback 0
142
OSPF: Adding NetworksExample Two (Cisco IOS < 12.4)
Core router with only links to other core routers:
interface loopback 0 ip address 192.168.255.1 255.255.255.255interface POS 0/0 ip address 192.168.10.129 255.255.255.252interface POS 1/0 ip address 192.168.10.133 255.255.255.252interface POS 2/0 ip address 192.168.10.137 255.255.255.252interface POS 2/1 ip address 192.168.10.141 255.255.255.252router ospf 100 network 192.168.255.1 0.0.0.0 area 0 network 192.168.10.128 0.0.0.3 area 0 network 192.168.10.132 0.0.0.3 area 0 network 192.168.10.136 0.0.0.3 area 0 network 192.168.10.140 0.0.0.3 area 0 passive interface loopback 0
143
OSPF: Adding NetworksSummary Key Theme when selecting a technique:
Keep the Link State Database Lean Increases Stability Reduces the amount of information in the Link
State Advertisements (LSAs) Speeds Convergence Time
144
OSPF in Cisco IOS
Useful features for ISPs
145
Areas An area is stored as
a 32-bit field: Defined in IPv4
address format (i.e. Area 0.0.0.0)
Can also be defined using single decimal value (i.e. Area 0)
0.0.0.0 reserved for the backbone area
146
Area 0
Area 1
Area 2
Area 3
Logging Adjacency Changes The router will generate a log message
whenever an OSPF neighbour changes state Syntax:
[no] [ospf] log-adjacency-changes (OSPF keyword is optional, depending on IOS
version) Example of a typical log message:
%OSPF-5-ADJCHG: Process 1, Nbr 223.127.255.223 on Ethernet0 from LOADING to FULL, Loading Done
147
Number of State Changes The number of state transitions is
available via SNMP (ospfNbrEvents) and the CLI: show ip ospf neighbor [type number] [neighbor-id] [detail]
Detail—(Optional) Displays all neighbours given in detail (list all neighbours). When specified, neighbour state transition counters are displayed per interface or neighbour ID
148
State Changes (Continued) To reset OSPF-related statistics, use the clear ip ospf counters command This will reset neighbour state transition
counters per interface or neighbour id clear ip ospf counters [neighbor [<type number>] [neighbor-id]]
149
Router ID If the loopback interface exists and has
an IP address, that is used as the router ID in routing protocols – stability!
If the loopback interface does not exist, or has no IP address, the router ID is the highest IP address configured – danger!
OSPF sub command to manually set the Router ID: router-id <ip address>
150
Cost & Reference Bandwidth
Bandwidth used in Metric calculation Cost = 108/bandwidth Not useful for interface bandwidths > 100 Mbps
Syntax: ospf auto-cost reference-bandwidth <reference-bw>
Default reference bandwidth still 100 Mbps for backward compatibility
Most ISPs simply choose to develop their own cost strategy and apply to each interface type
151
Cost: Example Strategy100GE 100Gbps cost = 140GE/OC768 40Gbps cost = 210GE/OC192 10Gbps cost = 5OC48 2.5Gbps cost = 10GigEthernet 1Gbps cost = 20OC12 622Mbps cost = 50OC3 155Mbps cost = 100FastEthernet 100Mbps cost = 200Ethernet 10Mbps cost = 500E1 2Mbps cost = 1000
152
Default routes Originating a default route into OSPF
default-information originate metric <n> Will originate a default route into OSPF if there is
a matching default route in the Routing Table (RIB)
The optional always keyword will always originate a default route, even if there is no existing entry in the RIB
153
Clear/Restart OSPF clear commands
If no process ID is given, all OSPF processes on the router are assumed
clear ip ospf [pid] redistribution This command clears redistribution based on OSPF routing
process ID clear ip ospf [pid] counters
This command clears counters based on OSPF routing process ID
clear ip ospf [pid] process This command will restart the specified OSPF process. It
attempts to keep the old router-id, except in cases where a new router-id was configured or an old user configured router-id was removed. Since this command can potentially cause a network churn, a user confirmation is required before performing any action 154
Use OSPF Authentication Use authentication
Too many operators overlook this basic requirement When using authentication, use the MD5 feature
Under the global OSPF configuration, specify:area <area-id> authentication message-digest
Under the interface configuration, specify:ip ospf message-digest-key 1 md5 <key>
Authentication can be selectively disabled per interface with:
ip ospf authentication null
155
Point to Point Ethernet Links For any broadcast media (like Ethernet), OSPF will
attempt to elect a designated and backup designated router when it forms an adjacency If the interface is running as a point-to-point WAN link, with
only 2 routers on the wire, configuring OSPF to operate in "point-to-point mode" scales the protocol by reducing the link failure detection times
Point-to-point mode improves convergence times on Ethernet networks because it:
Prevents the election of a DR/BDR on the link, Simplifies the SPF computations and reduces the router's memory
footprint due to a smaller topology database.
interface fastethernet0/2
ip ospf network point-to-point156
Tuning OSPF (1) DR/BDR Selection
ip ospf priority 100 (default 1) This feature should be in use in your OSPF
network Forcibly set your DR and BDR per segment so
that they are known Choose your most powerful, or most idle routers,
so that OSPF converges as fast as possible under maximum network load conditions
Try to keep the DR/BDR limited to one segment each
157
Tuning OSPF (2) OSPF startup
max-metric router-lsa on-startup wait-for-bgp Avoids blackholing traffic on router restart Causes OSPF to announce its prefixes with highest
possible metric until iBGP is up and running When iBGP is running, OSPF metrics return to normal,
make the path valid
ISIS equivalent: set-overload-bit on-startup wait-for-bgp
158
Tuning OSPF (3) Hello/Dead Timers
ip ospf hello-interval 3 (default 10) ip ospf dead-interval 15 (default is 4x hello) This allows for faster network awareness of a failure, and
can result in faster reconvergence, but requires more router CPU and generates more overhead
LSA Pacing timers lsa-group-pacing 300 (default 240) Allows grouping and pacing of LSA updates at configured
interval Reduces overall network and router impact
159
Tuning OSPF (4) OSPF Internal Timers
timers spf 2 8 (default is 5 and 10) Allows you to adjust SPF characteristics The first number sets wait time from topology
change to SPF run The second is hold-down between SPF runs BE CAREFUL WITH THIS COMMAND; if you’re
not sure when to use it, it means you don’t need it; default is sufficient 95% of the time
160
Tuning OSPF (5) LSA filtering/interface blocking
Per interface: ip ospf database-filter all out (no options)
Per neighbor: neighbor 1.1.1.1 database-filter all out (no options)
OSPFs router will flood an LSA out all interfaces except the receiving one; LSA filtering can be useful in cases where such flooding unnecessary (i.e., NBMA networks), where the DR/BDR can handle flooding chores
area <area-id> filter-list <acl> Filters out specific Type 3 LSAs at ABRs
Improper use can result in routing loops and black-holes that can be very difficult to troubleshoot
161
Summary OSPF has a bewildering number of
features and options Observe ISP best practices Keep design and configuration simple Investigate tuning options and suitability
for your own network Don’t just turn them on!
162
Deploying OSPF for ISPs
ISP Training WorkshopsISP Training Workshops
163
164
Introduction to BGP
ISP Training WorkshopsISP Training Workshops
165
Border Gateway Protocol A Routing Protocol used to exchange routing
information between different networks Exterior gateway protocol
Described in RFC4271 RFC4276 gives an implementation report on BGP RFC4277 describes operational experiences using BGP
The Autonomous System is the cornerstone of BGP It is used to uniquely identify networks with a common
routing policy
166
BGP Path Vector Protocol Incremental Updates Many options for policy enforcement Classless Inter Domain Routing (CIDR) Widely used for Internet backbone Autonomous systems
167
Path Vector Protocol BGP is classified as a path vector routing
protocol (see RFC 1322) A path vector protocol defines a route as a
pairing between a destination and the attributes of the path to that destination.
12.6.126.0/24 207.126.96.43 1021 0 6461 7018 6337 11268 i12.6.126.0/24 207.126.96.43 1021 0 6461 7018 6337 11268 i
AS PathAS Path
168
Path Vector Protocol
AS6461
AS7018
AS6337AS11268
AS500
AS600
169
Definitions Transit – carrying traffic across a network,
usually for a fee Peering – exchanging routing information
and traffic Default – where to send traffic when there
is no explicit match in the routing table
Default Free Zone
170
The default free zone is made up of Internet routers which
have explicit routing information about the rest of the Internet, and therefore do not need to use a default route
NB: is not related to where an ISP is in the hierarchy
171
provider A
provider C
provider B
Backbone Provider D
Peering and Transit example
A and B can peer, but need transit arrangements with D to get packets to/from C
IXP-EastIXP-West
172
AS 100
Autonomous System (AS)
Collection of networks with same routing policy Single routing protocol Usually under single ownership, trust and
administrative control Identified by a unique 32-bit integer (ASN)
173
Autonomous System Number (ASN) Two ranges
0-65535 (original 16-bit range) 65536-4294967295 (32-bit range – RFC4893)
Usage: 0 and 65535 (reserved) 1-64495 (public Internet) 64496-64511 (documentation – RFC5398) 64512-65534 (private use only) 23456 (represent 32-bit range in 16-bit
world) 65536-65551 (documentation – RFC5398) 65552-4294967295 (public Internet)
32-bit range representation specified in RFC5396 Defines “asplain” (traditional format) as standard notation
Autonomous System Number (ASN) ASNs are distributed by the Regional Internet
Registries They are also available from upstream ISPs who are
members of one of the RIRs Current 16-bit ASN allocations up to 61439 have
been made to the RIRs Around 42000 are visible on the Internet
Each RIR has also received a block of 32-bit ASNs Out of 3100 assignments, around 2800 are visible on the
Internet See www.iana.org/assignments/as-numbers
174
Configuring BGP in Cisco IOS This command enables BGP in Cisco IOS:
router bgp 100
For ASNs > 65535, the AS number can be entered in either plain or dot notation:router bgp 131076
or
router bgp 2.4
IOS will display ASNs in plain notation by default Dot notation is optional:router bgp 2.4
bgp asnotation dot
175
176
AS 100 AS 101
AS 102
EE
BB DD
AA CC
Peering
BGP Basics
Runs over TCP – port 179 Path vector protocol Incremental updates “Internal” & “External” BGP
177
AS 100 AS 101
AS 102
DMZ Network
AA
BB
CC
DD
EE
DMZ is the link or network shared between ASes
Demarcation Zone (DMZ)
178
BGP General Operation Learns multiple paths via internal and
external BGP speakers Picks the best path and installs it in the
routing table (RIB) Best path is sent to external BGP
neighbours Policies are applied by influencing the best
path selection
Constructing the Forwarding Table BGP “in” process
receives path information from peers results of BGP path selection placed in the BGP table “best path” flagged
BGP “out” process announces “best path” information to peers
Best path stored in Routing Table (RIB) Best paths in the RIB are installed in forwarding
table (FIB) if: prefix and prefix length are unique lowest “protocol distance”
179
180
Constructing the Forwarding Table
BGP inprocess
BGPtable
BGP outprocess
accepted
discarded
bgp
peerroutingtable
in
out
best paths
everything
forwardingtable
eBGP & iBGP BGP used internally (iBGP) and externally
(eBGP) iBGP used to carry
Some/all Internet prefixes across ISP backbone ISP’s customer prefixes
eBGP used to Exchange prefixes with other ASes Implement routing policy
181
182
BGP/IGP model used in ISP networks Model representation
IGP
iBGP
IGP
iBGP
IGP
iBGP
IGP
iBGP
eBGP eBGP eBGP
AS1 AS2 AS3 AS4
183
AS 100 AS 101CC
AA
BB
External BGP Peering (eBGP)
Between BGP speakers in different AS Should be directly connected Never run an IGP between eBGP peers
184
Configuring External BGP
Router A in AS100
interface ethernet 5/0 ip address 102.102.10.2 255.255.255.240!router bgp 100 network 100.100.8.0 mask 255.255.252.0 neighbor 102.102.10.1 remote-as 101 neighbor 102.102.10.1 prefix-list RouterC in neighbor 102.102.10.1 prefix-list RouterC out!
ip address on ethernet interface
ip address of Router C ethernet interface
Local ASN
Remote ASN
Inbound and outbound filters
185
Configuring External BGP
Router C in AS101
interface ethernet 1/0/0 ip address 102.102.10.1 255.255.255.240!router bgp 101 network 100.100.64.0 mask 255.255.248.0 neighbor 102.102.10.2 remote-as 100 neighbor 102.102.10.2 prefix-list RouterA in neighbor 102.102.10.2 prefix-list RouterA out!
ip address on ethernet interface
ip address of Router A ethernet interface
Local ASN
Remote ASN
Inbound and outbound filters
Internal BGP (iBGP) BGP peer within the same AS Not required to be directly connected
IGP takes care of inter-BGP speaker connectivity
iBGP speakers must be fully meshed: They originate connected networks They pass on prefixes learned from outside the
ASN They do not pass on prefixes learned from
other iBGP speakers
186
187
AS 100
AA
DD
CC
BB
Internal BGP Peering (iBGP)
Topology independent Each iBGP speaker must peer with every other
iBGP speaker in the AS
188
Peering between Loopback Interfaces
Peer with loop-back interface Loop-back interface does not go down – ever!
Do not want iBGP session to depend on state of a single interface or the physical topology
AS 100
AA
BB
CC
189
Configuring Internal BGP
Router A in AS100
interface loopback 0 ip address 105.3.7.1 255.255.255.255
!router bgp 100
network 100.100.1.0 neighbor 105.3.7.2 remote-as 100 neighbor 105.3.7.2 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 !
ip address on loopback interface
ip address of Router B loopback interface
Local ASN
Local ASN
190
Configuring Internal BGP
Router B in AS100
interface loopback 0 ip address 105.3.7.2 255.255.255.255
!router bgp 100
network 100.100.1.0 neighbor 105.3.7.1 remote-as 100 neighbor 105.3.7.1 update-source loopback0 neighbor 105.3.7.3 remote-as 100 neighbor 105.3.7.3 update-source loopback0 !
ip address on loopback interface
ip address of Router A loopback interface
Local ASN
Local ASN
191
Inserting prefixes into BGP Two ways to insert prefixes into BGP
redistribute static network command
192
Inserting prefixes into BGP –redistribute static
Configuration Example:router bgp 100
redistribute static
ip route 102.10.32.0 255.255.254.0 serial0
Static route must exist before redistribute command will work
Forces origin to be “incomplete” Care required!
193
Inserting prefixes into BGP –redistribute static Care required with redistribute!
redistribute <routing-protocol> means everything in the <routing-protocol> will be transferred into the current routing protocol
Will not scale if uncontrolled Best avoided if at all possible redistribute normally used with “route-maps”
and under tight administrative control
194
Inserting prefixes into BGP –network command Configuration Example
router bgp 100
network 102.10.32.0 mask 255.255.254.0
ip route 102.10.32.0 255.255.254.0 serial0
A matching route must exist in the routing table before the network is announced
Forces origin to be “IGP”
195
Configuring Aggregation Three ways to configure route aggregation
redistribute static aggregate-address network command
196
Configuring Aggregation Configuration Example:
router bgp 100
redistribute static
ip route 102.10.0.0 255.255.0.0 null0 250
static route to “null0” is called a pull up route packets only sent here if there is no more
specific match in the routing table distance of 250 ensures this is last resort static care required – see previously!
197
Configuring Aggregation – Network Command Configuration Example
router bgp 100
network 102.10.0.0 mask 255.255.0.0
ip route 102.10.0.0 255.255.0.0 null0 250
A matching route must exist in the routing table before the network is announced
Easiest and best way of generating an aggregate
Configuring Aggregation – aggregate-address command Configuration Example:
router bgp 100
network 102.10.32.0 mask 255.255.252.0
aggregate-address 102.10.0.0 255.255.0.0 [summary-only]
Requires more specific prefix in BGP table before aggregate is announced
summary-only keyword Optional keyword which ensures that only the summary is
announced if a more specific prefix exists in the routing table
199
SummaryBGP neighbour status
Router6>sh ip bgp sum
BGP router identifier 10.0.15.246, local AS number 10
BGP table version is 16, main routing table version 16
7 network entries using 819 bytes of memory
14 path entries using 728 bytes of memory
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1795 total bytes of memory
BGP activity 7/0 prefixes, 14/0 paths, scan interval 60 secs
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.15.241 4 10 9 8 16 0 0 00:04:47 2
10.0.15.242 4 10 6 5 16 0 0 00:01:43 2
10.0.15.243 4 10 9 8 16 0 0 00:04:49 2
...
BGP Version Updates sent and received
Updates waiting
200
SummaryBGP Table
Router6>sh ip bgpBGP table version is 16, local router ID is 10.0.15.246Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incompleteRPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path *>i 10.0.0.0/26 10.0.15.241 0 100 0 i *>i 10.0.0.64/26 10.0.15.242 0 100 0 i *>i 10.0.0.128/26 10.0.15.243 0 100 0 i *>i 10.0.0.192/26 10.0.15.244 0 100 0 i *>i 10.0.1.0/26 10.0.15.245 0 100 0 i *> 10.0.1.64/26 0.0.0.0 0 32768 i *>i 10.0.1.128/26 10.0.15.247 0 100 0 i *>i 10.0.1.192/26 10.0.15.248 0 100 0 i *>i 10.0.2.0/26 10.0.15.249 0 100 0 i *>i 10.0.2.64/26 10.0.15.250 0 100 0 i...
201
Summary BGP4 – path vector protocol iBGP versus eBGP stable iBGP – peer with loopbacks announcing prefixes & aggregates
202
Introduction to BGP
ISP Training WorkshopsISP Training Workshops
BGP Policy Control
ISP Training Workshops
203
Applying Policy with BGP Policy-based on AS path, community or the
prefix Rejecting/accepting selected routes Set attributes to influence path selection Tools:
Prefix-list (filters prefixes) Filter-list (filters ASes) Route-maps and communities
204
Policy Control – Prefix List Per neighbour prefix filter
incremental configuration Inbound or Outbound Based upon network numbers (using
familiar IPv4 address/mask format) Using access-lists in Cisco IOS for filtering
prefixes was deprecated long ago Strongly discouraged!
205
Prefix-list Command Syntax Syntax:
[no] ip prefix-list list-name [seq seq-value] permit|deny network/len [ge ge-value] [le le-value]
network/len: The prefix and its lengthge ge-value: “greater than or equal to”le le-value: “less than or equal to”
Both “ge” and “le” are optional Used to specify the range of the prefix length to be
matched for prefixes that are more specific than network/len
Sequence number is also optional no ip prefix-list sequence-number to disable
display of sequence numbers206
Prefix Lists – Examples Deny default route
ip prefix-list EG deny 0.0.0.0/0
Permit the prefix 35.0.0.0/8ip prefix-list EG permit 35.0.0.0/8
Deny the prefix 172.16.0.0/12ip prefix-list EG deny 172.16.0.0/12
In 192/8 allow up to /24ip prefix-list EG permit 192.0.0.0/8 le 24 This allows all prefix sizes in the 192.0.0.0/8 address
block, apart from /25, /26, /27, /28, /29, /30, /31 and /32.
207
Prefix Lists – Examples In 192/8 deny /25 and above
ip prefix-list EG deny 192.0.0.0/8 ge 25 This denies all prefix sizes /25, /26, /27, /28, /29, /30, /31
and /32 in the address block 192.0.0.0/8. It has the same effect as the previous example
In 193/8 permit prefixes between /12 and /20ip prefix-list EG permit 193.0.0.0/8 ge 12 le 20 This denies all prefix sizes /8, /9, /10, /11, /21, /22, …
and higher in the address block 193.0.0.0/8. Permit all prefixes
ip prefix-list EG permit 0.0.0.0/0 le 32 0.0.0.0 matches all possible addresses, “0 le 32”
matches all possible prefix lengths208
Policy Control – Prefix List Example Configuration
router bgp 100
network 105.7.0.0 mask 255.255.0.0
neighbor 102.10.1.1 remote-as 110
neighbor 102.10.1.1 prefix-list AS110-IN in
neighbor 102.10.1.1 prefix-list AS110-OUT out
!
ip prefix-list AS110-IN deny 218.10.0.0/16
ip prefix-list AS110-IN permit 0.0.0.0/0 le 32
ip prefix-list AS110-OUT permit 105.7.0.0/16
ip prefix-list AS110-OUT deny 0.0.0.0/0 le 32
209
Policy Control – Filter List Filter routes based on AS path
Inbound or Outbound Example Configuration:
router bgp 100
network 105.7.0.0 mask 255.255.0.0
neighbor 102.10.1.1 filter-list 5 out
neighbor 102.10.1.1 filter-list 6 in
!
ip as-path access-list 5 permit ^200$
ip as-path access-list 6 permit ^150$
210
Policy Control – Regular Expressions Like Unix regular expressions
. Match one character* Match any number of preceding expression+ Match at least one of preceding expression^ Beginning of line$ End of line\ Escape a regular expression character_ Beginning, end, white-space, brace| Or() brackets to contain expression[] brackets to contain number ranges
211
Policy Control – Regular Expressions Simple Examples
.* match anything
.+ match at least one character^$ match routes local to this AS_1800$ originated by AS1800^1800_ received from AS1800_1800_ via AS1800_790_1800_ via AS1800 and AS790_(1800_)+ multiple AS1800 in sequence
(used to match AS-PATH prepends)_\(65530\)_ via AS65530 (confederations)
212
Policy Control – Regular Expressions Not so simple Examples
^[0-9]+$ Match AS_PATH length of one^[0-9]+_[0-9]+$ Match AS_PATH length of two^[0-9]*_[0-9]+$ Match AS_PATH length of one or two^[0-9]*_[0-9]*$ Match AS_PATH length of one or two
(will also match zero)^[0-9]+_[0-9]+_[0-9]+$ Match AS_PATH length of three_(701|1800)_ Match anything which has gone
through AS701 or AS1800_1849(_.+_)12163$ Match anything of origin AS12163
and passed through AS1849
213
Policy Control – Route Maps A route-map is like a “programme” for IOS Has “line” numbers, like programmes Each line is a separate condition/action Concept is basically:
if match then do expression and exitelseif match then do expression and exitelse etc
Route-map “continue” lets ISPs apply multiple conditions and actions in one route-map
214
Route Maps – Caveats Lines can have multiple set statements Lines can have multiple match statements Line with only a match statement
Only prefixes matching go through, the rest are dropped Line with only a set statement
All prefixes are matched and set Any following lines are ignored
Line with a match/set statement and no following lines Only prefixes matching are set, the rest are dropped
215
Route Maps – Caveats Example
Omitting the third line below means that prefixes not matching list-one or list-two are dropped
route-map sample permit 10 match ip address prefix-list list-one set local-preference 120!route-map sample permit 20 match ip address prefix-list list-two set local-preference 80!route-map sample permit 30 ! Don’t forget this
216
Route Maps – Matching prefixes Example Configuration
router bgp 100 neighbor 1.1.1.1 route-map infilter in!route-map infilter permit 10 match ip address prefix-list HIGH-PREF set local-preference 120!route-map infilter permit 20 match ip address prefix-list LOW-PREF set local-preference 80!ip prefix-list HIGH-PREF permit 10.0.0.0/8ip prefix-list LOW-PREF permit 20.0.0.0/8
217
Route Maps – AS-PATH filtering Example Configuration
router bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-as-path in!route-map filter-on-as-path permit 10 match as-path 1 set local-preference 80!route-map filter-on-as-path permit 20 match as-path 2 set local-preference 200!ip as-path access-list 1 permit _150$ip as-path access-list 2 permit _210_
218
Route Maps – AS-PATH prepends Example configuration of AS-PATH prepend
router bgp 300
network 105.7.0.0 mask 255.255.0.0
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 route-map SETPATH out
!
route-map SETPATH permit 10
set as-path prepend 300 300
Use your own AS number when prepending Otherwise BGP loop detection may cause disconnects
219
Route Maps – Matching Communities
Example Configurationrouter bgp 100 neighbor 102.10.1.2 remote-as 200 neighbor 102.10.1.2 route-map filter-on-community in!route-map filter-on-community permit 10 match community 1 set local-preference 50!route-map filter-on-community permit 20 match community 2 exact-match set local-preference 200!ip community-list 1 permit 150:3 200:5ip community-list 2 permit 88:6
220
Community-List Processing Note:
When multiple values are configured in the same community list statement, a logical AND condition is created. All community values must match to satisfy an AND conditionip community-list 1 permit 150:3 200:5
When multiple values are configured in separate community list statements, a logical OR condition is created. The first list that matches a condition is processed
ip community-list 1 permit 150:3
ip community-list 1 permit 200:5221
Route Maps – Setting Communities
Example Configurationrouter bgp 100 network 105.7.0.0 mask 255.255.0.0 neighbor 102.10.1.1 remote-as 200 neighbor 102.10.1.1 send-community neighbor 102.10.1.1 route-map set-community out!route-map set-community permit 10 match ip address prefix-list NO-ANNOUNCE set community no-export!route-map set-community permit 20 match ip address prefix-list AGGREGATE!ip prefix-list NO-ANNOUNCE permit 105.7.0.0/16 ge 17ip prefix-list AGGREGATE permit 105.7.0.0/16 222
Route Map Continue Handling multiple conditions and actions in one route-
map (for BGP neighbour relationships only)route-map peer-filter permit 10 match ip address prefix-list group-one continue 30 set metric 2000!route-map peer-filter permit 20 match ip address prefix-list group-two set community no-export!route-map peer-filter permit 30 match ip address prefix-list group-three set as-path prepend 100 100!
223
Order of processing BGP policy For policies applied to a specific BGP
neighbour, the following sequence is applied: For inbound updates, the order is:
Route-map Filter-list Prefix-list
For outbound updates, the order is: Prefix-list Filter-list Route-map
224
Managing Policy Changes New policies only apply to the updates going
through the router AFTER the policy has been introduced or changed
To facilitate policy changes on the entire BGP table the router handles the BGP peerings need to be “refreshed” This is done by clearing the BGP session either in or out,
for example:clear ip bgp <neighbour-addr> in|out
Do NOT forget in or out — doing so results in a hard reset of the BGP session
225
Managing Policy Changes Ability to clear the BGP sessions of groups of
neighbours configured according to several criteria
clear ip bgp <addr> [in|out]<addr> may be any of the followingx.x.x.x IP address of a peer* all peersASN all peers in an ASexternal all external peerspeer-group <name> all peers in a peer-group
226
BGP Policy Control
ISP Training WorkshopsISP Training Workshops
227
Internet Exchange Point Design
ISP Training Workshops
228
IXP Design Background Why set up an IXP? Layer 2 Exchange Point Layer 3 “Exchange Point” Design Considerations Route Collectors & Servers What can go wrong?
229
A bit of history
In a time long gone…
230
A Bit of History… End of NSFnet – one major backbone move towards commercial Internet
Private companies selling their bandwidth Need for coordination of routing exchange
between providers Traffic from ISP A needs to get to ISP B
Routing Arbiter project created to facilitate this
231
What is an Exchange Point Network Access Points (NAPs) established
at end of NSFnet The original “exchange points”
Major providers connect their networks and exchange traffic
High-speed network or ethernet switch Simple concept – any place where
providers come together to exchange traffic
232
Internet Exchange Points Layer 2 exchange point
Ethernet (100Gbps/10Gbps/1Gbps/100Mbps) Older technologies include ATM, Frame Relay,
SRP, FDDI and SMDS Layer 3 exchange point
Router based Has historical status now
233
Why an Internet Exchange Point?
Saving money, improving QoS,Generating a local Internet
economy
234
Internet Exchange PointWhy peer? Consider a region with one ISP
They provide internet connectivity to their customers They have one or two international connections
Internet grows, another ISP sets up in competition They provide internet connectivity to their customers They have one or two international connections
How does traffic from customer of one ISP get to customer of the other ISP? Via the international connections
235
Internet Exchange PointWhy peer? Yes, International Connections…
If satellite, RTT is around 550ms per hop So local traffic takes over 1s round trip
International bandwidth Costs significantly more than domestic
bandwidth Congested with local traffic Wastes money, harms performance
236
Internet Exchange PointWhy peer? Solution:
Two competing ISPs peer with each other Result:
Both save money Local traffic stays local Better network performance, better QoS,… More international bandwidth for expensive
international traffic Everyone is happy
237
Internet Exchange PointWhy peer? A third ISP enters the equation
Becomes a significant player in the region Local and international traffic goes over their
international connections They agree to peer with the two other ISPs
To save money To keep local traffic local To improve network performance, QoS,…
238
Internet Exchange PointWhy peer? Private peering means that the three ISPs
have to buy circuits between each other Works for three ISPs, but adding a fourth or a
fifth means this does not scale Solution:
Internet Exchange Point
239
Internet Exchange Point Every participant has to buy just one
whole circuit From their premises to the IXP
Rather than N-1 half circuits to connect to the N-1 other ISPs 5 ISPs have to buy 4 half circuits = 2 whole
circuits already twice the cost of the IXP connection
240
Internet Exchange Point Solution
Every ISP participates in the IXP Cost is minimal – one local circuit covers all domestic
traffic International circuits are used for just international
traffic – and backing up domestic links in case the IXP fails
Result: Local traffic stays local QoS considerations for local traffic is not an issue RTTs are typically sub 10ms Customers enjoy the Internet experience Local Internet economy grows rapidly
241
Layer 2 Exchange
The traditional IXP
242
IXP Design Very simple concept:
Ethernet switch is the interconnection media IXP is one LAN
Each ISP brings a router, connects it to the ethernet switch provided at the IXP
Each ISP peers with other participants at the IXP using BGP
Scaling this simple concept is the challenge for the larger IXPs
243
Layer 2 Exchange
244
ISP 1 ISP 2ISP 3
IXP ManagementNetwork
ISP 6 ISP 5 ISP 4
Ethernet Switch
IXP Services:
Root & TLD DNS,
Routing Registry
Looking Glass, etc
Layer 2 Exchange
245
ISP 1 ISP 2ISP 3
IXP ManagementNetwork
ISP 6 ISP 5 ISP 4
Ethernet Switches
IXP Services:
Root & TLD DNS,
Routing Registry
Looking Glass, etc
Layer 2 Exchange Two switches for redundancy ISPs use dual routers for redundancy or
loadsharing Offer services for the “common good”
Internet portals and search engines DNS Root & TLDs, NTP servers Routing Registry and Looking Glass
246
Layer 2 Exchange Requires neutral IXP management
Usually funded equally by IXP participants 24x7 cover, support, value add services
Secure and neutral location Configuration
Private address space if non-transit and no value add services
Otherwise public IPv4 (/24) and IPv6 (/64) ISPs require AS, basic IXP does not
247
Layer 2 Exchange Network Security Considerations
LAN switch needs to be securely configured Management routers require TACACS+
authentication, vty security IXP services must be behind router(s) with
strong filters
248
“Layer 3 IXP” Layer 3 IXP is marketing concept used by
Transit ISPs Real Internet Exchange Points are only
Layer 2
249
IXP Design Considerations
250
Exchange Point Design The IXP Core is an Ethernet switch
It must be a managed switch Has superseded all other types of network
devices for an IXP From the cheapest and smallest managed 12
or 24 port 10/100 switch To the largest switches now handling high
densities of 10GE and 100GE interfaces
251
Exchange Point Design Each ISP participating in the IXP brings a
router to the IXP location Router needs:
One Ethernet port to connect to IXP switch One WAN port to connect to the WAN media
leading back to the ISP backbone To be able to run BGP
252
Exchange Point Design IXP switch located in one equipment rack
dedicated to IXP Also includes other IXP operational equipment
Routers from participant ISPs located in neighbouring/adjacent rack(s)
Copper (UTP) connections made for 10Mbps, 100Mbps or 1Gbps connections
Fibre used for 1Gbps, 10Gbps, 40Gbps or 100Gbps connections
253
Peering Each participant needs to run BGP
They need their own AS number Public ASN, NOT private ASN
Each participant configures external BGP directly with the other participants in the IXP Peering with all participants
or Peering with a subset of participants
254
Peering (more) Mandatory Multi-Lateral Peering (MMLP)
Each participant is forced to peer with every other participant as part of their IXP membership
Has no history of success — the practice is strongly discouraged
Multi-Lateral Peering (MLP) Each participant peers with every other participant
(usually via a Route Server) Bi-Lateral Peering
Participants set up peering with each other according to their own requirements and business relationships
This is the most common situation at IXPs today
255
Routing ISP border routers at the IXP must NOT be
configured with a default route or carry the full Internet routing table Carrying default or full table means that this router and
the ISP network is open to abuse by non-peering IXP members
Correct configuration is only to carry routes offered to IXP peers on the IXP peering router
Note: Some ISPs offer transit across IX fabrics They do so at their own risk – see above
256
Routing (more) ISP border routers at the IXP should not be
configured to carry the IXP LAN network within the IGP or iBGP Use next-hop-self BGP concept
Don’t generate ISP prefix aggregates on IXP peering router If connection from backbone to IXP router goes
down, normal BGP failover will then be successful
257
Address Space Some IXPs use private addresses for the IX LAN
Public address space means IXP network could be leaked to Internet which may be undesirable
Because most ISPs filter RFC1918 address space, this avoids the problem
Some IXPs use public addresses for the IX LAN Address space available from the RIRs IXP terms of participation often forbid the IX LAN to be
carried in the ISP member backbone
258
Hardware Try not to mix port speeds
if 10Mbps and 100Mbps connections available, terminate on different switches (L2 IXP)
Don’t mix transports if terminating ATM PVCs and G/F/Ethernet,
terminate on different devices Insist that IXP participants bring their own
router moves buffering problem off the IXP security is responsibility of the ISP, not the IXP
259
Charging IXPs should be run at minimal cost to participants Examples:
Datacentre hosts IX for free Because ISP participants then use data centre for co-lo
services, and the datacentre benefits long term IX operates cost recovery
Each member pays a flat fee towards the cost of the switch, hosting, power & management
Different pricing for different ports One slot may handle 24 10GE ports Or one slot may handle 96 1GE ports 96 port 1GE card is tenth price of 24 port 10GE card Relative port cost is passed on to participants
260
Services Offered Services offered should not compete with
member ISPs (basic IXP) e.g. web hosting at an IXP is a bad idea unless
all members agree to it IXP operations should make performance
and throughput statistics available to members Use tools such as MRTG/Cacti to produce IX
throughput graphs for member (or public) information
261
Services to Offer ccTLD DNS
the country IXP could host the country’s top level DNS e.g. “SE.” TLD is hosted at Netnod IXes in Sweden Offer back up of other country ccTLD DNS
Root server Anycast instances of I.root-servers.net, F.root-
servers.net etc are present at many IXes Usenet News
Usenet News is high volume could save bandwidth to all IXP members
262
Services to Offer Route Collector
Route collector shows the reachability information available at the exchange
Technical detail covered later on Looking Glass
One way of making the Route Collector routes available for global view (e.g. www.traceroute.org)
Public or members only access
263
Services to Offer Content Redistribution/Caching
For example, Akamised update distribution service
Network Time Protocol Locate a stratum 1 time source (GPS receiver,
atomic clock, etc) at IXP Routing Registry
Used to register the routing policy of the IXP membership (more later)
264
Introduction to Route Collectors
What routes are available at the IXP?
265
What is a Route Collector? Usually a router or Unix system running
BGP Gathers routing information from service
provider routers at an IXP Peers with each ISP using BGP
Does not forward packets Does not announce any prefixes to ISPs
266
Purpose of a Route Collector To provide a public view of the Routing
Information available at the IXP Useful for existing members to check
functionality of BGP filters Useful for prospective members to check value
of joining the IXP Useful for the Internet Operations community
for troubleshooting purposes E.g. www.traceroute.org
267
Route Collector at an IXP
268Route Collector
R1
R3
R5SWITCH
R2 R4
Route Collector Requirements Router or Unix system running BGP
Minimal memory requirements – only holds IXP routes Minimal packet forwarding requirements – doesn’t
forward any packets Peers eBGP with every IXP member
Accepts everything; Gives nothing Uses a private ASN Connects to IXP Transit LAN
“Back end” connection Second Ethernet globally routed Connection to IXP Website for public access
269
Route Collector Implementation Most IXPs now implement some form of
Route Collector Benefits already mentioned Great public relations tool Unsophisticated requirements
Just runs BGP
270
Introduction to Route Servers
How to scale very large IXPs
271
What is a Route Server? Has all the features of a Route Collector But also:
Announces routes to participating IXP members according to their routing policy definitions
Implemented using the same specification as for a Route Collector
272
Features of a Route Server Helps scale routing for large IXPs Simplifies Routing Processes on ISP
Routers Optional participation
Provided as service, is NOT mandatory Does result in insertion of RS Autonomous
System Number in the Routing Path Optionally uses Policy registered in IRR
273
Diagram of N-squared Peering Mesh
For large IXPs (dozens for participants) maintaining a larger peering mesh becomes cumbersome and often too hard
274
Peering Mesh with Route Servers
ISP routers peer with the Route Servers Only need to have two eBGP sessions rather
than N275
RS RS
RS based Exchange Point Routing Flow
276
TRAFFIC FLOW ROUTING INFORMATION FLOW
RS
Advantages of Using a Route Server Advantageous for large IXPs
Helps scale eBGP mesh Helps scale prefix distribution
Separation of Routing and Forwarding Simplifies BGP Configuration Management
on ISP routers
277
Disadvantages of using a Route Server ISPs can lose direct policy control
If RS is only peer, ISPs have no control over who their prefixes are distributed to
Completely dependent on 3rd party Configuration, troubleshooting, etc…
Insertion of RS ASN into routing path (If using a router rather than a dedicated route-
server BGP implementation) Traffic engineering/multihoming needs more
care
278
Typical usage of a Route Server Route Servers may be provided as an
OPTIONAL service Most common at large IXPs (>50 participants) Examples: LINX, TorIX, AMS-IX, etc
ISPs peer: Directly with significant peers With Route Server for the rest
279
Things to think about... Would using a route server benefit you?
Helpful when BGP knowledge is limited (but is NOT an excuse not to learn BGP)
Avoids having to maintain a large number of eBGP peers
But can you afford to lose policy control? (An ISP not in control of their routing policy is what?)
280
What can go wrong…
The different ways IXP operators harm their IXP…
281
What can go wrong?Concept Some Service Providers attempt to cash in
on the reputation of IXPs Market Internet transit services as
“Internet Exchange Point” “We are exchanging packets with other ISPs, so
we are an Internet Exchange Point!” So-called Layer-3 Exchanges — really Internet
Transit Providers Router used rather than a Switch Most famous example: SingTelIX
282
What can go wrong?Financial Some IXPs price the IX out of the means of
most providers IXP is intended to encourage local peering Acceptable charging model is minimally cost-
recovery only Some IXPs charge for port traffic
IXPs are not a transit service, charging for traffic puts the IX in competition with members
(There is nothing wrong with charging different flat fees for 100Mbps, 1Gbps, 10Gbps etc ports as they all have different hardware costs on the switch.)
283
What can go wrong?Competition Too many exchange points in one locale
Competing exchanges defeats the purpose Becomes expensive for ISPs to connect to
all of them
An IXP: is NOT a competition is NOT a profit making business
284
What can go wrong?Rules and Restrictions IXPs try to compete with their membership
Offering services that ISPs would/do offer their customers
IXPs run as a closed privileged club e.g.: Restrictive membership criteria
IXPs providing access to end users rather than just Service Providers
IXPs interfering with ISP business decisions e.g. Mandatory Multi-Lateral Peering
285
What can go wrong?Technical Design Errors Interconnected IXPs
IXP in one location believes it should connect directly to the IXP in another location
Who pays for the interconnect? How is traffic metered? Competes with the ISPs who already provide
transit between the two locations (who then refuse to join IX, harming the viability of the IX)
Metro interconnections work ok (e.g. LINX, AMS-IX, DE-CIX etc)
286
What can go wrong?Technical Design Errors ISPs bridge the IXP LAN back to their
offices “We are poor, we can’t afford a router” Financial benefits of connecting to an IXP far
outweigh the cost of a router In reality it allows the ISP to connect any
devices to the IXP LAN — with disastrous consequences for the security, integrity and reliability of the IXP
287
What can go wrong?Routing Design Errors Route Server implemented from Day One
ISPs have no incentive to learn BGP Therefore have no incentive to understand
peering relationships, peering policies, &c Entirely dependent on operator of RS for
troubleshooting, configuration, reliability RS can’t be run by committee!
Route Server is to help scale peering at LARGE IXPs
288
What can go wrong?Routing Design Errors iBGP Route Reflector used to distribute prefixes
between IXP participants Claimed Advantage (1):
Participants don’t need to know about or run BGP Actually a Disadvantage
IXP Operator has to know BGP ISP not knowing BGP is big commercial disadvantage ISPs who would like to have a growing successful
business need to be able to multi-home, peer with other ISPs, etc — these activities require BGP
289
What can go wrong?Routing Design Errors (cont) Route Reflector Claimed Advantage (2):
Allows an IXP to be started very quickly Fact:
IXP is only an Ethernet switch — setting up an iBGP mesh with participants is no quicker than setting up an eBGP mesh
290
What can go wrong?Routing Design Errors (cont) Route Reflector Claimed Advantage (3):
IXP operator has full control over IXP activities Actually a Disadvantage
ISP participants surrender control of: Their border router; it is located in IXP’s AS Their routing and peering policy
IXP operator is single point of failure If they aren’t available 24x7, then neither is the IXP BGP configuration errors by IXP operator have real
impacts on ISP operations
291
What can go wrong?Routing Design Errors (cont) Route Reflector Disadvantage (4):
Migration from Route Reflector to “correct” routing configuration is highly non-trivial
ISP router is in IXP’s ASN Need to move ISP router from IXP’s ASN to the ISP’s
ASN Need to reconfigure BGP on ISP router, add to ISP’s
IGP and iBGP mesh, and set up eBGP with IXP participants and/or the IXP Route Server
292
More Information
293
Exchange PointPolicies & Politics AUPs
Acceptable Use Policy Minimal rules for connection
Fees? Some IXPs charge no fee Other IXPs charge cost recovery A few IXPs are commercial
Nobody is obliged to peer Agreements left to ISPs, not mandated by IXP
294
Exchange Point etiquette Don’t point default route at another IXP
participant Be aware of third-party next-hop Only announce your aggregate routes
Read RIPE-399 firstwww.ripe.net/docs/ripe-399.html
Filter! Filter! Filter!
295
Exchange Point Examples LINX in London, UK TorIX in Toronto, Canada AMS-IX in Amsterdam, Netherlands SIX in Seattle, Washington, US PA-IX in Palo Alto, California, US JPNAP in Tokyo, Japan DE-CIX in Frankfurt, Germany HK-IX in Hong Kong… All use Ethernet Switches
296
Features of IXPs (1) Redundancy & Reliability
Multiple switches, UPS Support
NOC to provide 24x7 support for problems at the exchange
DNS, Route Collector, Content & NTP servers ccTLD & root servers Content redistribution systems such as Akamai Route Collector – Routing Table view
297
Features of IXPs (2) Location
neutral co-location facilities Address space
Peering LAN AS Number
If using Route Collector/Server Route servers (optional, for larger IXPs) Statistics
Traffic data – for membership
298
More info about IXPs http://www.pch.net/documents
Another excellent resource of IXP locations, papers, IXP statistics, etc
http://www.telegeography.com/ee/ix/index.php A collection of IXPs and interconnect points for
ISPs
299
Summary L2 IXP – most commonly deployed
The core is an ethernet switch ATM and other old technologies are obsolete
L3 IXP – nowadays is a marketing concept used by wholesale ISPs Does not offer the same flexibility as L2 Not recommended unless there are overriding
regulatory or political reasons to do so Avoid!
300
Internet Exchange Point Design
ISP Training Workshops
301
BGP Configuration for IXPs
ISP Training Workshops
302
Background This presentation covers the BGP
configurations required for a participant at an Internet Exchange Point It does not cover the technical design of an IXP Nor does it cover the financial and operational
benefits of participating in an IXP See the IXP Design Presentation that is part of
this Workshop Material set for financial, technical and operational details
303
Recap: Definitions Transit – carrying traffic across a network,
usually for a fee Traffic and prefixes originating from one AS are
carried across an intermediate AS to reach their destination AS
Peering – private interconnect between two ASNs, usually for no fee
Internet Exchange Point – common interconnect location where several ASNs exchange routing information and traffic
304
IXP Peering Issues Only announce your aggregates and your
customer aggregates at IXPs Only accept the aggregates which your
peer is entitled to originate Never carry a default route on an IXP (or
private) peering router
305
ISP Transit Issues
Many mistakes are made on the Internet today due to
incomplete understanding of how to configure BGP for
peering at Internet Exchange Points
306
Simple BGP Configuration
example
Exchange Point Configuration
307
Exchange Point Example Exchange point with 6 ASes present
Layer 2 – ethernet switch Each ISP peers with the other
NO transit across the IXP is allowed
308
Exchange Point
Each of these represents a border router in a different autonomous system 309
AS110
AS100
AS130
AS150
AS120
AS140
AA
BB
C C
FF
EE
DD
Router configuration IXP router is usually located at the
Exchange Point premises Configuration needs to be such that
disconnecting it from the backbone does not cause routing loops or traffic blackholes
Create a peer-group for IXP peers All outbound policy to each peer will be the
same Ensure the router is not carrying the
default route Or the full routing table (for that matter)
310
Creating a peer-group & route-maprouter bgp 100
neighbor ixp-peer peer-group
neighbor ixp-peer send-community
neighbor ixp-peer prefix-list my-prefixes out
neighbor ixp-peer route-map set-local-pref in
!
ip prefix-list my-prefixes permit 121.10.0.0/19
!
route-map set-local-pref permit 10
set local-preference 150
!
311
Only allow AS100 address block to IXP peers
Prefixes heard from IXP peers have highest preference
Interface and BGP configuration (1)interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.1 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp no ip redirects!router bgp 100 neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peer neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in
312
IXP LAN BCP configuration
Interface and BGP Configuration (2) neighbor 120.5.10.4 remote-as 130 neighbor 120.5.10.4 peer-group ixp-peers neighbor 120.5.10.4 prefix-list peer130 in neighbor 120.5.10.5 remote-as 140 neighbor 120.5.10.5 peer-group ixp-peers neighbor 120.5.10.5 prefix-list peer140 in neighbor 120.5.10.6 remote-as 150 neighbor 120.5.10.6 peer-group ixp-peers neighbor 120.5.10.6 prefix-list peer150 in!ip route 121.10.0.0 255.255.224.0 null0!ip prefix-list peer110 permit 122.0.0.0/19ip prefix-list peer120 permit 122.30.0.0/19ip prefix-list peer130 permit 122.12.0.0/19ip prefix-list peer140 permit 122.18.128.0/19ip prefix-list peer150 permit 122.1.32.0/19
313
Peer-group applied to each peer
Each peer has own inbound filter
Exchange Point Configuration of the other routers in the
AS is similar in concept Notice inbound and outbound prefix filters
outbound announces myprefixes only inbound accepts peer prefixes only
Notice inbound route-map Set local preference higher than default
ensures that if the same prefix is heard via AS100 upstream, the best path for traffic is via the IXP
314
Exchange Point Ethernet port configuration
Be aware of LAN configuration best practices Switch off proxy arp, redirects and broadcasts
(if not already default) IXP border router must NOT carry prefixes
with origin outside local AS and IXP participant ASes Helps prevent “stealing of bandwidth”
315
Exchange Point Issues:
AS100 needs to know all the prefixes its peers are announcing
New prefixes requires the prefix-lists to be updated
Alternative solutions Use the Internet Routing Registry to build
prefix list Use AS Path filters (could be risky)
316
More Complex BGP example
Exchange Point Configuration
317
Exchange Point Example Exchange point with 6 ASes present
Layer 2 – ethernet switch Each ISP peers with the other
NO transit across the IXP allowed ISPs at exchange points provide transit to their
BGP customers
318
Exchange Point
Each of these represents a border router in a different autonomous system
319
AS110
AS100
AS130
AS150
AS120
AS140
AA
BB
C C
FF
EE
DD
AS200
AS201
Exchange PointRouter A configuration
interface fastethernet 0/0 description Exchange Point LAN ip address 120.5.10.2 mask 255.255.255.224 no ip directed-broadcast no ip proxy-arp no ip redirects!router bgp 100 neighbor ixp-peers peer-group neighbor ixp-peers send-community neighbor ixp-peers prefix-list bogons out neighbor ixp-peers filter-list 10 out neighbor ixp-peers route-map set-local-pref in...next slide
320
Filter by ASN rather than by prefix – and block bogons too
Exchange Point neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peers neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in neighbor 120.5.10.4 remote-as 130 neighbor 120.5.10.4 peer-group ixp-peers neighbor 120.5.10.4 prefix-list peer130 in neighbor 120.5.10.5 remote-as 140 neighbor 120.5.10.5 peer-group ixp-peers neighbor 120.5.10.5 prefix-list peer140 in neighbor 120.5.10.6 remote-as 150 neighbor 120.5.10.6 peer-group ixp-peers neighbor 120.5.10.6 prefix-list peer150 in
321
Exchange Pointip route 121.10.0.0 255.255.224.0 null0!ip as-path access-list 10 permit ^$ip as-path access-list 10 permit ^200$ip as-path access-list 10 permit ^201$!ip prefix-list peer110 permit 122.0.0.0/19ip prefix-list peer120 permit 122.30.0.0/19ip prefix-list peer130 permit 122.12.0.0/19ip prefix-list peer140 permit 122.18.128.0/19ip prefix-list peer150 permit 122.1.32.0/19!route-map set-local-pref permit 10 set local-preference 150
322
Exchange Point Notice the change in router A’s configuration
Filter-list instead of prefix-list permits local and customer ASes out to exchange
Prefix-list blocks Special Use Address prefixes – rest get out, could be risky
Other issues as previously This configuration will not scale as more and
more BGP customers are added to AS100 As-path filter has to be updated each time Solution: BGP communities
323
More scalable BGP example
Exchange Point Configuration
324
Exchange Point Example (Scalable) Exchange point with 6 ASes present
Layer 2 – ethernet switch Each ISP peers with the other
NO transit across the IXP allowed ISPs at exchange points provide transit to their
BGP customers (Scalable solution is presented here)
325
Exchange Point
Each of these represents a border router in a different autonomous system - each ASN has BGP customers of their own
326
AS110
AS100
AS130
AS150
AS120
AS140
AA
BB
C C
FF
EE
DD
Router configuration Take AS100 as an example
Has 15 BGP customers, in AS501 to AS515 Create a peer-group for IXP peers
All outbound policy to each peer will be the same
Communities will be used AS-path filters will not scale well
Community Policy AS100 aggregate put into 100:1000 All BGP customer aggregates go into 100:1100
327
Creating a peer-group & route-maprouter bgp 100 neighbor ixp-peer peer-group neighbor ixp-peer send-community neighbor ixp-peer route-map ixp-peers-out out neighbor ixp-peer route-map set-local-pref in!ip community-list 10 permit 100:1000ip community-list 11 permit 100:1100!route-map ixp-peers-out permit 10 match community 10 11!route-map set-local-pref permit 10 set local-preference 150!
328
AS100 aggregate
Prefixes heard from IXP peers have highest preference
AS100 BGP customers
BGP configuration for IXP router
router bgp 100 neighbor 120.5.10.2 remote-as 110 neighbor 120.5.10.2 peer-group ixp-peer neighbor 120.5.10.2 prefix-list peer110 in neighbor 120.5.10.3 remote-as 120 neighbor 120.5.10.3 peer-group ixp-peers neighbor 120.5.10.3 prefix-list peer120 in...etc
Remaining configuration is the same as earlier Note the reliance again on inbound prefix-lists for
peers Peers need to update the ISP if filters need to be changed And that’s what the IRR is for (otherwise use email)
329
BGP configuration for AS100’s customer aggregation router
router bgp 100 network 121.10.0.0 mask 255.255.192.0 route-map set-comm neighbor 121.10.4.2 remote-as 501 neighbor 121.10.4.2 prefix-list as501-in in neighbor 121.10.4.2 prefix-list default out neighbor 121.10.4.2 route-map set-cust-policy in...etc!route-map set-comm permit 10 set community 100:1000!route-map set-cust-policy permit 10 set community 100:1100!
330
Set community on AS100 aggregate
Set community on BGP customer routes
Scalable IXP policy ISP Community policy is set on ingress ISP now relies on communities to determine what
is announced at the IXP No need to update any as-path filters, prefix-lists, &c
If BGP customer announces more prefixes, only the filters at the aggregation edge need to be updated And those new prefixes will automatically be tagged
with the community to allow them through to AS100’s IXP peers
Consult the BGP community presentation for more extensive examples
331
Route Servers IXP operators quite often provide a Route Server
to assist with scaling the BGP mesh All prefixes sent to a Route Server are usually
distributed to all ASNs that peer with the Route Server (although some IXPs offer ISPs the facility to configure
specific policies on their Route Server) BGP configuration to peer with a Route Server is
the same as for any other ordinary peer But note that the route server will offer prefixes from
several ASNs (the IXP membership who choose to participate)
Inbound filter should be constructed appropriately
332
Route Servers Route Server software suppresses the ASN of the
RS so that it doesn’t appear in the AS-path IOS by default will not accept prefixes from a
neighbouring AS unless that AS is first in the AS-path
router bgp 100
no bgp enforce-first-as
neighbor x.x.x.a remote-as 65534
neighbor x.x.x.a route-map IXP-RS-in in
neighbor x.x.x.a route-map ixp-peers-out out
333
Needed so that IOS can receive prefixes without AS65534 being first in path
Summary
Exchange Point Configuration
334
Summary Ensure that BGP is scalable on your IXP peering
router Manually updating filters every time a new customer
connects is tiresome and has potential to cause errors Only carry local ASN prefixes and customer
routes on the IXP peering router Anything else (e.g. default or full BGP table) has the
potential to result in bandwidth theft Filter IXP peer announcements
Inbound – use the IRR if maintaining prefix-lists is difficult
Outbound – use communities for scalability
335
BGP Configuration for IXPs
ISP Training Workshops
336