Upload
m4prashanth
View
219
Download
0
Embed Size (px)
Citation preview
8/13/2019 Automatic Protection Switching
1/79
AutomaticProtection Switching
Yaakov (J) SteinCTORAD Data Communications
Mar 2012
8/13/2019 Automatic Protection Switching
2/79
Y(J)S APS Slide 2
Course Outline
General protection switching principles
Examples of protection mechanisms
SONET/SDH
Ethernet linear protection
Ethernet ring protection
MPLS fast reroute
MPLS-TP APS
8/13/2019 Automatic Protection Switching
3/79
Y(J)S APS Slide 3
General principles
Definition
References
Traffic types
Network topologies
Triggers
Protection classes
Entities
Protection types
Signaling
8/13/2019 Automatic Protection Switching
4/79
Y(J)S APS Slide 4
Definition
Automatic Protection Switching (APS)is a functionality of carrier-grade transport networks
is often called resilience
since it enables service to quickly recover from failures
is required to ensure high reliability and availability
APS includes :
detection offailures(signal fail or signal degrade) on a working channel
switching traffic transmissionto aprotection channel
selecting traffic receptionfrom the protection channel (optionally) reverting back to the working channel once failure is repaired
Automaticmeans uses (at most) control plane protocols
no management layeror manual operations needed
8/13/2019 Automatic Protection Switching
5/79
Y(J)S APS Slide 5
Some useful references
G.808.1generic linear protectionG.808.2generic ring protection (not yet written)
G.841 and G.842SDH
G.774.3/4/9/10SDH protection management
G.870 and G.873.1OTN
G.8031Ethernet linear protectionG.8032Ethernet ring protection
G.8131T-MPLS APS
Y.1720MPLS
I.630ATM
M.495analog signal protection
G.781clock selection (can be used to protect synchronization)
RFC 4090MPLS Fast ReRoute
RFC 6372MPLS-TP Survivability Framework
RFC 6378MPLS-TP Linear Protection
8/13/2019 Automatic Protection Switching
6/79
Y(J)S APS Slide 6
Traffic types
In a network with APS capabilities, there are three types of traffic :
protectedtraffic
traffic that may be rapidly switched to protection channel at any time it may be on the working channel or protection channel
Nonpreemptible Unprotected Traffic (NUT) noncritical traffic that does not require protection mechanism
not affected by protection mechanism
somewhat less expensive to customer
extra(preemptible) traffic best effort background traffic that runs on protection channel
preempted (blocked) when protection channel is needed
very inexpensive to customer
8/13/2019 Automatic Protection Switching
7/79Y(J)S APS Slide 7
Network topologies
APS can be defined for any topology with redundant links
e.g., for tree topologies no protection is possibleWe will often discuss protection of individual links
However, there are two topologies that are of particular interest :
rings
protection is natural for rings although there are other reasons for using rings as well
rings are so important that protection for other topologies
is often called linear protection
dense meshes for this topology multiple local bypasses can be preconfigured
protection switching is similar to routing change, but faster
often called Fast ReRoute (FRR)
8/13/2019 Automatic Protection Switching
8/79Y(J)S APS Slide 8
Triggers
Protection switching is usually triggered by afailure
although the operator may manuallyforcea protection switch
Afailureis declared when a fault condition
persists long enoughfor the ability to perform the required function
to be considered terminated
Failures are Signal Fail (SF) or Signal Degrade (SD)(of various types)
and may be :
detected by physical layer indicated by signaling (e.g. AIS)
detected by OAM mechanisms
When there is no SF or SD, the state is called No Request (NR)
8/13/2019 Automatic Protection Switching
9/79Y(J)S APS Slide 9
Switching time (1)
SONET/SDH protection switching takes place in under 50 ms
Regarding multiplex section shared protection rings, G.841 states :
The following network objectives apply:
1) Switch time In a ring with no extra traffic, all nodes in the idle state (no detected failures,
no active automatic or external commands, and receiving only Idle K-bytes), and with less
than 1200 km of fibre, the switch (ring and span) completion time for a failure on a singlespan shall be less than 50 ms. On rings under all other conditions, the switch completion
time can exceed 50 ms (the specific interval is under study) to allow time to remove extra
traffic, or to negotiate and accommodate coexisting APS requests.
while for linear VC trail protection, it says :
The following network objectives apply:
1) Switch time
The APS algorithm for LO/HO VC trail protection shall operate as fast as
possible. A value of 50 ms has been proposed as a target time. Concerns have been
expressed over this proposed target time when many VCs are involved. This is for further
study. Protection switch completion time excludes the detection time necessary to initiate the
protection switch, and the hold-off time.
There are similar statements in other clauses as well
8/13/2019 Automatic Protection Switching
10/79Y(J)S APS Slide 10
Switching time (2)
This 50 ms time has become the golden standard
and new protection schemes are expected to meet this objectiveHowever, studying the literature that lead up to SONET/SDH standards
shows that the objective was to attain the minimum possible time
for the sum of persistent (i.e. non-transient) failure detection
speed of light propagation signaling protocol time
regaining sync alignment
and 50 ms was the minimum that was considered practical !
Many modern standards have built in 50 ms
and much marketing literature boasts faster than 50 msBut there is really nothing special about 50 ms
50 ms gaps in voiced speech are noticeable,
but not fatal if infrequent
50 ms of data at high rates can not be stored and later forwarded
timing circuits can withstand much more than 50 ms without clock
8/13/2019 Automatic Protection Switching
11/79Y(J)S APS Slide 11
Protection classes
It is useful to distinguish two different protection classes
path protection (AKA trail protection, end-to-end protection)
when a failure is detected on the end-to-end path
we switch to an alternative end-to-end path
the failure is usually detected by end-to-end OAM
local protection (AKA local restoration, SNC protection, bypass, detour)
we protect individual network elements, links, or groups of same when such an entity fails
only that local entity is bypassed
the failure may be detected by link OAM or physical layer means
8/13/2019 Automatic Protection Switching
12/79Y(J)S APS Slide 12
APS entities (1)
The following entities are important in APS
working channelchannel used when no failure exists
protection channelchannel used when a failure exists
head-endentity transmitting data to working/protection channel
tail-endentity receiving data from the working/protection channel
Note: we will usually consider traffic to be bidirectional
so that the head-end for one direction
is the tail-end for the opposite direction
head-end tail-end
working channel
protection channel
8/13/2019 Automatic Protection Switching
13/79Y(J)S APS Slide 13
APS entities (2)
Bridgefunction at head-end that connects traffic (including extra traffic) to the
working and protection channels
Selectorfunction at tail-end that extracts traffic (perhaps extra traffic) from
the working or protection channel
APS signaling channelchannel used to communicate between head-
end and tail-end for APS purposes
Trail terminationfunction responsible for failure detection
including injection and extraction of OAM
head-end
(bridge)
tail-end
(selector)
working channel
protection channel
signaling channel
8/13/2019 Automatic Protection Switching
14/79Y(J)S APS Slide 14
Revertive operation
Reversion means returning to use the working channel
after the failure has been rectified
Protection mechanisms can be revertiveor nonrevertive
Revertive mechanisms may be preferable
when the working channel has better performance (free BW, BER, delay)
when there are frequent switches (easier to manage)
when there is extra traffic
but nonrevertive also has advantages only one service disruption due to protection switching
may be simpler to implement
8/13/2019 Automatic Protection Switching
15/79Y(J)S APS Slide 15
Uni/bi-directional
We will usually consider bidirectional traffic
but even then the failures can be uni- or bi- directional
and for unidirectional failures there can be uni- or bi- directional switching
unidirectional
failure
bidirectional
failure
working channel
protection channel in use
working channel
protection channel
unidirectional
protection
working channel
protection channel in use
working channel
protection channel in use
bidirectionalprotection
8/13/2019 Automatic Protection Switching
16/79Y(J)S APS Slide 16
Uni- / bi- directional switching
Unidirectional switching may be advantageous
for 1+1 - faster and no signaling channel is needed
no unnecessary service disruption for direction without failure
higher chance of protection under multiple failures
easier to implement for local protection
maintains extra traffic in direction without failure
But bidirectional may be preferable
easier management since directions traverse same network elements
does not disrupt delay balance between direction may simplify repair since failed spans are unused
8/13/2019 Automatic Protection Switching
17/79Y(J)S APS Slide 17
Protection types
We distinguish several different protection types
1+1
1:1
1:n
m:n (1:1)n
Each type has its applicability, advantages, and disadvantages
and there are trade-offs between
simplicity
BW consumption
protection switch time
signaling requirements
8/13/2019 Automatic Protection Switching
18/79Y(J)S APS Slide 18
1+1 protection
Simplest and fastest form of protection
but wasteful - only 50% of actual physical capacity is used
Head-end bridge always sends data on both channels
Tail-end selector chooses channel to use (based on BER, dLOS, etc.)
For unidirectional1+1 switching there is no need for APS signaling
If non-revertive
there is no distinction between working and protection channels
channel A
channel B
8/13/2019 Automatic Protection Switching
19/79Y(J)S APS Slide 19
1:1 protection
Head-end bridge usually sends data on working channelWhen failure detected it starts sending data over protection channel
and tail-end needs to select the protection channel
When not in use, protection channel can be used for extra traffic
However, since failure is detected by tail-end, APS signaling is needed
Protection channel should have OAM running to ensure its functionality
working channel
protection channel
extra traffic
APS signaling
8/13/2019 Automatic Protection Switching
20/79Y(J)S APS Slide 20
1:n protection
One protection channel is allocated for n working channels
Only can protect one working channel at a time
but improbable that more than 1 working channel will simultaneously fail
Only 1/(n+1) of total capacity is reserved for protection
working channels
protection channel
8/13/2019 Automatic Protection Switching
21/79Y(J)S APS Slide 21
m:n protection
To enable protection of more than 1 channel
m protection channels are allocated for n working channels (m < n)
m simultaneous failures can be protected
Less protection capacity dedicated than for n times 1:1
When failure detected,
1 of the m protection channels need to be assigned and signaled
High complexity but conserves resources
working channels
protection channels
8/13/2019 Automatic Protection Switching
22/79Y(J)S APS Slide 22
(1:1)nprotection
This is like n times 1:1 but the n protection channels share bandwidth
Only 1 failed working channel can be protected
This is different from 1:n since
n protection channels are preconfigured
n working channels need not be of the same type
Protection bandwidth must be at least that of the largest working channel
8/13/2019 Automatic Protection Switching
23/79
Y(J)S APS Slide 23
APS algorithm
We have seen that protection switching is a tricky business
So it is not surprising that network elements that support APS
run anAPS algorithm
This algorithm inputs : configuration (protection type, revertive?, available channels, )
failure indications (NR, SF, SD)
operator commands
APS signaling (more on that soon)
and makes switching decisions
The algorithm maintains state information for head-end and tail-end
APS algorithms are detailed in standards documents
8/13/2019 Automatic Protection Switching
24/79
Y(J)S APS Slide 24
Priority
Not every failure event / operator command results in a protection switch
For example
in 1:n protection the protection channel may already be in use !
Conflicts are resolved by assigning priorities to events/commands
When an event is detected or a command received
the APS algorithm will notact
if an event/command or equal or higher priority is already in effect
True failure conditions usually have higher priority than manual commands
8/13/2019 Automatic Protection Switching
25/79
Y(J)S APS Slide 25
Timers
Even failure events with priority are not acted upon immediately
to do so would cause unnecessary switches after transient defects
The APS algorithm may maintains several timers, such as
Holdoff timers
the time between detection of a SF or SD event
and the APS algorithm acting upon this even the algorithm usually used is called peek twice
i.e., the condition is checked again after the timer expires
Wait To Restore timer
for revertive switching, the time between detection of the failure being
cleared and the APS algorithm acting upon this event also used in SDH optimized bidirectional 1+1 (nonrevertive)
Guard timer
for ringsblockout time during which APS messages are ignored (sincethey may be old and outdated)
8/13/2019 Automatic Protection Switching
26/79
Y(J)S APS Slide 26
APS signaling
In all types except unidirectional 1+1, some APS signaling is needed
APS signaling is used to synchronize between head-end and tail-end
It is critical that head-end and tail-end always be in the same state
Example messages include :
No Request (NR)
by tail-end to inform head-end of Signal Failure (SF)
by head-end to confirm the events priority
by head-end to report the particular protection channel
by head-end to inform tail-end of Reverse (bidirectional) Request (RR)
by tail-end after failure cleared to Wait To Restore (WTR)
by tail-end after failure cleared to Do Not Revert (DNR) for nonrevertive
8/13/2019 Automatic Protection Switching
27/79
Y(J)S APS Slide 27
APS signaling phases
When APS signaling is used, it needs to be as rapid as possible
Depending on the scenario it may be
1-phase tailhead (fastest)
tail-end informs head-end of failure
both ends uniquely know the protection channel to be used
only for 1+1 and unidirectional-(1:1)n (including 1:1)
2-phase 1) tailhead 2) headtail
tail-end informs head-end of failure
head-end signals that it has switched to protection channel
not for bidirectional-1:n or m:n
3-phase 1) tailhead 2) headtail3) tailhead (slowest)
works for all protection types (including m:n)
8/13/2019 Automatic Protection Switching
28/79
Y(J)S APS Slide 28
Examples of 1-phase
Example of when 1-phase signaling is possible is 1:1 or (1:1)n
1. upon detection of failure the tail-end sends SF to the head-end
and immediately changes its selector (blind switch)
upon receipt the head-end changes the bridge setting
(no priority is checked)
1-phase can also be used for bidirectional 1:1
1. upon detection of failure the tail-end sends SF to the head-end
and immediately changes both its selector and bridge
upon receipt the head-end changes its bridge and selector
8/13/2019 Automatic Protection Switching
29/79
Y(J)S APS Slide 29
Example of 2-phase
2-phase is useful for unidirectional 1:n with priority checking
1. upon detection of failure the tail-end sends SF to the head-end
but does not change its selector
2. the head-end checks priority
sends confirmation to tail-end (with identity of working channel)
the bridge setting is changed
3. the tail-end changes its selector
8/13/2019 Automatic Protection Switching
30/79
Y(J)S APS Slide 30
Example of 3-phase
3-phase signaling is imperative for bidirectional 1:n
1. upon detection of failure the tail-end sends SF to the head-end
but does not change its selector
2. the head-end checks priority, and sends confirmation to tail-end
head-end changes its bridge setting
and also sends a reverse request
3. the tail-end changes selector
checks priority and sends confirmation to head-end
tail-end changes its bridge setting (as head-end of opposite direction)
head-end receives confirmation and changes its selector
8/13/2019 Automatic Protection Switching
31/79
Y(J)S APS Slide 31
For G.805 buffs
to add 1+1 trail protection to a trail - expand a trail termination functionwe use a special transport processing function - the protection switch
unprotectedtrail
the unprotected TTs report status
to the protection switch
protected trail
8/13/2019 Automatic Protection Switching
32/79
Y(J)S APS Slide 32
SONET/SDH APS
8/13/2019 Automatic Protection Switching
33/79
Y(J)S APS Slide 33
SONET protection ?
SONET/SDH networks need to be highly reliable (five nines)
Down-time should be minimal (less than 50 msec)
So systems mustrepair themselves (no time for manual intervention)
Upon detection of a failure (dLOS, dLOF, high BER)
the network must reroute traffic (protection switching)
from working channelto protection channelSDH APS is unidirectional
SDH APS maybe revertive
head-end NE tail-end NE
working channel
protection channel
8/13/2019 Automatic Protection Switching
34/79
Y(J)S APS Slide 34
SONET/SDH layers
Between regenerators there are sections (regenerator sections)
Between ADMs there are lines (multiplex sections)
Between path terminations there are paths
Protection can be at OC-n level (different physical fibers)
or at STM/VC level
or end-to-end path (trail protection)
Path
Termination
Path
Termination
Line
Termination
Line
Termination
Section
Termination
path
line line (MS section) line
ADM ADMregenerator
section section sectionsection
8/13/2019 Automatic Protection Switching
35/79
Y(J)S APS Slide 35
Synchronous Payload Envelope
Line APS
9rows
TOH
6r
ows
3rows
90 columns
9rows
TOH consists of
3 rows of section overhead - frame sync, trace, EOC,
6 rows of line overhead - pointers, SSM, FEBE, and
Line APS signaling uses bytes K1 and K2
A1 A2 J0
B1 E1 F1
D1 D2 D3
H1 H2 H3
B2 K1 K2
D4 D5 D6
D7 D8 D9
DA DB DC
S1 M0 E2
8/13/2019 Automatic Protection Switching
36/79
Y(J)S APS Slide 36
HO Path APS
POH is responsible for type, status, path performance monitoring, VCAT, trace
HO Path APS signaling uses 4 MSBs of byte K3
J1
B3
C2
G1
F2
H4F3
K3
N1
POH
8/13/2019 Automatic Protection Switching
37/79
Y(J)S APS Slide 37
LO Path APS
VC OH is responsible for
Timing, PM, REI,
LO Path APS signaling is
4 MSBs of byte K4
1 875930
V5
J2
N2
K4
VC OH
V1
V2
V3
V4
8/13/2019 Automatic Protection Switching
38/79
Y(J)S APS Slide 38
How does it work?
Head-end and tail-end NEs have bridges (muxes)
Head-end and tail-end NEs maintain bidirectional signaling channel
Signaling is contained in Kbytes ofprotectionchannel
For lineAPS
K1tail-end status and requests K2head-end status
head-end bridge tail-end bridge
working channel
protection channel signaling channel
8/13/2019 Automatic Protection Switching
39/79
Y(J)S APS Slide 39
Linear 1+1 protection
Can be at OC-n level (different physical fibers)or at STM/VC level (SubNetwork Connection Protection)
or end-to-end path (called trail protection)
Head-end bridge always sends data on both channelsTail-end chooses channel to use based on BER, dLOS, etc.
No need for signaling
If non-revertive
there is no distinction between working and protection channels
head-end NE tail-end NE
working channel
protection channel
8/13/2019 Automatic Protection Switching
40/79
Y(J)S APS Slide 40
Linear 1:1 protection
Head-end bridge usually sends data on working channelWhen tail-end detects failure it signals (using K1)to head-end
Head-end then starts sending data over protection channel
When not in use
protection channel can be used for (discounted) extra traffic(pre-emptible unprotected traffic)
May be at any layer (but only OC-n level protects against fiber cuts)
working channel
protection channel
extra traffic
8/13/2019 Automatic Protection Switching
41/79
Y(J)S APS Slide 41
Linear 1:N protection
In order to save BW
we allocate 1 protection channel for every N working channels
N limited to 14
4 bits in K1 byte from tail-end to head-end 0 protection channel
1-14 working channels 15 extra traffic channel
working channels
protection channel
8/13/2019 Automatic Protection Switching
42/79
Y(J)S APS Slide 42
Two fiber vs. Four-fiber rings
Ring based protection is popular in North America (100K+ rings)
Full protection against physical fiber cutsSimpler and less expensive than mesh topologies
Protection at line (multiplexed section) or path layer
Four-fiber ringsfully redundant at OC level
can support bidirectional routing at line layerTwo-fiber rings
support unidirectional routing at line layer
2 fibers in opposite directions
U idi i l bidi i l
8/13/2019 Automatic Protection Switching
43/79
Y(J)S APS Slide 43
Unidirectional vs. bidirectional
Unidirectional routingworking channel B-A same direction (e.g. clockwise)as A-Bmanagement simplicity: A-B and B-A can occupy same timeslotsInefficient: waste in ring BW and excessive delay in one direction
Bidirectional routingA-B and B-1 are opposite in directionboth using shortest route
spatial reuse: timeslots can be reused in other sections
A
BA-B
B-A
A
B
B-A
A-B
C
B-C
C-B
8/13/2019 Automatic Protection Switching
44/79
Y(J)S APS Slide 44
UPSR vs. BLSR (MS-SPRing)
Of all the possible combinations, only a few are in use
Unidirectional (routing) Path Switched Rings
protects tributariesextension of 1+1 to ring topology
Bidirectional (routing) Line Switched Rings (two-fiber and four-fiber versions)
called Multiplex Section Shared Protection Ring in SDHsimultaneously protects all tributaries in STMextension of 1:1 to ring topology
Path switching
Line switching
Two-fiber
Four-fiber
Unidirectional
Bidirectional
UPSR
BLSR
8/13/2019 Automatic Protection Switching
45/79
Y(J)S APS Slide 45
UPSR
Working channel is in one direction
protection channel in the opposite directionAll path traffic is added in both directions (1+1)
decision as to which to use is made at drop point (no signaling)
Normally non-revertive, so effectively two diversitypaths
Good match for access networks1 access resilient ring
less expensive than fiber pair per customer
Inefficient for core networks
no spatial reuse
every signal in every spanin both directions
node needs to continuously monitorevery tributary to be dropped
SONET ADM
2 rings
8/13/2019 Automatic Protection Switching
46/79
Y(J)S APS Slide 46
BLSR
Switch at line levelless monitoring
When failure detected tail-end NE signals head-end NE
Works for unidirectional/bidirectional fiber cuts, and NE failures
Two-fiber versionhalf of OC-N capacity devoted to protectiononly half capacity available for traffic
Four-fiber version
full redundant OC-N devoted to protection
twice as many NEs as compared to two-fiber
Example
recovery from unidirectional fiber cut
wrap-around
2 rings
8/13/2019 Automatic Protection Switching
47/79
Y(J)S APS Slide 47
Ethernet linear APS
STP
LAG
G.8031
8/13/2019 Automatic Protection Switching
48/79
Y(J)S APS Slide 48
STP
The original Spanning Tree Protocol automatically removed loops
from arbitrary networks (with loops)
However, its convergence was very slow (about a minute)
STP can not be used as a protection mechanism
since its reconvergence time is very long
due to a cumbersome protocoland long holdoff timer settings
An evolutionary update called Rapid STP802.1w
was incorporated into 802.1D-2004 clause 17
that converges in about the same time as STP
but can reconverge after a topology change in less than 1 second
RSTP can be used to detect failures and reconverge
and thus can be used as a primitive protection mechanism
However, the switching time will be many tens of ms to 100s of ms
8/13/2019 Automatic Protection Switching
49/79
Y(J)S APS Slide 49
Use of LAG
Ethernet link aggregation (AKA bonding, Ethernet trunk, inverse mux, NIC teaming)
enables bonding several ports together as single uplink
Defined by 802.3ad task force and folded into 802.3-2000 as clause 43
Binding of ports to Link Aggregation Groups (LAGs) distributed via
Link Aggregation Control Protocol (LACP)
LACP uses slow protocol frames (up to 5 per second)
Links may be dynamically added/removed from LAG
and LACP continuously monitors to detect if changes needed
Upon link failure LAG delivers traffic at a reduced rate
Thus LAG can be used as a primitive protection mechanism
When used this way it is called worker/standby orN+N mode
The restoration time will be on the order of 1 second
8/13/2019 Automatic Protection Switching
50/79
Y(J)S APS Slide 50
G.8031
Q9 of SG15 in the ITU-T is responsible for protection switching
In 2006 it produced G.8031 Linear Ethernet Protection Switching
G.8031 uses standard Ethernet formats, but is incompatible with STP
The standard addresses
point-to-point VLAN connections
SNC (local) protection class 1+1 and 1:1 protection types
unidirectional and bidirectional switching for 1+1
bidirectional switching for 1:1
revertive and nonrevertive modes
1-phase signaling protocol
G.8031 uses Y.1731 OAM CCM messages in order to detect failures
G.8031 defines a new OAM opcode (39) for APS signaling messages
Switching times should be under 50 ms (only holdoff timers when groups)
8/13/2019 Automatic Protection Switching
51/79
Y(J)S APS Slide 51
G.8031 signaling
The APS signaling message looks like this :
regular APS messages are sent 1 per 5 seconds
after change 3 messages are sent at max rate (300 per sec)
where
req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)
prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.)
requested and bridged signal identify incoming / outgoing traffic
since only 1+1 and 1:1 they are either null or traffic (all other values reserved)
MEL
(3b)
VER=0
(5b)
OPCODE=39
(1B)
FLAGS=0
(1B)
OFFSET=4
(1B)
req/state
(4b)
prot. type
(4b)
requested sig
(1B)
bridged sig
(1B)
reserved
(1B)
END=0
(1B)
8/13/2019 Automatic Protection Switching
52/79
Y(J)S APS Slide 52
G.8031 1:1 revertive operation
In the normal (NR) state :
head-end and tail-end exchange CCM (at 300 per second rate)on both working and protection channels
head-end and tail-end exchange NR APS messages
on the protection channel (every 5 seconds)
When a failure appears in the working channel tail-end stops receiving 3 CCM messages on working channel
tail-end enters SF state
tail-end sends 3 SF messages at 300 per second on the APS channel
tail-end switches selector (bi-d and bridge)to the protection channel
head-end (receiving SF)switches bridge (bi-d and selector)to protection channel tail-end continues sending SF messages every 5 seconds
head-end sends NR messages but with bridged=normalWhen the failure is cleared tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min)
tail-end sends WTR message to head-end (in nonrevertive - DNR message)
tail-end sends WTR every 5 seconds
when WTR expires both sides enter NR state
8/13/2019 Automatic Protection Switching
53/79
Y(J)S APS Slide 53
Ethernet ring APS
G.8032
RPR
CLEER
8/13/2019 Automatic Protection Switching
54/79
Y(J)S APS Slide 54
Ethernet rings ?
Ethernet has become carrier grade :
deterministic connection-oriented forwarding
OAM
synchronization
The only thing missing to completely replace SDH is ring protection
However, Ethernet and ring architectures dont go together Ethernet has no TTL, so looped traffic will loop forever
STP builds trees out of any architectureno loops allowed
There are two ways to make an Ethernet ring
open loop
cut the ring by blocking some link
when protection is required - block the failed link
closed loop
disable STP (but avoid infinite loops in some way !)
when protection is required - steerand/or wraptraffic
8/13/2019 Automatic Protection Switching
55/79
Y(J)S APS Slide 55
Ethernet ring protocols
Open loop methods G.8032 (ERPS)
rSTP (ex 802.1w)
RFER (RAD)
ERP (NSN)
RRST (based on RSTP)
REP (Cisco)
RRSTP (Alcatel)
RRPP (Huawei)
EAPS (Extreme, RFC 3619)
EPSR (Allied Telesis)
PSR (Overture)
Closed loop methods
RPR (IEEE 802.17)
CLEER and NERT (RAD)
8/13/2019 Automatic Protection Switching
56/79
Y(J)S APS Slide 56
G.8032
Q9 of SG15 produced G.8032 between 2006 and 2008
G.8032 is similar to G.8031
strives for 50 ms protection (< 1200 km, < 16 nodes)
but here this number is deceiving as MAC table is flushed
standard Ethernet format but incompatible with STP
uses Y.1731 CCM for failure detection employs Y.1731 extension for R-APS signaling (opcode=40)
R-APS message format similar to APS of G.8031
(but between every 2 nodes and to MAC address 01-19-A7-00-00-01)
revertive and nonrevertive operation defined
However, G.8032 is more complex due to
requirement to avoid loop creation under any circumstances
need to localize failures
need to maintain consistency between all nodes on ring
existence of a special node (RPL owner)
8/13/2019 Automatic Protection Switching
57/79
Y(J)S APS Slide 57
RPL
G.8032v1 defines the Ring Protection Link (RPL)as the link to be blocked (to avoid closing the loop) in NR state
One of the 2 nodes connected to the RPL
is designated the RPL owner
Unlike RFER
there is only one RPL owner
the RPL and owner are designated before setup
operation is usually revertive
All ring nodes are simultaneously in 1 of 2 modesidle or protecting
in idle mode the RPL is blocked in protecting mode the failed link is blocked and RPL is unblocked
in revertive operation
once the failure is cleared the block link is unblocked
and the RPL is blocked again
8/13/2019 Automatic Protection Switching
58/79
Y(J)S APS Slide 58
G.8032 revertive operation
In the idle state :
adjacent nodes exchange CCM at 300 per second rate (including over RPL) exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds (but notoverRPL)
R-APS messages are never forwarded
When a failure appears between 2 nodes node(s) missing CCM messagespeek twice with holdoff time
node(s) block failed link and flush MAC table node(s) send SF message (3 times @ max rate, then every 5 sec)
node receiving SF message will check priority and unblock any blocked link
node receiving SF message will send SF message to its other neighbor
in stable protecting state SF messages over every unblocked link
When the failure is cleared node(s) detect CCM and start guard timer (blocks acting on R-APS messages)
node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)
RPL owner receiving NR starts WTR timer
when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB
node receiving NR RB flushes table, unblocks any blocked ports, sends NR RB
G 8032 2010
8/13/2019 Automatic Protection Switching
59/79
G.8032-2010
After coming out with G.8032 in 2008 (G.8032v1)
the ITU came out with G.8032-2010 (G.8032v2) in 2010
This new version is not backwards-compatiblewith v1
but a v2 node must support v1 as well (but then operation is according to v1)
Major differences :
2 designated nodesRPL owner node and RPL neighbor node
and for optionalflush-optimization next neighbor node
significant changes to
state machine
priority logic
commands (forced/manual/clear) and protocol
new Wait To Block timer supports more general topologies (sub-rings)
ladders (For Further Study in v1)
multi-ring
ring topology discovery
virtual channel based on VLAN or MAC address
Y(J)S APS Slide 59
ring subringsubring
ladder
RPL
RPLowner
RPLneighbor
RPLnext
neighbor
8/13/2019 Automatic Protection Switching
60/79
Y(J)S APS Slide 60
RPR802.17
Resilient Packet Rings
are compatiblewith standard Ethernet, but different frame format are robust (lossless,
8/13/2019 Automatic Protection Switching
61/79
Y(J)S APS Slide 61
Basic RPR queuing
traffic from local source
sent according to fairness
first sent to ringlet selection
PTQ
STQ
AC B
fairness
AC B
traffic going around ringplaced into internal bufferin dual-transit queue mode
placed into 1 of 2 buffers
according to service class
sent according to fairness
traffic for local sinkplaced in output buffer
according to service class
Primary/Secondary Transit Queue
8/13/2019 Automatic Protection Switching
62/79
Y(J)S APS Slide 62
RPR service classes
class use info rate D/FDV FE
A0 RT reserved low No
A1 RT allocated,
reclaimable
low No
B-CIR near RT allocated,
reclaimable
bounded No
B-EIR near RT opportunistic unbounded Yes
C BE opportunistic unbounded Yes
RPR defines 3 main classes
class A : real time (low latency/FDV) class B : near real time (bounded predictable latency/FDV) class C : best effort
8/13/2019 Automatic Protection Switching
63/79
Y(J)S APS Slide 63
RPR Class use
A0 ring BW is reservednot reclaimed even if no traffic
in dual-transit queue mode:
class A frames from the ring are queued in PTQ
class B, C in STQ
priority for egress
frames in PTQ local class A frames
local class B (when no frames in PTQ)
frames in STQ
local class C (when no PTQ, STQ, local A or B)
Notes:class A have minimal delayclass B have higher priority than STQ transit frames, so bounded delay/FDVclasses B and C share STQ, so once in ring have similar delay
8/13/2019 Automatic Protection Switching
64/79
Y(J)S APS Slide 64
RPR - protection
rings give inherent protection against single point of failure
RPR specifies 2 mechanisms
steering
wrapping (optional)
(implementations may also do wrapping then steering)
wrap
steering info
8/13/2019 Automatic Protection Switching
65/79
Y(J)S APS Slide 65
NERT and CLEER
New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring
Similar to RPR but uses real Ethernet format
NERT and CLEER distinguish between
ring nodes
switches connected to ring nodes
Traffic in ring is MAC-in-MAC encapsulated External MACs are of ring node
Internal MACs are original
Unexpected external MACs discarded
External MACs learned as in 1ah
Ring nodes forward according to table
NERT floods, CLEER never floods
Protection switch only involves changing table
so service restoration isfast
ring nodes
switches
8/13/2019 Automatic Protection Switching
66/79
Y(J)S APS Slide 66
MPLS fast reroute
IP FRR
RFC 4090
8/13/2019 Automatic Protection Switching
67/79
Y(J)S APS Slide 67
IP FRR
True protection mechanisms do not exist for connectionless IPIn practice, routing protocols discover breaks and recalculate routes
but this usually takes a long time
Link-state IGPs detect link-down state using hellos
for OSPF - typically every 10 sec, and detection after 40 sec
and then Dijkstra algorithm avoids the failed link
BFD can be used to speed up the detection
However,
the information still has to be propagated further (seconds?)
and FIBs updated (100s of ms)
Various IP Fast ReRoute (IP FRR) mechanisms have been proposed
but true protection is best done at the MPLS level
f
8/13/2019 Automatic Protection Switching
68/79
Y(J)S APS Slide 68
MPLS fast reroute
RSVP-TE enables MPLS traffic engineering by fine control over placementspecifies explicit path using information gathered from IGP
resources may be reserved at LSRs along the way
RFC 4090 defines extensions to RSVP-TEFast ReRoute (FRR)
LSRs along the path preconfigure local bypasses (detours)
Upon detection of failure by
BFD (specified in microseconds, typically 10s of ms) or
RSVP hellos (RFC default is 5 ms) or
RESV / PATH messages (driven by IGP)
upstream LSR simply enables the detour
Since this is a local action, it should be fast
RFC 4090 only discusses adding FRR to RSVP-TE network
but its use with LDP is possible if there is a single label generator
not
discussed in
RFC 4090
d
8/13/2019 Automatic Protection Switching
69/79
Y(J)S APS Slide 69
PLRs and MPs
A fundamental entities in MPLS FRR are Point of Local Repair (PLR)
Merge Point (MP)
A PLR is the LSR before the failed element (link or node)
All LSRs except the egress LER can be PLRs
The PLR is solely responsible for the FRR (no explicit APS signaling)
During path setup, potential PLRs create detours towards the egress LER
A MP is the LSR where the detour rejoins the LSP
All LSRs except the ingress LER can be MPs
ingress
LER
egress
LERPLR MP
M h d
8/13/2019 Automatic Protection Switching
70/79
Y(J)S APS Slide 70
Methods
RFC 4090 defines two different protection methods
Usually one orthe other is employed in a given network
One-to-one backup
each LSP protected separately
detour LSP created for each LSP at each potential PLR
no labels pushed
Facility backup
backup tunnel for multiple LSPs bypass tunnel created at each potential PLR
uses label stacking
PLR MP
PLR MP
NHOP d NNHOP
8/13/2019 Automatic Protection Switching
71/79
Y(J)S APS Slide 71
NHOP and NNHOP
MPLS FRR can bypass a failed link or a failed node
In order to bypass a single failed link
we need an alternative path to the next hop (NHOP)
In order to bypass a single failed node, we need an alternative path to the
next next hop (NNHOP)
PLR MP
PLR MP
8/13/2019 Automatic Protection Switching
72/79
Y(J)S APS Slide 72
MPLS TP APS
RFC 6372 (MPLS-TP Survivability Framework)
RFC 6378 (MPLS-TP Linear Protection)
draft-ietf-mpls-tp-ring-protection
MPLS-TP resilience
8/13/2019 Automatic Protection Switching
73/79
Y(J)S APS Slide 73
MPLS-TP resilience
Since it strives to be a carrier-grade transport network
TP has strong protection switching requirements
APS has been almost as contentious issue as OAM
and indeed the arguments are inter-related
RFC 6372 gives a general framework
and differentiates between
linear
shared-mesh and ring protection
Linear protection
8/13/2019 Automatic Protection Switching
74/79
Y(J)S APS Slide 74
Linear protection
from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)
1+1, 1:1, 1:n and uni/bidi are supported
APS signaling protocol (for all modes except 1+1 uni)
is single-phase
and called the Protection State Coordination protocol
PSC messages are sent over the protection channelAPS messages are sent over the GACh with a single channel type
message functions identified by a request field
6 states: normal, protecting due to failure, admin protecting,
WTR, protection path unavailable, DNRwhen revertive, a WTR timer is used
PSC message format
8/13/2019 Automatic Protection Switching
75/79
Y(J)S APS Slide 75
PSC message format
Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR
PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n
R = Revertive
FPath = which path has fault Path = which data path is on protection channel
0001 VER 00000000 PSC channel type
Ver Request PT R Res FPath Path
GAL Label (13) TC S=1 TTL GAL
GACh
PSCTLV Length Res
Optional TLVs
PSC control logic states
8/13/2019 Automatic Protection Switching
76/79
Y(J)S APS Slide 76
PSC control logic states
Normal state - no trigger events reported
Unavailable state - protection path is unavailable
Protecting failure state
traffic is being transported on the protection path
Protecting administrative stateoperator issued command switching traffic to protection path
Wait-to-Restore state - recovering from working path SF/SD
WTR timer not up
Do-not-Revert state - recovered from a protecting statebut operator has configured DNR
PSC local requests
8/13/2019 Automatic Protection Switching
77/79
Y(J)S APS Slide 77
PSC local requests
In order from highest to lowest priority :
1. Clear (operator command)
2. Lockout of protection (operator command)
3. Forced Switch (operator command)
4. Signal Fail on protection (OAM / control-plane / server indication)
5. Signal Fail on working (OAM / control-plane / server indication)
6. Signal Degrade on working (OAM / control-plane / server indication)
7. Clear Signal Fail/Degrade (OAM / control-plane / server indication)
8. Manual Switch (operator command)
9. WTR Expires (WTR timer)10. No Request (default)
Linear protection ITU style
8/13/2019 Automatic Protection Switching
78/79
Y(J)S APS Slide 78
Linear protection ITU style
from draft-zulr-mpls-tp-linear-protection-switching
Similar to previous, but uses Y.1731/G.8031 format (no surprise!)
0001 VER 00000000 allocated channel type
GAL Label (13) TC S=1 TTL GAL
GACh
G.8031
MEL VER OPCODE=39 FLAGS=0 OFFSET=4
req
state
prot
type
requested
sigbridged
sigreserved
END=0
Ring protection
8/13/2019 Automatic Protection Switching
79/79
Ring protection
once again there weretwo drafts, both supporting
p2p and p2mp, wrapping and steering, link/node failures
draft-ietf-mpls-tp-ring-protection (not yetRFC)Between any 2 LSRs can define a Sub-Path Maintenance Entity
So between 2 LSRs on a ring there are 2 SPMEs
we define 1 as the working channel and 1 as the protection channelNow we re-use the linear protection mechanisms, including the PSC protocol
draft-helvoort-mpls-tp-ring-protection-switchingBoth counter-rotating rings carry working and protection traffic
The bandwidth on each ring is divided
X BW is dedicated to working traffic and Y dedicated to protection trafficThe protection bandwidth of one ring is used to protect the other ring
Each node should have information about the sequence of ring nodes
MPLS-TP Ring Protection Switching is G.8032-like, but forwards non-NR msgs