Automatic Protection Switching

8/13/2019 Automatic Protection Switching

1/79

AutomaticProtection Switching

Yaakov (J) SteinCTORAD Data Communications

Mar 2012


2/79

Y(J)S APS Slide 2

Course Outline

General protection switching principles

Examples of protection mechanisms

SONET/SDH

Ethernet linear protection

Ethernet ring protection

MPLS fast reroute

MPLS-TP APS


3/79

Y(J)S APS Slide 3

General principles

Definition

References

Traffic types

Network topologies

Triggers

Protection classes

Entities

Protection types

Signaling


4/79

Y(J)S APS Slide 4

Definition

Automatic Protection Switching (APS)is a functionality of carrier-grade transport networks

is often called resilience

since it enables service to quickly recover from failures

is required to ensure high reliability and availability

APS includes :

detection offailures(signal fail or signal degrade) on a working channel

switching traffic transmissionto aprotection channel

selecting traffic receptionfrom the protection channel (optionally) reverting back to the working channel once failure is repaired

Automaticmeans uses (at most) control plane protocols

no management layeror manual operations needed


5/79

Y(J)S APS Slide 5

Some useful references

G.808.1generic linear protectionG.808.2generic ring protection (not yet written)

G.841 and G.842SDH

G.774.3/4/9/10SDH protection management

G.870 and G.873.1OTN

G.8031Ethernet linear protectionG.8032Ethernet ring protection

G.8131T-MPLS APS

Y.1720MPLS

I.630ATM

M.495analog signal protection

G.781clock selection (can be used to protect synchronization)

RFC 4090MPLS Fast ReRoute

RFC 6372MPLS-TP Survivability Framework

RFC 6378MPLS-TP Linear Protection


6/79

Y(J)S APS Slide 6

Traffic types

In a network with APS capabilities, there are three types of traffic :

protectedtraffic

traffic that may be rapidly switched to protection channel at any time it may be on the working channel or protection channel

Nonpreemptible Unprotected Traffic (NUT) noncritical traffic that does not require protection mechanism

not affected by protection mechanism

somewhat less expensive to customer

extra(preemptible) traffic best effort background traffic that runs on protection channel

preempted (blocked) when protection channel is needed

very inexpensive to customer


7/79Y(J)S APS Slide 7

Network topologies

APS can be defined for any topology with redundant links

e.g., for tree topologies no protection is possibleWe will often discuss protection of individual links

However, there are two topologies that are of particular interest :

rings

protection is natural for rings although there are other reasons for using rings as well

rings are so important that protection for other topologies

is often called linear protection

dense meshes for this topology multiple local bypasses can be preconfigured

protection switching is similar to routing change, but faster

often called Fast ReRoute (FRR)



Triggers

Protection switching is usually triggered by afailure

although the operator may manuallyforcea protection switch

Afailureis declared when a fault condition

persists long enoughfor the ability to perform the required function

to be considered terminated

Failures are Signal Fail (SF) or Signal Degrade (SD)(of various types)

and may be :

detected by physical layer indicated by signaling (e.g. AIS)

detected by OAM mechanisms

When there is no SF or SD, the state is called No Request (NR)



Switching time (1)

SONET/SDH protection switching takes place in under 50 ms

Regarding multiplex section shared protection rings, G.841 states :

The following network objectives apply:

1) Switch time In a ring with no extra traffic, all nodes in the idle state (no detected failures,

no active automatic or external commands, and receiving only Idle K-bytes), and with less

than 1200 km of fibre, the switch (ring and span) completion time for a failure on a singlespan shall be less than 50 ms. On rings under all other conditions, the switch completion

time can exceed 50 ms (the specific interval is under study) to allow time to remove extra

traffic, or to negotiate and accommodate coexisting APS requests.

while for linear VC trail protection, it says :

The following network objectives apply:

1) Switch time

The APS algorithm for LO/HO VC trail protection shall operate as fast as

possible. A value of 50 ms has been proposed as a target time. Concerns have been

expressed over this proposed target time when many VCs are involved. This is for further

study. Protection switch completion time excludes the detection time necessary to initiate the

protection switch, and the hold-off time.

There are similar statements in other clauses as well



Switching time (2)

This 50 ms time has become the golden standard

and new protection schemes are expected to meet this objectiveHowever, studying the literature that lead up to SONET/SDH standards

shows that the objective was to attain the minimum possible time

for the sum of persistent (i.e. non-transient) failure detection

speed of light propagation signaling protocol time

regaining sync alignment

and 50 ms was the minimum that was considered practical !

Many modern standards have built in 50 ms

and much marketing literature boasts faster than 50 msBut there is really nothing special about 50 ms

50 ms gaps in voiced speech are noticeable,

but not fatal if infrequent

50 ms of data at high rates can not be stored and later forwarded

timing circuits can withstand much more than 50 ms without clock



Protection classes

It is useful to distinguish two different protection classes

path protection (AKA trail protection, end-to-end protection)

when a failure is detected on the end-to-end path

we switch to an alternative end-to-end path

the failure is usually detected by end-to-end OAM

local protection (AKA local restoration, SNC protection, bypass, detour)

we protect individual network elements, links, or groups of same when such an entity fails

only that local entity is bypassed

the failure may be detected by link OAM or physical layer means



APS entities (1)

The following entities are important in APS

working channelchannel used when no failure exists

protection channelchannel used when a failure exists

head-endentity transmitting data to working/protection channel

tail-endentity receiving data from the working/protection channel

Note: we will usually consider traffic to be bidirectional

so that the head-end for one direction

is the tail-end for the opposite direction

head-end tail-end

working channel

protection channel



APS entities (2)

Bridgefunction at head-end that connects traffic (including extra traffic) to the

working and protection channels

Selectorfunction at tail-end that extracts traffic (perhaps extra traffic) from

the working or protection channel

APS signaling channelchannel used to communicate between head-

end and tail-end for APS purposes

Trail terminationfunction responsible for failure detection

including injection and extraction of OAM

head-end

(bridge)

tail-end

(selector)

working channel

protection channel

signaling channel



Revertive operation

Reversion means returning to use the working channel

after the failure has been rectified

Protection mechanisms can be revertiveor nonrevertive

Revertive mechanisms may be preferable

when the working channel has better performance (free BW, BER, delay)

when there are frequent switches (easier to manage)

when there is extra traffic

but nonrevertive also has advantages only one service disruption due to protection switching

may be simpler to implement



Uni/bi-directional

We will usually consider bidirectional traffic

but even then the failures can be uni- or bidirectional

and for unidirectional failures there can be uni- or bidirectional switching

unidirectional

failure

bidirectional

failure

working channel

protection channel in use

working channel

protection channel

unidirectional

protection

working channel


working channel


bidirectionalprotection



Uni- / bidirectional switching

Unidirectional switching may be advantageous

for 1+1 - faster and no signaling channel is needed

no unnecessary service disruption for direction without failure

higher chance of protection under multiple failures

easier to implement for local protection

maintains extra traffic in direction without failure

But bidirectional may be preferable

easier management since directions traverse same network elements

does not disrupt delay balance between direction may simplify repair since failed spans are unused



Protection types

We distinguish several different protection types

1+1

1:1

1:n

m:n (1:1)n

Each type has its applicability, advantages, and disadvantages

and there are trade-offs between

simplicity

BW consumption

protection switch time

signaling requirements



1+1 protection

Simplest and fastest form of protection

but wasteful - only 50% of actual physical capacity is used

Head-end bridge always sends data on both channels

Tail-end selector chooses channel to use (based on BER, dLOS, etc.)

For unidirectional1+1 switching there is no need for APS signaling

If non-revertive

there is no distinction between working and protection channels

channel A

channel B



1:1 protection

Head-end bridge usually sends data on working channelWhen failure detected it starts sending data over protection channel

and tail-end needs to select the protection channel

When not in use, protection channel can be used for extra traffic

However, since failure is detected by tail-end, APS signaling is needed

Protection channel should have OAM running to ensure its functionality

working channel

protection channel

extra traffic

APS signaling



1:n protection

One protection channel is allocated for n working channels

Only can protect one working channel at a time

but improbable that more than 1 working channel will simultaneously fail

Only 1/(n+1) of total capacity is reserved for protection

working channels

protection channel



m:n protection

To enable protection of more than 1 channel

m protection channels are allocated for n working channels (m < n)

m simultaneous failures can be protected

Less protection capacity dedicated than for n times 1:1

When failure detected,

1 of the m protection channels need to be assigned and signaled

High complexity but conserves resources

working channels

protection channels



(1:1)nprotection

This is like n times 1:1 but the n protection channels share bandwidth

Only 1 failed working channel can be protected

This is different from 1:n since

n protection channels are preconfigured

n working channels need not be of the same type

Protection bandwidth must be at least that of the largest working channel


23/79

Y(J)S APS Slide 23

APS algorithm

We have seen that protection switching is a tricky business

So it is not surprising that network elements that support APS

run anAPS algorithm

This algorithm inputs : configuration (protection type, revertive?, available channels, )

failure indications (NR, SF, SD)

operator commands

APS signaling (more on that soon)

and makes switching decisions

The algorithm maintains state information for head-end and tail-end

APS algorithms are detailed in standards documents


24/79

Y(J)S APS Slide 24

Priority

Not every failure event / operator command results in a protection switch

For example

in 1:n protection the protection channel may already be in use !

Conflicts are resolved by assigning priorities to events/commands

When an event is detected or a command received

the APS algorithm will notact

if an event/command or equal or higher priority is already in effect

True failure conditions usually have higher priority than manual commands


25/79

Y(J)S APS Slide 25

Timers

Even failure events with priority are not acted upon immediately

to do so would cause unnecessary switches after transient defects

The APS algorithm may maintains several timers, such as

Holdoff timers

the time between detection of a SF or SD event

and the APS algorithm acting upon this even the algorithm usually used is called peek twice

i.e., the condition is checked again after the timer expires

Wait To Restore timer

for revertive switching, the time between detection of the failure being

cleared and the APS algorithm acting upon this event also used in SDH optimized bidirectional 1+1 (nonrevertive)

Guard timer

for ringsblockout time during which APS messages are ignored (sincethey may be old and outdated)


26/79

Y(J)S APS Slide 26

APS signaling

In all types except unidirectional 1+1, some APS signaling is needed

APS signaling is used to synchronize between head-end and tail-end

It is critical that head-end and tail-end always be in the same state

Example messages include :

No Request (NR)

by tail-end to inform head-end of Signal Failure (SF)

by head-end to confirm the events priority

by head-end to report the particular protection channel

by head-end to inform tail-end of Reverse (bidirectional) Request (RR)

by tail-end after failure cleared to Wait To Restore (WTR)

by tail-end after failure cleared to Do Not Revert (DNR) for nonrevertive


27/79

Y(J)S APS Slide 27

APS signaling phases

When APS signaling is used, it needs to be as rapid as possible

Depending on the scenario it may be

1-phase tailhead (fastest)

tail-end informs head-end of failure

both ends uniquely know the protection channel to be used

only for 1+1 and unidirectional-(1:1)n (including 1:1)

2-phase 1) tailhead 2) headtail

tail-end informs head-end of failure

head-end signals that it has switched to protection channel

not for bidirectional-1:n or m:n

3-phase 1) tailhead 2) headtail3) tailhead (slowest)

works for all protection types (including m:n)


28/79

Y(J)S APS Slide 28

Examples of 1-phase

Example of when 1-phase signaling is possible is 1:1 or (1:1)n

1. upon detection of failure the tail-end sends SF to the head-end

and immediately changes its selector (blind switch)

upon receipt the head-end changes the bridge setting

(no priority is checked)

1-phase can also be used for bidirectional 1:1


and immediately changes both its selector and bridge

upon receipt the head-end changes its bridge and selector


29/79

Y(J)S APS Slide 29

Example of 2-phase

2-phase is useful for unidirectional 1:n with priority checking


but does not change its selector

2. the head-end checks priority

sends confirmation to tail-end (with identity of working channel)

the bridge setting is changed

3. the tail-end changes its selector


30/79

Y(J)S APS Slide 30

Example of 3-phase

3-phase signaling is imperative for bidirectional 1:n


but does not change its selector

2. the head-end checks priority, and sends confirmation to tail-end

head-end changes its bridge setting

and also sends a reverse request

3. the tail-end changes selector

checks priority and sends confirmation to head-end

tail-end changes its bridge setting (as head-end of opposite direction)

head-end receives confirmation and changes its selector


31/79

Y(J)S APS Slide 31

For G.805 buffs

to add 1+1 trail protection to a trail - expand a trail termination functionwe use a special transport processing function - the protection switch

unprotectedtrail

the unprotected TTs report status

to the protection switch

protected trail


32/79

Y(J)S APS Slide 32

SONET/SDH APS


33/79

Y(J)S APS Slide 33

SONET protection ?

SONET/SDH networks need to be highly reliable (five nines)

Down-time should be minimal (less than 50 msec)

So systems mustrepair themselves (no time for manual intervention)

Upon detection of a failure (dLOS, dLOF, high BER)

the network must reroute traffic (protection switching)

from working channelto protection channelSDH APS is unidirectional

SDH APS maybe revertive

head-end NE tail-end NE

working channel

protection channel


34/79

Y(J)S APS Slide 34

SONET/SDH layers

Between regenerators there are sections (regenerator sections)

Between ADMs there are lines (multiplex sections)

Between path terminations there are paths

Protection can be at OC-n level (different physical fibers)

or at STM/VC level

or end-to-end path (trail protection)

Path

Termination

Path

Termination

Line

Termination

Line

Termination

Section

Termination

path

line line (MS section) line

ADM ADMregenerator

section section sectionsection


35/79

Y(J)S APS Slide 35

Synchronous Payload Envelope

Line APS

9rows

TOH

6r

ows

3rows

90 columns

9rows

TOH consists of

3 rows of section overhead - frame sync, trace, EOC,

6 rows of line overhead - pointers, SSM, FEBE, and

Line APS signaling uses bytes K1 and K2

A1 A2 J0

B1 E1 F1

D1 D2 D3

H1 H2 H3

B2 K1 K2

D4 D5 D6

D7 D8 D9

DA DB DC

S1 M0 E2


36/79

Y(J)S APS Slide 36

HO Path APS

POH is responsible for type, status, path performance monitoring, VCAT, trace

HO Path APS signaling uses 4 MSBs of byte K3

J1

B3

C2

G1

F2

H4F3

K3

N1

POH


37/79

Y(J)S APS Slide 37

LO Path APS

VC OH is responsible for

Timing, PM, REI,

LO Path APS signaling is

4 MSBs of byte K4

1 875930

V5

J2

N2

K4

VC OH

V1

V2

V3

V4


38/79

Y(J)S APS Slide 38

How does it work?

Head-end and tail-end NEs have bridges (muxes)

Head-end and tail-end NEs maintain bidirectional signaling channel

Signaling is contained in Kbytes ofprotectionchannel

For lineAPS

K1tail-end status and requests K2head-end status

head-end bridge tail-end bridge

working channel

protection channel signaling channel


39/79

Y(J)S APS Slide 39

Linear 1+1 protection

Can be at OC-n level (different physical fibers)or at STM/VC level (SubNetwork Connection Protection)

or end-to-end path (called trail protection)

Head-end bridge always sends data on both channelsTail-end chooses channel to use based on BER, dLOS, etc.

No need for signaling

If non-revertive

there is no distinction between working and protection channels

head-end NE tail-end NE

working channel

protection channel


40/79

Y(J)S APS Slide 40

Linear 1:1 protection

Head-end bridge usually sends data on working channelWhen tail-end detects failure it signals (using K1)to head-end

Head-end then starts sending data over protection channel

When not in use

protection channel can be used for (discounted) extra traffic(pre-emptible unprotected traffic)

May be at any layer (but only OC-n level protects against fiber cuts)

working channel

protection channel

extra traffic


41/79

Y(J)S APS Slide 41

Linear 1:N protection

In order to save BW

we allocate 1 protection channel for every N working channels

N limited to 14

4 bits in K1 byte from tail-end to head-end 0 protection channel

1-14 working channels 15 extra traffic channel

working channels

protection channel


42/79

Y(J)S APS Slide 42

Two fiber vs. Four-fiber rings

Ring based protection is popular in North America (100K+ rings)

Full protection against physical fiber cutsSimpler and less expensive than mesh topologies

Protection at line (multiplexed section) or path layer

Four-fiber ringsfully redundant at OC level

can support bidirectional routing at line layerTwo-fiber rings

support unidirectional routing at line layer

2 fibers in opposite directions

U idi i l bidi i l


43/79

Y(J)S APS Slide 43

Unidirectional vs. bidirectional

Unidirectional routingworking channel B-A same direction (e.g. clockwise)as A-Bmanagement simplicity: A-B and B-A can occupy same timeslotsInefficient: waste in ring BW and excessive delay in one direction

Bidirectional routingA-B and B-1 are opposite in directionboth using shortest route

spatial reuse: timeslots can be reused in other sections

A

BA-B

B-A

A

B

B-A

A-B

C

B-C

C-B


44/79

Y(J)S APS Slide 44

UPSR vs. BLSR (MS-SPRing)

Of all the possible combinations, only a few are in use

Unidirectional (routing) Path Switched Rings

protects tributariesextension of 1+1 to ring topology

Bidirectional (routing) Line Switched Rings (two-fiber and four-fiber versions)

called Multiplex Section Shared Protection Ring in SDHsimultaneously protects all tributaries in STMextension of 1:1 to ring topology

Path switching

Line switching

Two-fiber

Four-fiber

Unidirectional

Bidirectional

UPSR

BLSR


45/79

Y(J)S APS Slide 45

UPSR

Working channel is in one direction

protection channel in the opposite directionAll path traffic is added in both directions (1+1)

decision as to which to use is made at drop point (no signaling)

Normally non-revertive, so effectively two diversitypaths

Good match for access networks1 access resilient ring

less expensive than fiber pair per customer

Inefficient for core networks

no spatial reuse

every signal in every spanin both directions

node needs to continuously monitorevery tributary to be dropped

SONET ADM

2 rings


46/79

Y(J)S APS Slide 46

BLSR

Switch at line levelless monitoring

When failure detected tail-end NE signals head-end NE

Works for unidirectional/bidirectional fiber cuts, and NE failures

Two-fiber versionhalf of OC-N capacity devoted to protectiononly half capacity available for traffic

Four-fiber version

full redundant OC-N devoted to protection

twice as many NEs as compared to two-fiber

Example

recovery from unidirectional fiber cut

wrap-around

2 rings


47/79

Y(J)S APS Slide 47

Ethernet linear APS

STP

LAG

G.8031


48/79

Y(J)S APS Slide 48

STP

The original Spanning Tree Protocol automatically removed loops

from arbitrary networks (with loops)

However, its convergence was very slow (about a minute)

STP can not be used as a protection mechanism

since its reconvergence time is very long

due to a cumbersome protocoland long holdoff timer settings

An evolutionary update called Rapid STP802.1w

was incorporated into 802.1D-2004 clause 17

that converges in about the same time as STP

but can reconverge after a topology change in less than 1 second

RSTP can be used to detect failures and reconverge

and thus can be used as a primitive protection mechanism

However, the switching time will be many tens of ms to 100s of ms


49/79

Y(J)S APS Slide 49

Use of LAG

Ethernet link aggregation (AKA bonding, Ethernet trunk, inverse mux, NIC teaming)

enables bonding several ports together as single uplink

Defined by 802.3ad task force and folded into 802.3-2000 as clause 43

Binding of ports to Link Aggregation Groups (LAGs) distributed via

Link Aggregation Control Protocol (LACP)

LACP uses slow protocol frames (up to 5 per second)

Links may be dynamically added/removed from LAG

and LACP continuously monitors to detect if changes needed

Upon link failure LAG delivers traffic at a reduced rate

Thus LAG can be used as a primitive protection mechanism

When used this way it is called worker/standby orN+N mode

The restoration time will be on the order of 1 second


50/79

Y(J)S APS Slide 50

G.8031

Q9 of SG15 in the ITU-T is responsible for protection switching

In 2006 it produced G.8031 Linear Ethernet Protection Switching

G.8031 uses standard Ethernet formats, but is incompatible with STP

The standard addresses

point-to-point VLAN connections

SNC (local) protection class 1+1 and 1:1 protection types

unidirectional and bidirectional switching for 1+1

bidirectional switching for 1:1

revertive and nonrevertive modes

1-phase signaling protocol

G.8031 uses Y.1731 OAM CCM messages in order to detect failures

G.8031 defines a new OAM opcode (39) for APS signaling messages

Switching times should be under 50 ms (only holdoff timers when groups)


51/79

Y(J)S APS Slide 51

G.8031 signaling

The APS signaling message looks like this :

regular APS messages are sent 1 per 5 seconds

after change 3 messages are sent at max rate (300 per sec)

where

req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)

prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.)

requested and bridged signal identify incoming / outgoing traffic

since only 1+1 and 1:1 they are either null or traffic (all other values reserved)

MEL

(3b)

VER=0

(5b)

OPCODE=39

(1B)

FLAGS=0

(1B)

OFFSET=4

(1B)

req/state

(4b)

prot. type

(4b)

requested sig

(1B)

bridged sig

(1B)

reserved

(1B)

END=0

(1B)


52/79

Y(J)S APS Slide 52

G.8031 1:1 revertive operation

In the normal (NR) state :

head-end and tail-end exchange CCM (at 300 per second rate)on both working and protection channels

head-end and tail-end exchange NR APS messages

on the protection channel (every 5 seconds)

When a failure appears in the working channel tail-end stops receiving 3 CCM messages on working channel

tail-end enters SF state

tail-end sends 3 SF messages at 300 per second on the APS channel

tail-end switches selector (bi-d and bridge)to the protection channel

head-end (receiving SF)switches bridge (bi-d and selector)to protection channel tail-end continues sending SF messages every 5 seconds

head-end sends NR messages but with bridged=normalWhen the failure is cleared tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min)

tail-end sends WTR message to head-end (in nonrevertive - DNR message)

tail-end sends WTR every 5 seconds

when WTR expires both sides enter NR state


53/79

Y(J)S APS Slide 53

Ethernet ring APS

G.8032

RPR

CLEER


54/79

Y(J)S APS Slide 54

Ethernet rings ?

Ethernet has become carrier grade :

deterministic connection-oriented forwarding

OAM

synchronization

The only thing missing to completely replace SDH is ring protection

However, Ethernet and ring architectures dont go together Ethernet has no TTL, so looped traffic will loop forever

STP builds trees out of any architectureno loops allowed

There are two ways to make an Ethernet ring

open loop

cut the ring by blocking some link

when protection is required - block the failed link

closed loop

disable STP (but avoid infinite loops in some way !)

when protection is required - steerand/or wraptraffic


55/79

Y(J)S APS Slide 55

Ethernet ring protocols

Open loop methods G.8032 (ERPS)

rSTP (ex 802.1w)

RFER (RAD)

ERP (NSN)

RRST (based on RSTP)

REP (Cisco)

RRSTP (Alcatel)

RRPP (Huawei)

EAPS (Extreme, RFC 3619)

EPSR (Allied Telesis)

PSR (Overture)

Closed loop methods

RPR (IEEE 802.17)

CLEER and NERT (RAD)


56/79

Y(J)S APS Slide 56

G.8032

Q9 of SG15 produced G.8032 between 2006 and 2008

G.8032 is similar to G.8031

strives for 50 ms protection (< 1200 km, < 16 nodes)

but here this number is deceiving as MAC table is flushed

standard Ethernet format but incompatible with STP

uses Y.1731 CCM for failure detection employs Y.1731 extension for R-APS signaling (opcode=40)

R-APS message format similar to APS of G.8031

(but between every 2 nodes and to MAC address 01-19-A7-00-00-01)

revertive and nonrevertive operation defined

However, G.8032 is more complex due to

requirement to avoid loop creation under any circumstances

need to localize failures

need to maintain consistency between all nodes on ring

existence of a special node (RPL owner)


57/79

Y(J)S APS Slide 57

RPL

G.8032v1 defines the Ring Protection Link (RPL)as the link to be blocked (to avoid closing the loop) in NR state

One of the 2 nodes connected to the RPL

is designated the RPL owner

Unlike RFER

there is only one RPL owner

the RPL and owner are designated before setup

operation is usually revertive

All ring nodes are simultaneously in 1 of 2 modesidle or protecting

in idle mode the RPL is blocked in protecting mode the failed link is blocked and RPL is unblocked

in revertive operation

once the failure is cleared the block link is unblocked

and the RPL is blocked again


58/79

Y(J)S APS Slide 58

G.8032 revertive operation

In the idle state :

adjacent nodes exchange CCM at 300 per second rate (including over RPL) exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds (but notoverRPL)

R-APS messages are never forwarded

When a failure appears between 2 nodes node(s) missing CCM messagespeek twice with holdoff time

node(s) block failed link and flush MAC table node(s) send SF message (3 times @ max rate, then every 5 sec)

node receiving SF message will check priority and unblock any blocked link

node receiving SF message will send SF message to its other neighbor

in stable protecting state SF messages over every unblocked link

When the failure is cleared node(s) detect CCM and start guard timer (blocks acting on R-APS messages)

node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)

RPL owner receiving NR starts WTR timer

when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB

node receiving NR RB flushes table, unblocks any blocked ports, sends NR RB

G 8032 2010


59/79

G.8032-2010

After coming out with G.8032 in 2008 (G.8032v1)

the ITU came out with G.8032-2010 (G.8032v2) in 2010

This new version is not backwards-compatiblewith v1

but a v2 node must support v1 as well (but then operation is according to v1)

Major differences :

2 designated nodesRPL owner node and RPL neighbor node

and for optionalflush-optimization next neighbor node

significant changes to

state machine

priority logic

commands (forced/manual/clear) and protocol

new Wait To Block timer supports more general topologies (sub-rings)

ladders (For Further Study in v1)

multi-ring

ring topology discovery

virtual channel based on VLAN or MAC address

Y(J)S APS Slide 59

ring subringsubring

ladder

RPL

RPLowner

RPLneighbor

RPLnext

neighbor


60/79

Y(J)S APS Slide 60

RPR802.17

Resilient Packet Rings

are compatiblewith standard Ethernet, but different frame format are robust (lossless,


61/79

Y(J)S APS Slide 61

Basic RPR queuing

traffic from local source

sent according to fairness

first sent to ringlet selection

PTQ

STQ

AC B

fairness

AC B

traffic going around ringplaced into internal bufferin dual-transit queue mode

placed into 1 of 2 buffers

according to service class

sent according to fairness

traffic for local sinkplaced in output buffer

according to service class

Primary/Secondary Transit Queue


62/79

Y(J)S APS Slide 62

RPR service classes

class use info rate D/FDV FE

A0 RT reserved low No

A1 RT allocated,

reclaimable

low No

B-CIR near RT allocated,

reclaimable

bounded No

B-EIR near RT opportunistic unbounded Yes

C BE opportunistic unbounded Yes

RPR defines 3 main classes

class A : real time (low latency/FDV) class B : near real time (bounded predictable latency/FDV) class C : best effort


63/79

Y(J)S APS Slide 63

RPR Class use

A0 ring BW is reservednot reclaimed even if no traffic

in dual-transit queue mode:

class A frames from the ring are queued in PTQ

class B, C in STQ

priority for egress

frames in PTQ local class A frames

local class B (when no frames in PTQ)

frames in STQ

local class C (when no PTQ, STQ, local A or B)

Notes:class A have minimal delayclass B have higher priority than STQ transit frames, so bounded delay/FDVclasses B and C share STQ, so once in ring have similar delay


64/79

Y(J)S APS Slide 64

RPR - protection

rings give inherent protection against single point of failure

RPR specifies 2 mechanisms

steering

wrapping (optional)

(implementations may also do wrapping then steering)

wrap

steering info


65/79

Y(J)S APS Slide 65

NERT and CLEER

New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring

Similar to RPR but uses real Ethernet format

NERT and CLEER distinguish between

ring nodes

switches connected to ring nodes

Traffic in ring is MAC-in-MAC encapsulated External MACs are of ring node

Internal MACs are original

Unexpected external MACs discarded

External MACs learned as in 1ah

Ring nodes forward according to table

NERT floods, CLEER never floods

Protection switch only involves changing table

so service restoration isfast

ring nodes

switches


66/79

Y(J)S APS Slide 66

MPLS fast reroute

IP FRR

RFC 4090


67/79

Y(J)S APS Slide 67

IP FRR

True protection mechanisms do not exist for connectionless IPIn practice, routing protocols discover breaks and recalculate routes

but this usually takes a long time

Link-state IGPs detect link-down state using hellos

for OSPF - typically every 10 sec, and detection after 40 sec

and then Dijkstra algorithm avoids the failed link

BFD can be used to speed up the detection

However,

the information still has to be propagated further (seconds?)

and FIBs updated (100s of ms)

Various IP Fast ReRoute (IP FRR) mechanisms have been proposed

but true protection is best done at the MPLS level

f


68/79

Y(J)S APS Slide 68

MPLS fast reroute

RSVP-TE enables MPLS traffic engineering by fine control over placementspecifies explicit path using information gathered from IGP

resources may be reserved at LSRs along the way

RFC 4090 defines extensions to RSVP-TEFast ReRoute (FRR)

LSRs along the path preconfigure local bypasses (detours)

Upon detection of failure by

BFD (specified in microseconds, typically 10s of ms) or

RSVP hellos (RFC default is 5 ms) or

RESV / PATH messages (driven by IGP)

upstream LSR simply enables the detour

Since this is a local action, it should be fast

RFC 4090 only discusses adding FRR to RSVP-TE network

but its use with LDP is possible if there is a single label generator

not

discussed in

RFC 4090

d


69/79

Y(J)S APS Slide 69

PLRs and MPs

A fundamental entities in MPLS FRR are Point of Local Repair (PLR)

Merge Point (MP)

A PLR is the LSR before the failed element (link or node)

All LSRs except the egress LER can be PLRs

The PLR is solely responsible for the FRR (no explicit APS signaling)

During path setup, potential PLRs create detours towards the egress LER

A MP is the LSR where the detour rejoins the LSP

All LSRs except the ingress LER can be MPs

ingress

LER

egress

LERPLR MP

M h d


70/79

Y(J)S APS Slide 70

Methods

RFC 4090 defines two different protection methods

Usually one orthe other is employed in a given network

One-to-one backup

each LSP protected separately

detour LSP created for each LSP at each potential PLR

no labels pushed

Facility backup

backup tunnel for multiple LSPs bypass tunnel created at each potential PLR

uses label stacking

PLR MP

PLR MP

NHOP d NNHOP


71/79

Y(J)S APS Slide 71

NHOP and NNHOP

MPLS FRR can bypass a failed link or a failed node

In order to bypass a single failed link

we need an alternative path to the next hop (NHOP)

In order to bypass a single failed node, we need an alternative path to the

next next hop (NNHOP)

PLR MP

PLR MP


72/79

Y(J)S APS Slide 72

MPLS TP APS

RFC 6372 (MPLS-TP Survivability Framework)

RFC 6378 (MPLS-TP Linear Protection)

draft-ietf-mpls-tp-ring-protection

MPLS-TP resilience


73/79

Y(J)S APS Slide 73

MPLS-TP resilience

Since it strives to be a carrier-grade transport network

TP has strong protection switching requirements

APS has been almost as contentious issue as OAM

and indeed the arguments are inter-related

RFC 6372 gives a general framework

and differentiates between

linear

shared-mesh and ring protection

Linear protection


74/79

Y(J)S APS Slide 74

Linear protection

from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)

1+1, 1:1, 1:n and uni/bidi are supported

APS signaling protocol (for all modes except 1+1 uni)

is single-phase

and called the Protection State Coordination protocol

PSC messages are sent over the protection channelAPS messages are sent over the GACh with a single channel type

message functions identified by a request field

6 states: normal, protecting due to failure, admin protecting,

WTR, protection path unavailable, DNRwhen revertive, a WTR timer is used

PSC message format


75/79

Y(J)S APS Slide 75

PSC message format

Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR

PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n

R = Revertive

FPath = which path has fault Path = which data path is on protection channel

0001 VER 00000000 PSC channel type

Ver Request PT R Res FPath Path

GAL Label (13) TC S=1 TTL GAL

GACh

PSCTLV Length Res

Optional TLVs

PSC control logic states


76/79

Y(J)S APS Slide 76

PSC control logic states

Normal state - no trigger events reported

Unavailable state - protection path is unavailable

Protecting failure state

traffic is being transported on the protection path

Protecting administrative stateoperator issued command switching traffic to protection path

Wait-to-Restore state - recovering from working path SF/SD

WTR timer not up

Do-not-Revert state - recovered from a protecting statebut operator has configured DNR

PSC local requests


77/79

Y(J)S APS Slide 77

PSC local requests

In order from highest to lowest priority :

1. Clear (operator command)

2. Lockout of protection (operator command)

3. Forced Switch (operator command)

4. Signal Fail on protection (OAM / control-plane / server indication)

5. Signal Fail on working (OAM / control-plane / server indication)

6. Signal Degrade on working (OAM / control-plane / server indication)

7. Clear Signal Fail/Degrade (OAM / control-plane / server indication)

8. Manual Switch (operator command)

9. WTR Expires (WTR timer)10. No Request (default)

Linear protection ITU style


78/79

Y(J)S APS Slide 78

Linear protection ITU style

from draft-zulr-mpls-tp-linear-protection-switching

Similar to previous, but uses Y.1731/G.8031 format (no surprise!)

0001 VER 00000000 allocated channel type

GAL Label (13) TC S=1 TTL GAL

GACh

G.8031

MEL VER OPCODE=39 FLAGS=0 OFFSET=4

req

state

prot

type

requested

sigbridged

sigreserved

END=0

Ring protection


79/79

Ring protection

once again there weretwo drafts, both supporting

p2p and p2mp, wrapping and steering, link/node failures

draft-ietf-mpls-tp-ring-protection (not yetRFC)Between any 2 LSRs can define a Sub-Path Maintenance Entity

So between 2 LSRs on a ring there are 2 SPMEs

we define 1 as the working channel and 1 as the protection channelNow we re-use the linear protection mechanisms, including the PSC protocol

draft-helvoort-mpls-tp-ring-protection-switchingBoth counter-rotating rings carry working and protection traffic

The bandwidth on each ring is divided

X BW is dedicated to working traffic and Y dedicated to protection trafficThe protection bandwidth of one ring is used to protect the other ring

Each node should have information about the sequence of ring nodes

MPLS-TP Ring Protection Switching is G.8032-like, but forwards non-NR msgs

Documents

Automatic Protection Switching