View
110
Download
3
Category
Tags:
Preview:
Citation preview
www.huawei.com
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved.
OptiX SDH System Troubleshooting Methods
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page2
Objectives
Upon completion of this course, you will be able to:
List the common analysis methods of fault locating.
Outline the Fault Handling Flow.
Analyze the typical faults: traffic interruption, error bit,
etc.
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page3
Contents
1. Troubleshooting Preparation
2. Troubleshooting Idea and Methods
3. Classified Troubleshooting Examples
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page4
Contents
1. Troubleshooting Preparation
2. Troubleshooting Idea and Methods
3. Classified Troubleshooting Examples
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page5
Requirements for Maintenance Staff-I
Be familiar with hardware system and SDH fundamental,
Be familiar with alarm generation mechanism and signal flow
in transmission system
Be familiar with the basic maintenance instruments and tools
Familiar with the network under maintenance
Network topology, network protection, traffic configuration
Professional SkillsProfessional Skills
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page6
Requirements for Maintenance Staff-II
Be familiar with common alarms
SDH line alarms (R_LOS, R_LOF, R_OOF, AU_AIS, AU_LOP, MS_AIS, MS_RDI, B1_EXC,
B2_EXC, HP_LOM, HP_SLM, HP_TIM, HP_UNEQ);
PDH tributary alarms (TU_AIS, TU_LOP, T_ALOS, T_DLOS, P_LOS, EXT_LOS,
UP_E1_AIS, LP_RDI, LP_SLM, LP_TIM, LP_UNEQ, B3_EXC);
Protection switching alarms (PS);
Clock alarms (LTI, SYNC_C_LOS , SYN_BAD);
Equipment alarms (POWER_FAIL, FAN_FAIL, BD_STATUS).
Collect and save on-site data
System alarms, performance events data, configurations, operation records of NMS
Professional SkillsProfessional Skills
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page7
Fault Handling FlowStart
Record fault trace
External cause?Other handling
flows
Analyze the fault tolocate it
Report the fault toHuawei
Faultremoved?
Yes
Yes
No
No
Continue 1Continue 2
Flow ChartFlow Chart
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page8
Continue 1
Make solution together
Service recovered?
Observe service running
Archive the fault handling report
Fault removed?
End
Try the solution
No
No
Yes
Yes
Continue 2
Fault Handling Flow - cont.Flow ChartFlow Chart
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page9
Contents
1. Troubleshooting Preparation
2. Troubleshooting Idea and Methods
3. Classified Troubleshooting Examples
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page10
Question
What is the key for troubleshooting ?
To locate a failure ACCURATELY in one station
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page11
Network first, then network elements Try your best to locate the troubles to one
node
How to Locate a Fault?
Broken fiber, switch failure Power failure, grounding
External first, then transmission
Basic Principles of Fault Localization Basic Principles of Fault Localization
LU first, then TU
Higher-severity alarms first, then Lower-severity alarms
First analyze critical/major alarms.
Then come to minor/warning alarms.
LU alarms can lead to TU alarms
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page12
Common Methods of Fault Localization
1 、 Alarm and performance analysis
4 、 Configuration Data Analysis
2 、 Loopback
6 、 Test with instruments
3 、 Replacement
7 、 Experience
5 、 Configuration Modification
Keys of Fault Localization Keys of Fault Localization
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page13
Use NMS How to obtain alarms and performance?
Observe indicators on boards and cabinets
•Not detailed•No history alarms
•Comprehensive•All alarms/performance events from the whole network
•Accurate• Current alarms, history alarms, occurrence time and performance event data can be queried.
Alarm and Performance Analysis-I
Evaluate Whole Network Evaluate Whole Network
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page14
Obtain alarm and
performance events
Select the key alarm or
performance events
Analyze reasonsLimit the troubles to a
certain range or a node
Alarm and Performance Analysis-IIMain Steps Main Steps
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page15
11 22 33 44
w w w wE E
TU-AISLP-RDI
R-LOS
MS-RDI
Alarm and Performance Analysis-III
Case Case
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page16
Line SDH equipment Line
Inloop Inloop
Inloop
outloop outloop
outloop
Tributary
Loopback
Loopback is the most common, most efficient method in troubleshooting.
What is Loopback?What is Loopback?
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page17
Board involve
d
Loopback options
Loopback tools
Loopback level
Application
Tributary board
Inloop/outloop
Loopback cable, NMS
Loopback at path level
Separate switching faults from transmission faults. Determine the tributary board failure roughly. Be unnecessary to modify service configuration.
Line board
Inloop/outloop
Patch fiber, NMS
Loopback by optical interface
Locate single station faults. Roughly determines the line board failure. Be no need to modify service configuration.
Software loopback is NOT an absolute method, why? May interrupt the traffic and ECC Will automatically be removed in 5 minutes (provisional) Notes
Loopback Where Do We Loop?Where Do We Loop?
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page18
Select one NE from several faulty NEs; Choose one affected traffic path from the selected faulty NE; Draw the traffic flow diagram (source, sink, pass through); Connect testing devices; Check alarms.
Loopback ProcedureProcedure
321w2:17 w2:17 w2:17e2:17
t2:1 t2:1
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page19
ApplicationObjective
Fiber CableFiber Cable
BoardBoard ModulesModules
External faultsExternal faults
Board faults Board faults
Replacement
MSP switch SNCP switch Active/standby XC switch TPS switch
When to Use?When to Use?
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page20
Configuration Data Analysis
Timeslot configuration
J1 or C2 bytes
LU and TU paths loopback
SNCP or MSP switching conditions
External commands (e.g. locked switch)
Consistent Configuration in both NMS and NEs
Query & Analyze the ConfigurationQuery & Analyze the Configuration
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page21
Port
Timeslot
Sub-rack Slots
No spare boardsNo spare boards
Restore the traffic Restore the traffic
temporarily temporarily
Objective Application Examples
Configuration Modification
Fast SolutionFast Solution
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page22
Instrument Test item
Bit error testing device Bit error/traffic
Optical power meter Optical power
SDH analyzer Bit error/traffic/overhead bytes
……
Multi-meter Voltage/current/resistance
This method is the most reliable one, but we must have the devices in hand.
Testing Instrument
Accurate JudgmentsAccurate Judgments
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page23
Experience
Reset board
Power off and on
Resend the configuration
Do not consider them as cure-all.
They are not helpful for us to find the real cause of
the failure.
Rule of ThumbRule of Thumb
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page24
SummaryMethods Application Features
Alarm and performance
analysisUniversal
1. Evaluate the whole network situation. 2. Locate the faulty point preliminarily based on the collected data. 3. Cause no negative effect on normal services 4. Depend on the NMS
LoopbackLocate the fault
to a single station or board
1. Independent of alarm and performance event analysis2. Rapid and effective
Replacement Locate the fault
to a board or isolate external
faults
1. Convenient 2. Require spare parts/equipment. 3. Applied with other methods
Configuration data
analysis
Locate the fault to a single
station or board
1. Can find the fault cause.2. Fault locating time is longer. 3. Depend on the NMS
Configuration
modificationLocate the fault
to a board 1. Have a high risk. 2. Depend on the NMS
Test with instruments
Isolate external faults and
resolve interconnectivit
y problem
1. A general method with high accuracy 2. Have certain requirements for the meters. 3. Applied with other methods
Experience Special cases1. Fast fault handling 2. High probability of mistake 3. Need experience accumulation.
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page25
Contents
1. Troubleshooting Preparation
2. Troubleshooting Idea and Methods
3. Classified Troubleshooting Examples
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page26
Troubleshooting SequenceExclude external troubles
Switching problem?
Fiber problems?
Trunk cable?
Power supply system?
Grounding problem?
Replacement
Instrument testing
Loopback
Alarm/performance analysis
Locate troubles to one NE
Loopback
Alarm/performance analysis
Locate the troubles to one board
Replacement LoopbackAlarm/performance analysisConfiguration analysisConfiguration modification Rule of Thumb
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page27
Traffic Interruption
Bit Errors
Classified Troubleshooting Examples
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page28
Traffic Interruption
Possible Causes
Power supply systemequipment power off,
under voltage, etc.Switch problemsFiber or trunk cables
Excessive attenuation,
fiber cutCable disconnection
LoopbackData modification
Faulty board Performance degrade
External causes Operation causes Equipment failure
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page29
Traffic Interruption
Equipment operatorCheck the indicator status on each boardAnalyze the alarmsHardware loopbackReplacement
NMS operatorCheck the login of each station Query and analyze alarmsLoopback section by sectionConfiguration modificationImplement switch
OperationsOperations
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page30
No-protection Line-CaseNo-protection Line-Case
Traffic Interruption
Network Configuration Node 1 is the centralized services node. Each station has E1 services with node 1.
Failure Description Interrupted E1 service between node1
and 4 Node 4:TU-AIS Node 1: LP-RDI
Other services normal
11 22 33 44w w w wE E
TU-AISLP-RDI
t2:1 t2:1
2:1 2:1 2:1
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page31
11 22 33 44w w w wE E
TU-AISTU-AISLP-RDILP-RDI
t2:1 t2:1
2:1 2:1 2:1
Query alarms
TU-AIS in node 4 only
Failure location between nodes 3 and 4
Alarm analysis
Traffic InterruptionWhere is the Problem?Where is the Problem?
Node 4 can not receivethe traffic from node 1
Other traffic normal between nodes 1, 2, 3
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page32
11 22 33 44w w w wE E
BER testert2:1 t2:1
2:1 2:1 2:1
Connecttester
Outloop on VC4 #2 at node 4
Normal Failure in node 4Yes
No
Failure between nodes 3, 4
Soft Inloop on VC4 #2at east LU of node 3
Normal Failure in node 3No
YesFailure between nodes 3, 4
Hard Optical portinloop at east LU of node 3
Normal Failure in node 3No
YesFailure in node 4
Loopback
Traffic InterruptionAnalysisAnalysis
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page33
11 22 33 44w w w wE E
TU-AISTU-AISLP-RDILP-RDI
t2:1 t2:1
2:1 2:1 2:1
Replacement Locate failure inone node
Maybe LU/TU/XC faulty
Traffic normal Replace faulty TUYes
NoActive/standby XC switch
TPS switch
Traffic normal Replace faulty XCYes
No
Replace faulty LU
Traffic InterruptionFinal SolutionFinal Solution
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page34
wSNCP Ring
e
e
ee
w
ww
3
2
4
1
SNCP Ring-CaseSNCP Ring-Case
Network Configuration Node 1 is the centralized
services node. Each station has E1 services
with node 1. Failure Description
All E1 services interrupted Nodes 2, 3, 4: TU-AIS Node 1: LP-RDI
LP-RDILP-RDI
TU-AISTU-AIS
Traffic Interruption
TU-AISTU-AIS
TU-AISTU-AIS
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page35
Thoughts and
methods
Alarm/performance analysis
Disconnect ring, convert to line
Replacement
Loopback
Analyze configuration correctness
Traffic Interruption
Where is the Problem?Where is the Problem?
wSNCP Ring
e
e
ee
w
ww
3
2
4
1
LP-RDILP-RDI
TU-AISTU-AIS
TU-AISTU-AIS
TU-AISTU-AIS
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page36
MSP Ring-CaseMSP Ring-Case
Network Configuration Node 1 is the centralized services
node. Each station has E1 services with
node 1. Shortest service route configuration Failure Description Fibers between NE2-NE3 are broken
R-LOS E1 services interrupted between nodes1
and 3 Nodes 1, 3: TU-AIS
Other services normal
wMSP RingSTM-4
ee
e
ee
ww
ww
3
2
4 5
1
TU-AISTU-AIS
TU-AISTU-AIS
R-LOSR-LOSR-LOSR-LOS
Traffic Interruption
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page37
MSP switch process
Traffic Interruption LU
SF or SD detection K1 & K2 bytes transmission
SCC Normally process APS
protocol Started APS controller Right switch state
XC Implement switching
Protection channels Available
Where is the Problem?Where is the Problem?
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page38
wMSP RingSTM-4
ee
e
ee
ww
ww
3
2
4 5
1
R-LOS
R-LOS
APS-INDI
APS-INDI APS-INDI
APS-INDI
APS-INDIS
S
P
P
P
Query and checkalarms
Check switch status
Normal
Maybe APS protocol stopedRestart it
Yes
No
Draw switched traffic flow diagram
Replace faulty LU/XC
Resend configuration
Switch status normal
No
Yes
Restart APS protocol node bynode to locate faulty LU/XC
Switch status normal
No
Yes
Loopback section after sectionto locate faulty LU/XC
Traffic Interruption
AnalysisAnalysis
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page39
Traffic Interruption
APS-INDI
wMSP RingSTM-4
ee
e
ee
ww
ww
32
4 5
1
R-LOSR-LOS
APS-INDI APS-INDI
APS-INDI
APS-INDIS
S
P
P
P
Normal route
Switched route
321e1:17 w1:17 w1:17e1:17
t2:1 t2:1
e1:17
w1:17 w3:17w3:17 w3:17 w3:17
e3:17 e3:17 e3:17 e3:17
t2:1 t2:1
121 5 4 3
Notes One complex lineCan use dichotomy
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page40
Bit Errors
Possible Causes
Performance degradation of
fibers, excessive attenuationDirty fiber joint or incorrect
connectorPoor equipment groundingStrong interference source
near the equipmentPoor ventilation, high
operating temperature
Transmitter or receiver
failure in LUPoor synchronizationPoor coordination between
XC and LU/TUFan failureFaulty boards or poor
performance
External causes Equipment failure
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page41
Bit ErrorsEquipment operator
Measure optical powerCheck cable or fiber connection and groundingClean fiber connectorCheck ventilation and temperatureHardware loopbackReplace boardExclude interference source
NMS operatorQuery and analyze alarms/ performance eventsLoopback section by sectionConfiguration modificationImplement switch
OperationsOperations
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page42
Bit Errors
11 22 33 44w w w wE E
LPBBE LPFEBBE
RSBBEMSBBEHPBBE
MSFEBBEHPFEBBE
Network Configuration Node 1 is the centralized services node. Each station has E1 services with node
1. Failure Description Too many bit errors
No-protection Line-CaseNo-protection Line-Case
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page43
Performance
event analysisCheck and exclude
external causes
Performance eventanalysis
LPBBE from 4 to 1RSBBE/MSBBE/HPBBE
from 4 to 3
LU firstthen TU
continue
Failure locates between 3 or 4
11 22 33 44w w w wE E
LPBBE LPFEBBE
RSBBEMSBBEHPBBE
MSFEBBEHPFEBBE
Bit ErrorsWhere is the Problem?Where is the Problem?
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page44
11 22 33 44w w w wE E
LPBBE LPFEBBE
RSBBEMSBBEHPBBE
MSFEBBEHPFEBBE
Performance
event analysis
Solve problems
continue
Check fans
and temperature
Normal
Yes
No
Normal
Yes
NoCheck and replacetransmitter/fiber/
connector/receiver
Measure or query
optical power
Bit ErrorsAnalysisAnalysis
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page45
11 22 33 44w w w wE E
LPBBE LPFEBBE
RSBBEMSBBEHPBBE
MSFEBBEHPFEBBE
Loopback &
ReplacementConnect
BER tester
Locate and replacethe faulty LU/XC
Loopback Active/standby XC switch Modify configuration
Bit Errors
Final SolutionFinal Solution
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page46
11 22 33 44w w w wE E
LPBBE LPFEBBE
RSBBEMSBBEHPBBE
MSFEBBEHPFEBBE
How to solve occasional bit errors?InterchangeYou can not loopback for a long timeFiber or LU
Question
Bit Errors
Think About It!Think About It!
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page47
Questions
What is the principle of troubleshooting?
External first, then internal
Station first, then boards
LU first, then TU
Higher-severity alarms first, then lower-severity alarms
What is the key of troubleshooting?
To locate a failure ACCURATELY in certain station
Copyright © 2006 Huawei Technologies Co., Ltd. All rights reserved. Page48
Summary
Which methods for troubleshooting?
1Alarm and performance analysis
2Loopback
3Replacement
4Configuration Data Analysis
5Configuration Modification
6Test with instruments
7Rule of Thumb
Recommended