172254

7/29/2019 172254

http://slidepdf.com/reader/full/172254 1/435

Lucent Technologies—ProprietaryThis document contains proprietary information of

Lucent Technologies and is not to be disclosed or usedexcept in accordance with applicable agreements.

Issue 16.0December 2000

401-661-045

Flexent™/AUTOPLEX®

Wireless NetworksExecutive Cellular Processor (ECP)

Release 16.0Common Network Interface (CNI)Ring Maintenance

7/29/2019 172254

Lucent Technologies—ProprietarySee notice on first page

This material is protected by the copyright laws of the United States and other countries. It may not bereproduced, distributed, or altered in any fashion by any entity including other Lucent Technologies

business units or divisions without the expressed written consent of the Customer Training andInformation Products Department.

NoticeEvery effort was made to ensure that the information in this document was complete and accurate atthe time of printing. However, information is subject to change.

Federal Communications Commission Statement (FCC) Notification and Repair

InformationNOTE: This equipment has been tested and found to comply with the limits for a Class A digital device,pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protectionagainst harmful interference when the equipment is operated in a commercial environment. Thisequipment generates, uses, and can radiate radio frequency energy, and if not installed and used inaccordance with the instruction manual, may cause harmful interference to radio communications.

Operation of this equipment in a residential area is likely to cause harmful interference in which casethe user will be required to correct the interference at his/her own expense.

Security StatementIn rare instances, unauthorized individuals make connections to the telecommunications networkthrough the use of remote access features.

In such event, applicable tariffs require that the customer pay all network charges for traffic. LucentTechnologies cannot be responsible for such charges and will not make any allowance or give anycredit for charges that result from unauthorized access.

Trademarks5ESS is a registered trademark of Lucent Technologies.AUTOPLEX is a registered trademark of Lucent Technologies.

AutoPACE is a registered trademark of Lucent Technologies.BILLDATS is a registered trademark of Lucent Technologies.DEFINITY is a registered trademark of Lucent Technologies.DOS Windows is a trademark of Sun Microsystems, Inc.Informix is a registered trademark of Informix Software, Inc.Intel is a registered trademark of the Intel Corporation.Motorola is a registered trademark of the Motorola Corporation.Paradyne is a trademark of Paradyne Corporation.Sun is a trademark of Sun Microsystems, Inc.Solaris is a trademark of Sun Microsystems, Inc.SPARC is a trademark of Sun Microsystems, Inc.UNIX is a registered trademark in the United States and other countries, licensedexclusively through X/Open Company Ltd.

Other trademarks may appear in this document as well. They are marked on first usage.

7/29/2019 172254

Issue 16.0 December 2000 iii

Contents

About This Document xv

s Purpose xv

s Reasons for Reissue xv

s Intended Audience xvi

s How to Use This Document xvi

s Conventions Used xvii

s Product Safety Labels xvii

s How to Order Documentation xviii

s How to Comment on This Document xix

1 Overview of the CNI Ring 1-1s DSN/CSN/ICN Hardware Descriptions 1-1

s CDN Hardware Description 1-2

CDN 1-3

CDN-I 1-3

CDN-II 1-4

CDN-IIx 1-4

CDN-III 1-5

s RPCN Hardware Description 1-5

s Direct Link Node Hardware Description 1-6

s SS7 Node Hardware Description 1-6

s EIN Ethernet Interface Node 1-6

s CNI Integrity Process Descriptions 1-7

s Error Analysis and Recovery Process 1-7

s Automatic Ring Recovery Process 1-7

s Node Audit Capability 1-8

s Ring Audit Capability 1-8

s RPCN Token Audit 1-8

s CNI Safety Net Capability 1-9

Inhibiting CNI Safety Net 1-9

Allowing CNI Safety Net Feature 1-10

s General Maintenance 1-10

Daily Activity Recommendation 1-10

Faulty Node Recovery Strategy 1-11

Routine Diagnostics 1-11

s Fault Descriptions 1-12

7/29/2019 172254

iv Issue 16.0 December 2000

401-661-045

ContentsRAC Parity/Format Error 1-12

Unexplained Loss of Token 1-17

SRC Match 1-21

RAC Output Parity Error 1-27General RAC Error Detected 1-30

Node Audit Failure 1-32

Interframe Buffer Parity Error 1-35

Read Format Error 1-38

Write Format Error” 1-39

s Emergency Maintenance 1-41

Ring Down Recovery 1-41

Rolling CNI Initializations 1-41

Global CDN Recovery 1-47

Single CDN Recovery 1-48

2 Description of the Ring Subsystem 2-1

s General 2-1

s Operation of the Ring 2-3

s Ring Nodes 2-5

Ring Peripheral Controller Nodes 2-6

Basic IMS User Nodes 2-6

Direct Link Nodes (DLN) 2-7

Call Processor/Data Base Nodes (CDN) 2-7Interframe Buffers 2-9

s Node Names and Addresses 2-10

s Ring Message Format 2-11

s Reconfigurations 2-13

Node Quarantine 2-13

Node Isolation 2-13

The Ring Config Module 2-16

s Initializations 2-17

Level-3 IMS Initializations (FPI and Boot) 2-18

Level-4 IMS Initializations (FPI and Boot) 2-19

s Audits 2-20Central Node Control Audit (AUD CNC) 2-20

Node State Audit (AUD NODEST) 2-20

Node Audit 2-21

7/29/2019 172254

Issue 16.0 December 2000 v

Contents

3 Ring Maintenance 3-1

s Overview 3-1

s Automatic Ring Maintenance 3-3

EAR or Ring Recovery 3-3

ARR or Deferrable Node Recovery 3-11

s Manual Ring Maintenance 3-25

Ring Maintenance Interfaces 3-25

Ring Diagnostics 3-36

Guide to Critical Ring Maintenance 3-39

s Examples of Ring Maintenance 3-66

Responses to Single, Ring-Related Faults 3-67

Responses to Multiple, Ring-Related Faults 3-85

4 Ring and Ring Node MaintenanceProcedures 4-1

s Introduction 4-1

s Ring Fault Conditions and Maintenance Approach 4-3

Ring Node Out-of-Service 4-3

Single-Ring Node Isolation 4-6

Multiple-Ring Node Isolation 4-11

Ring Down 4-19

s Ring Generic Access Package (RGRASP) 4-21

Feature Definition 4-21

Feature Description 4-21

Software Impact 4-22

Software Description 4-22

User Profile 4-22

Description of Feature Operation 4-22

Equipment Configuration Data (ECD) 4-25

Recent Change Procedures 4-25

Measurement 4-25

Network Management Impact 4-25

Maintenance/Troubleshooting Impact 4-25

Recording 4-26

Output Messages 4-29

Audits 4-30

7/29/2019 172254

vi Issue 16.0 December 2000

401-661-045

ContentsCritical Events 4-30

Support Tools 4-30

Related Documentation Cross-References 4-30

5 Ring Critical Events 5-1

s Introduction 5-1

s Critical Event Message Output 5-2

Logging Critical Events 5-2

Short Form CNCE Message 5-3

Long Form CNCE Message 5-3

Using the CHG:CEPARM Command 5-4

CNCE Descriptions 5-4

6 Diagnostic User’s Guide 6-1

s Introduction 6-1

s Overview 6-1

Diagnostics 6-1

Hardware and Interfaces 6-2

System Maintenance Interfaces 6-5

s Performing Diagnostics 6-6

Diagnostic Message Structure 6-6

System Diagnostics 6-8

Denied Diagnostic Requests 6-72

Inhibiting Diagnostic Requests 6-73

Diagnostic Aborts and Audits 6-73

s Operating System Diagnostics 6-75

7 Equipment Handling Procedures 7-1

s Introduction 7-1s Equipment Description and Handling Precautions 7-1

Power Packs and Fusing Descriptions 7-2

Fan and Filter Maintenance 7-13

s Ring Node Circuit Pack Handling Precautions 7-16

7/29/2019 172254

Issue 16.0 December 2000 vii

ContentsRing Node Equipment Visual Indicators 7-17

Removing Affected Equipment From Service 7-17

UN122C and UN123B Combination Circuit Pack

Installation 7-23Voice Frequency Link Hardware Equipment

Replacement Procedures 7-28

A Ring Error Analysis and Recovery A-1

s Introduction A-1

s Data Structures A-1

s General Information A-2

s Blockage Error A-3

s Hard Ring Parity Errors A-6

s Orphan Byte Error A-8

s Soft Ring Parity Error A-10

s Interframe Buffer Parity Error A-12

s RAC Output Parity Error A-14

s Write Format Error A-16

s Read Format Error A-18

s Received Too Short Error A-20

s Read Inhibit Error A-21

s Excessive Ring Command Interrupts A-23

s Token Removed from Ring A-25

s Source Match Error A-26

s Miscellaneous RAC Problem A-28

s Unexpected Loss of Token A-30

s Checksum Audit Failure A-30

s Node Processor Parity Failure A-31

B Ring Maintenance Reference Material B-1

s Ring Transport Errors B-1

Ring-Related Errors B-1Node-Related Errors B-3

Errors Without Consequences B-4

Unexplained Loss of Token B-5

s Some IMS Input Messages B-5

7/29/2019 172254

viii Issue 16.0 December 2000

401-661-045

Contentss Setting the ECD Flag for Manual Ring Mode B-6

s ECD Values for Interframe Buffers B-7

7/29/2019 172254

Issue 16.0 December 2000 ix

Figures

1 Overview of the CNI Ring 1-1

1-1. RAC Parity/Format Error 1-14

1-2. Unexplained Loss of Token 1-19

1-3. SRC Match 1-23

1-4. RAC Output Parity Error 1-29

1-5. General RAC Error 1-31

1-6. NAUD Failure 1-33

1-7. Interframe Buffer Error 1-37

1-8. Ring Down 1-43

2-1. Conceptual Illustration of an IMS Ring 2-2

2-2. A Ring Access Circuit on the IMS Ring 2-4

2-3. Interframe Buffers 2-9

2-4. IMS Message Format 2-11

2-5. Illustration of an Isolated Ring 2-14

2-6. Before (top) and After (bottom) Becoming a BISOor EISO Node 2-15

3-1. A 1105 Display Page 3-29

3-2. An 1106 Display Page 3-33

3-3. Isolated RACs of BISO and EISO Nodes 3-48

3-4. Manual Recovery - Method One 3-78

3-5. Manual Recovery - Method Two 3-79

4 Ring and Ring Node Maintenance Procedures 4-1

4-1. Ring OOS Normal 4-4

4-2. Single Node Isolation 4-8

7/29/2019 172254

x Issue 16.0 December 2000

401-661-045

Figures

4-3. New BISO Established 4-9

4-4. Diagnosing EISO Node 4-10

4-5. Two or More Faulty Nodes 4-14

4-6. New BISO Node 4-16

4-7. More Than One Faulty Node 4-18

5-1. CNCE Messages 5-3

6-1. General Format for Input/Output Messages 6-7

7/29/2019 172254

Issue 16.0 December 2000 xi

Tables

1 Overview of the CNI Ring 1-1

3-1. Node Problems Mapped to Maintenance States and EAR

Actions 3-17

3-2. ARR Responses to Maintenance-States 3-21

3-3. Output Messages that Report ARR Actions 3-23

3-4. Alarms Associated with IMS Output Messages 3-27

3-5. 1105-Page Symbols of Node Major States 3-31

3-6. Circuit Pack LED States 3-44

4 Ring and Ring Node Maintenance Procedures 4-1

5-1. CNCE Descriptions 5-5

6-1. Discontinued Availability CP Listings 6-3

6-2. DGN Message Input Variations 6-8

6-3. OP:RING Input Message Variations 6-9

6-4. IRN and IRN2 RPCN Node Diagnostic Phases 6-10

6-5. IRN LN (LIN - E/SS7) Node Diagnostic Phases 6-11

7/29/2019 172254

xii Issue 16.0 December 2000

401-661-045

Tables

6-6. IRN LN (LI4S/SS7) Node Diagnostic Phases 6-12

6-7. IRN DLNE Node Diagnostic Phases 6-14

6-8. IRN2 DLN30 Node Diagnostic Phases 6-15

6-9. IRN2 DLN60 Node Diagnostic Phases 6-17

6-10. IRN CDN-I Diagnostic Phases 6-18

6-11. IRN2 CDN-II/CDN-IIx Diagnostic Phases 6-20

6-12. IRN2 CDN-III Diagnostic Phases 6-22

6-13. IRN2 EIN Node Diagnostic Phases 6-23

6-14. IRN MDL (SCN, DSN, ICN) Diagnostic Phases 6-24

6-15. Discontinued Availability CP Listings 6-25

6-16. IRN and IRN2 RPC Trouble Location CP List 6-25

6-17. IRN LN (LIN-E/SS7) Trouble Location CP List 6-27

6-18. IRN LN (LI4S/SS7) Trouble Location CP List 6-28

6-19. IRN DLNE Trouble Location CP List 6-30

6-20. IRN2 DLN30 Trouble Location CP List 6-32

6-21. IRN2 DLN60 Trouble Location CP List 6-33

6-22. IRN CDN-I Manual Trouble Location CP List 6-34

6-23. IRN2 CDN-II/CDN-IIx Manual Trouble Location CP List 6-37

6-24. IRN2 CDN-III Trouble Location CP List 6-38

6-25. IRN2 EIN Node Trouble Location CP List 6-39

6-26. IRN MDL (CSN, DSN, ICN) Trouble Location CP List 6-40

6-27. Physical Node ID (Decimal Representation) 6-44

6-28. Physical Node ID (Hexadecimal Representation) 6-47

6-29. Physical Node Addresses (Decimal Representation) 6-50

6-30. Physical Node Addresses (Hexadecimal Representation) 6-53

7-1. Power Unit Index 7-3

7-2. Ring Node Power Supply Index 7-21

7-3. Hardware Version Values (with IFB) 7-25

7-4. Hardware Version Values (No IBF) 7-27

7/29/2019 172254

Issue 16.0 December 2000 xiii

Tables

A Ring Error Analysis and Recovery A-1

B Ring Maintenance Reference Material B-1

B-1. Some Versions of the RST Input Message B-5

7/29/2019 172254

xiv Issue 16.0 December 2000

401-661-045

Tables

7/29/2019 172254

Issue 16.0 December 2000 xv

About This Document

This chapter gives an overview of the contents, intended audience, and use of theFlexent™/AUTOPLEX ® Wireless Network Systems Common Network Interface

(CNI) Ring Maintenance manual.

Purpose

This guide gives you the instructions to maintain and troubleshoot the CNI Ring asused in a Flexent™/AUTOPLEX ® wireless network.

NOTE:This document is not intended for use with the 5ESS ® Digital Cellular Switch(DCS) component of a Flexent™/AUTOPLEX ® wireless network. The 5ESS ®

DCS documentation should be used for ring maintenance.

Reasons for Reissue

Issue 16 is reissued for the following reasons:

s To correct erroneous information

s To revise any technical errorss To make quality improvements

7/29/2019 172254

xvi Issue 16.0 December 2000

401-661-045

Intended Audience

The audience for this guide includes users who maintain the CNI r ing. This may

be the Lucent Technologies support personnel (CTSO) or the cellular provider’stechnicians.

How to Use This Document

This guide is organized as follows:

s Chapter 1—Overview of the CNI Ring

Describes the components of a CNI ring.

s Chapter 2—Description of the Ring Subsystem

Describes the ring subsystem.

s Chapter 3—Ring Maintenance

Explains the maintenance philosophy behind the CNI ring.

s Chapter 4—Ring and Ring Node Maintenance Procedures

Explains how to run the maintenance procedures for both the ring and thering nodes.

s Chapter 5—Ring Critical Events

Explains events that indicate abnormal behavior in the r ing.

s Chapter 6—Diagnostic User’s Guide

Explains how to perform diagnostics on ring nodes for a CNI ring-basedoffice.

s Chapter 7—Equipment Handling Procedures

Describes how to handle equipment when replacing hardware on the CNI

s Appendix A—Ring Error Analysis and Recovery

Describes the ring error analysis and recovery procedures and

mechanisms.

s Appendix B—Ring Maintenance Reference Material

Contains material in reference to maintaining the CNI ring.

s Glossary and Acronyms

s Index

7/29/2019 172254

Issue 16.0 December 2000 xvii

About This Document

Conventions Used

Specific typography is used in this guide to show actions or results.

Commands you enter on the keyboard are shown in

Data screens or responses from the system are shown in

constant width

Options for commands are shown in

italics

Keys that must be pressed on your keyboard are shown in

Product Safety Labels

Admonishments are strategically-placed reminders that assure safety ofpersonnel, minimize service interruptions or loss of data, and minimize damage toequipment, products, or software. The types of admonishments used in this guide

are listed below.

! DANGER:

Indicates the presence of a hazard that will cause death or severe personal injury if the hazard is not avoided.

! WARNING:Indicates the presence of a hazard that can cause death or severe personal injury if the hazard is not avoided.

! CAUTION:Indicates the presence of a hazard that will or can cause minor personal injury or property damage if the hazard is not avoided.

NOTE:Notifies you that something needs special attention or consideration.

7/29/2019 172254

xviii Issue 16.0 December 2000

401-661-045

How to Order Documentation

The FLEXENT™/AUTOPLEX ® Wireless Network Systems Customer

Documentation Catalog (401-610-000) is a guide to all FLEXENT™/AUTOPLEX® Wireless Network Systems customer documents and includes document

descriptions and ordering information.

To order FLEXENT™/AUTOPLEX ® Wireless Network Systems documents,

including documents on CD-ROM, and all other Lucent Technologies productdocumentation by phone, please use the following numbers:

Within the United States:

Voice: 1-888-LUCENT8 or 1-888-582-3688, prompt 1FAX: 1-800-566-9568

7/29/2019 172254

Issue 16.0 December 2000 xix

About This Document

Locations outside of the United States:

Australia and all European countries: (317) 322-6416Asia Pacific and China: (317) 322-6411

North America (excluding U.S.) and all other countries: (317) 322-6646

FAX for all international customers: (317) 322-6699

Product documentation can be ordered by mail using this address:

Lucent Technologies Customer Information CenterAttention: Order Entry Section

2855 N. Franklin RoadP.O. Box 19901Indianapolis, Indiana 46219

U.S.A.

To order documentation electronically, visit the Lucent Technologies CustomerInformation Center web site at:

http://www.cic.lucent.com

How to Comment on This Document

Lucent Technologies has endeavored to ensure that this document meets yourneeds. We are interested in your suggestions for improving the document. At the

back of this document is a postage-paid comment card. Please complete thecomment card and mail it to us at the preprinted address. If your copy of the

document has no comment card, please specify the title of the document and mailyour comments to

Lucent Technologies1000 E. Warrenville RoadP.O Box 3013

Naperville, Illinois 60566-7013U.S.A.

Attn: Customer Training and Information Products Manager—Room 2V-120

or e-mail your comments to

wirelessdocs@lucent.com

7/29/2019 172254

xx Issue 16.0 December 2000

401-661-045

7/29/2019 172254

Contents

Issue 16.0 December 2000 1-i

Overview of the CNI Ring

DSN/CSN/ICN Hardware Descriptions 1-1

CDN Hardware Description 1-2s CDN 1-3

s CDN-I 1-3

Double Plate CDN-I 1-4

Single Plate CDN-I 1-4

s CDN-II 1-4

s CDN-IIx 1-4

s CDN-III 1-5

RPCN Hardware Description 1-5

Direct Link Node Hardware Description 1-6

SS7 Node Hardware Description 1-6

CNI Integrity Process Descriptions 1-6

Error Analysis and Recovery Process 1-6

Automatic Ring Recovery Process 1-7

Node Audit Capability 1-7

Ring Audit Capability 1-8

RPCN Token Audit 1-8

CNI Safety Net Capability 1-8

s Inhibiting CNI Safety Net 1-9

s Allowing CNI Safety Net Feature 1-9

General Maintenance 1-10

s Daily Activity Recommendation 1-10

s Faulty Node Recovery Strategy 1-10

7/29/2019 172254

1-ii Issue 16.0 December 2000

401-661-045

Contentss Routine Diagnostics 1-11

Fault Descriptions 1-11

s RAC Parity/Format Error 1-12

Cause 1-12

Effect 1-12

Craft Recovery Action 1-12

s Unexplained Loss of Token 1-17

Effect 1-17

s SRC Match 1-21

Cause 1-21

Effect 1-21

s RAC Output Parity Error 1-27

Cause 1-27

Effect 1-27

s General RAC Error Detected 1-30

Cause 1-30

Effect 1-30

s Node Audit Failure 1-32

Cause 1-32

Effect 1-32

Craft Recovery Action 1-32s Interframe Buffer Parity Error 1-35

Cause 1-35

Effect 1-35

s Read Format Error 1-38

Cause 1-38

Effect 1-38

s Write Format Error 1-39

Cause 1-39

Effect 1-40Craft Recovery Action 1-40

Emergency Maintenance 1-41

7/29/2019 172254

Issue 16.0 December 2000 1-iii

Contents

s Ring Down Recovery 1-41

s Rolling CNI Initializations 1-41

s Global CDN Recovery 1-47

s Single CDN Recovery 1-48

7/29/2019 172254

1-iv Issue 16.0 December 2000

401-661-045

Contents

7/29/2019 172254

Issue 16.0 December 2000 1-1

The Common Network Interface (CNI) ring serves as the medium that connectsthe various cellular processors together. The following sections describe the basic

hardware configuration of each type of processor.

DSN/CSN/ICN Hardware Descriptions

A Digital Switch Node (DSN) is the CNI node that is used to connect the DigitalCellular Switch (DCS) to the rest of the system via data links to the DSN.

A Cell Site Node (CSN) is the CNI node that is used to connect the cell sites to the

rest of the system via data links to the CSN.

An Inter-Cellular Node (ICN) is the CNI node that is used to connect cellularsystems together via data links to the ICN.

The basic difference between each of these three node types is the software thatresides in each node. The hardware configuration for these nodes is identical.

In the Flexent/AUTOPLEX environment, each of these nodes is equipped with an

Integrated Ring Node (IRN) circuit pack. This IRN board comes in several differentmicrocode versions:

MC3F014A1 UN303

MC3F018A1 UN303B

MC3F026A1 UN303B

7/29/2019 172254

1-2 Issue 16.0 December 2000

401-661-045

MC3F026A1B UN303C

MC3F026A1C UN304

All of these versions can be used in a CSN, DSN or ICN. The IRN board can be

found in the Node Processor (NP) slot of each node.

A new circuit pack, the UN304/UN304B, has replaced the UN303 in manyapplications. When the UN304 is used, the node is called an IRN2. When the

UN304B is used, the node is called the IRN2B. Unless specifically stated, theterm IRN can apply to any of these circuit packs. When an IRN2B is used in a

CSN, it is known as a CSN Enhanced (CSNE). Unless specified otherwise, allreferences to CSN can include the CSNE.

The memory data link (MDL) circuit pack handles the transfer of information

between the data links and the node processor. A CSN can be equipped with twoMDL boards (MDL0 and MDL1), with each MDL capable of handling four datalinks. DSNs and ICNs should be equipped with only one MDL board.

There are two types of MDL circuit packs: a TN1317 version and a TN1640

version. Either type can be used in a CSN, DSN or ICN. The TN1640 versionprovides additional message throughput and should be used in CSNs containing

heavily loaded cell sites. See the System Capacity Monitoring and Engineering Guidelines , 401-610-009, for recommendations on how to assign CSN, DSN orICN data links.

The data links coming into each of these node types connect to an 11A, 12A, 13A,

or 13B adaptor board. The 11A adaptor board is used for RS232 connections, the12A adaptor board is used for RS449 connections, and the 13A and 13B adaptor

boards are used for V.35 connections. These adaptor boards are attached to the

backplane of the CSN/DSN/ICN on the vertical slot location occupied by the MDLboards. Each adaptor board holds up to four data links and there is one adaptor

board for each equipped MDL board.

CDN Hardware Description

A Call Processor/Data Base Node (CDN) is the CNI node which handles the call

processing functions of the FLEXENT™/FLEXENT/AUTOPLEX ® WirelessNetwork Systems. A CDN is basically a two-part unit consisting of a node andRing Application Processor (RAP) unit. The following versions of CDNs may be

found in existing systems:

s CDN-I [sometimes referred to as a Standard Multi-Application Real Time

(SMART) Node (SN)]

s CDN-II [sometimes referred to as a Turbo CDN (TCDN)]

7/29/2019 172254

s CDN-IIx

s CDN-III.

Unless specified otherwise, references to CDN in this document apply to any of

these versions.

The original CDN used a double-plate RAP with 2-Mbyte memory boards. A

double plate CDN occupies two horizontal mounting plate locations in a CNIframe.

The CCC and CCS pair can be either a UN237 and UN236 pair or a UN625 and

UN626 pair. They must be a matched pair. That is, a UN2XX series CCC/CCSboard is not compatible with a UN6XX series CCC/CCS board.

The MASC board can be either a UN95 board or a UN295 board. There can be upto four MASC boards in the FLEXENT/AUTOPLEX environment (MASC0 -

MASC3).

The MASA boards are always TN56 boards. Each TN56 board provides 2 Mbytesof memory, and there can be up to eight MASA boards per MASC memory group.

The NPI board is always a TN1349 board.

In the FLEXENT/AUTOPLEX environment, the node is always equipped with anIRN circuit pack. Only two of the three possible microcode versions are approved

for use in a CDN-I. The approved versions are:

MC3F018A1 UN303B

MC3F026A1 UN303B

The RAP portion of a CDN-I is a 3B15-based computer. The basic functionalcomponents that make up this unit are a central controller cache (CCC) board, a

central controller support (CCS) board, a main store controller (MASC) board, themain store array (MASA) memory boards, and a node processor interface (NPI)

board.

A CDN-I comes in two different versions commonly referred to as double plate orsingle plate CDN-I.

7/29/2019 172254

401-661-045

Double Plate CDN-I

A double plate CDN-I occupies two horizontal mounting plate locations in a CNI

frame.

The CCC and CCS pair can be either a UN237 and UN236 pair or a UN625 andUN626 pair. They must be a matched pair. That is, a UN2XX series CCC/CCS

board is not compatible with a UN6XX series CCC/CCS board.

The MASC board can be either a UN95 board or a UN295 board. There can be up

to four MASC boards in the FLEXENT/AUTOPLEX environment (MASC0 -MASC3).

The MASA boards are always TN56 boards. Each TN56 board provides 2 Mbytes

of memory, and there can be up to eight MASA boards per MASC memory group.

The NPI board is always a TN1349 board.

Single Plate CDN-I

A single plate CDN-I only occupies one horizontal mounting plate location in aCNI frame. This space reduction is due to the replacement of the 2-Mbyte TN56

MASA boards with TN1398 MASA boards. The TN1398 boards provide 16Mbytes of memory per board, and there can be up to eight MASA boards in the

The CCC and CCS pair must be a UN625 and UN626 pair.

The MASC board must be a UN507 board.

The same NPI board (UN1349) is used in the single plate CDN-I as in the double

plate CDN-I.

CDN-II

The CDN-II is a Turbo CDN node type. The CDN-II is composed of an IRN2, an\ 80386-based NP, and an AP30’ (prime) attached processor (AP). The AP30’ is a

68030-based processor board with 80 Mbytes of local memory (16 Mbytes on thebase board and an additional 64 Mbytes of zig-zag in-line package (ZIP) memoryon a mezzanine board).

7/29/2019 172254

CDN-IIx

The CDN-IIx is a modified Turbo CDN node type. The CDN-II is composed of an

IRN2, an 80386-based NP, and a modified AP30 attached processor. The

modified AP30’ is a 68030-based processor board with 16 Mbytes of localmemory on the base board and from 64 to 256 Mbytes on a mezzanine board.The additional memory comes from two to eight 32-Mbyte serial in-line memorymodules (SIMM).

Unless otherwise specified, any reference to CDN-II applies to both the CDN-II

and CDN-IIx.

CDN-III

The CDN-III is an improved CDN that may be used to upgrade CDN-II or CDN-IIxtype nodes. The CDN-III consists of an IRN2 node core and AP60 attached

processor (TN2523), providing greater processing and memory capacity thanprevious CDNs. The AP60 uses an MC68LC060 processor.

RPCN Hardware Description

The Ring Peripheral Controller Node (RPCN) is the unit which provides theinterface between the ring and the ECP. In the FLEXENT/AUTOPLEXenvironment, the ring is always equipped with two RPCNs. This IRN board is

located in the NP slot of the RPCN. The microcode versions approved for use inan RPCN are:

MC3F026A1 UN303B

MC3F026A1 UN304

! CAUTION:Never use MC3F014A1 or MC3F18A1 microcode versions in an RPCN.

Doing so could seriously hinder the ring’s ability to perform automatic fault recovery tasks.

The RPCN can also be equipped with an IRN2 or IRN2B board, the UN304 or

UN304B. This board is also located in the NP slot of the RPCN.

The RPCN has a duplex dual serial bus selector (DDSBS) which basicallyterminates the ECPs connection to the ring. This board is a TN69B and has aconnection from the RPCN to each Control Unit (CU) of the ECP (CU0, CU1).

7/29/2019 172254

401-661-045

The RPCN also contains a 3B Interface (3BI) board which serves as the interface

between the DDSBS an the NP of the RPCN. This board is a TN914.

Direct Link Node Hardware Description

A Direct Link Node (DLN) is basically an RPCN equipped with an attachedprocessor (AP), with respect to its hardware configuration, but has a different task

to perform in the FLEXENT/AUTOPLEX environment. The function performed bya DLN is to route the data link message traffic between cellular systems.

The DLN is used to route messages into and out of the FLEXENT/AUTOPLEX

systems, and for both X.25 and SS7 types of intersystem networking. FLEXENT/ AUTOPLEX currently supports three types of DLNs: the DLNE, the DLN30, andthe DLN60.

s The DLNE has IRNB, AP30, 3BI, and DDSBS boards.

s The DLN30 replaces the IRNB board with an IRN2B to provide increasedperformance and higher reliability.

s The DLN60 provides more processing power and memory than previoustypes of DLNs. The DLN60 uses an IRN2 node core with an AP60 attached

processor. The DLN60 does not have a 3B21D computer interface.

SS7 Node Hardware Description

The SS7 nodes are used to interface with the Signal Transfer Points (STP). In theFLEXENT/AUTOPLEX environment, SS7 nodes are always equipped with an IRN

circuit pack. All three IRN microcode versions are approved for use in an SS7node.

An SS7 node is also equipped with a Link Interface board. This board handles one

data link from the FLEXENT/AUTOPLEX system to the STP. The LI board can beeither a TN916 (MC3F003A1) or a TN1316.

EIN Ethernet Interface Node

The Ethernet Interface Node ( EIN) is an Interprocess MessageSwitch (IMS) user

node on the Common Network Interface (CNI) ring. The Ethernet Interface Node(EIN) provides access through the Ethernet from the ring to the Application

Processor (AP). CNI provides the capability to transport data from the EIN to theAP and vice versa over the Ethernet.The EIN hardware consists of the following:

7/29/2019 172254

s Integrated Ring Node (IRN) 2 (IRN2) circuit pack (CP), UN304B

(MC3F024AIB)

s EIN Link Interface (ELI) CP, TN4016

Paddleboard, 9822EBs Cable ED3F064-37 G80.

CNI Integrity Process Descriptions

This section describes the various software processes responsible for monitoringthe CNI ring to verify that it is functioning properly. .

Error Analysis and Recovery Process

CNI provides an Error Analysis and Recovery process (EAR) which is responsiblefor analyzing error reports from the ring and determining the probable cause of the

fault. Once the cause of the fault is determined, automatic corrective actionistaken. This corrective action could be as simple as restoring the ring to its originalconfiguration (no recovery action was necessary) or could result in nodes being

removed from service and left in the isolated state.

Automatic Ring Recovery Process

CNI provides an Automatic Ring Recovery (ARR) process which is responsible

for automatically restoring nodes which have been removed from service by theEAR process. CNI also provides an Application Specified Unconditional Restore(ASUR) process that allows the application to specify the manner in which ARR is

to restore an out-of-service node (conditional or unconditional restore).

In the FLEXENT/AUTOPLEX environment, a node that is removed from servicewill be unconditionally restored (no diagnostics performed) if this is the first time

the node has been removed in the last hour. The only exception to this rule is inthe event that EAR suspects the ring interface circuitry of the IRN board may befaulty. In this case, the node will be left in the isolated state until diagnostics are

performed and the node passes phase 1 and phase 2. This is necessary toensure the stability of the ring. Restoring a node unconditionally that is in the ring

interface faulty state could result in faults being generated which seriouslythreaten the performance of the CNI ring.

7/29/2019 172254

401-661-045

If this is the second time a node has been removed from service by EAR in the

past hour, ARR will diagnose the node and only restore the unit if it passes alldiagnostic phases.

If this is the third time a node has been removed from service by EAR in the pasthour, the node will be left in the out-of-service state. This link node will remain in

this state until craft takes the appropriate recovery action to restore the node toservice.

Node Audit Capability

The Node Audit feature is a CNI process responsible for ensuring that nodeswhich are in the active state are functioning properly and are capable ofcommunicating with the ring. The Node Audit does this by periodically sending a

message from the ECP destined for a node, followed by a chaser message. This

chaser message is not destined for any particular node. Its purpose is to circulatearound the ring undisturbed and return to the node audit process.

When the link node receives this audit request, it should respond by sending areply message back to the ECP. If the ECP receives the reply message, all is well.If the reply is lost, but the chaser message arrives at the ECP as expected, then

another audit message is sent to the node. If this reply is also lost, the node isassumed to be in an insane state and will be removed from service. If the first

reply message was lost and the chaser message did not arrive at the ECP asexpected, this implies a possible RPCN or ring problem. This is discussed in the

“Ring Audit Capability” section of this chapter.

Ring Audit Capability

The Ring Audit feature is a CNI process based on the Node Audit process. TheRing Audit verifies the message communication path from the ECP to the ring.

This task is performed by monitoring the results of the chaser message sent outby the Node Audit Capability.

If a chaser message is lost, another chaser will be sent through the other RPCN. If

this test is successful, then the RPCN which was first tested is assumed to befaulty and is removed from service.

If the second chaser message is also lost, or the other RPCN is already out of

service, a Level 3 EAR is invoked in an attempt to isolate and correct the possiblering/RPCN trouble.

7/29/2019 172254

RPCN Token Audit

The RPCN Token Audit Capability is a CNI process that ensures a token message

is circulating around the ring at all times. Since a node must possess the tokenmessage in order to write to the ring, it is critical that this message be present.

The audit is performed by periodically forcing the RPCN to exercise its ring writecircuitry, thus forcing it to read the token message. If a special timer fires within the

RPCN before the token is detected, the token is assumed to be lost and theRPCN sends a lost token report to the EAR process in the ECP.

The EAR process then reports an unexplained loss of token. A token tracking

audit is then run in an attempt to discover where the token was lost. The EARprocess then initiates a Level 0 restart in an attempt to return the ring to service. Ifthis restart is unsuccessful, EAR escalates to a Level 3 ring recovery.

CNI Safety Net Capability

The CNI Safety Net Capability is an FLEXENT/AUTOPLEX process whose solepurpose is to verify that the CNI ring is up and functional. When Safety Net

detects a problem with the ring, it will respond by requesting a CNI Level 3initialization or CNI Level 4 initialization depending on the severity of the problem.

Safety Net checks the integrity of the ring every 60 seconds. It does so by sendinga message from the ECP to a different node every 60 seconds. If the message is

returned to the ECP by the node, then all is well. If the message is not returned tothe ECP, Safety Net increments a counter and begins repeating this process,

cutting the interval from 60 seconds to 10. If the failed message counter reachesits maximum error threshold (eight at present time), a Level 3 CNI initialization willbe requested to restore the communication path to the CNI ring.

Another critical item monitored by the CNI Safety Net is to ensure that the system

has a minimum of one active CDN. If Safety Net detects that all CDNs are out ofservice, an SI24 Defensive Check Failure Assert message is printed on the ROP.

This will repeat every minute for four additional minutes (five total messages). Onthe sixth SI24, a CNI Level 4 Initialization will be initiated. The Safety Net will then

turn itself off for 90 minutes. It should be noted that if Safety Net detects all CDNsare out of service, it will first check to see if a CDN is in the process of beingrestored. If so, it will allow that CDN to come up rather than begin a CNI

initialization.

7/29/2019 172254

401-661-045

Inhibiting CNI Safety Net

At times, it may be necessary to inhibit (turn off) the CNI Safety Net feature. This

need may arise due to a fault existing in the ring that prevents the system from

being recovered via a CNI Level 4 initialization. Safety Net would continue torequest CNI Level 4 initializations, getting in the way of craft attempts to clear thefault from the ring.

The Safety Net feature can be easily inhibited from the Emergency ActionInterface (EAI) page on the MCRT. Once on this page,

s Enter a 42 poke command.

s Enter i (inhibit) for the parameter value.

s Next, a 50 initialization is required to set the flag in ECP memory.

Once Safety Net has been inhibited, it will remain in this state until a 54

initialization occurs or the inhibit flag is cleared from the EAI page (see followingsection). Whenever Safety Net is inhibited, it is critical that craft personnelremember to turn the feature back on once the source of the fault has been

cleared. Failure to do so could result in an extended outage which Safety Net mayhave avoided.

Allowing CNI Safety Net Feature

The CNI Safety Net feature is always turned on at boot (54) time and remains thisway unless inhibited from the EAI page. Once the feature is inhibited, it will remainin this state until craft resets the inhibit flag.

To turn the Safety Net feature back on, once again go to the EAI page and:

s Enter a 42 poke command.

s Enter a to allow the feature to function.

s Enter a 50 initialization is required to clear the inhibit flag in ECP memory.

General Maintenance

This section provides craft with information which could assist in identifyingpotentially faulty hardware before the problem is serious enough to cause a ring

outage.

Also included in this section are descriptions of common CNI ring faults and the

steps necessary to correct the situation.

7/29/2019 172254

Daily Activity Recommendation

The most important tool available to craft to prevent a serious ring event is the

daily history of ring maintenance activity. This information is critical given the

FLEXENT/AUTOPLEX strategy for recovering faulty nodes. Quite often, a faultynode will be removed from service and restored so quickly that craft is unawarethe fault ever occurred. This recovery strategy will be briefly discussed in the nextsection.

The history of recent ring maintenance activity is kept in the RPTERR1 log file

located in the /etc/log directory. This file should be inspected daily for theoccurrence of ring faults. The UNIX command ls -l RPTERR1 will provide the date

and time of the last entry to this log file. If this time stamp indicates recent ringactivity, the log file should be examined to determine the nature of the activity.

When this log file reaches its maximum allowable size, it is moved to RPTERR0and a fresh RPTERR1 log file is started.

This activity could be the result of routine RPCN midnight diagnostics or the resultof a ring fault. If the activity is determined to be a r ing fault, locate the ring fault in

the “`Fault Descriptions” section of this chapter for assistance in correcting thesituation.

Faulty Node Recovery Strategy

Usually when a node is automatically removed from service, it is due to a transient

fault. This fault could be either a hardware glitch, or a software fault which causesthe node to basically shut down operation. Many of these transient faults can be

corrected by reinitializing the node. The only way for the node to request this is to

refuse to accept messages from the ring. Once this happens, messages destinedfor the node will be returned to the sender. When the sending node receives this

message, it reports this to the ECP and the ECP removes the node from service.

Once the node is removed, it is up to ARR to restore the node to service. Asmentioned in the “Automatic Ring Recovery Process” section, the first time a node

is removed from service within a 60-minute interval, it will be restoredunconditionally (no diagnostics performed). This is due to the transient nature ofmost faults. If it was a one-time event, the node will probably be ATP if diagnostics

are performed. Given this, it is more important to get the node back into service asquickly as possible rather than take the additional time to diagnose the node on

the first fault. If a second fault occurs within an hour, the node will be diagnosed.However, at times a node may contain questionable hardware which may only

result in the node being faulted a couple of times a day or even less frequently. It isthis borderline hardware that makes it imperative for craft to understand the

importance of monitoring the daily activity in the RPTERR1 log file mentionedearlier. If a persistent fault is detected, craft intervention may be necessary toisolate the source of the problem.

7/29/2019 172254

401-661-045

Routine Diagnostics

Given the ring’s ability to detect and report suspected faulty hardware, it is not

recommended that diagnostics be performed on every node around the ring.

However, it is recommended that RPCNs, CDNs and DLNs be taken down at leastonce a month (weekly if possible) and diagnosed. These nodes have beenselected for preventive maintenance due to both their importance to systemperformance, and the extended amount of time it takes to diagnose and restore

these nodes should a fault occur.

While CSNs, DSNs, ICNs and SS7 are certainly important to the system, theirloss does not seriously threaten system performance. Also, in the event one of

these nodes is lost, the recovery time is minimal if this is the first fault.

NOTE:On the subject of performing routine diagnostics, it should be noted that there is a

critical difference between a single plate and double plate (TN1398 or TN56memory boards) CDN-I unit. Requesting diagnostics on a double plate CDN-I willresult in the entire CDN-I being diagnosed. The same can not be said of a single

plate CDN-I. For a single plate CDN-I, craft MUST specify that demand phases 54through 61 be executed. These phases are responsible for diagnosing the 16-Mbyte memory boards (one phase for each MASA board equipped). These

memory diagnostics are done on a demand basis only due to the time required tocomplete memory diagnostics on the TN1398 circuit packs.

Fault Descriptions

This section describes various CNI ring faults. The output message associatedwith the fault is presented, followed by the cause of the fault, the effect the faulthas on the ring, and the recovery action to clear the fault. For a more detailed

description of possible faults, see Appendix A, Ring Error Analysis and Recovery.

In the following descriptions, the terms upstream node and downstream node willbe used. These terms describe relative position of nodes and are based on the

direction of data flow on the rings. Basically, any particular node will RECEIVEdata from its upstream neighbor and will SEND data to its downstream neighbor.Since the data flows in opposite directions on the two rings, a node’s upstream

neighbor on ring 1 is the downstream neighbor on ring 0 and its upstreamneighbor on ring 0 is the downstream neighbor on ring 1. For example, with

respect to ring 0, LN00-7’s upstream neighbor is LN00-6 and its downstream

neighbor is LN00-8.

7/29/2019 172254

RAC Parity/Format Error

The output message present on the ROP and the RPTERR1 log file for this fault is

as follows:

REPT RING TRANSPORT ERR

RAC PARITY/FORMAT ERROR DETECTED, LN00 7 RAC 0.

X’00000000 X’FFFFFFFF X’03000008 X’00000380

X’00004000 X’00000300 (3121083924)

The reporting node, LN00-7 in this example, is reporting that its upstreamneighbor on RAC 0 (LN00 6) tried to pass a bad message to it. This message is

used to report both bad parity and an orphan byte failure. The effect and recoveryaction is the same regardless of which error type it is, so it is not necessary to

determine which fault type it is from a craft perspective.

Effect

The node which had the bad message presented to it will refuse to accept themessage. This will force the node offering the bad message to report ring

blockage to EAR. EAR will attempt to reestablish normal ring communication byperforming a Level 0 ring recovery. If this fails to correct the error condition, EAR

will escalate to a Level 1 ring recovery which could result in nodes being removedand isolated.

Craft Recovery Action

The RPTERR1 log file should be examined to determine if this is the first instanceof the fault. If this is a recurring fault, the node reporting the fault and the upstreamneighbor node should be taken down and diagnosed.

If diagnostics do not find a problem with either node, attempt to clear the fault by

cleaning and reseating the circuit packs in the suspect nodes using therecommended contact cleaner.

NOTE:Miller Stevenson Company markets an aerosol form of the solvent-lubricant which

is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes

and circuit packs. This product is marketed as MS-181.

If the fault persists, replace packs in the following order:

7/29/2019 172254

401-661-045

1. If there is a pair of interframe buffer boards (IFB) between the node

reporting the fault and the upstream neighbor, replace the IFB associatedwith the node reporting the problem.

2. If the fault persists, and IFBs are involved, replace the IFB in the node

upstream of the node reporting the fault.

3. If the fault persists, replace the IRN board in the node upstream of the nodereporting the problem.

4. If the fault persists, replace the IRN board in the node reporting theproblem.

5. If the fault persists, and there are IFBs involved, there could be a cableproblem. Call for assistance to isolate the source of the fault.

See Figure 1-1 on page 1-15.

7/29/2019 172254

Figure 1-1. RAC Parity/Format Error

1st occurrence?

Chart 1

Replace packs &

diagnosed asper TLP list

node and both neighbors

RAC parity format error

Run diagnostics on the faulted

Examine UNIX

file /etc/log/RPTERR1

Transient fault. Monitor

/etc/log/RPTERR1 log file forseveral weeks. If fault

returns, go to 1st

occurrence no leg

Go toChart 1A

7/29/2019 172254

401-661-045

Figure 1-1. RAC Parity/Format Error (contd)

Replace IRN board

in nodereporting problem

Replace IRN boardin upstream neighbor

IFB boards between

reporting node andupstream neighbor?

Chart 1A

Note 1: If RAC 0 is implicated in the output

message, the upstream neighbor is the lower nodenumber (LN32-4 is upstream of LN32-5). If RAC 1is implicated, the upstream neighbor is the higher

node number (LN32-6 is upstream of LN32-5).

Go toChart 1B

forassistance

Cleared?

7/29/2019 172254

Figure 1-1. RAC Parity/Format Error (contd)

implicated or R1 if RAC 1 implicated.then replace the R0 board if RAC 0Note 3: If RPCN and it has no IRN,

the fault.Replace IRN in node reporting

Possible cable problem. Callfor assistance in swapping

cables between rings

Bad cable. Configure cablesso that the faulty cable is

in RAC 1. Obtain new cableASAP!

Call for assistance

Cleared?

Replace IFB in node

upstream of reporting node

Replace IFB in nodereporting the fault

Note 2: RPCN32 is upstream of the last node ingroup 00 (or group 31 if equipped) on RAC 1 and

downstream on RAC 0. RPCN00 is upstream of thelast node in group 32 (or group 63 if equipped) on

RAC 1 and downstream on RAC 0.

Chart 1B

7/29/2019 172254

401-661-045

Unexplained Loss of Token

as follows:

UNEXPLAINED LOSS OF TOKEN REPORTED ON RING 0.Cause

This message occurs when a RPCN detects that the token is no longer circulatingaround the ring.

Effect

EAR will initiate a token tracking procedure in an attempt to determine where the

token was last seen. If the procedure is successful, the following message willresult:

REPT TOKEN TRACKTOKEN WAS LOST BETWEEN LN63 1 AND LN63 6 ON RING: 0

X’00000000 X’3F63F104 X’00300001 X’40040001

There are several other versions of the message that could result depending onoutcome of the token tracking procedure. Reference the FLEXENT/AUTOPLEX

Output Message Manual for the other versions of this message which could result.

EAR will attempt to reestablish normal ring communication by performing a Level

0 ring recovery. If this fails to correct the error condition, EAR will escalate the ringrecovery to a Level 1 which could result in nodes being removed and isolated.

The RPTERR1 log file should be examined to determine if this is the first instanceof the fault. If this is a recurring fault, and the token tracking report was successful,remove and diagnose the two nodes mentioned in the report. If the token tracking

report was not successful, call for assistance.

If diagnostics do not find a problem with either node, attempt to clear the fault bycleaning and reseating the circuit packs in the suspect nodes using the

recommended contact cleaner.

NOTE:Miller Stevenson Company markets an aerosol form of the solvent-lubricant whichis recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanes

and circuit packs. This product is marketed as MS-181.

If the fault persists, start replacing circuit packs in the following order:

7/29/2019 172254

1. If there is a pair of interframe buffer boards (IFB) between the two nodes

identified in the token tracking report, replace the IFB in one of the nodes.

2. If the fault persists, and IFBs are involved, replace the IFB in the other node

identified in the token tracking report.

3. If the fault persists, replace the IRN board in one of the two nodes identified

in the token tracking report.

4. If the fault persists, replace the IRN board in the other node identified in the

token tracking report.

5. If the fault persists, call for assistance.

7/29/2019 172254

401-661-045

Figure 1-2. Unexplained Loss of Token

Report successful?

Examine ROP & UNIX file

occurrences

Call forassistance

Replace IRN board

in one of the nodes.If RPCN and it is notan IRN, then replace

the R0 board if ring 0 isimplicated or R1 ifring 1 is implicated

Cleared?

Go toChart 2A

Unexplained loss of token

Chart 2

Examine ROP & UNIX file /etc/log/RPTERR1 for token

tracking report

Transient fault. Monitor /etc/log/RPTERR1 log file

for several weeks tosee if fault returns

1st occurrence?

/etc/log/RPTERR1 for other

Replace packs& diagnose

as perTLP list

Diagnoseboth

7/29/2019 172254

Figure 1-2. Unexplained Loss of Token (contd)

Call for

assistance

IFB boardsbetween

suspect nodes?

Replace othernodes IFB

Possible cable problem.Call for assistance in

swapping cablesbetween rings

Faultmove?

Cleared?

Replace IRN board in othernode.

Chart 2A

Replace one node's IFB

Bad cable. Configurecables so that thefaulty cable is in

RAC 1. Obtain newcable ASAP!

7/29/2019 172254

401-661-045

SRC Match

as follows:

RMV LN33 7 RQSTD; SRC MATCH RPTD BY LN31 6

X’6FB015F4 X’352070B8 (2834204595)

An SRC match failure results when a node does not take a message from the CNIring that was addressed to it. This message will eventually return to the sourcenode, who will remove the message from the ring and will report an SRC match to

the ECP against the destination node.

Effect

As stated above, the message will eventually return to the source node. The

source node will remove the message from the ring and report the SRC match tothe EAR. This will always result in the destination node being removed fromservice. ARR will then restore the node to service either conditionally or

unconditionally, depending on the frequency of the faults against this node.

An occasional SRC match, in itself, is normally not cause for concern. CNI

integrity software running in the nodes at times detects situations that require the

node to be reinitialize to clear the fault. The only means available for a node torequest itself to be reinitialized is for it to force itself to quit taking its messagesfrom the ring, commonly referred to as panic the node. By refusing to read itsmessages from the ring, the node is assured of being removed from service via

the SRC match mechanism and restored via ARR.

When SRC matches are detected, the RPTERR1 log file should be examined todetermine the frequency of the fault. If the fault is persistent, then there could be a

hardware problem and the node should be diagnosed. If the node is a single plateCDN-I, demand phases 54 through 61 must be performed to completely test themain store memory.

cleaning and reseating the circuit packs in the suspect node using therecommended contact cleaner.

7/29/2019 172254

NOTE:Miller Stevenson Company markets an aerosol form of the solvent-lubricant whichis recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanesand circuit packs. This product is marketed as MS-181.

If the fault persists, replace circuit packs in the following order:

1. If the faults are occurring immediately after the node is restored to service,check the ECD (rcvecd) and the application database (apxrcv, iun form) to

verify they are in sync with respect to the node type.

2. If the fault persists, replace the IRN circuit pack.

3. If the fault persists, replace the MDL boards one at a time, or replace the

LLI board if the node is an SS7 node.

4. If the node is a CDN, check the RPTERR1 log file for the existence of a

CDN panic message in the form of:

REPT COM100 TBLLN00 07 NADR: X’C07

Panic : Hardware

Local Bus Parity Error:

CCS0(lba=0x0):

CSRs=0x61100028,0x0

MASC0(lba=0x100000):

CSRs=0x422054,0x4c00b500

CCS 61100028

MASC 00422054

NPI 00000000

5. If a message similar to this appears, it is not necessarily a local bus parity

error. Go directly to page 3 of Figure 1-3 for CDN assistance.

6. If the fault persists, or the panic message is not present for a CDN, call for

assistance in clearing the fault.

7/29/2019 172254

401-661-045

Figure 1-3. SRC Match

Transient fault. Monitor /etc/log/RPTERR1 log file for

several weeks to see ifthe fault returns

1st occurrence?

Examine UNIXfile

/etc/log/RPTERR1

Replace packs &diagnose as per

TLP list

Run diagnostics onthe faulted node

SRC match

Chart 3

Check APXRCV DB toverify it agreeswith ECD entry

Agree?

Fault occursimmediately

after restoral?

Check ECD to verifynode type

Determine fault frequencyby examining ROP or

RPTERR1 log file

Correct any discrepanciesand restore node

Cleared?

Chart 3AGo to

7/29/2019 172254

Figure 1-3. SRC Match (contd)

Replace IRN boardin faulty node

Chart 3A

Replace MDL 1

board if equipped

Call forassistance

Go toChart 3B

Replace adaptorboards on node

backplane

Cleared?

Is node a CDN?

Cleared?

Replace MDL 0

7/29/2019 172254

401-661-045

Replace theNPI board Replace the

NPI board

Unidentified

SYSerror

Go toChart 3C

Replace theCCS board

Cleared?

Call forassistance

Chart 3B

Check RPTERR1 errorlog for a

PANIC: HARDWAREmessage for this CDN

Present?

Replace theCCC board

NPI USEC

timerchange

Call forassistance

Double

biterror

Local bus

parity error

Cleared?

Call for

assistance

7/29/2019 172254

Go to next two pagesfor instructionson convertingaddress in thepanic message

to a MASA

board location

Chart 3C

TN56 memoryboards?

Insert a new TN1398 boardin the first MASA slot. Iffault still exists, return

original board and slide

new board to the nextslot. Continue until newboard has been tried in

each MASA slot

Starting at demandPhase 54, run one

phase for eachMASA board

equipped (54-61)

Replace boards &diagnose as per

TLP list

Cleared?

Call forassistance

Insert a new MASC board. If faultstill exists, return the original

board and slide the new boardto the next MASC until new board

has been tried in each MASC

Insert two new TN56 boardsin the first two MASA slots. If

fault still exists, returnoriginal boards and slide

new boards to the nextslot. Continue until the

two new boards have beentried in each MASA position

Valid boardnumber

Replace suspectedMASA board

7/29/2019 172254

401-661-045

RAC Output Parity Error

as follows:

RAC OUTPUT PARITY ERROR DETECTED, LN31 2 RAC 1.

X’00000000 X’00000000 X’03020002 X’00002280

X’00014000 X’00000300 (2923885816)

The node reporting the fault detected that it had attempted to write a messagewith bad parity to the ring.

Effect

The node which had the bad message presented to it will refuse to accept themessage. This will force the node offering the bad message to report ring

blockage to EAR. EAR will attempt to reestablish normal ring communication byperforming a Level 0 ring recovery. As part of this recovery process, each nodewill reread the message that it had presented to the downstream neighbor. When

doing this, the node reporting the fault detected that it had presented a messagecontaining bad parity to its downstream neighbor.

If this fails to correct the error condition, EAR will escalate the ring recovery to a

Level 1 which could result in nodes being removed and isolated.

The RPTERR1 log file should be examined to determine if this is the first instanceof the fault. If this is a recurring fault, the node reporting the fault should be

removed and diagnosed.

If diagnostics do not find a problem with either node, attempt to clear the fault bycleaning and reseating the circuit packs in the suspect nodes using the

is recommended (1.0 percent OS-124 in Freon TA) for use on CNI ring backplanesand circuit packs. This product is marketed as MS-181.

1. Replace the IRN board in the node reporting the fault.

7/29/2019 172254

401-661-045

Figure 1-4. RAC Output Parity Error

ATP?Replace packs &diagnose as per

TLP list

Run diagnostics on thenode reporting the

RAC output parity error

/etc/log/RPTERR1 log filefor several weeks tosee if fault returns

Chart 4

1st occurrence?

Examine ROP & UNIX file /etc/log/RPTERR1 for other

occurrences

Call for

assistanceCleared?

Replace the IRN boardin the node reporting

the problem.

Note: If RPCN and it has noIRN, then replace the R0

board if ring 0 is implicated

or R1 board if ring 1 isimplicated

7/29/2019 172254

General RAC Error Detected

as follows:

GENERAL RAC ERROR DETECTED, LN63 1 RAC 0.

X’00000000 X’00000000 X’03018010 X’00000380

X’00000000 X’00000300 (2834204091)

This is a catch all error type used to report unexpected node hardware or softwarehardware conditions.

Effect

The node reporting the problem will not accept any data from the upstreamneighbor node, thus forcing that node to report blockage.

The RPTERR1 log file should be examined to determine if this is the first instance

of the fault. If this is a recurring fault, the node reporting the fault and its upstreamneighbor should be removed from service and diagnosed.

cleaning and reseating the circuit packs in the suspect nodes using the

2. If the fault persists, replace the IRN board in the upstream neighbor.

7/29/2019 172254

401-661-045

Figure 1-5. General RAC Error

General RAC error

TLP list

Run diagnostics on the

node reporting thefault

Chart 5

/etc/log/RPTERR1 log filefor several weeks tosee if fault returns

1stoccurrence?

/etc/log/RPTERR1 for otheroccurrences

Call forassistance

Replace the IRN in theupstream neighbor.

Note: If RAC 0 is implicated,

the upstream neighbor is the

lower node # (LN32-4 isupstream of LN32-5). If RAC 1

is implicated, the upstreamneighbor is the higher node #

(LN32-6 is upstream of LN32-5)

Cleared?

Replace the IRN board

in the node reportingthe problem.

Note: If RPCN and it has no

IRN, then replace the R0board if ring 0 is implicated

or R1 board if ring 1 is

implicated

7/29/2019 172254

Node Audit Failure

as follows:

RMV LN32 4 RQSTD; NAUD FAILURE RPTD

X’6FB015F4 X’352070B8 (2834204595)

The Node Audit process has detected a node that is not responding to the nodeaudit requests, but the rest of the ring seems to be functioning normally.

Effect

The node at fault will be removed from service.

The RPTERR1 log file should be examined to determine if this is the first instanceof the fault. If this is a recurring fault, the node faulted should be removed anddiagnosed.

cleaning and reseating the circuit packs in the suspect nodes using therecommended contact cleaner.

NAUD failures can be caused by noisy data links on the node being removed from

service. Before proceeding to replace circuit packs, first use the CMpfcnts tool todetermine if there are questionable data links on the node being removed fromservice.

2. Replace one of the two MDL boards.

3. Replace the other MDL board, if equipped.

7/29/2019 172254

401-661-045

Figure 1-6. NAUD Failure

NAUD failure

Cleared?

Is node a CDN?

Correct link problemand monitor node

for several weeks

Replace IRN board

Go toChart 6A

Transient fault,

monitor RPTERR1for several

weeks to see if

fault returns

1st occurrence?

Examine UNIX file

/etc/log/RPTERR1

Familiar withCMpfcnts tool?

This fault could bethe result of noisydata links. Run

CMpfcnts to identifypossible problem

ATP?Replace & diagnose packsas per TLP list

Diagnose faulty node

Call forassistance

Noisy links?

Chart 6

7/29/2019 172254

Figure 1-6. NAUD Failure (contd)

Chart 6A

Replace IRN board

in faulty node

Replace MDL 0

Replace MDL 1

Cleared?

Replaced adaptor

boards on nodebackplane

Call forassistance

7/29/2019 172254

401-661-045

Figure 1-7. Interframe Buffer Error

Replace the IRN board in

node reporting the error.Note 2: If RPCN & it has no IRN,

replace R0 board if ring 0 is

implicated or R1 if ring 1 isimplicated

Interframe buffer parity error

TLP list

Run diagnostics on the node

reporting the problem

Chart 7

occurrence?

/etc/log/RPTERR1 for otheroccurrences

Call forassistance

Replace the IRN in the

upstream neighbor.Note 1: If RAC 0 is implicatedthe upstream neighbor is the

lower node # (LN32-4 is

upstream of LN32-5). If RAC 1is implicated, the upstream

neighbor is the higher node #

(LN32-6 is upstream of LN32-5)If RPCN, see Note 2

Replace the IFB inthe upstream node

Replace the IFB

in the nodereporting problem

Cleared?

Transient fault.Monitor the

RPTERR1 log filefor several weeks

to see if fault

returns

7/29/2019 172254

Read Format Error

The output message present on the ROP and the RPTERR1 logfile for this fault is

as follows:

READ FORMAT ERROR DETECTED, LN00 7 RAC 0.

MSG SRC: LN00 3, msg type: zzzzz

X’00000000 X’FFFFFFFF X’03000008 X’00000380

X’00004000 X’00000300 (3121083924)

The reporting node, LN00-7 in this example, is reporting the upstream neighbor

on RAC 0 (LN00 6) tried to pass a message which had a bad message length.This error usually indicates there is a node on the ring which is clipping/mutilating

messages as they pass through this node. This fault type requires immediateattention. A clipped message, if undetected, could take the appearance of a valid

maintenance message. This maintenance message could take the appearance ofone which would force all nodes into a set quarantine state, thus removing themfrom service and resulting in a system outage.

Effect

The node which had the bad message presented to it will refuse to accept themessage a will send a error report to the home RPCN. This will force the node

offering the bad message to report ring blockage to EAR. EAR will attempt to re-established normal ring communication by performing a level 0 r ing recovery. Ifthis fails to correct the error condition, EAR will escalate to a level 1 ring recovery

which could result in nodes being removed and isolated.

The RPTERR1 log file should be examined to determine if this is the first instance

of the fault. If this is a recurring fault all reports must be examined in an effort todetermine a ring segment which most likely contains the faulty node.

If MSG SRC data is present in the output message, the suspected faulty nodeshould be one of the nodes between the SRC node and the node reporting the

fault (LN00 4 -> LN00 6) in the example above. If the SRC MSG data is notpresent, several reports must be examined to determine which area of the ring

most likely contains the faulty node. For example, if reports are present from bothLN00 7 and LN32 7, all nodes between LN00 7 and LN32 7 (LN00 8 -> LN32 6)

are probably not the source of the problem for RAC 0 reports.

7/29/2019 172254

401-661-045

NOTE:WRITE FORMAT ERROR messages may also be present and can be used toassist in locating the faulty segment.

All nodes in the suspected ring segment should be diagnosed. If diagnostics donot find a problem with any node, attempt to clear the fault by cleaning and

reseating the circuit packs in the suspected segment using the recommendedcontact cleaner.

NOTE:Miller Stevenson Company markets an aerosol form of the solvent-lubricant that is

recommended (1.0 percent OS-124 in Freon TA) for use on CNI Ring backplanesand circuit packs. This product is marketed as MS-181.

1. Select the first node in the suspected segment and replace the UN303

board. Monitor the RPTERR data daily to determine if fault has beencleared.

2. If fault persists, examine the additional faults reported. If the node reporting

the fault is in the suspected segment, all nodes from the node reporting thisnew fault to the previous nodes reporting the fault can be removed from the

suspected faulty list.

3. Repeat Step 1 for the next logical link node in the suspected faulty ring

segment. If any node contains IFBs, replace these as well once the UN303has been eliminated as a suspected pack.

4. If fault persists, and all packs in suspected segment have been replaced,call for assistance.

Write Format Error

The output message present on the ROP and the RPTERR1 logfile for this fault is

as follows:

WRITE FORMAT ERROR DETECTED, LN00 7 RAC 0.

X’00000000 X’FFFFFFFF X’03000008 X’00000380

X’00004000 X’00000300 (3121083924)

The reporting node, LN00-7 in this example, is reporting a message it wasattempting to write to the ring failed a validation check. This message is similar to

the READ FORMAT ERROR type in that it usually indicates there is a node on thering which is clipping/mutilating messages as they pass through this node. This

7/29/2019 172254

fault type requires immediate attention. A clipped message, if undetected, could

take the appearance of a valid maintenance message. This maintenancemessage could take the appearance of one which would force all nodes into a set

quarantine state, thus removing them from service and resulting in a system

outage.

Effect

The node which was trying to write the message will not do so, nor accept the

message being offered to it, and a error report is sent to the home RPCN. Thenodes previous to the reporting node will report ring blockage to EAR. EAR will

attempt to re-established normal ring communication by performing a level 0 ringrecovery. If this fails to correct the error condition, EAR will escalate to a level 1ring recovery which could result in nodes being removed and isolated.

The RPTERR1 log file should be examined to determine if this is the first instanceof the fault. If this is a recurring fault all reports must be examined in an effort to

determine a ring segment which most likely contains the faulty node. For example,if reports are present from both LN00 7 and LN32 7, all nodes between LN00 7and LN32 7 (LN00 8 -> LN32 6) are probably not the source of the problem for

RAC 0 reports.

NOTE:READ FORMAT ERROR messages may also be present and can be used to

assist in locating the faulty segment.

All nodes in the suspected ring segment should be diagnosed. If diagnostics donot find a problem with any node, attempt to clear the fault by cleaning andreseating the circuit packs in the suspected segment using the recommended

contact cleaner.

NOTE:Miller Stevenson Company markets an aerosol form of the solvent-lubricant whichis recommended (1.0 percent OS-124 in Freon TA) for use on CNI Ring

backplanes and circuit packs. This product is marketed as MS-181.

1. Select the first node in the suspected segment and replace the UN303

board. Monitor the RPTERR data daily to determine if fault has beencleared.

7/29/2019 172254

401-661-045

2. If fault persists, examine the additional faults reported. If the node reporting

the fault is in the suspected segment, all nodes from the node reporting thisnew fault to the previous nodes reporting the fault can be removed from the

suspected faulty list.

3. Repeat Step 1 for the next logical link node in the suspected faulty ringsegment. If any node contains IFBs, replace these as well once the UN303has been eliminated as a suspect pack.

4. If fault persists, and all packs in suspected segment have been replaced,call for assistance.

Emergency Maintenance

This section is intended to assist craft in those instances where the CNI ring

appears to be flat on its back and requires craft intervention to get the system

operational.

While this data provides useful information, it should not be used as a

replacement for calling for immediate assistance when such a situation occurs.Lucent Technologies personnel should be contacted whenever system recovery isinvolved rather than waiting until the “Ring Down Recovery” section of this chapter

has exhausted its helpful hints.

Ring Down Recovery

A ring down situation can take several forms. One of these is the case where theCNI ring is repeatedly rolling into either a CNI Level 3 or CNI Level 4 initialization.

The second form a ring down situation can take is where EAR is repeatedly

performing various levels of ring recovery in an attempt to isolate the cause of theproblem.

The third scenario is one that should never happen, but given this document has

just mentioned that it should never happen, it will be discussed. This is a casewhere all communication to the ring has been lost, but no integrity processappears to be doing anything about it. No section will be dedicated to discuss this

scenario, but in the event it does occur, start the recovery process by requesting aCNI Level 3 initialization, and call for assistance immediately.

Rolling CNI Initializations

If the ring is in a state of repeated CNI initializations, perform the following steps:

7/29/2019 172254

1. Determine if CNI Safety Net is requesting the CNI initializations. Do this by

checking the ROP for the existence of SI15, SI22 or SI24 Defensive Checkfailures. If present, go to Step 2, else go to Step 5.

2. Disable CNI Safety Net by going to the Emergency Action Interface page

and entering a 42 poke command. When the parameter field appears,enter i to inhibit Safety Net. Next, perform a 50 initialization to set the inhibitflag in memory. This should stop the rolling initializations so that theproblem can be investigated. If so, go to Step 3, else go to Step 5.

3. If Safety Net was requesting the initializations due to no CDNs being active

(SI24 asserts), determine if the rest of the ring appears to be up. If so, go toStep 4; for anything else, go to Step 5.

4. No CDNs are active, but the rest of the ring seems to be up. Go to the“Global CDN Recovery” and “Single CDN Recovery” sections in this

chapter for assistance in recovering from this fault.

5. Either the ring is in a rolling initialization due to CNI not being able to get an

RPCN up or SI15/SI22 asserts were present due to CNI Safety Net firing.

6. Verify that there are no power interruptions to the ring.

7. If the problem persists, examine the ROP closely to determine if CNIsoftware is flagging any node, or group of nodes, as being a possible

source of the problem. If so, pull the IRN board out of those nodes to forceisolation around that segment.

8. If the RPCNs are equipped with IRN boards, verify that they have theproper microcode versions. Again, only MC3F026A1 is approved for use in

a RPCN.

9. If problem persists, power down RPCN32 to force the ring to come up on

RPCN00.

10. If the problem persists, restore power to RPCN32. Maybe the problem is

related to a bad CU in the ECP. Force the ECP to do a CU switch andattempt a CNI Level 4 initialization.

11. If problem persists, force isolated segments by removing power from onemounting plate at a time (group of three nodes). After power is removed

from a group of nodes, request a CNI Level 4. If the problem persists,restore power to the previous group and remove power from the next

group. Repeat this step until every node has been tried in an isolatedsegment.

12. Again, it is assumed that you have already called for assistance, but if not,do so immediately.

7/29/2019 172254

401-661-045

Figure 1-8. Ring Down

Ring up?

Check the ROP for thepresence of SI15, SI22

or SI24 asserts

Correct?N

Present?

Inhibit safety net from theEAI page. Use pokes 42, I

for inhibit and 50 bootto set new value.

Request a CNILevel 3 INIT to

restart thedriver

Request a CNI

Level 4 INIT torepump the ring

Ring down

Chart 8

Power down RPCN32 andrequest a CNI INIT 4

Go toChart 8A

Go toChart 8B

Verify that the RPCNshave the correct IRN

micro code. OnlyMC3F026A1 can be used

in an RPCN

Rolling INITSstop?

Rolling CNIINITS?

Ring is down buttaking no

recovery action

Power RPCN32 back up &power down RPCN00.Request a CNI INIT 4

Correct and request aCNI INIT 4

Ring up?

7/29/2019 172254

Figure 1-8. Ring Down (contd)

If each RPCN is reporting a fault orone RPCN & the upstream neighbor ofthe other (last node in groups 31 or 63),then there could be two IFB problems.

Power down one RPCN to force thatsegment out of ring. Place a new IFB

in the other RPCN. If problem stillpresent, place a new IFB in the

neighbor node. If problem still exists,try new IRN. If RPCN is not IRN type,replace the R0 or R1 board based on

which ring the fault is reported onif the fault does not involve both pairs.

Chart 8B

Chart 8C

Token trackinginformation?

Lost tokenReport?

Repeated RACparity errors

on both rings?

Rolling ringreconfigurations?

Mention missingfiles?

Are all linknodes OOS?

Chart 8A

All CDNsOOS?

Call for

assistance

Call forassistance

Pull the IRN fromthe two nodes

mentioned in thetoken tracking

report & requestCNI INIT 4

Follow normalmaintenanceprocedures tocorrect faulty

Check ROP/RPTERR1 for clues

Ring up?

If 1506, 1509, or 1803 IFBs, then pull theIRNs from the two nodes reporting the

fault to force a isolated segment.

7/29/2019 172254

401-661-045

Global CDN Recovery

This section is intended to provide assistance when all CDNs are out of service

and fail to recover after a CNI Level 4 initialization. When this event does occur,

execute the following steps in an attempt to clear the fault AND immediately callfor assistance.

1. If you have not already done so, inhibit CNI Safety net by going to the EAI

page and entering a 42 command. When asked for the parameter value,enter i. Next, do a 50 boot to set the flag in memory.

2. Was a BWM just applied that required the CDNs to be repumped? If so,back the BWM out.

3. Check the ROP closely to see if there are any error messages present thatindicate files may be missing.

4. CDN memory could be scrambled. Inhibit ARR via inh:dmq:src arr inputcommand. Next, power cycle each CDN, and allow the CDN to initialize its

memory (approximately 5 minutes). Once the initialization is completed(red light on the MASA boards should be extinguished), request an

unconditional restoral of each CDN.

5. Perform an ECP stable clear to reinitialize the CNI integrity processes

using init:ecp:sc. Attempt to restore the CDNs unconditionally.

6. If the nodes are being removed during the database download portion of

recovery (page 2160 shows them in the init state), use UXprint todetermine if the nodes are always removed while downloading a specific

database.

7. Examine the ROP closely for the existence of either of these messages:

REPT:CDN x, y (CDN-I)REPT:CDN x, FAULT (CDN-II and CDN-III)

where y is either STACK, MEMORY or UNKNOWN. If present, contact CTS

personnel.

8. Check the application database (apxrcv) iun form to verify that the

CDNs are defined properly.

9. Check the ECD (apxrcv) ucb form to verify that the CDNs are defined

properly.

10. It is assumed you have called for assistance already, but if not, do soimmediately.

7/29/2019 172254

Single CDN Recovery

This section is intended to provide assistance when a single CDN will not restore

to service. When this occurs, execute the following steps in an attempt to clear the

fault:

1. Perform manual diagnostics on the suspect link node. If the CDN-I is asingle plate RAP, demand phases 54-61 must be requested to test the

MASA boards. One phase is required for each MASA board equipped.

2. If the restore fails during the pumping phase, (that is, ABORTED PUMP OF

IUN LN00 7), check the file /1apx10/ims/cdn/OFC.cdn.lv.x to verify that itis a contiguous file. If it is not, use the fmove command to make it

contiguous.

If the node is a CDN-II, check the file /1apx10/ims/cdn2/OFCcdn2 to verify

that it is a contiguous file. If it is not, use the fmove command to make itcontiguous.

3. If the node is a CDN-I and the fault persists, refer to “CDN-I Fault Isolation”in Chapter 6, Diagnostic User’s Guide, for assistance in running the on-

board firmware diagnostics.

If the node is a CDN-II node, try replacing the AP board (TN1630B). If the

fault persists, contact the CTS for assistance.

4. If the node is a CDN-I and the fault persists, inspect the RPTERR1 log file

for the presence of the Hardware Panic message.

REPT COM100 TBL

LN00 07 NADR: X’C07

Panic : Hardware

Local Bus Parity Error:

CCS0(lba=0x0):CSRs=0x61100028,0x0

MASC0(lba=0x100000):

CSRs=0x422054,0x4c00b500

CCS 61100028

MASC 00422054

NPI 00000000

5. If a message similar to this appears, it is not necessarily a local bus parityerror. Go directly to Chart 3B of Figure 1-3 for CDN assistance.

6. If flowchart fails to clear the fault, call for assistance.

7/29/2019 172254

401-661-045

7/29/2019 172254

Contents

Description of the RingSubsystem

General 2-1

Operation of the Ring 2-3Ring Nodes 2-5

s Ring Peripheral Controller Nodes 2-6

s Basic IMS User Nodes 2-6

s Direct Link Nodes (DLN) 2-7

s Call Processor/Data Base Nodes (CDN) 2-7

CDN-I 2-7

CDN-II 2-8

CDN-IIx 2-8

CDN-III 2-8

s Interframe Buffers 2-9

Node Names and Addresses 2-10

Ring Message Format 2-11

Reconfigurations 2-13

s Node Quarantine 2-13

s Node Isolation 2-13

s The Ring Config Module 2-16

Initializations 2-17

s Level-3 IMS Initializations (FPI and Boot) 2-18

s Level-4 IMS Initializations (FPI and Boot) 2-19

Audits 2-20

s Central Node Control Audit (AUD CNC) 2-20

s Node State Audit (AUD NODEST) 2-20

s Node Audit 2-21

7/29/2019 172254

401-661-045

Contents

7/29/2019 172254

Description of the Ring Subsystem

General

The Interprocess Message Switch (IMS) is a packet switch composed ofring-based communication nodes centered upon a 3B21D computer. Each ring

node is controlled by a microcomputer called the node processor. The nodes aredistributed around dual, parallel communication rings that propagate data in

opposite directions. Ring 0, the outer ring in the illustration below, propagates dataclockwise; and ring 1, the inner ring, propagates data counter-clockwise.Ordinarily, of the two ring paths, ring 0 is actively involved in transmitting user

messages, while ring 1 performs as a path for internal IMS communications.

Each ring node contains one interface to each of the two rings and one interfaceeither to the 3B21D or to a user's external system. Thus, IMS has two types of

nodes: nodes interconnecting the ring and the 3B21D, the most important ofwhich are called ring peripheral controller nodes (RPCNs), and nodesinterconnecting the ring with the user's external system, most of which are called

basic IMS user nodes (basic IUNs). As a processing resource, the centralized3B21D is also available to users, but its principal purpose is to provide

operational, administrative and maintenance control of the switch.

7/29/2019 172254

401-661-045

The following graphic illustrates a graphic conception of the ring.

Figure 2-1. Conceptual Illustration of an IMS Ring

The real situation is somewhat more complicated than this description, because

IMS has other types of nodes and because users are represented not only by anexternal communication system but also by internal hardware and software

residing in certain nodes. A full discussion of all classes of IMS nodes appears

shortly below.

IMS may be used either as a local area network or as a switching system. Morecommonly it is used as a switch to transfer user messages from incoming

transmission facilities to user-specified outgoing transmission facilities. A usermessage typically enters IMS through the external or user interface of an IUN, is

formatted and addressed to a destination IUN by the resident node processor, andis inserted on the ring by the resident ring interface. It then passes around the ringto the destination IUN where it is recognized and extracted by the ring interface,

reformatted by the node processor, delivered to the user interface and, then,returned to the user. In this typical transmission the 3B21D is not directly involved,

though it can be involved, depending on user requirements. When access to the3B21D is needed, a user message enters the ring as described above but is first

removed by an RPCN or similarly functioning node, which delivers it to the 3B21D,which processes it. The 3B21D then returns the processed message to an RPCN,which inserts it on the ring, from which it is removed by the destination IUN, which

further processes and returns it to the user.

BASIC IUN

LEGEND

7/29/2019 172254

In this illustration of IMS switching, a user message is transferred between

processes residing in different processors. By itself the illustration is misleading,because IMS is not an interprocessor message switch but an interprocess

message switch. It is capable of transmitting messages between any two

processes, whether user- or IMS-owned, residing in the same or in differentprocessors. This capability is provided by a major IMS software module called the

message switch.

Operation of the Ring

All ring nodes contain a ring interface. Each ring interface is equipped with a pair

of ring access circuits (RACs), one connected to each ring. Each RAC consists ofthree elements:

— a firstin-firstout buffer (FIFO) that is 10 bits wide,

— circuitry providing receive logic, and — circuitry providing transmit logic.

The FIFO is actually a component of the ring, which is a mixed medium composedalternately of storage devices and transmission leads. The storage devices are

the FIFOs. The transmission leads are a 12-bit ring bus that interconnects theFIFOs (and therefore the RACs). The ring bus contains eight data leads, two

formatting leads, and two control leads. A data-available control lead permits theupstream RAC to assert to the downstream RAC to which it is offering a byte ofdata. A data-taken control lead allows the downstream RAC to acknowledge to the

upstream RAC that it has accepted the offered byte. Data thus advances betweenadjacent RACs asynchronously, one byte at a time, by means of continuous

handshakes. Upstream and downstream are relative terms. Each RAC isupstream of the RAC to which it offers data and downstream of the RAC from

which it receives data.

A byte of data may be offered to a RAC either by the upstream RAC or by the

resident node processor, which connects with the RAC through an 18-bit DMAchannel composed of 16 data leads and two formatting leads. The first 8 bytes of

a message from either source consists of header information. Each header byte isexamined as it is offered by the second element of the RAC, the receive logic. The

receive logic checks for parity and formatting errors and determines messagedisposition. It also controls the loading of each data byte into the FIFO. The thirdRAC element, the transmit logic, disposes of the data in the FIFO according to

instructions from the receive logic.

7/29/2019 172254

401-661-045

If the message was addressed to the resident node or was a broadcast

message,1 the bytes composing it are offered by means of handshakes to thenode processor via the 18-bit DMA channel. If the message was not addressed to

the resident node, the bytes composing it are offered by means of handshakes to

the downstream node via the next segment of the ring bus.

Figure 2-2. A Ring Access Circuit on the IMS Ring

IMS employs a token message on each ring to ensure that only one node at atime writes messages to the ring. A token continuously traverses a ring. When a

node is ready to insert a message or a block of messages on a ring, it waits for theupstream node to offer a data byte that its receive logic recognizes as the first byte

of the token header. It delays accepting this byte (does not assert the data-takenlead) until it can insert its message or messages, byte by byte, on the ring. Then itaccepts and transmits the token message downstream, making it available to the

next node that has messages to write.

1 IMS has two types of broadcast messages-general broadcasts, which are read by everynode, and selective broadcasts,which are read by previously defined groups of nodes.Selective broadcasting-achieved by virtual addressing-allows such practices as paralleldownloading of data or code into similar node types.

12-bit ring bus12-bit ring bus

18-bit

DMAchannel(write)

18-bit

DMAchannel(write)

RCVlogic XMITlogic

7/29/2019 172254

Ring Nodes

IMS has two classes of ring nodes-RPCNs and IUNs. RPCNs are nodes that

contain no user software and that interconnect the ring and the 3B21D. IUNs,which contain both IMS and user software, perform a variety of functions. The

class of IUNs has two subclasses-unextended IUNs, in which the node processorprovides the only processing resource, and extended IUNs, in which theprocessing function is supplemented by an attached processor. At present, all

unextended IUNs contain external user interfaces, but no extended IUNs do. Thiscondition, however, is arbitrary and therefore subject to change. Currently there is

one type of unextended IUNs; the basic IUNs. There are two types of extendedIUNs-direct link nodes (DLNs) and call processor/database nodes (CDN-I). All

ring nodes of either class have a ring interface and a node processor. In thisdocument the units of a node other than the ring interface and the node processorare called auxiliary components.

Ring node hardware utilizes very large scale integration hardware, housing thering-interface and the node-processor functions in a single integrated circuit pack.These are called integrated ring nodes (IRNs). There are two versions of IRNs:

the IRN/IRNB (UN303/UN303B) and the IRN2/IRN2B (UN304/UN304B).

Node processors are microcomputers composed of a CPU, memory, interrupt

logic, I/O ports, and DMA circuitry. They are supplemented in DLNs by anadditional microcomputer called the attached processor and in CDNs by an

additional minicomputer called the ring application processor. In unextendedIUNs, the node processor contains both IMS and user code. In extended IUNs,

user code resides only in the attached processor, whereas both node andattached processors contain IMS code. The content of user code is determined byuser needs. Typically it provides or contributes to such functions as controlling

user hardware resident in the node, managing the user's network, and providingreal-time user services such as protocol conversion and message addressing.

The code provided by IMS manages the ring-interface and node-processor

hardware. It includes code for initialization and automatic maintenance and forsuch switching functions as message formatting and temporary message storage.

It provides an operating system, boot monitor, memory, timers, andmeasurements. Except for the boot monitor, all code residing in node processorsand attached processors is downloaded from the 3B21D.

7/29/2019 172254

401-661-045

Ring Peripheral Controller Nodes

RPCNs allow messages to be passively exchanged between the ring and the

3B21D. The exchange is passive because the RPCNs contain no user code that

could provide processing of message substance. By contrast, direct link nodes(discussed below) provide active exchange of messages between the ring and the3B21D by supplementing certain real-time user functions housed in the 3B21D. Tominimize the consequences of a wide failure, RPCNs are distributed about the

ring with approximately equal numbers of IUNs between them. A minimumrequirement exists of two RPCNs per ring. Typically, large rings will have more.

In addition to a ring interface and a node processor, RPCNs contain the following

circuit packs:

s A duplex dual serial bus selector (DDSBS) serves as a termination point

between the ring and the dual serial channels of the 3B21D. It converts theparallel output of the ring to the serial format of the dual serial channels

and vice versa. The DDSBS is duplexed, with one DDSBS functionconnected to the dual serial channel of the on-line 3B21D control unit andone to the off-line control unit.

s A 3B21D computer interface (3BI) circuit pack serves as a buffer between

the node processor and the DDSBS. It also provides data conversionbetween the node processor's 16-bit data bus and the DDSBS's 36-bit databus. The 3BI communication occurs either via a DMA channel or a program

I/O utility of the 3B21D operating system. The DMA channel is ordinarilyused for standard message interchange. The program I/O is initiated and

used by the 3B21D to issue urgent commands to the RPCN or tosynchronize data transfers.

Basic IMS User Nodes

Basic IUNs interconnect the ring and the user's external system. In addition to a

ring interface and a node processor, a basic IUN contains an external userinterface. The external user interface and node processor communicate with one

another via a shared memory in the external user interface. The MDL circuit packdescribed below is available as an external user interface for these nodes; or, aswith Common Network Interface (CNI) link nodes, users may supply their own

interface.

7/29/2019 172254

Direct Link Nodes (DLN)

DLNs are designed to supplement real-time processing of user data in the 3B21D.

Like RPCNs, DLNs provide message transmission between the ring and the

3B21D. But unlike RPCNs, DLNs contain user code, the presence of whichenables them to reduce the processing demands upon the 3B21D by assumingsome user processing functions that cannot be performed by basic IUNs.

In addition to a r ing interface and a node processor that contains only IMS code,DLNs are composed of the following circuit packs:

s An attached processor that resides on the node-processor bus andcommunicates with the node processor via a dual-ported memory and

hardware interrupts. The attached processor contains both IMS and usercode.

s A 3B21D computer interface (3BI) and a duplex dual serial bus selector(DDSBS) that perform in the same way and serve the same functions as

they do for RPCNs, as described above.

Call Processor/Data Base Nodes (CDN)

The CDN handles the call processing functions of the FLEXENT™/AUTOPLEX ®

Wireless Network Systems. There are several versions of the CDN: CDN-I,

CDN-II, and CDN-IIx.

IMS offers an extended node for users who require more processing power in the

nodes than can be supplied by basic IUNs. The node is called a CDN-I[sometimes referred to as a standard multi-application real time node (SMARTnode or SN)]. It serves as an alternative to the 3B21D for the substantive

processing of user data. Currently, the CDN-I has only an interface to the ring. It iscapable, however, of having an external user interface, and it may have one in the

future.

In addition to a ring interface and a node processor that contains only IMS code, aCDN-I is composed of the following elements:

s An attached processor called a ring application processor (RAP). The RAPis a 3B15 computer mounted on an IMS backplane that has been

redesigned to conform with the design of IMS ring-node frames/cabinets

and the 3B15. The older version has 2 megabytes of memory and iscapable of growing an additional 94 megabytes. The newer version has 16

megabytes of memory and is capable of growing an additional 112megabytes. The following circuit packs compose the RAP:

7/29/2019 172254

401-661-045

— Central controller cache (CCC)

— Central controller support (CCS)

— Main store controller(s) (MASC)

— Main store arrays (MASAs)

s A power control interface and display (PCID) that provides manual-power,reset, and diagnostics controls and LEDs that indicate power and

diagnostic failures.

s A node-processor interface (NPI) that provides message exchange

between the node processor and the RAP.

CDN-II

The CDN-II (sometimes referred to as the Turbo CDN) creates a new node that is

used to replace the CDN-I. The CDN-II requires only two boards and fits in a

standard 3-node shelf or the new 5-node shelf.

The CDN-II provides a newer technology, higher performance CDN. Theperformance of CDN-II is about four times the performance of the CDN-I. CDN-II

has a fixed 80 Mbytes of memory and consists of the IRN2B (UN304B) and an AP(TN1630B).

CDN-IIx

The CDN-IIx has identical features to the CDN-II, but different hardware. It uses

the IRN2B (UN304B) and an AP (TN1720x) but can have up to 272 Mbytes ofmemory using multiple AP boards. A CDN-II can be upgraded to a CDN-IIx by

ordering a memory growth upgrade kit.

CDN-III

The CDN-III is an improved CDN that may be used to upgrade CDN-II or CDN-IIxtype nodes. The CDN-III consists of an IRN2 node core and AP60 attached

processor, providing greater processing and memory capacity than previousCDNs. The AP60 uses an MC68LC060 processor.

7/29/2019 172254

Interframe Buffers

Interframe buffers (IFBs) are required to extend the parallel ring buses where the

distance between adjacent ring nodes is greater than a few inches. In an IRN ring,

the distance is 24 inches or more. Such internodal distances occur at theboundaries of frames or cabinets where the two rings must be extended by twolengths of cable. At times they may also occur within frames/cabinets. At theseboundaries, an interframe-buffer circuit pack must be inserted at each end of the

parallel cables, between the cables and the nodes that are separated by thecables.

Interframe-buffer circuit packs are always employed in pairs. Each member of a

pair contains both send and receive circuitry. Therefore, the paired packs aremutually dependent, with each providing half of the buffering function for each

parallel ring bus.

The following graphic iilustrates the pairing of the interframe buffers.

Figure 2-3. Interframe Buffers

Thus, if either member of a pair fails, the pair fails.

In addition to providing necessary drive capability without slowing down theinternodal byte transfer rate, interframe buffers in padded form may be used toincrease the effective lengths of small rings, thereby permitting them to employ

longer messages. For this purpose, two pairs of 4104-byte buffers may beinserted in small IRN rings. The pairs should be placed diametrically on the ring to

minimize the possibility that both would be included in an isolation. If additionalinterframe buffers are needed, they should be of the standard 16-byte capacity.

The 16-byte capacity is adequate for use on large rings where employment of longmessages requires no buffer padding. Technicians should ensure that the actualsizes of their interframe buffers correspond to the sizes entered in equipment

configuration data (ECD). See ``ECD Values for Interframe Buffers'' in AppendixB, Ring Maintenance Reference Material .

ring 0

ring 1

7/29/2019 172254

401-661-045

Node Names and Addresses

Ring nodes are named as members of the group in which they reside. A group is

composed of a maximum of 16 member nodes numbered 00 through 15. Node 00is always reserved for an RPCN. Nodes 01 through 15 are reserved for other

node-types. If a node position is unequipped, the member number is neverthelessreserved for the position.

Node names consist of a node-type identification followed by a 2-digit groupnumber followed by a 2-digit member number. IUN32 10, for example, is an IUN,

and it is member 10 (or the 11th node or node position) in group 32. RPCN00 0 isan RPCN, and it is member 0 (or the first node or node position) in group 00.

Member numbers and group numbers are assigned so that they increase in thedirection of traffic flow on ring 0. Unlike member numbers, however, groupnumbers do not necessarily increase by consecutive integers. Thus, a ring might

consist of groups 00, 01, 02, 32, 33, and 34, for example. In IMS usage, nodes are

identified by the formula RPCNa b or IUNa b , where a is the 2-digit group numberand b is the 2-digit member number.

In addition to names, nodes have identifications and physical addresses. (Nodesmay also have virtual addresses, but technicians will not encounter or use them.)The identification, a number between 0 and 1023, represents the physical location

of the node on the ring. The identification is calculated with the formula 16(a) + b where a is the group number and b is the member number. The identification

appears in decimal or hexadecimal form in various IMS output messages. It isalso the address that is strapped on the back of each node by grounding the node

ID pins. The pins, which are numbered 0 through 9, represent sequential binaryweights (ID 0 = 1, ID 1 = 2, ID 2 = 4, ID 3 = 8, and so on). The sum of the binaryweights of all grounded pins is the node identification.

The physical node address, a number between 3072 and 4095, is used in IMS

message headers to identify the source and destination addresses of messages.The physical address is calculated by adding 3072 (or in hexadecimal notation,

C00) to the node identification. The number 3072 corresponds to the two mostsignificant bits in the 12-bit source- and destination-address fields of message

headers, the lower 10 bits being the node identification. Tables in the referencechapter of this document provide translations of both identifications and physicaladdresses into node names. Technicians will encounter the hexadecimal form of

the physical node address in messages output in response to phase 1 and 2diagnostic failures.

7/29/2019 172254

Ring Message Format

The figure below illustrates the format of IMS messages as they appear on the

12-bit ring bus (the two control leads are not shown).

Figure 2-4. IMS Message Format

P C 7 6 5 4 3 2 1 0

source address word count

word count

DC RR CF CC

source address

dest.address

dest.addressDR

last data

LEGEND

CC = Control CodeCF = Control FlagRR = Rac ResetDC = Destination ControlSR = Source Ring IDDR = Destination R

7/29/2019 172254

401-661-045

The illustration leaves blank fill bits and bits that are not examined by

ring-interface hardware. The first 8 bytes constitute the message header. The firstbyte contains a 7-bit control field from which the RAC learns how to respond to the

message. Within the first byte, the control code (CC) defines the message

function. Functions are token, software, destroy, set/clear quarantine, set/clearisolation, processor reset. The destination control (DC) identifies the

address-type. Types are normal address match, general broadcast, selectivebroadcast, and take message. In addition to the 8 data-bits, there is a ninth bit,

called the control or C-bit, which is always set to logic-one to identify thebeginning byte of every message. From association with this feature, the entire

first message byte is often referred to in documentation as the control or C-byte.The tenth bit is a parity bit which provides odd parity over the data byte and C-bit.When a RAC writes a message to the ring, it generates the C-bit and modifies the

parity bit from node-processor memory to include the C-bit. When a RAC reads amessage from the ring, the C-bit is removed and parity is changed back to its

original form before being written to node-processor memory.

The word count in the second message byte informs the RAC of the total numberof 32-bit words in the message. Each message contains 4N bytes, where N is thevalue of this 7-bit word count. All messages are padded out to contain an integral

number of 32-bit words. The longest possible message that can be placed on thering is limited to the maximum value of this word count, which is 127 32-bit words

(508 bytes) for rings that allow the short message and 543 32-bit words (2172bytes) for rings that allow the long message. For explanations of conditions that

permit short and long messages, see the discussion of interframe buffers above.

The third and fourth header bytes contain the source address, and the fifth and

sixth header bytes contain the destination address. The ring-interface hardwareperforms address matching on the 12-bit node address and the 1-bit ring id (that

identifies which of the two rings is used for the message). The lower 10 bits of the

ring address are referred to as the node identification. Each node is assigned aunique 10-bit node identification via the ID0-ID9 backplane straps.

This header information enables the RAC to determine message disposition and

the source and destination addresses, to check for errors in parity, format, andmessage length, and to perform hardware control functions required for ring

maintenance.

7/29/2019 172254

Reconfigurations

The types and number of nodes composing any ring are selected to meet the

requirements of a specific user. Thus, only a ring whose components are fully inservice may be thought of as properly configured. Yet rings must sometimes be

temporarily reconfigured for such reasons as the need to repair or replaceequipment. IMS reconfigures a ring by removing one or more nodes from service.Nodes that have been removed from service are ordinarily in one of two states.

They may be quarantined or they may be isolated.

Node Quarantine

Quarantining a node consists of electrically severing the node processor from itsassociated ring interface, an action that prevents the node processor from

communicating through or to the ring interface. However, the action does not

prevent the 3B21D or other nodes from limited communications with the nodeprocessor which they accomplish by setting registers in the ring interface. When anode is placed in quarantine, both RACs are set to forced-propagate mode, which

allows them to continue propagating messages on the rings but prevents themfrom reading messages from or writing messages to the rings. Quarantining is theappropriate response to a fault that occurs in a node processor or in any of the

auxiliary components of a node. Quarantining has the advantage over isolation inthat it disturbs the ring subsystem only slightly.

Throughout this document the term "quarantine'' is used solely to represent a

node that is in the state described above and that is in the active ring. Nodes inisolation or nodes during initialization or recovery sequences may have their nodeprocessors electrically severed from their ring interfaces, which are in

forced-propagate mode. Such nodes will not be called "`quarantined'' since theyare not in the active ring.

Node Isolation

Quarantining a node insulates the active ring from faults or activities in the nodeprocessor and in auxiliary components. Isolating a node insulates the active ringfrom the entire node. It is achieved by converting the ring subsystem from one

dual-ring structure to two single-ring structures. Of the two single-ring structures,one is the active segment that continues to transmit user messages, and the other

is the isolated segment that contains the isolated node or nodes. Isolatedsegments do not have a token message. The following figure schematically

represents an isolated ring.

7/29/2019 172254

Figure 2-6. Before (top) and After (bottom) Becoming a BISO or EISO Node

Because all nodes have this shunting capability, any node of any class can

perform as a BISO or an EISO node. The nodes actually selected to performthese functions are determined by the location of the node(s)-to-be-isolated. The

node selected to be the BISO node is ordinarily the first node upstream on ring 0of the node(s)-to-be-isolated (and therefore the next lower-numbered node), and

the node selected to be the EISO node is ordinarily the first node downstream onring 0 of the node(s)-to-be-isolated (and therefore the next higher-numbered

node). If more than one node must be isolated (a phenomenon called a multipleisolation), IMS software chooses to reconfigure the ring in such a way as to

DS = Data Selector

Selected ring path

Unselected ring path

Ring 0 DS

Ring 1DS

Ring 0DS

Ring 1DS

7/29/2019 172254

401-661-045

include the smallest number of nodes possible. Nodes included in a multiple

isolation, not because they contain faults, but because they lie between faultynodes, are called innocent victim nodes.

The BISO and EISO nodes also provide the means by which maintenancemessages are transmitted between the active and the isolated segments of an

isolated ring. BISO and EISO nodes have one RAC participating in the activesegment and one RAC participating in the isolated segment. Messages destined

for either ring segment may be read from the sending segment by the EISO orBISO RAC participating in it, transmitted via the node processor to the RAC

participating in the receiving segment, and then written to the receiving segment.It is by this means that diagnostic code is downloaded by the 3B21D into isolatednodes and diagnostic results are returned to the 3B21D.

Isolation is a more drastic means than quarantine for removing a faulty node from

service. It is an appropriate response to a fault in the ring interface or in themedium between ring interfaces (this may be a fault that prevents messages from

being propagated on the ring).

The Ring Config Module

When the ring is restarted or when an isolation is imposed or dissolved, the actionis performed by the IMS ring config module whose principal acts are:

1. to inhibit the services provided by the message switch, thus, preventing thenodes from writing to the ring, a condition known as ring silence

2. to set the data selectors of every node to positions that provide the desiredring structure

3. to test ring continuity, and-if continuity is good-s to issue one token message, when the ring contains an isolation, or

two token messages, when it does not

s to restart the message switch; or-if continuity is bad-

s to abort and return control to the process that initiated ring config.

The ring config module may be executed by IMS initialization software, by Error

Analysis and Recovery (EAR) software, by Automatic Ring Restoral (ARR)software, or by manual commands to change the structure of the ring. The

processes mentioned here are described at length later in this document.

7/29/2019 172254

Level-4 IMS Initializations (FPI and Boot)

Level-4(FPI) initializations begin with a limited initialization of IMS in the 3B21D as

described above. Level-4(BOOT) initializations begin with a full initialization of IMS

in the 3B21D as described above. Both level-4s then proceed to initialize the ringwith the following sequence of events:

1. RPCNs are downloaded with new operational code and placed in

execution.

2. Each node is tested for the ability of its ring-interface hardware to

propagate messages on the ring and for the functionality of its dataselectors.

3. The ring config module is called to establish a ring structure based on theresults of these tests.

4. With the new ring structure in place, tests are made to determine the abilityof each unisolated IUN to read messages from, and write messages to, the

ring. Nodes that fail the tests are quarantined.

5. All unquarantined and nonisolated nodes are downloaded with operational

code and placed into execution. The downloading occurs by means ofselective broadcast messages that allow parallel downloading of similar

node-types. When downloading is done, the IMS initialization process isdone, and the ring is up. IMS level 4s are accompanied by ring silence.

Even if no nodes are operational, IMS level 4 initialization completes so thattechnicians can conduct diagnostics in an attempt to manually correct the

problem.

IMS initializations are reported on the ROP by the REPT IMSDRV INIT output

message. This message format will report first the completion of the critical stageof initialization and then the completion of the non-critical stage. Initialization ofthe ring and initialization or restarting of the IMS driver compose the critical stage.The noncritical stage consists of initializing such features in the 3B21D as display

pages, measurements, and certain craft state reports.

7/29/2019 172254

401-661-045

Audits

The following information about IMS audits is offered chiefly because output

messages concerning audits will occasionally appear on the ROP. Techniciansshould rarely have occasion to use the input commands that manually initiate

Central Node Control Audit (AUD CNC)

This is a routine audit that runs according to a user-specified schedule. IMSrecommends a 15-minute interval. It also runs during level 0 and level 1A IMS

initializations and in response to manual requests. The purpose of the audit is tofind and correct inconsistencies in internal records that could interfere with theactions of automatic maintenance. The errors detected by this audit indicate

mutilated internal data or other software problems, which often occur as side

effects of other events, such as those reported by REPT IMSDRV FLT messages.The central node control audit attempts to correct an error by canceling themaintenance task associated with it. It does not verify that its action was

successful. To verify that the error was corrected, a technician must run the auditagain, using the AUD:CNC 1 input message.

If the central node control audit finds an error, it reports it in an AUD CNC outputmessage. If it does not find an error, no output message is printed, unless the

audit was manually requested. Problems in running the audit are reported in aREPT IMSDRV AUD message. Once started, the audit normally takes under 10

seconds to run.

Node State Audit (AUD NODEST)

This is a routine audit that runs according to a user-specified schedule. IMSrecommends a 15-minute interval. It also runs during level 0 and level 1A IMS

Initializations and in response to manual requests. Its purpose is to detect andcorrect errors in the node availability map, which is used by software modules

such as node audits to identify nodes whose major state is ACT (See thediscussion below of IMS maintenance states). The audit compares the data in thenode availability map with state data in the IMS driver and, when it finds

inconsistencies, modifies the map to conform to the state data.

The errors detected by the node state audit indicate mutilated internal data orother software problems, which often occur as side effects of other events, such

as those reported by REPT IMSDRV FLT messages. The audit's attempts tocorrect errors should always succeed. When the audit finds an error, an AUDNODEST output message is printed. When it does not find an error, no output

message is printed, unless the audit was manually requested. Problems inrunning the audit are reported in a REPT IMSDRV AUD message.

7/29/2019 172254

Node Audit

An automatic, internal audit of nodes allows maintenance software in the 3B21D

to continuously monitor the health of the ring and all ring nodes. The node audit is

run routinely every few seconds. By this means, the 3B21D verifies that eachactive node is operating correctly, checks the communication paths of both rings,and finds nodes that have quarantined themselves or that need to be quarantined.The work of the node audit is transparent to technicians and users of IMS, unless

it detects a problem that causes a node to be removed from service.

7/29/2019 172254

401-661-045

7/29/2019 172254

401-661-045

Contents

Manual Ring Maintenance 3-25

s Ring Maintenance Interfaces 3-25

Alarms 3-25

Critical Alarms 3-25

Major Alarms 3-25

Minor Alarms 3-26

Special IMS Indicators 3-26

Display Pages 3-28

The Ring Status Summary Page 3-28

The Ring Node Status Page 3-32

s Ring Diagnostics 3-36

Obtaining Diagnostic Results 3-37

Diagnostic Listings 3-38

Using Diagnostics 3-39

s Guide to Critical Ring Maintenance 3-39

IMS Input Messages 3-40

Critical Maintenance Procedures for Nodes 3-42

Critical Maintenance Procedures for Nodes in Isolation 3-47

Low-Phase Ambiguity 3-48

Guideline to Single-Node Isolations 3-51

Guideline to Multiple-Node Isolations 3-53

Responding to Ring Down 3-56

Employing Manual Ring Mode 3-58

Ring Application Processor Critical Maintenance Procedure 3-59

Recognizing and Finding Intermittent Faults 3-63

Other Suggestions for Troubleshooting 3-64

New Circuit Pack; Old Failure 3-64

Unconditional Restorals 3-65

Unexplained Loss of Token 3-65

Avoiding Trouble 3-65

Recording Trouble 3-65

New Installations or Ring Growth 3-66

Examples of Ring Maintenance 3-66

s Responses to Single, Ring-Related Faults 3-67

Automatic Recovery from a Transient Fault by EAR Level 0 3-67

Manual Recovery from a Hard Fault 3-70Automatic Recovery from a Transient Fault by ARR 3-75

Manual Recovery from a Hard Fault on a Small Ring 3-78

7/29/2019 172254

Issue 16.0 December 2000 3-iii

Contents

s Responses to Multiple, Ring-Related Faults 3-85

Manual Recovery from Multiple Hard Faults 3-85

Automatic Recovery from Two Intermittent Faults 3-101

7/29/2019 172254

3-iv Issue 16.0 December 2000

401-661-045

Contents

7/29/2019 172254

401-661-045

with nodes means the ring can respond to faults by removing nodes from service,

either by quarantining or isolating them. The type of reconfiguration chosendepends on the impact of the fault. If the impact is confined to the internal

operations of the node, then the node will be quarantined. But if the fault has

disrupted operation of the ring, then the node associated with the fault will beisolated. Automatic node quarantine occurs in response to instructions from the

node processor of the faulty node or from the 3B21D. Automatic node isolationoccurs when the ring config module is called with instructions to set the data

selectors in positions that create an isolated segment.

Reinstatement will succeed in response to most soft faults, while most hard faultsrequire reconfiguration. Soft faults are transient hardware problems or glitches insoftware, either of which is likely to be temporary. Soft faults may often be

corrected simply by resuming operation of the system or of the component theyhave disrupted. (Sometimes, however, the effects of soft faults are sufficiently

severe that recovery requires reconfiguration.) By contrast, hard faults are failuresin hardware or software which, once manifested, are likely to persist until they or

their causes are corrected.

Both reinstatement and reconfiguration provide rapid recovery, with the former

usually being faster but less rigorous. When confronted with a fault in the ringsubsystem, ring maintenance software must always choose to resume operation

by one of these two means. When its first choice is reinstatement, and that choicefails to achieve a stable and usable ring, it next tries reconfiguration. When, on the

other hand, its first choice is reconfiguration, reinstatement will not ordinarilyfollow, since reconfiguration, being the more thorough action, should succeed inall but the rarest cases.

Reconfiguration precipitates the third type of recovery action employed by ring

maintenance, node restoral. Node restoral occurs after operation of the

reconfigured ring has resumed. It begins with ring maintenance software testingquarantined or isolated nodes to determine how best to treat them. In somecases, it can and does return them to service by automatic means. When it cannotor does not return them to service, it alerts technicians to repair or replace them

and then to return them to service manually.

Reinstatement and reconfiguration occur automatically. The work of node restoralalso begins with automatic procedures, which give way to manual means only if

the automatic procedures fail repeatedly or if diagnostics reveal a hard fault. Thusthe usual role of technicians is to support ring maintenance by manually

completing tasks software has begun. In some instances, however, manualintervention in the automatic machinery may be indicated.

The organization of the next two chapters reflects the operational divisionbetween automatic and manual ring maintenance. The next chapter describes the

maintenance procedures that occur automatically, and the chapter that followsexplains the related responsibilities of technicians.

7/29/2019 172254

Ring Maintenance

Automatic Ring Maintenance

In the strategy of automatic ring maintenance described above, error analysis and

recover (EAR) software performs the nondeferrable task of reinstating orreconfiguring the ring, while automatic ring recovery (ARR) software performs the

deferrable task of node restoral. The following explanation of automatic ringmaintenance begins with EAR, and then proceeds to ARR.

EAR or Ring Recovery

This discussion of EAR describes events in the order of their occurrence. EAR

recognizes the existence of a fault from audits or by detecting errors in messageformat or message delivery. The work of error detection occurs chiefly in thenodes which report errors to EAR in the 3B21D. EAR in the 3B21D then analyzes

the errors to determine the type and location of the fault. Its analysis distinguishes

between ring-related faults that obstruct the transportation of messages on thering and node-related faults that prevent the processing and transmission ofmessages within nodes. Based on this information, together with its knowledge of

the current ring structure, it decides whether to reinstate or reconfigure the ring.Ring reinstatement and reconfiguration are achieved by overlapping mechanisms,and these mechanisms are also discussed below.

Error Detection

The ring assumes that faults will produce errors in message format or messagedelivery, so it searches for faults by looking for errors. Errors may occur as

messages are propagated on the ring that is, they may occur within ring interfaces

or in the medium between ring interfaces as messages are transmitted orprocessed by node processors or auxiliary components, or as messages are

transmitted between the ring and the 3B21D.

The task of detecting and reporting errors is assigned chiefly to the ring nodes. Bymeans of circuitry in their ring interfaces and software in their node processors,

nodes are usually able to detect errors internal to themselves. Moreover by meansof failures in message delivery, nodes can often detect external errors, errors

occurring in association with other nodes. When a node detects an error, it will, if itcan, report the error to the 3B21D for analysis.

An error associated with a fault that disrupts traffic on the ring is ordinarily firstdetected by the circuitry of the ring interface. Every ring interface contains circuits

for checking parity on the ring path as well as for detecting format errors in themessages it reads, writes, and propagates. When a ring-interface circuit detects

an error, it informs its node processor by means of an interrupt. The node

7/29/2019 172254

401-661-045

processor then interrogates the ring-interface hardware to determine the cause of

the problem and reports, if it can, the identity and location of the error to the3B21D via one or both rings.

An error associated with a fault that prevents the transmission or processing ofmessages within nodes will usually be detected by the node processor. Such an

error is typically caused by a fault in the node processor or by a node-processordetectable fault in one of the auxiliary components. From some errors of this type,

nodes can recovery immediately by means of local reinstatement. They may, forexample, be able to restart an attached processor that has incurred an error.

Usually, however, reinstatement is not possible, and the node processor respondsto the error by placing itself in quarantine, a condition that prevents it fromreporting its state to the 3B21D. Instead the 3B21D usually learns of the condition

from a report made by the first node that attempts to send a message to thequarantined node. During normal operation, messages are read from the ring by

the destination node. A node in quarantine, however, cannot read messages.Instead, a message addressed to it will, after traversing the entire ring, be

detected and removed from the ring by the sending node, which will understandthis condition as a SOURCE MATCH error and report it to the 3B21D. If a sourcematch fails to materialize, however, or if an injured node processor is unable to

quarantine itself, the condition will be detected by a node audit and reported to the3B21D which responds, if needed, by quarantining the disabled node.

Source-match errors are one of two means by which r ing nodes detect errors

external to themselves. The other is ring blockage. Blockage is the condition thatexists when an upstream node cannot propagate data to its downstream neighbor.Every node has a timer on the output of each of its two ring paths. The timer

expires if a byte of data being offered by the upstream node is not taken by thedownstream node within a specified interval. Expiration of the timer implies a

problem in the downstream node, for a node processor ordinarily reacts to an

error that implicates its ring interface by forcing blockage on its ring input path. Inthis context, all interconnections between nodes, including interframe buffercircuits, are considered part of the downstream node. When a node processordetects blockage, it immediately drains the ring of any remaining data, including

the token message, and reports the blockage to the 3B21D via the alternate ring.1

Errors may also be detected during the testing phase of ring initialization. Testing,which is more extensive in level-4 than in level-3 initializations, is in neither of

these levels of initialization so detailed as in diagnostics. Nevertheless, errors

1 The node that first detects blockage drains the ring to avoid confusing the 3B21D as towhich node is immediately upstream of the faulty node. If it did not drain the ring, masscongestion would ensue, causing many upstream nodes to experience and reportblockage. Even so, the initial blockage condition will often trigger two or three upstreamblockage reports before the ring can be drained.

7/29/2019 172254

Ring Maintenance

EAR Ring Recovery Intervals and Output Messages

In this document error messages have been classified according to whether they

indicate a ring-related fault (a fault that obstructs the transportation of messages

on the ring) or a node-related fault (a fault that prevents the processing ortransmission of messages within nodes). A message of the first class is usuallyfollowed by ring restarts and, if restarts fail, by node isolation. A message of the

second class is usually followed by node quarantine. A third class of messagesexists that result in no change in ring or node connectivity.

All three message types (including the third class) are reported, usually by nodesto the 3B21D, which in turn formats them and sends them to the MCRT and ROP

as REPT RING TRANSPORT ERR messages. A descriptive list of thesemessages is included in Appendix B, Ring Maintenance Reference Material . The

most common ring transport errors, the error types that technicians shouldprobably know well, are:

s Blockage

s RAC Parity/Format Error

s Interframe Buffer Parity Error

s Source Match and SRC Match

s NAUD Failure, and

s Unexplained Loss of Token.

The outages that occur during ring recovery actions are chiefly the result of ringsilence. Ring silence is a condition imposed upon the nodes while the ring is

restarting, initializing, or reconfiguring to achieve an isolation. During ring silencethe nodes are not permitted to write to the ring. Although the actions of the IMS

ring config module to restart the ring or to achieve an isolation require only a briefperiod of ring silence, the periods of silence required by continuity tests aresignificantly longer. Nevertheless, most EAR ring recovery attempts will be

completed very rapidly. The lower levels of EAR escalative recovery actions arebrief. A level 0, 1, or 2 recovery attempt may take from to 1 second to complete,

while a level 3 attempt will usually take from 1.3 to 2 seconds. The soak periods oflevels 4 and 5 make them somewhat more expensive. Typically, a level 4 attempt

consumes 11 to 14 seconds and a level 5 attempt 90 seconds to 3 minutes,depending on ring size.

5 Overall system tolerance to these partial ring outages depends on the application. Whereapplications require very high availability of a particular user-node function, that functioncan be replicated on two or more nodes. By spacing these nodes equally around the ring,at least one member of the set should remain in the active ring segment for most cases ofmultiple ring faults.

7/29/2019 172254

401-661-045

The brevity of all but the longest of these ring recovery attempts mean that

technicians will ordinarily learn of them after they have completed. Moreover, withone exception, it is the practice of the 3B21D to queue error messages and send

them to the MCRT only after the recovery level to which they apply has completed

its attempt to return the ring to service. Technicians may infer, however, that ahigh-level recovery attempt is underway from previous output messages indicating

failed recovery attempts at lower levels, as well as from the blinking of the ``notoken'' lights on the circuit packs of all ring nodes, indicating that tests are

occurring.

The output messages concerning each ring recovery attempt will usually consistof the following items of information in the order shown:

1. A REPT RING CFR message announcing a specific level of EAR recoveryattempt.

2. If the attempt was successful, a REPT RING CFR message indicating thatthe ring has been configured and is identifying the new ring structure.

3. If the attempt was unsuccessful, an REPT RING CFR message indicatingthe reason for failure.

4. Separate REPT RING TRANSPORT ERR messages identifying each errorthat was received by the 3B21D in response to the fault that gave rise to

the recovery attempt.

Notice that REPT RING TRANSPORT ERR messages ordinarily appear on theMCRT and ROP following the REPT RING CFR messages to which they apply.Yet, because each of these message types is stamped in milliseconds by the real-

time clock, it is possible to confirm their relations. The real-time stamp on a REPTRING CFR message indicates the completion time of the attempt being reported.

The real-time stamp on a REPT RING TRANSPORT ERR message indicates the

time the report arrived at the 3B21D from a ring node. Remembering that, afterreceiving a ring transport error report that may lead to node isolation, the 3B21Dobserves a listening period of 100 milliseconds before analyzing its reports andacting upon them, technicians can reconstruct system events.

One exception exists to the rule that the 3B21D queues error messages until the

completion of the recovery attempt to which they give rise. If the 3B21D receives aloss-of-token report, then waits the 100-millisecond listening period without

receiving another error report, it immediately reports REPT RING TRANSPORTERR/UNEXPLAINED LOSS OF TOKEN to the MCRT and ROP before jumping to

a level-3 recovery attempt. Therefore, in this single case the 3B21D reports eventsin the order of their occurrence. There is no time stamp on messages announcingloss of token.

Though quarantining a node reconfigures the r ing, it is not accomplished by the

ring config module and, therefore, produces no REPT RING CFR outputmessage. Instead, technicians learn that a node has become quarantined from

7/29/2019 172254

Ring Maintenance

RMV RPCN or RMV IUN output messages and from indicators on display pages.

Also, when a node experiences a fault that leads to quarantine, it attempts to senda message to the 3B21D identifying the type of error that occurred. Currently EAR

does not use the message for fault analysis. It does, however, report the error on

the MCRT and ROP in the second line of a REPT ERROR output message. In theevent of an intractable problem, technicians should record and report this line. The

line will indicate, among other matters, whether the error was soft (requiring nosystem action), firm (requiring a restart), or hard (requiring a repump of the node

software).

ARR or Deferrable Node Recovery

Fundamental to the recovery strategy of automatic ring maintenance is thecomplementary action of ARR to EAR software. When EAR reconfigures a

suspected fault out of the ring, either by quarantining or isolating a node, ARRassumes its responsibility of either returning the node to service or, if it

determines that the node should not be returned to service, of directingtechnicians to repair or replace its faulty equipment and then returning it to servicemanually. ARR determines not to return a node to service when it has failed

diagnostics or when it has become a chronic problem. After either of these events,ARR immediately surrenders control of the node to technicians whose

responsibility it becomes to perform maintenance on it manually.

Overview of ARR Treatment of Out-of-Service Nodes

ARR can return nodes to service by restarting or restoring them. The two methodsare achieved under different circumstances and according to different rules.

Node restarts can occur only when a node has quarantined itself. Upon detectingan error in its node processor or in an auxiliary component, a node in the activering attempts to quarantine itself. It then, in response to most error-types, runs an

internal audit to test the integrity of its node-processor operational code and, if theaudit passes, attempts—with the assistance of the 3B21D—to restart itself. (If thenode is an extended IUN, it will audit the operational code of the attached

processor as well.) A restart is done without downloading code. Rather, the nodefinds a safe place in its current code and places it in execution. A successful

restart results in the node being returned to service almost immediately.6 On theother hand, if a node with a faulty node processor or auxiliary component is

unable to detect internal faults, unable to quarantine itself, unable to pass an

6 In response to a few error-types, however, a self-quarantined node does not attempt torestart itself but waits for the 3B21D to detect its state and to return it to service byrestoring it in the manner described below.

7/29/2019 172254

401-661-045

internal audit, or unable to restart after one attempt, the 3B21D will detect its

disabled condition, and if it is not already quarantined, quarantine it. Then ARR inthe 3B21D will restore the node to service.

ARR restores a node by downloading it with new operational code and placing thecode into execution. Nodes may be restored either unconditionally without being

previously diagnosed or conditionally by having their return to service depend ontheir passing all automatically-run diagnostic tests.

Maintenance States

ARR is driven to do its work by system indicators called IMS maintenance states .Maintenance states identify the operational mode of the r ing and the operationalmode, functionality, and condition of each ring node. They are determined and

announced by programs in the 3B21D, mainly by EAR software.

In addition to driving ARR to do its work, maintenance states serve as a primarysource of system information for IMS users and for technicians who should always

consult them before taking any manual action. Technicians may learn of currentmaintenance states from the IMS 1106 display page or from the OP:RINGcommand. They should keep in mind that because maintenance states represent

the central processor's knowledge of a distributed system, this knowledge undercertain conditions may be temporarily incorrect. A node processor, for example, is

allowed to quarantine itself if it detects certain irregularities in its software, but the3B21D may not learn of this change of state until it has conducted a node audit or

received a source match error.

The following are the different classes of maintenance states:

s Ring state

s Node major state

s Node minor state: ring position

s Node minor state: ring interface

s Node minor state: node processor

s Node minor state: maintenance mode.

These states are explained below.

Ring States

The ring state identifies the current operational mode of the ring. The followingstates are possible:

7/29/2019 172254

Ring Maintenance

s Ring Normal - This state represents the two-ring configuration, with one

ring serving as the active path that chiefly transmits user messages andthe other serving as a standby path that may also transmit administrative

and maintenance messages. A normal ring contains no isolated segment,

but it may contain quarantined nodes.s Ring Isolated - In this state the ring contains an isolated segment. The

nodes that bound the isolation are active and are identified as thebeginning-of-isolation (BISO) and the end-of-isolation (EISO) nodes. Any

node, including an RPCN, may act as a BISO or an EISO node. The ringcannot contain more than one isolated segment.

s Ring Restoring - When Ring Restoring appears as a transitory state, itindicates a condition that occurs very briefly during ring reconfiguration.

When Ring Restoring appears as an extended state, it indicates theresponses of automatic maintenance to a failed BISO or EISO node. When

a BISO or EISO node experiences a node-processor failure, critical noderecovery (CNR) software first attempts to conditionally restore it. (Restoral

software knows to run only those diagnostic phases that do not requireisolation.) If the conditional restoral fails, ring config extends the isolatedsegment to include the faulty node. Attending to a failed BISO or EISO

node is the highest priority activity of ARR/CNR.

s Ring Configuring - In this state the ring is initializing, restarting, beingreconfigured to isolate or unisolate one or more nodes, or engaged in oneor more levels of EAR escalative recovery action.

s Ring Down - Chief among conditions that cause the ring to go down arewhen the 3B21D cannot communicate with it through any RPCN or when it

is so fragmented by faults that EAR cannot define an active segment longenough to satisfy the criterion for minimum length. The first condition is

most likely to occur when, in a two-RPCN environment, one RPCN has

been manually taken out of service, after which the other experiences afailure in its 3B interface or duplex dual serial bus selector. During the timethe ring is down, it is possible in some applications of IMS that all IUNs willcontinue to receive and transmit messages on the ring.7 For a fuller

discussion of this matter, see the section ``Responding to Ring Down'' inthis chapter.

Node Major States

The node major state identifies the current operational mode of each node. The

following states are possible:

7 Technicians probably have no way of confirming this to be the case.

7/29/2019 172254

401-661-045

s ACT - Active. An active node is on-line and capable, unless the ring is

silenced or configuring, of performing all required functions. An active nodeis neither quarantined nor isolated. In this document, the expression ``to

return a node to service'' means to give it ACT status.

s OOS - Out of service. An out-of-service node is unavailable for certainuses. The uses depend upon whether the node is quarantined or isolated.If the ring position (see below) of an out-of-service node is NORM, then thenode is quarantined and can propagate messages on the ring, although it

cannot read, write, or otherwise process messages. If the ring position ofan out-of-service node is isolated, the node is entirely excluded from the

active ring. Nodes in either OOS state are ordinarily able to receive andtransmit only maintenance information and instructions.

s STBY - Standby. This designation is used for RPCNs only. It indicates thata healthy RPCN is prevented from doing its work by the circumstance that

the ring is down or configuring. It also appears as a transitional conditionwhen an RPCN is being grown and during system-wide initializations.

s INIT - Initializing. The attached processor of an extended node is beingrestarted or restored. The INIT state occurs as the second stage of

restarting or restoring extended nodes. In the first stage, the nodeprocessor is restarted or, in the case of restorals, downloaded with

operational code and set to executing. In the second or INIT stage, theattached processor is treated similarly. For DLNs the second stage alsoincludes tests of the DMA channel.

s OFL - Off-line. The node is quarantined out-of-service preliminary to beingassigned a role in the active ring. Nodes should not be allowed to remain

long in this condition, because their quarantined state prevents their nodeprocessor from fulfilling its important and unassignable role of error

detection and reporting.

s GROW - Grow. The node is physically being added to or removed from the

ring. During growth or degrowth, the node must always be isolated.

s UNEQ - Unequipped. Either the unequipped node has no hardware, or ring

connections physically bypass it. Still, a place holder for the node exists inIMS software.

Node Minor States: Ring Position

The ring position of each node indicates its function within the current structure of

the ring. The following are the four possible ring positions.

s NORM - Normal. The node is included in the active ring and is neither a

BISO nor an EISO node. A node in the NORM state may be quarantined; ifit is quarantined, its node major state will be OOS or OFL.

s BISO - The node is included in the active segment of an isolated ring andbounds the beginning of the isolated segment.

7/29/2019 172254

Ring Maintenance

s EISO - The node is included in the active segment of an isolated ring and

bounds the ending of the isolated segment.

s ISOL - Isolated: The node is contained in the isolated segment of an

isolated ring. Its node major state will be OOS or OFL.

Node Minor States: Ring Interface

This state characterizes for each node the current condition of its ring interface.

s USBL - Usable. This is the default state. In other words, IMS regards

ring-interface hardware as usable unless it has received an error message,a diagnostic result, or has detected a ring condition indicating otherwise.

s QUSBL - Quarantine-usable, that is, usable by the ring to propagate databut not usable by the node processor, which is insulated from the ring as in

the quarantine (OOS NORM) state. IMS sets ring-interface hardware of anynode to QUSBL when diagnostics find or suspects a fault in the ring

interface that does not prevent it from propagating messages on the ring. Anode that fails only diagnostic phase 10, for example, would be set to

QUSBL. When, under these circumstances, a ring interface is set toQUSBL, IMS unisolates the node if possible, quarantines it, and changesits maintenance mode (see below) to manual. Before performing

diagnostics or other maintenance functions on the ring interface of thenode, however, the node must be isolated.

IMS sets the ring interface of an IUN to QUSBL and the node processor toFLTY when, during a level-4 initialization, the node fails a communication

test of its ability to receive downloaded code. If this occurs, the ring willreturn to service with the node in question quarantined and in the

automatic maintenance mode.

IMS sets the ring interface of a node to QUSBL as a way of unisolating anode that is suspected of being faulty but that, as a member of an isolatedsegment, has passed phases 1 and 2 diagnostics without being subjected

to further diagnostic phases.

s FLTY - Faulty. The 3B21D has received information indicating that the

ring-interface hardware is faulty. Thus the node is, or is about to be,isolated.

s UNTSTD - Untested. The minor states of nodes are maintained in corememory only, not on disk or in ECD. Therefore, during a level 3 or level 4

initialization, the system loses knowledge of the ring-interface states ofout-of-service nodes and must retest them. The testing is done during

initialization, during which time their ring-interface states will briefly be

UNTSTD.

7/29/2019 172254

401-661-045

Node Minor States: Node Processor

This state characterizes for each node the condition of the node processor and/or

of the auxiliary components.

s USBL - Usable. This is the default state. In other words, IMS regards nodeprocessors and auxiliary components as usable unless it has received anerror message, a diagnostic result, or has detected a ring condition

indicating otherwise.

s FLTY - Faulty The node processor and/or one or more auxiliary

components is known or suspected to be faulty. The 3B21D sets thenode-processor state to FLTY when it receives error messages implicating

the node processor or an auxiliary component. It also sets the state toFLTY when it learns that a node has quarantined itself. Nodes ordinarily

quarantine themselves when they detect a problem in their nodeprocessors or in an auxiliary component. Thus the node-processor FLTY

state does not necessarily mean that a problem is in the node processor. It

could be in the node processor or in any of the auxiliary components of thenode.

s UNTSTD - Untested. Node minor states are maintained in current memoryonly, not on disk or in ECD. Therefore, during a level-3 or level-4 ring

initialization, the system loses knowledge of the node-processor states ofout-of-service nodes and must retest them. The testing is done during

initialization, during which time their node-processor states will briefly beuntested.8

Node Minor States: Maintenance Mode

The maintenance mode of a node is always either automatic or manual.

s AUTO - Automatic. In this mode a node is under control of IMS software.Nodes in the ACT state are always under automatic control. Nodes in the

OOS state are under automatic control as long as ARR software is actingupon them.

s MAN - Manual. This mode indicates that an out-of-service node is underthe control of technicians. Control will change to manual because of the

following:

8 If, during ring initialization, a fault occurs requiring an isolation that includes innocent

victim nodes, the node-processor hardware of the innocent victims might not have beentested before the isolation occurred and could not be tested during the isolation. In thiscase, the innocent victims would be quarantined, their ring-interface states set to usable,and their node-processor states set to untested. Then, when the isolation is dissolved,ARR, assuming that UNTSTD equals USBL, returns the nodes to service in accordancewith its standard algorithm which is explained below.

7/29/2019 172254

401-661-045

Three ARR Rules

In attempting to restore out-of-service nodes, ARR observes the following threerules:

s Restoral priorities rule

s One-restoral-at-a-time rule

s Fourth-time rule

Procedure 3-1. Restoral Priorities Rule

If several nodes are simultaneously out-of-service and still under automaticcontrol, ARR acts to restore them in the order shown below:

1. Inactive BISO and EISO nodes

2. Nodes whose ring-interface state is FLTY (isolated) (In 3.4 and later generics,

application-nominated critical nodes with faulty ring-interfaces are restored before

other nodes with faulty ring interfaces.)

3. Innocent victim RPCNs (isolated)

4. Application-nominated critical nodes with high priority (quarantined)

5. Other RPCNs (quarantined)

Faulty NP orauxiliary com-

ponent and

faulty RI

Isolate thenode

OOS ISOL FLTY FLTY AUTO

Needed to

begin an isola-

Configure as

BISO node

ACT BISO USBL USBL AUTO

Needed to end

an isolation

Configure as

EISO node

ACT EISO USBL USBL AUTO

Untested NP Quarantine the

OOS NORM USBL UNTSTD AUTO

Table 3-1. Node Problems Mapped to Maintenance States and EAR Actions (Page 2 of 2)

NODEPROBLEM

EARACTION

NODESTATE

RINGPOSITION

RISTATE

NPSTATE

MAINT.MODE

7/29/2019 172254

Ring Maintenance

6. Application-nominated critical nodes with low priority (quarantined)

7. Innocent victim IUNs (isolated)

8. Other IUNs (quarantined)

Nodes awaiting ARR restoral efforts may be contained in the active ring segment;

or they may be contained in, or as BISO and EISO nodes associated with, theisolated segment. Because ARR's highest priority is to dissolve isolations, it deals

first with nodes contained in or associated with an isolated segment. First, itattempts to return to service any node that has become inactive after being

designated a BISO or EISO node.9 Next, it attempts to restore nodes that, byvirtue of having faulty ring interfaces, are responsible for the isolation. Then, itrestores healthy nodes that were victims of the isolation. Finally, having dissolved

the isolation by restoring all isolated nodes, ARR turns to restore any quarantinednodes. The restoral priority list does not apply to node restarts, however, which

occur independent of, and may occur in parallel with, node restorals.

The One-Restoral-at-a-Time Rule

When ARR undertakes to restore a node, whether conditionally or unconditionally,it cannot begin to restore another until any current restoral effort is completed or

terminated. To conditionally restore a node, ARR must request that the RTRMaintenance Input Request Administrator (MIRA) do the job.10 To unconditionally

restore a node, ARR does not use MIRA but performs the work itself.

Application-Nominated Critical Nodes. The rule that ARR cannot begin to restorea node until its previous restoral attempt completes has one exception. When an

application-nominated critical node requires restoral, ARR aborts an ongoing

restoral attempt in favor of the critical node, provided that the critical node ishigher on the restoral priority list than then node currently being restored.Application-nominated critical nodes occupy the fourth and sixth positions on the

The Fourth-Time Rule

To prevent a transient problem from repeatedly disrupting the ring, ARR keeps aleaky-bucket count of the number of times it has restored a node to service. If,

within a 60-minute interval, ARR has restored a node to service three times and isthen called upon to restore it a fourth, it refuses to do so. Instead, it leaves it

9 These are termed IMS critical nodes. Their recovery efforts go by the special title criticalnode recovery (CNR), a title that may appear on IMS display pages.

10 Technicians may learn of the status of IMS requests at MIRA from the RTR OP:DMQcommand, as well as from IMS 1105 and 1106 display pages, which are discussed in thethis chapter.

7/29/2019 172254

401-661-045

OOS NORM USBL FLTY 1st or 2nd time inhour

pump &return to

service

3rd time in hour isolate &

diagnose

(pass)

pump &

return to

service

isolate &

diagnose

(fail)

manual

mainte-

4th time in hour manual

mainte-

OOS NORM USBL UNTSTD n/a pump &

return to

service

OOS NORM QUSBL FLTY n/a isolate &

diagnose

(pass)

pump &

return to

service

isolate &

diagnose

(fail)

manual

mainte-

OOS NORM USBL FLTY extended node isolate &

diagnose

(pass)

pump &

return to

service

isolate &

diagnose

(fail)

manual

mainte-

OOS ISOL FLTY USBL 1st, 2nd or 3rd time

in hour

isolate &

diagnose

(pass)

pump &

return to

service

isolate &

diagnose

(fail)

manual

mainte-

mainte-nance

Table 3-2. ARR Responses to Maintenance-States (Page 2 of 3)

NODESTATE

POSITION RISTATE

NPSTATE

CIRCUMSTANCE ARRACTION 1

ARRACTION 2

7/29/2019 172254

Ring Maintenance

ARR Recovery Intervals and Output Messages

ARR activities are reflected in the status information provided by the IMS 1105and 1106 display pages which are described in the next chapter of this document.In addition, results of ARR actions are reported by the following output messages.

OOS ISOL FLTY FLTY 1st. 2nd or 3rd timein hour

isolate &diagnose

(pass)

pump &return to

service

isolate &

diagnose

(fail)

manual

mainte-

OOS ISOL USBL FLTY n/a quarantine manual

mainte-

OOS ISOL USBL USBL isolation ends pump &

return to

service

ACT BISO USBL USBL isolation ends chg. BISO

to NORM

ACT EISO USBL USBL isolation ends chg. EISO

to NORM

Table 3-3. Output Messages that Report ARR Actions

ARR ACTION OR RESULT OUTPUT MESSAGE

Request to quarantine an RPCN RMV RPCN...

Request to quarantine an IUN RMV IUN...

Request to diagnose an RPCN DGN RPCN...

Request to diagnose on an IUN DGN IUN...

Request to diagnose and restore an IUN to

service

RST IUN...

Table 3-2. ARR Responses to Maintenance-States (Page 3 of 3)

NODESTATE

POSITION RISTATE

NPSTATE

CIRCUMSTANCE ARRACTION 1

ARRACTION 2

7/29/2019 172254

401-661-045

The time taken by ARR to return a node to service varies considerably, dependingon such factors as the type of restoral and the number of jobs waiting in MIRA's

queue. An unconditional restoral usually takes 30 to 90 seconds. A full andsuccessful diagnosis of a basic IUN or RPCN may take 5 to 8 minutes, while a

failing diagnosis usually takes somewhat longer. Diagnosis of an extended nodetakes longer still, perhaps as much as 15 minutes.

Request to diagnose and restore an RPCN

to service

RST RPCN...

Abortion of a diagnostics request because

of an error

DGN:AUDIT:RING...

Outcome of a request to reconfigure the

REPT RING CFR

Abortion of an IUN pump REPT IUN PUMP...

Failure of an IUN restore REPT IUN RST...

Failure of RPCN initialization during a

restore or restart

REPT RPC INIT...

Start of an ARR recovery attempt REPT ARR AUTORSTa b FOR c STARTED

Success of an ARR recovery attempt REPT ARR AUTORST

a b FOR c SUCCEEDED

Failure of a diagnostic phase REPT ARR AUTORST

a b FOR c FAILED

Abortion of a diagnostic request REPT ARR AUTORST

a b FOR c ABORTED

Violation of the fourth-time rule REPT ARR AUTORST

RECOVERY THRESHOLD EXCEEDED FOR c

Time out of a restoral request REPT ARR AUTORST

TIMEOUT AWAITING MIRA FOR c

Inhibition of a restoral request REPT ARR AUTORST

a b FOR c STOPPED <INHIBITED>

Table 3-3. Output Messages that Report ARR Actions

ARR ACTION OR RESULT OUTPUT MESSAGE

7/29/2019 172254

Ring Maintenance

Manual Ring Maintenance

This chapter explains tools and procedures used in manual ring maintenance and

offers suggestions to technicians for solving hard problems and avoiding easymistakes.

Ring Maintenance Interfaces

Technicians who maintain the r ing are supported in their responsibilities by

various maintenance interfaces. The maintenance CRT terminal (MCRT) providesan interactive interface that outputs IMS and other system messages and status

information while accepting as inputs IMS and other system commands. IMS inputand output messages will be recorded on the maintenance read only printer(ROP), if it is turned on. In addition, various audible and visual alarms act to alert

technicians to important IMS events. These maintenance interfaces as they

pertain to IMS are explained below.

Alarms

The following alarms indicate trouble that may affect IMS equipment:

Critical Alarms

A critical condition or fault in or associated with the IMS ring will be indicated by anasterisk C (*C) preceding the ROP output message that identifies the problem. It

may also be indicated by an audible alarm and a red CRITICAL indicator on eachMCRT display-page header.

Major Alarms

A major condition or fault in the IMS ring is indicated by two asterisks (**)

preceding the ROP output message that identifies the problem. It may also beindicated by the following:

s An audible alarm

s A red MAJOR indicator on each MCRT display-page header, and

s A red lamp on the aisle containing the frame/cabinet where the fault orfailure occurred.

See the “Special IMS Indicators'' section in this chapter for descriptions of other

indicators that may appear with a major alarm.

7/29/2019 172254

401-661-045

If a major alarm is caused by a power failure, the POWER indicator on each

MCRT display-page header will show red, and display page 1111 will identify thetype and location of the problem. If the problem is a failed power converter circuit

pack in an IMS frame/cabinet, the lamp at the aisle containing the disabled frame/

cabinet will show red, and inside the frame/cabinet the power alarm light at thetop-left will show red also.

Minor Alarms

A minor condition or fault in the IMS ring is indicated by one asterisk (*) preceding

the ROP output message that identifies the problem. It may also be indicated bythe following:

s An audible alarm

s A red MINOR indicator on each MCRT display-page header, and

s A yellow lamp on the aisle containing the frame/cabinet where the fault or

failure occurred.

See “Special IMS Indicators'' below for descriptions of other indicators that mayappear with a minor alarm.

If a minor alarm is caused by a power failure, the POWER indicator on eachMCRT display-page header will show red, and display page 1111 will identify the

type and location of the problem. If the problem is a single failed fan in an IMSframe/cabinet, the lamp at the aisle containing the disabled frame/cabinet will

show yellow, and inside the frame/cabinet the power alarm light at the top-left willshow red.

Special IMS Indicators

A ring-quarantine (RQ) LED is located on IRN circuit packs. When the RQ LED

shows red, it indicates that the node containing the circuit pack is quarantinedfrom the ring.

A no-token (NT) LED is located on IRN circuit packs. The chief purpose of the NTLED is to indicate, by lighting red, when the node is isolated. The NT LED

mechanism works by detecting the absence of token messages. The ringinterfaces in IRNs, however, cannot make this distinction; so, during periods when

diagnostic are occurring, their NT LEDs will blink off and on as test messagespass. At other times, however, IRN NT LEDs on isolated nodes will show constantred. In addition, when all NT LEDs, of whatever type, in the ring are lighted, the

ring is down.

Each circuit pack in the ring application processors (RAPs) of CDN-1 is equippedwith an LED that indicates when the pack has failed a diagnostic phase. Some of

these LEDs also turn on when the RAP is initializing and then turn off when

7/29/2019 172254

Ring Maintenance

initialization tests confirm that the firmware within the pack is executing. The

nature and uses of these LEDs are explained in the section ``Ring ApplicationProcessor Critical Maintenance Procedure.''

The application-processor circuit pack in a direct link node (DLN) is equipped withgreen, red, and yellow LEDs. The green stays on during normal operation and

goes off when the node is taken out-of-service, when a hard panic occurs in thenode processor, or when diagnostic code begins to be downloaded, whichever

occurs first. The red and yellow LEDs come into play as either diagnostic oroperational code is downloaded. Diagnostic phase 41 begins with a firmware test.

During the test the red and yellow LEDs come on and stay on permanently if thetest fails. If the test passes, the red goes off briefly, then joins the yellow back onagain as the diagnostic proper begins. If the diagnostic fails, the yellow goes off

and the red stays on. If the diagnostic passes, the red goes off and the yellowstays on until the node processor receives the diagnostic results, at which time it

goes off. Then red and yellow come on and go off again as operational code isdownloaded, and the green comes on as the attached processor is placed in

execution. If technicians wish to consult support about the performance of a DLN,they might first observe the behavior of these LEDs so they can report it.

Output messages on the ROP are preceded, when appropriate, by an M or an A,indicating that the action described in the message is the result of a manual or an

automatic IMS request. Table 3-4 on page 3-27 shows the IMS output messagesaccompanied by the types of alarms.

Table 3-4. Alarms Associated with IMS Output Messages (Page 1 of 2)

MESSAGESEVERITY

CRT MAJ MIN

REPT DB INIT X

REPT ERROR X X X

REPT IMSDRV AUD X

REPT IMSDRV FLT X

REPT IMSDRV INIT X X

REPT IUN X

REPT MSDC FLT X

REPT OP_RTM FLT X

REPT PSDO_UMS>P FLT X

REPT RING GROWTH X

REPT RING INIT X X

7/29/2019 172254

401-661-045

Other IMS output messages are not accompanied by audible or visual alarms.

Display Pages

IMS provides technicians with two MCRT display pages, page 1105, the RingStatus Summary Page, and page 1106, the Ring Node Status Page. These pages

are similar in appearance and function to RTR display pages, and the procedureused to access them is also the same. The first three lines of the IMS pages,consisting of the standard header information that appears on all RTR display

pages, are omitted from the illustrations that follow. For more information onStatus Display Page(s), see 410-610-160, The FLEXENT™/AUTOPLEX ®

Wireless Networks, Executive Cellular Processor (ECP) Operations,Administration, and Maintenance Guide.

To access a particular display page, perform the following actions in the orderindicated.

1. Type the NORM/DISP key.

2. Place the MCRT in the command mode by typing the CMD/MSG key.

3. Type and enter 1105 or 1106 on the numeric key pad.

During ring initialization and configuration, indicators or data shown on display

pages may be invalid or out of date; and during disk independent operation, thedisplay page process is terminated.

The Ring Status Summary Page

The 1105 display page provides status information about the entire IMS ring.

Figure 3-1 is typical of an 1105 page for small IMS offices.

REPT RING TRANSPORT ERR X

REPT TDTP FLT X

AUD CNC X

AUD NODEST X

Table 3-4. Alarms Associated with IMS Output Messages (Page 2 of 2)

MESSAGESEVERITY

CRT MAJ MIN

7/29/2019 172254

Ring Maintenance

Figure 3-1. A 1105 Display Page

The 1105 page, as exemplified in the above figure, offers the following informationand capabilities: The first line contains, on the left, the CMD> prompt for command

entries and, on the right, the page title. To enter display commands, move thecursor to the CMD> prompt by typing the CMD/MSG key, then enter the command.

The next three lines identify, in square brackets, locations on the page where thetypes of information, shown within the square brackets, will appear, when

appropriate. The brackets themselves will not appear on display pages.

s [Ring Major State] appears at the location where the current ring

state will be displayed. One of the following states should always bepresent:

RING STATE ACTIVE

RING STAT ISOLATED SEGMENT

RING STAT CONFIGURING

RING STAT DOWN

RING STAT RESTORE

s [Ring Error Threshold State] is the location where a message willappear when the Ring Error Threshold has been exceeded. The thresholdis set by the user to indicate the number of faults per interval of time to bepermitted before the IMS practice of responding initially to ring-related

faults with EAR level-0 (restarting the ring) is discontinued and replaced by

CMD> -- 1105 RING STATUS SUMMARY --

[Ring Major State] [Ring Error Threshold State] CMD Function

400 OP Ring Detailed[ARR Restore; System Indicator; IMSRTS.P indicator]

[ARR Restart] [ACNR Restore or Restart]

00AAAOAAAiigAOO... 01.AAAAOOAA...AAAA 02.AAAAAAAAA...AAA

32AAAAAAAAOOOAAA.. 33.AAAAAAAAAAAAAAA 34.AOOOOOAAAAAAAAA

7/29/2019 172254

401-661-045

EAR level-1 (isolating the fault) or, in response to unexplained loss of

token, by EAR level-3 (ring continuity testing). After the threshold isexceeded, an error-free period of time the length of the threshold interval is

required before IMS returns to its normal practice concerning ring restarts.

When IMS returns to its normal practice, the Ring Error Threshold Exceeded tag will disappear from the 1105 page, and the location will be

blank.

s The information CMD Function/400 OP Ring Detailed appears

permanently on the 1105 page to remind technicians that the page alsoallows entry, at the CMD> prompt, of the 400 command, which produces the

same output as the input message OP:RING;DETD.

s [ARR Restore; System Indicator; imsrts.p Indicator]

appears at the location where a, b, or c, below, will appear:

— A node that ARR is currently attempting to restore, conditionally orunconditionally. The identification will read ARR followed by themethod of restoral (UCL for unconditional, COND for conditional)

followed by the node name in the form NODEa b. If ARR isattempting to restore an EISO or BISO node (see "Three ARR

Rules'' above), CNR will appear in place of ARR .

— One of the following system states of IMS:

s IMS FPI PROLOGUE (appears during the initial stage of anFPI initialization)

s IMS SYS BOOT (appears during the initial stage of level-3 or-4 BOOT initialization)

s IMS LVL3 INIT (appears during subsequent stages of a

level-3 initialization)

s IMS LVL4 INIT (appears during subsequent stages of alevel-4 initialization)

s IMS SYS CRIT SEQ CMPL (appears at the conclusion of a

level-3 or -4 FPI or BOOT initialization)

s IMS SYS ABORT (appears prior to a level-3 or level-4 BOOT

initialization)

s IMSRTS.P CREATED (see below)

— One of the following states of the imsrts.p process, which creates

the IMS display pages:

s IMSRTS.P DIED

s IMSRTS.P CREATED

If ARR is not currently attempting to restore a node and none of the system

or IMSRTS.P conditions exist, the location will be blank.

7/29/2019 172254

Ring Maintenance

s [ARR Restart] appears at the location where any node (other than an

application-nominated critical node) that ARR is currently attempting torestart will be identified. Node restarts that are initiated locally by the node

processor are not recognized nor recorded by this indicator.

s [ACNR Restore or Restart] appears at the location where anyapplication-nominated critical node (see ``Three ARR Rules'' above) thatARR is currently attempting to restore or restart will be identified.

s Because one ARR restart and one ACNR restart may occur in parallel andbecause one or both restarts may occur in parallel with a single restore, it is

possible to have all three node-activity indicators lighted simultaneously. Itis not, however, possible to have two restorals occurring simultaneously,since IMS can restore only one node at a time (see "Three ARR Rules''

above).

The next section of the display page, beginning in the above example with the fifthline, identifies all frames/cabinets in the IMS system, each node within each

frame/cabinet, and the major state of each node. The nodes that occupy a frame/ cabinet are called a group. The example shows six groups identified by their groupnumbers as 00, 01, 02, 32, 33, and 34. To the right of the group numbers are

characters representing the sixteen nodes or node positions within each group.Thus the first character represents the RPCN, and the next fifteen characters

represent IUNs. In the IMS numbering scheme, nodes are identified by theformula RPCNa b or IUNa b, where a is the two-digit group number and b is a

number between 00 and 15 that corresponds to the sequential location of thenode within its group on the downstream path of ring 0. Thus RPCNs are alwaysnumbered 00 and IUNs are always numbered 01 to 15.

The characters also identify, in accordance with the following formulas, the current

major state of each of the sixteen nodes. See Table 3-5 on page 3-31.

Table 3-5. 1105-Page Symbols of Node Major States

Active A

Standby s or S

Out of service, quarantined O

Out of service, isolated i

Grow g or G

Offline f or F

Unequipped . or blank space

Initializing b or B

7/29/2019 172254

401-661-045

In the instances that provide an alternative of an upper- or a lower-case letter, the

lower-case signifies that the node is isolated, and the upper-case signifies that thenode is in the active ring. In the example of an 1105 page above:

s RPCN00 00 is in the active node major state

s LN00 01 and LN00 02 are also active

s LN00 03 is out-of-service quarantined

s LN00 04, LN00 05, and LN00 06 are active

s LN00 07 and LN00 08 are out-of-service isolated

s LN00 09 is in the grow state and is isolated

s LN00 10 is active

s LN00 11 and LN00 12 are out-of-service quarantined, and

s LN00 13, LN00 14, and LN00 15 are unequipped12

The Ring Node Status Page

The 1106 display page provides status information about, and a commandinterface for, a technician-specified group of nodes. Figure 3-2 is typical of an

1106 page.

12 When a group contains any out-of-service nodes, IMS color-codes the entire group withred background on white lettering. For additional information on the node and ringmaintenance states, refer to the ``ARR or Deferrable Node Recovery” section of thischapter.

7/29/2019 172254

Ring Maintenance

Figure 3-2. An 1106 Display Page

The 1106 page is composed of three areas. The area to the right, beginning with

and including the column of line numbers 01 through 16, displays the major andminor states of a group of up to sixteen technician-specified nodes. In thisdocument, this is called the display area. The area at the top left beginning CMD>

and ending ACNR Restore or Restart is the command-interface andsystem-status area. In this document, this is called the command area. The area

below the command area and to the left of the column of line numbers is anonselectable command menu. In this document, this is called the menu area.

The Menu Area. Entries in the CMS column of the menu area list the input formsfor commands identified under the FUNCTION column. These commands may betyped and entered at the CMD> prompt. The xx in the first, second, seventh, and

ninth commands represent a line number—not a node number—from the columnof numbers, beginning 01 and ending 16, at the center of the page. Each line

number is associated with the node to its right. In the above example, line 02represents IUN00 01; and to quarantine IUN00 01, a technician would enter 202

at the CMD> prompt. By contrast, the nn in the next-to-the-last commandrepresents not a line number but a group number. In the above example, to havethe nodes contained in group 32 displayed, a technician would enter 632. Below is

a listing of the results obtained from entering these 3-digit commands:2xx Quarantines the node identified on line xx.

3xx Unconditionally restores the node identified on line xx.

CMD> -- 1106 - RING NODE STATUS --

NODE> RING MAJOR RI NP MAINT

[Ring Status] NODE NAME POS STATE STATE STATE MODE[ARR Restore, etc.] 01 RPCN00 00 NORM ACT USBL USBL AUTO

[ARR Restart] 02 LN00 01 NORM ACT USBL USBL AUTO

[ACNR Restore or Restart] 03 LN00 02 BISO ACT USBL USBL AUTO

CMS FUNCTION 04 LN00 03 ISO OOS FLTY USBL MAN

2xx RMV node (line xx) 05 LN00 04 ISO OOS FLTY USBL MAN

3xx RST node (line xx)(UCL) 06 LN00 09 EISO ACT USBL USBL AUTO

400 BISO-EISO 07 LN00 14 NORM OOS USBL FLTY AUTO

401/402all non-ACT(next/prev) 08 LN00 15 NORM ACT USBL USBL AUTO

403/404 all Equipped(next/prev) 09

500 DGN Isolated Segment 10

5xx DGN node (line xx) 11

6nn Group nn 12

7xx RST node (line xx)(COND)13

TOTAL 15

7/29/2019 172254

401-661-045

400 Displays, if the ring has an isolated segment, currently isolated

nodes preceded by the BISO node and followed by the EISOnode. If the isolated segment is greater than 14 nodes, the

display will list first the BISO node, then the first seven isolated

nodes downstream of the BISO node, then the last seven isolatednodes upstream of the EISO node, then the EISO node. It can be

recognized from the Total line below the menu area that a portionof an isolated segment is missing (because the isolation contains

more than 14 nodes). After the 400 command is entered, thisdisplays a number that includes all currently isolated nodes plus

the BISO and EISO nodes. The count on the Total line updatesinteractively.

401 Initially provides in the display area a list of nodes in the ring thatare neither active nor unequipped. Thus it lists any nodes that are

in the out-of-service, standby, initializing, and grow states. Afterthe 401 command is entered, the total number of nonactive nodeswill be given on the Total line below the menu area and updated

interactively. If this number is greater than 16, technicians maypage forward and backward in the list by reentering 401 and 402,

respectively.

403 Entered the first time provides a list of nodes in the ring that are

equipped. Thus it lists all nodes that are in the active,out-of-service, standby, initializing, and grow states. After the 403

command is entered, the total number of equipped nodes will begiven on the Total line below the menu area and updatedinteractively. If this number is greater than 16, technicians may

page forward and backward in the list by reentering 403 and 404,respectively.

500 Runs diagnostic phases 1 and 2 on all RACs in the isolated ringsegment.

5xx Runs all automatic diagnostic phases on the node identified atline xx.

6nn Displays all equipped nodes in group nn, where nn is not the line

number but the group number. After the 6nn command is entered,the total number of equipped nodes within the group will be givenon the Total line below the menu area and updated interactively.

7xx Conditionally restores the node identified on line xx.

The Command Area. CMD> is the prompt for any of the 3-character commandslisted in the command menu. Entering a valid command here evokes an OKresponse. Entering an invalid command evokes an NG response. To enter a

command, manipulate the cursor with the CMD/MSG key until it is at the prompt.

7/29/2019 172254

Ring Maintenance

Then type and enter a 3-character command from the CMS column of the menu

area. The prompt also accepts as input display-page numbers to which thetechnician wishes to turn.

Node> is the prompt for a command that allows technicians to select thesequence of nodes displayed, after having entered a 401 or 403 command. To

employ this feature, enter 401 or 403, manipulate the cursor with the arrow keys tothe Node> prompt, and then type and enter the identification, in the form IUNa b

or RPCNa b, of the node you wish to form the starting point of the sequence. Thedisplay will be redrawn with the specified node as the last entry in the 401 display

and as the first entry in the 403 display. This feature is not available for the 400and 6nn commands where its reordering might be confusing.

[Ring Status] appears at the location where the current ring state will bedisplayed. One of the following states should always be present:

RING STATE ACTIVE

RING STAT RESTORING

RING STAT CONFIGURING

RING STAT DOWN

[ARR Restore, etc] [ARR Restart] [ACNR Restore or Restart]

provide the same information as they do for the 1105 display page, as explainedabove.

Because one ARR restart and one ACNR restart may occur in parallel and

because one or both restarts may occur in parallel with a single restore, it is

possible to have all three node-activity indicators appear simultaneously. It is notpossible, however, to have two restorals appear simultaneously, since IMS can

restore only one node at a time (see "Three ARR Rules'' above).

The Display Area. The display area lists up to 16 nodes and identifies their major

and minor maintenance states. Node major and minor states are explained abovein the ``ARR or Deferrable Node Recovery'' section of this chapter. A listing of the

maintenance states follows:

s Node Major States

— ACT - Active

— OOS - Out of service

— STBY - Standby

— INIT - Initializing

— OFL - Off-line

7/29/2019 172254

401-661-045

— GROW - Grow

— UNEQ - Unequipped

s Node Minor States: Ring Position

— NORM - Normal

— BISO - Beginning of Isolation

— EISO - End of Isolation

— ISOL - Isolated

s Node Minor States: ring interface

— USBL - Usable

— QUSBL - Quarantine-usable

— FLTY - Faulty

— UNTSTD - Untested

s Node Minor States: node processor

— USBL - Usable

— FLTY - Faulty

— UNTSTD - Untested

s Node Minor States: Maintenance Mode

— AUTO - Automatic

— MAN - Manual

Nodes may be added to 401 and 403 displays by manipulating the cursor to any

vacant line in the display and typing and entering a node name in the form LNa bor RPCNa b. The display will provide status information for the node and alsodisplay the line number in reverse video, indicating its special status. The specialstatus node will disappear when a new command is entered at the CMD> prompt.

Prior to that time the node may be deleted manually by manipulating the cursor tothe line and then typing only the RETURN key.

Ring Diagnostics

IMS provides diagnostic tests for all circuit packs that reside in the ring nodeframes/cabinets except power supplies. These tests are submitted as requests toMIRA and performed in a manner similar to standard RTR diagnostics. They may

be initiated automatically by ARR or manually by technicians through inputmessages or display-page commands.

7/29/2019 172254

Ring Maintenance

Each IMS node-type is tested by a distinct diagnostic routine; each diagnostic

routine is composed of units of sequential execution called phases; and eachphase tests functionally-related hardware. Phases are automatic or optional

(available on demand). Automatic phases are executed when a diagnostic is run

at the request of ARR or in response to a manual request without the PH option.Optional phases are executed only in response to manual requests in which they

are specified in the PH option.

Phases are identified by the node-type on which they are executed and by phasenumbers. Node-types are further distinguished by their hardware composition.

The currently available node-types are IRN RPCNs, IRN2 RPCNs, IRN LNs(LIN-E/SS7), IRN LNs (LI4S/SS7), IRN DLNEs, IRN DLN30s, IRN CDN-Is, IRNCDN-IIs, IRN CDN-IIxs, CDN-IIIs, SS7NEs, DLN6os and IRN MDLs. Phase

numbers reflect the relative order in which phases are run within a routine.

Diagnostic phases 1 and 2 are special in two ways. They are common to allnode-types; and when full, automatic diagnostics are requested whether manually

or by ARR on any node (thus requiring that the node be isolated), phases 1 and 2test the entire path within the isolation as a preliminary step to testing thespecified node. Testing the isolated path requires par tial tests of all nodes and

interframe buffers within the isolated segment as well as tests of the isolatedRACs of the EISO and BISO nodes. Running phases 1 and 2 also has the effect

of clearing RAC status registers. RAC status registers may become improperly setas a consequence of a fault, of the node being powered down, or of the RAC

circuit pack being removed or reset.

Phase 40 is a critical juncture in IMS diagnostics. When a diagnostic request

includes only phases above 39, IMS quarantines the node before running thediagnostic phases on it. When, on the other hand, a diagnostic request includes

any phases below 40, IMS attempts to isolate the node prior to running

diagnostics on it. If, however, ring conditions do not permit the node to be isolated,IMS runs all requested phases that do not require the node be isolated while thenode is quarantined. These will include all requested phases above 40 and somerequested phases below 40.

Most IMS diagnostic routines terminate at the end of a phase in which a test fails.

A few terminate at the end of a failing test. Important exceptions to this statementare as follows: If phase 1 or 2 fails in any node-type, all of phases 1 and 2 are still

run. If either or both phases 1 or 2 fails in RPCNs, phases 10 through 27 are stillrun unless a test fails in these upper phases, in which case diagnostics terminate

at the end of the failing upper phase.

Obtaining Diagnostic Results

Included in Appendix B, Ring Maintenance Reference Material , are two groups oftables that provide IMS diagnostic information. Diagnostic Phase Tables, available

for each node type, identify and superficially describe the phases in each routine.

7/29/2019 172254

401-661-045

Diagnostic Fault Tables, also available for each node type, associate phases with

the circuit packs they test, thereby providing a list of suspect circuit packs for anyfailing phase.

Whether diagnostics are initiated automatically or manually, their results appearas output messages on the ROP. The DGN output message identifies failing

phases and failing tests for a faulty node. And the ANALY TLPFILE outputmessage provides a list of suspect circuit packs in the faulty node. The ANALY

TLPFILE message, invoked by the TLP option of the RST command, is alwaysincluded by ARR requests to restore a node. In the ANALY TLPFILE message,

each circuit pack associated with a diagnostic failure is assigned a numberbetween one and ten. The number represents the probability as calculated by IMSsoftware that the location of the fault is in the pack; the higher the number, the

greater the probability. The DGN and ANALY TLPFILE output messages areprimary sources of diagnostic information for technicians.

Diagnostic Listings

If the information provided by ROP output messages fails to identify faultyequipment, further scrutiny of the diagnostic results is possible using diagnosticlistings. A diagnostic listing is a document that describes a particular diagnostic

phase. Common Network Interface has available the diagnostic listings thatpertain to the CNI configuration of the ring. They consist of the listings for ring

peripheral controller nodes, link nodes, attached processors, and ring applicationprocessors.

A diagnostic listing is composed of a prologue and a statement sequence. Theprologue introduces the subject phase by explaining what it tests, how the testing

is done, and what hardware is involved. All lines in the prologue begin with the

character C, indicating they are comments. The statement sequence consists ofinformation, arranged into numbered statements, about each command within theseries of commands that constitutes the phase. Each statement contains a

statement number, a source-file version of the command, and an ASCIIrepresentation of the executable version of the command. The ASCIIrepresentation is on a line that begins with the string * adr, unless the command

generates a test, in which case the line begins with * test followed by the testnumber. Most statements are preceded by one or more comment lines that

explain the purpose of the command that follows. Statement numbers correspondto numbers that appear in early termination output messages and in DGN AUDIT

RING output messages. They are also used in the EX input message. Testnumbers correspond to the test numbers that appear in DGN output messages.

For technicians, test numbers are the most important information in diagnostic

listings.

7/29/2019 172254

Ring Maintenance

Some long diagnostic listings subdivide the statement sequence into program

units. Program units correspond to divisions of phases that serve explanatoryrather than programming functions. Each program unit is preceded by a prologue

that provides introductory information about the commands within the unit.

Using Diagnostics

IMS ring diagnostics serve three principal purposes to confirm faults, to locatefaults, and to verify repairs. When IMS software removes a node suspected of

being faulty from services, it sometimes employs diagnostics to confirm and tolocate the fault. After replacing or repairing equipment indicated as faulty,

technicians employ diagnostics manually to verify that the fault has beencorrected before returning the node to service.

Because conditional restoral requests of ARR always include the TLP option,technicians usually have no need to manually diagnose a node in order to confirm

or locate its fault. Instead, they should consult the diagnostic results on the ROPthat was generated by ARR's restoral attempt. If, however, a restoral attempt fails

for nondiagnostic reasons, technicians will ordinarily need to run diagnostics onthe node before performing maintenance on it.

Guide to Critical Ring Maintenance

This document uses the term "critical maintenance" for manual actions

undertaken to correct faults and to recover the ring. The faults are of the kind thatobstruct the transportation of messages on the ring (ring-related faults) or the kindthat prevent the processing or transmission of messages within nodes

(node-related faults). As applied to nodes and their components, the principles of

critical maintenance are essentially the same for all except the ring applicationprocessors (RAPs) of CDN-Is which require unique treatment. Therefore, amongthe maintenance procedures set forth below, there is a special one for RAPs.

Critical maintenance most often occurs with the ring subsystem in operation,however fragmented the total ring might be by out-of-service nodes. Occasionally,

however, critical maintenance is required when, because of r ing conditions, thering subsystem fails and cannot be recovered by automatic means. This state,

known as ring down, is also discussed in this chapter and addressed with its ownprocedure.

The section begins with a discussion of the IMS commands technicians will most

often employ in performing critical ring maintenance. The discussion is intended

to amplify information contained in the IMS Output Manual ; it is not to be used asreference material.

7/29/2019 172254

401-661-045

IMS Input Messages

IMS input messages allow technicians to practice critical maintenance by

manually controlling various maintenance functions associated with the IMS

ring.13

A descriptive list of frequently-used IMS input messages follows. Wherethe word NODE appears in the list, substitute RPCN or the user's name for an IUN(LN, for example).

RMV:NODE Quarantines the specified node. If the command is executed for anode that has been automatically quarantined, the maintenance

mode of the node will change to manual, and the node will remainquarantined until it is manually returned to service by a version of

the RST:NODE command.

Before entering RMV:NODE for an active node with an active

external user interface, remove from service the communicationlink or links that terminate in the node.

DGN:NODE Executes diagnostic phases on the specified node. If no phasesare specified, DGN:NODE with exceptions described in a and b

a. If a node is in the active segment of an isolated ring but not a

BISO or EISO node, DGN:NODE with no phases specifiedquarantines the node (if it was not already quarantined) and

runs all diagnostic phases that do not require the node beisolated.

b. If the node is a BISO or EISO node, DGN:NODE with nophases specified extends the isolation to include the nodeand runs all automatic phases on it. If, however, the extended

isolation would create an active ring that is too short to

support message transport, the extension is not allowed andthe subsequent action is that described in a. above.

13 These commands may conform either to the Program Documentation Standards (PDS) —except that terminal exclamation marks are supplied automatically by software —or to theMan-Machine Interface Language (MML). Technicians should select one or the other ofthese message conventions by setting the RTR ECD spooler flag to PDS or MML. For anexplanation of the PDS input-message format, consult 3B21D Computer, UNIX RTR Operating System, Input Message Manual, PDS ``Section 2, User Guidelines.” For acomplete description of PDS, consult the Bell Laboratories Program Documentation Standards Reference Manual . For an explanation of the MML input-message format,consult 3B21D Computer, UNIX RTR Operating System, Input Message Manual, MML

``Section 2, User Guidelines.” For a complete description of MML, consult the CCITT MMLRecommendations (Z.301-Z.341) which are available from OMNICOM, Inc. Vienna,Virginia.

To set the spooler flag, see the layout for the ECD splrinfo form in the RTR Operating System, Recent Change and Verify Manua l for the 3B21D Computer.

7/29/2019 172254

Ring Maintenance

If any phases below 40 are specified, DGN:NODE behaves as

above except that it attempts to run only the specified phases.

If only phases above 39 are specified, DGN:NODE runs the

phases on the node after quarantining it (if it was not already

quarantined).

If a node was active or quarantined prior to the request fordiagnostics, DGN:NODE attempts to quarantine it after

diagnostics have completed. If a node was in another state,DGN:NODE leaves the node in the state in which it found it,

provided that diagnostic results do not require a different state.(Technicians would ordinarily return a quarantined node that hadpassed diagnostics to service by unconditionally restoring it.)

Before entering DGN:NODE for an active node with an active

external user interface, remove from service the communicationlink or links that terminate in the node.

RST:NODE Entered unconditionally for an out-of-service node that is notsandwiched in isolation between nodes with faulty ring interfaces,unisolates and/or unquarantines the node—thus placing it in the

active ring, downloads operational code into it, places the code inexecution, then changes the major state of the node to active. If

the node is sandwiched in isolation, RST:NODE enteredunconditionally leaves the node isolated, while placing it under

ARR control so that it will be automatically restored when ringconditions permit.

Entered conditionally, RST:NODE completes the same actions asDGN:NODE with no phases specified, then restores the node,

provided that it passes diagnostics and is not sandwiched in

isolation. If it is sandwiched in isolation, RST:NODE leaves itisolated while placing it under ARR control so that it will be

automatically restored when ring conditions permit.

If a node fails diagnostics, RST:NODE leaves it isolated, if its

ring-interface state is FLTY, or quarantines it, if its ring-interfacestate is USBL or QUSBL and it is not sandwiched in an isolation.

If the RST:NODE command is followed by a resource failure thatprevents downloading or executing code, a REPT IUN RST

output message with failure code 43 will appear on the ROP.When this occurs, technicians should wait a few minutes and try

the restoral again.

Before entering RST:NODE conditionally for an active node with

an active external user interface, remove from service thecommunication link or links that terminate in the node.

7/29/2019 172254

401-661-045

After entering RST:NODE for a node whose communication link

has been manually removed from service, it may be necessary tomanually return the communication link to service.

OP:RING Produces an OP RING output message concerning the status or

generic identity of specified nodes, groups of nodes, or of thering.

CFR:RING

1. isolates or attempts to end the isolation of specified nodes or

2. initializes the ring if it is down.

Because the DGN and RST commands provide automatic

isolation and unisolation of nodes under most conditions, thiscommand is rarely used. The command is intended primarily for

use in the first sense when growing and degrowing nodes and inthe second sense when a new ring is being installed underManual Ring Mode, which is explained below. In daily operations,

the first version of the command might be used with the excludeoption to isolate a node whose ring-interface state is

quarantine-usable prior to changing the ring-interface or IRNcircuit pack. With the MOVFLT option the first version command

can be used to shift an isolation on a ring that is too small for theisolation to be extended.

Before the Exclude version of the CFR command is entered foran active node, the node must be removed from service with theRMV:NODE command.

Tables providing brief descriptions of commonly used versions of IMS output

messages appear in Chapter 5, Ring Critical Events .

Critical Maintenance Procedures for Nodes

Because of the automatic actions of IMS maintenance software, techniciansordinarily perform critical maintenance on nodes that ARR has attempted

unsuccessfully to restore. Most restoral attempts that fail do so because ofdiagnostic failure. A few fail either because the attempt timed out waiting a reply

from MIRA or because a recurrent error condition caused a node to violate thefourth-time rule, which prevents ARR from restoring the same node for a fourth

time within a 60-minute interval. When any restoral attempts fails, ARRannounces the event with a version of the REPT ARR AUTORST message on the

ROP and changes the maintenance mode of the node to manual, thereby,

directing technicians to perform maintenance on it.

This section contains three procedures for clearing faults in individual nodes andthree procedures for dissolving isolations. Of the procedures for clearing faults,

one is to be used when ARR has failed to restore a node, one when critical

7/29/2019 172254

Ring Maintenance

maintenance is manually initiated, and one when—these procedures failing to

clear a problem—it becomes necessary to consult diagnostic listings. Theinformation provided by these three procedures is entirely sufficient for the

maintenance of nodes that are quarantined. Maintenance of isolated nodes,

however, involves these issues and others as well. The section ends withprocedures for dissolving isolations. One is concerned with single-node isolations;

one is concerned with multiple-node isolations; and one, to be used in conjunctionwith the other two, is concerned with the problems associated with a fault in a

BISO or EISO node.

Procedure 3-2. Clearing Faults in Response to ARR Action

ARR turns a faulty node over to technicians isolated when diagnostics or errormessages indicate a ring-interface problem that prevents the node from

propagating messages on the ring. Otherwise, it turns a faulty node over totechnicians quarantined. Thus technicians sometimes do and sometimes do not

receive a node from ARR in the proper state for replacing the circuit packs thatdiagnostics have indicated as possibly faulty. Quarantined nodes with

ring-interface problems (ring interface QUSBL) and IRN nodes with nodeprocessor problems are turned over to technicians quarantined yet must be

isolated before their ring-interface circuit packs are replaced. Nodes requiringbackplane repairs must also be isolated.

IMS circuit packs are designed to be replaced while the power supply to the nodeis on.

1. Learn of the failure of an ARR restoral attempt from a REPT ARR AUTORST RST

RQST FOR a FAILED output message, where a is the node that failed. Confirm withthe OP:RING command or from the 1106 display page that the failed node is in the

manual mode.

2. Note the failing phases and tests from the DGN output message.

3. From the information concerning failing phases, compose a list of suspect circuit

packs using the ANALY TLPFILE output message, and obtain from the supply of

spare circuit packs one of each pack on your list.

Observing the circuit pack LEDs, ensure that the node containing the listed packor packs is in the proper state for having the pack(s) replaced.

7/29/2019 172254

401-661-045

The following Table describes the various LED indications.

Nodes should be isolated before having any part of their backplanes repaired.

4. Replace the first circuit pack on the list, then proceed as follows:

s If you replaced a ring-interface, a node-processor, or an IRN circuit pack in

any node-type other than an RPCN, restore the node conditionally withRST:NODEa,b command.

s If you replaced any circuit pack in an RPCN other than the DDSBS circuitpack, restore the node conditionally with the RST:RPCNa,b command.

s If you replaced the DDSBS circuit pack of an RPCN, first run all automatic

diagnostic phases with the DGN:RPCN command. If the automatic phasespass, next run optional diagnostic phase 14 with the commandDGN:RPCNa,b:PH 14,CU c where c is 0 or 1, indicating the off-line control

unit of the 3B21D. If the DDSBS circuit pack passed both optional andautomatic diagnostic phases, restore the node to service unconditionally

using the RST:RPCNa,b;UCL command.

s If you replaced an auxiliary circuit pack of any node other than an RPCN orCDN-I, enter the command DGN:NODEa,b:PHc where c is the range ofphases that test the circuit pack you replaced. If the unit passes all

specified diagnostic phases, restore the node unconditionally with theRST:NODEa,b;UCL command.

Table 3-6. Circuit Pack LED States

Circuit-PackType

Node Type State Indication

auxiliary any quarantined or iso-

RQ LED red

IRN VLSI isolated NT LED red

IFB any isolate the adjacent

node in the same

unit as the IFB CP

NT LED red

NOTE:Before pulling any circuit pack in units not equipped with a connectorassembly, isolate all nodes serviced by the power supply associated with the

connector assembly. In 3-node units, the connector assembly is located at therear of the backplane at the RI\ 1 position in the two external nodes and isassociated with the nearest power supply. In two-node units, the connector

assembly is located at the rear of the backplane at the RI 1 position in bothnodes and is associated with the nearest power supply. In eight-node units the

connector assembly is located at the back of each power supply and isassociated with that power supply.

7/29/2019 172254

Ring Maintenance

s If you replaced the DDSBS circuit pack of a DLN, first run all automatic

diagnostic phases with the DGN:NODEa,b command. If the automaticphases pass, next run optional diagnostic phase 34 with the command

DGN:NODEa,b:PH 34,CU c where c is 0 or 1, indicating the off-line control

unit. If the DDSBS circuit pack passed both optional and automaticdiagnostic phases, restore the node to service unconditionally using the

RST:NODEa,b;UCL command.

s Consult the section ``Ring Application Processor Critical Maintenance

Procedure'' for instructions on diagnosing and changing auxiliary circuitpacks on a CDN-I.

s If to replace an interframe buffer you isolated an RPCN, restore the nodeconditionally with the RST:RPCNa,b command. If to replace an interframe

buffer you isolated any other node-type, run diagnostic phases 1 through13 with the DGN:NODE,b:PH 1-13 command and, if the phases pass,

restore the node unconditionally. If you permanently removed an interframebuffer or substituted a buffer with different capacity, change the ECD HV

field to reflect the change before restoring the node.

5. If the list of suspect circuit packs contained more than one entry and the node failed

to pass diagnostics after the first listed pack was replaced, reinstall the original pack,

replace the next pack on the list, then repeat the applicable portion of 4 and 5 above.

Continue in this fashion until either the node passes the specified diagnostic tests or

all circuit packs on the list have been replaced and tested. (If the node you are

troubleshooting is critically important or contributing to a multiple isolation, you may

wish to replace simultaneously all its circuit packs and then, at another time, reinstall

the original packs and test them individually to determine which pack was at fault.)

6. If you replaced all circuitpacks without the node passing diagnostics, visually inspect

the node and its housing. Look for unseated circuit packs, backplane damage, poorgrounding connections, and unseated cable connections. Before repairing the

backplane, isolate the node.

7. If the backplane is not at fault, consult the sections below on isolations and

trouble-shooting.

Procedure 3-3. Manually Initiated Maintenance of Nodes

In general, technicians should avoid manual intervention of any kind while EAR isattempting to recover the ring and should avoid manually intervening with a node

that ARR is attempting to restore.

7/29/2019 172254

401-661-045

IMS circuit packs are designed to be replaced while the power supply to the node

is on.

1. Before entering an RMV, DGN, conditional RST, or CFR:RING,NODExx

yy;EXCLUDE command for an active node with an active external user interface,remove from service the communication link or links that terminate in the node. After

entering an RST command for a node whose communication link was manually

removed from service, it may be necessary to manually return the communication

link to service.

2. Before manually initiating maintenance on a circuit pack or interframe buffer, remove

the resident or associated node from service. See Table 3-6.

Before replacing a power supply circuit pack in a 3-node unit, isolate the twonodes adjacent to the power supply. In a 2-node unit, isolate the node adjacent to

the power supply. In an 8-node unit, isolate the four nodes adjacent to the power

supply. In a 5-node unit, learn from the unit horizontal designation strip next to thepower supply in question the nodes serviced by the power supply, and isolateeither three or two nodes.

Nodes should be isolated before having any part of their backplanes repaired.

3. To quarantine a node, remove it from service with the RMV:NODEa b command.

This action has the effect of changing the maintenance mode of the node to manual,

thus preventing ARR from attempting to restore it.

4. To isolate a node, first remove it from service with the RMV:NODEa b command, and

then isolate it with the CFR:RING,NODExx yy;EXCLUDE command. This also has

the effect of changing the maintenance mode to manual.

5. If a quarantined or isolated node has not had a circuit pack replaced or reset, it may

be restored to service unconditionally.

6. If an isolated node has not had a circuit pack replaced but has been powered down

or had a circuit pack reset, run diagnostic phases 1 and 2 on it with the

DGN:NODEa,b:PH 1-2 command. If it passes it may be restored to service

unconditionally.

7. If a node has had a circuit pack replaced, observe the guidelines set forth in the fifth

step of the procedure ``Clearing Faults in Response to ARR Action.''

7/29/2019 172254

Ring Maintenance

Procedure 3-4. Using Diagnostic Listings

If the information provided by ROP output messages fails to identify faultyequipment, further scrutiny of the diagnostic results is possible using diagnostic

listings as explained below:

1. Note the failing phase and failings tests in the DGN output message.

2. Obtain the diagnostic listing(s) for the phase(s) that failed.

3. Read the prologue(s) to the failing phase(s) and, if one exists, the prologue to the

program unit in which failing tests appear. Pay particular attention to any

troubleshooting hints.

4. Read the individual comments on statements that contain failed tests.

5. If this information does not provide guidance on how to clear the fault, consult the

``Recognizing and Finding Intermittent Faults'' and the ``Other Suggestions for

Troubleshooting'' sections below for possible solutions.

6. If these sections provide no leads, seek assistance from the CTS.

Critical Maintenance Procedures for Nodes in

Isolation

Under circumstances described previously in this document, EAR may respond toconditions on the ring by creating an isolated segment that ARR cannot dissolve.

In these cases, dissolving the isolation becomes the responsibility of technicians.Generally, technicians should respond promptly to an isolation, since even a

singly-isolated node creates the potential of a massive isolation, in the event thatanother node must also be isolated.

Dissolving isolations sometimes requires that they be extended to include theBISO or EISO node. There are two reasons why this may need to be done. The

first involves the ambiguity IMS experiences in detecting certain types ofring-related faults. The second involves the way in which diagnostic code is

transmitted into an isolated segment.

The second can be stated simply. Messages, including messages containing

diagnostic code, are sent from the 3B21D to an isolated segment of the r ingthrough the BISO or the EISO node. BISO and EISO nodes have one RAC

participating in the active-ring segment and one RAC participating in the

7/29/2019 172254

401-661-045

isolated-ring segment. Messages destined for the isolated segment are read from

the active ring by the active-ring RAC, then transmitted by the node processor tothe isolated-ring RAC, which writes them to the isolated segment of the ring. A

fault in the isolated-ring RAC of either BISO or EISO node might go undetected,

since it would not affect the transportation of message on the active ring and couldshow up misleadingly as a diagnostic failure in the isolated node. Therefore,

technicians who find that they cannot clear a fault that appears to reside in theisolated node should extend the isolation to include the current BISO and EISO

nodes and run diagnostics again.

Low-Phase Ambiguity

The other reason for extending isolations concerns the ambiguity that IMSexperiences in detecting certain ring-related faults. Faults that prevent the

propagation of messages on the ring usually produce phase-1 and phase-2diagnostic failures. In the case of such failures, IMS often has the problem of

being unable to decide in which of two adjacent RACs a fault resides. Because

this problem is associated entirely with the parts of node hardware tested bydiagnostic phases 1 and 2, this document calls it ”low-phase ambiguity.''

Low-phase ambiguity does not usually result in the isolation of two nodesbecause, while one suspect RAC is isolated, the other suspect RAC may be

included in the isolated segment as the isolated RAC of the BISO or EISO node.The following figure illustrates the ring structure that permits this practice:

Figure 3-3. Isolated RACs of BISO and EISO Nodes

Notice that either RAC 1 of the BISO node or RAC 0 of the EISO could beincluded in the isolated segment as a suspect RAC.

IMS has difficulty acknowledging by customary means the fact that it has included

possibly faulty RACs in BISO or EISO nodes. A BISO or EISO node, being in theactive ring, cannot have its ring interface marked faulty. Therefore, if a RAC ofsuch a node is suspect, this fact will not be indicated in the minor state of the node

nor in the TLP information. It will, however, be reflected in tests 5 and 10 of theROP failure data for diagnostic phases 1 or 2, provided that the RAW option of the

RAC 0RAC 0RAC 0

Ring Interface Ring InterfaceRing Interface

EISO NodeBISO Node Isolated Node

RAC 1RAC 1RAC 1

7/29/2019 172254

Ring Maintenance

DGN command has been specified. (ARR does not specify the RAW option, so

the automatically output DGN failure data does not contain this information in full.It does, however, contain failing test 5, which is a sure indication that low-phase

ambiguity exists.)

The maintenance principle dictated by low-phase ambiguity is represented in the

following procedure:

Procedure 3-5. Determining the Nodes Involved in Low-Phase Ambiguity

1. After attempting to clear a fault in an isolated node that has failed test 5 ofdiagnostic phases 1 or 2, run verification diagnostics on the node with the

RAW option using the command DGN:NODEa,b;RAW, where NODEa,b isthe isolated node.

2. If the node passes all diagnostic phases, restore it to serviceunconditionally.

3. If the node still fails phases 1 or 2, consult the output message generatedby the DGN command with the RAW option, and determine whether it is

the BISO or EISO node that is suspected of being faulty. This is anexample of an output message when the RAW option of the DGN

command has been specified:

DGN LN32 1 PH 1 STF (14 X'00000000 x'00000000)

TEST MISMATCH ACTUAL MASK EXPECTED

001 X'00010000 N/A N/A N/A

004 X'FF012242 N/A N/A N/A

005 X'00000E01 N/A N/A N/A

006 X'00000044 N/A N/A N/A

007 X'0000002E N/A N/A N/A

008 X'00000E00 N/A N/A N/A

009 X'00000E04 N/A N/A N/A

010 X'00000E02 N/A N/A N/A

011 X'FF012242 N/A N/A N/A

7/29/2019 172254

401-661-045

Ignore everything except the mismatch data for test 005 and 010. If either

test 005 or test 010 appears in the DGN output message, the other willappear also, provided that the RAW option to the DGN command has been

specified. These tests will always identify two nodes as possibly faulty.

4. Using the physical node-address table in the reference chapter of thisdocument, translate the hexadecimal mismatch data for test numbers 005and 010 into the node names of two nodes. For example, in the above DGNoutput message, 00000E01 translates into IUN32 1 and 00000E02

translates into IUN32 2. These are the nodes suspected by IMS of beingfaulty. In the case of single-node isolations, one of the suspect nodes will

be the isolated node and the other will be the BISO or EISO node, thesuspect component of which will be the RAC 1 of the former or RAC 0 of

the latter.

5. When one suspect node is an EISO or BISO node, manually remove its

communication link (if it has an active one) from service, then remove thenode from service with the RMV:NODEa b command, thus extending the

isolation to include the suspect node in the isolated segment.

6. Perform maintenance on the newly isolated node.

Low-phase ambiguity has bearing on the procedures for treating single-and multiple-node isolations.

The procedures concerning isolations that follow are merely recommended. Whencircumstances, reason, or user practices dictate to act differently, do so. Theprocedures are not self-sufficient but build upon the three procedures discussed

above for clearing faults in nodes. The order of battle in these procedures is this:first perform maintenance on suspect nodes within the isolated segment. If this

fails to dissolve the isolation, next check to see if the isolated RAC of an EISO or

BISO node is suspected of being faulty. If so, perform maintenance on it afterincluding it in the isolation. Finally, if no isolated RAC in the EISO or BISO node issuspected of being faulty, extend the isolation to include the BISO and EISO

nodes, one at a time, and run diagnostics again on the chance that a fault in oneof their isolated RACs is being misread by diagnostic code.

7/29/2019 172254

Ring Maintenance

Guideline to Single-Node Isolations

Procedure 3-6. Responding to Single-Node Isolations

1. Recognize the existence of an isolated segment from output messages or from

information on 1105 or 1106 display pages. In some cases technicians will

themselves create an isolation, as for example when ARR turns over to technicians

a quarantined node that must be isolated before manual maintenance can be

performed on it.

2. If you are on-site, confirm that the node is isolated by checking its NT LED.

3. Follow the appropriate procedure for the isolated node from the procedures listed

below:

s Clearing Faults in Response to ARR Actions

s Manually Initiated Maintenance of Nodes

If test 5 of a phase-1 or phase-2 failure is indicated, verify your repair using theDGN command with the RAW option specified, thereby learning when the isolated

node still fails diagnostics whether the isolated RAC of the BISO or EISO node isalso suspected by IMS of being faulty.

4. If the procedure that you employed on the isolated node in step 3 failed to end the

isolation and test 5 and test 10 of a phase-1 and/or phase-2 failure is indicated,

extend the isolation to include the BISO or EISO node identified by the mismatch

data for test 10. Use the command RMV:NODEa, b, where NODE is the node name

of the node identified by test 10 mismatch data. On small rings you may have to shift,

rather than extend, the isolation by employing the MOVFLT option of the CFR:RING

command. (If the BISO or EISO node has an active communication link, remove the

link from service before removing the node.)

5. Follow the procedure “Clearing Faults in Response to ARR Actions'' for the newly

isolated node.

6. If:

Isolated

7/29/2019 172254

401-661-045

a. the procedure that you employed on the isolated node in 3 failed to

end the isolation

b. and test 5 of a phase-1 and/or phase-2 failure is not indicated,

extend the isolation to include the BISO node with the command RMV:NODEa, b,where NODE is the BISO node. On small rings you may have to shift, rather thanextend, the isolation by employing the MOVFLT option of the CFR:RINGcommand. (If the BISO node has an active communication link, remove the link

from service before removing the node.)

7. With the former BISO node now in the isolated segment, again diagnose the

originally isolated node.

8. If the originally isolated node now passes diagnostics,

a. diagnose the former BISO node and, if it fails, perform maintenance

on it following the TLP instructions

b. but if it passes, change its ring-interface and node-processor circuit

pack(s), then conditionally restore it to service.

s If the former BISO node now enters the active ring (therebydissolving the isolation), unconditionally restore the originally

isolated node (which should now have become quarantined) toservice, and end this procedure.

9. But if the originally isolated node still fails diagnosticsafter the former BISO node has

been included in the isolated segment, reduce the isolation by unconditionally

restoring the former BISO node, thereby making it once again the BISO node. (You

may have to manually return its communication link to service.)

10. Extend the isolation in the other direction to include the EISO node, and treat the

former EISO node as you did the former BISO node above.

BISONode Isolated

EISONode

FormerBISONode

Originally

BISONode

EISONode

FormerBISONode

IsolatedNode

Originally

7/29/2019 172254

Ring Maintenance

11. If the originally isolated node still fails diagnostics after the isolation has been

extended in both directions, or if the isolation repeatedly dissolves and returns,

attempt any appropriate procedures described in the section below on

troubleshooting. Then, if the isolation still persists, call the CTS.

Guideline to Multiple-Node Isolations

Isolations of more than two nodes will often contain innocent victims, that is,nodes that are included in the isolation, not because they are faulty, but because

they reside between faulty nodes. The ring interfaces and node processors ofsuch nodes will be classified as usable. Unless technicians manually remove

innocent victim nodes from service, they will remain in automatic maintenancemode, and ARR will automatically return them to service when the isolation isdissolved.

Procedure 3-7. Responding to Multiple-Node Isolations

1. Recognize the existence and extent of an isolated segment from output messages

or from information on 1105 or 1106 display pages.

2. Identify from DGN output messages the nodes within the isolation regarded by IMS

software as faulty. In nearly all cases the faulty nodes should be the isolated nodes

next to the BISO and EISO nodes. If an interior node is also indicated faulty, ignore

it until partial success in this procedure transforms it into a node next to an EISO or

BISO node.

3. If you are on-site, confirm that the nodes in question are indeed isolated by checking

their NT LEDs.

4. Choose to begin working on either the isolated node next to the BISO node or the

isolated node next to the EISO node. Base your choice on the followingconsiderations in the order shown:

a. If diagnostic failure data is given for only one of the two nodes, begin

with the node for which you have failure data.

BISONode

EISONode

Isolated Nodenext to theEISO Node

Isolated Nodenext to theBISO Node

InnocentVictimNode

7/29/2019 172254

401-661-045

b. If failure data is given for both nodes, begin at the end of the

isolation that includes the nodes most important to your operation.

5. For the node you have chosen, follow the procedure ``Clearing Faults in Response

to ARR Actions.'' If test 5 of a phase-1 or phase-2 failure is indicated for this node,verify your repair of the node using the DGN command with the RAW option

specified, thereby learning when the isolated node still fails diagnostics if the isolated

RAC of the adjacent BISO or EISO node is also suspected by IMS of being faulty.

6. If the procedure clears the fault of the isolated node next to the BISO or EISO node,

the ring shouldnow contain only a singly-isolated node, since both the repaired node

and the innocent victim nodes will have returned to the active ring. (An exception to

this statement occurs when the isolated segment contains three faulty nodes. In this

case, restoring one of the external faulty nodes will result in a smaller multiple

isolation. If this occurs, return to the beginning of this procedure and repeat the steps

up to here, then continue on.) Treat the singly-isolated node according to the

procedure for ``Responding to Single-Node Isolations,'' and end this procedure.

7. If, however, the procedure that you employed failed to reduce the isolation and test 5

and test 10 of a phase-1 and/or phase-2 diagnostic failure are indicated, extend the

isolation to include the BISO or EISO node identified by the mismatch data for test

10. Use the command RMV:NODEa, b, where NODE is the name of the node

identified by test 10 mismatch data. On small rings you may have to shift, rather than

extend, the isolation by employing the MOVFLT option of the CFR:RING command.

(If the BISO or EISO node has an active communication link, remove the link from

service before removing the node.)

8. Follow for the newly isolated node the procedure ``Clearing Faults in Response to

ARR Actions.''

9. If the procedure clears the fault of the newly isolated node, the ring should now

contain only a singly isolated node, since the repaired node, the isolated node next

to the original BISO or EISO node, and the innocent victim nodes will have returned

to the active ring. (An exception to this statement occurs when the isolated segment

contains three faulty nodes. In this case, restoring one of the external faulty nodes

will result in a smaller multiple isolation. If this occurs, return to the beginning of this

procedure and repeat the steps.) Treat the singly-isolated node according to the

procedure for ``Responding to Single-Node Isolations,'' and end this procedure.

10. If the previous step of this procedure fails to reduce the isolation or test 5 and test 10

of a phase-1 and/or phase-2 diagnostic failure were not indicated after failure in Step5 above, go to the other end of the isolated segment and repeat Steps 5 through 9

there.

7/29/2019 172254

Ring Maintenance

11. If these steps fail to reduce the isolation, extend the isolation to include either the

EISO or BISO node if one has already been extended, choose the other; if neither

has been extended,choose eitherwith the command RMV:NODEa, b, where NODE

is the EISO or BISO node. (If the EISO or BISO node has an active communication

link, remove the link from service before removing the node.

12. With the former EISO or BISO node now in the isolated segment, diagnose the

isolated node next to the former EISO or BISO node; and if the isolated node next to

the former EISO or BISO node now passes diagnostics, change the ring-interface

and node-processor circuit pack(s) of the former EISO or BISO node, then

conditionally restore the former EISO or BISO node to service.

13. If the former EISO or BISO node enters the active ring (thereby reducing the

isolation), treat the remaining isolation according to the procedure for single-node

isolations.

14. If, however, the isolated node next to the former EISO or BISO node still fails

diagnostics, unconditionally restore the former EISO or BISO node to the active ring.

(If you manually removed its communication link from service, you may have to

manually return it to service.) Then extend the isolation at the other end of the

isolated segment (unless you have done so previously), and treat that end in the

same way you have treated this end.

15. If both originally faulty nodes still fail diagnostics after the isolation has been

extended in both directions, or if the isolation returnsafter nodes havebeen restored,

follow any appropriate procedures described below in the section on

troubleshooting. Then if the problem still persists, call the CTS.

BISONode

EISONode

Isolated Nodenext to the

EISO Node

Isolated Nodenext to theBISO Node

InnocentVictimNode

EISONode

Former

BISONode BISO

Former Isolated Nodenext to the

BISO NodeFormer

InnocentVictimNode

Isolated Nodenext to theEISO Node

EISONode

7/29/2019 172254

401-661-045

Responding to Ring Down

IMS in the 3B21D and IMS in the ring are independent of one another to the

extent that either can fail while the other remains in operation. This section is

concerned with the problems that confront technicians when the ring subsystemfails because of ring conditions and cannot be recovered by automatic means.

The ring subsystem will fail when the 3B21D cannot communicate with the activering through any RPCN. This condition is most likely to occur in a two-RPCNenvironment when both RPCNs fail or when the active RPCN fails after the other

RPCN had been manually removed from service. In a multiple-RPCNenvironment, the condition is most likely to occur because of a condition in the

3B21D that would simultaneously disable all RPCNs.

The ring subsystem will also fail if the data length of the active ring becomesshorter than the maximum message length for which the system was engineered.

Small rings are susceptible to this problem. The problem is brought about by the

ring fragmentation associated with an isolation. An isolation that includes paddedinterframe buffers may shorten the active ring severely. Padded interframe buffers

are redundantly employed in pairs at opposite sides of the ring. Thus asingle-node isolation would not usually include both pairs. Still, interframe buffers

exist under a kind of quadruple jeopardy, because if either member of a pair fails,the pair fails and must be isolated, and because a pair must also be isolated if

either of the nodes adjacent to it fails. Thus while it is unlikely that both pairs willbecome isolated, they have.

Finally, a ring may go down and stay down because of an intermittent fault thatconfuses initialization tests, or a ring may repeatedly go down because of a fault

that is transparent during initialization tests but not during normal operations.

The following procedure for recovering a ring that is down is intended as aninstructional paradigm only. Technicians should freely depart from it ascircumstances, reason, or user practices suggest. In particular, technicians should

not manually intervene until they are certain that IMS software has exhausted allits efforts to recover a down ring. Such recovery efforts are ordinarily directed by

user software. Therefore, technicians should consult user documentation to learnhow to know when automatic recovery efforts have ended.

7/29/2019 172254

Ring Maintenance

Procedure 3-8. Ringdown Response Procedure

1. Following the termination of automatic recovery efforts, immediatelyattempt to bring

the ring up by submitting it to a level-3 and, if that fails, to a level-4 IMS initialization.

If it is important to the user that IMS in the 3B21D not abort itself should ring

initialization fail, initialize the ring at level 4 using manual ring mode, as explained

below.

2. If in response to level-4 initialization the ring fails to come up (as indicated bya REPT

RING INIT output message) or to stay up (as indicated bya REPT RING CFR output

message), determine the cause of its failure byexamining the outputmessages. The

REPT RING INIT messages in question are of two types. One type indicates the

reason the ring failed tocome up. These reasonsincludeno standby RPC nodes

available and no ring segment acceptable for active ring use,with the latter indicating either that no candidate for the active ring-segment contains

an RPCN or that no candidate is long enough to satisfy the requirement of minimum

length. In the absenceof the first message, the second messagemay be understood

to indicate that the problem is length. The second typeREPT RING INIT message

identifies nodes that tests conducted during initialization have determined to be

faulty.

3. If RPCN failure is the apparent cause, replace all circuit packs with known good

packs in an RPCN that was not isolated before the ring went down. Then initialize

IMS at level 4. If this attempt fails, replace all circuit packs with known good packs in

another RPCN.

4. If ring length is the apparent cause, identify faulty nodes by examining the second

type REPT RING INITmessage. Mentally construct the population and distribution

of nodes within the portion of the ring that is likely to become the isolated segment.

Ask yourself the following questions:

s Are any nodes adjacent to padded interframe buffers listed as faulty?

s If so, are they all external nodes (adjacent to the BISO or EISOnodes) within the portion of the ring likely to become the isolated

segment, or is one of them an internal node within that portion?

s If not, are they innocent victim nodes within the candidate for the

isolated segment?

7/29/2019 172254

401-661-045

5. If nodes adjacent to padded interframe buffers are faulty and one of them is likely to

be an external node in an isolated segment, replace (if you are in an emergency

situation) the ring-interface and node-processor circuit pack(s) on both nodes

adjacent to the interframe buffers and replace both interframe buffers. Then initialize

the ring at level 4.

6. If nodes adjacent to padded interframe buffers are internal nodes (either faulty or

innocent-victim) in the candidate for the isolated segment, approach the problem

following the procedure described above for responding to multiple isolations

(though of course under ring down conditions you will not be able to conduct

diagnostics). Then, if a node adjacent to padded interframe buffers becomes a

probable external node in a candidate for the isolated segment, treat it as in 5 above.

7. Study the MOVFLT option of the CFR:RING command. It may be useful in resolving

an isolation on a very small ring.

8. If none of the above approaches succeeds in recovering the ring, force faults byunseating various ring circuit packs and initializing at level 4. This is a desperate

attempt by trial and error to force an isolation in the hope of getting the ring up. Once

the ring is up, diagnostics can be run on the isolated portion.

Employing Manual Ring Mode

Manual ring mode allows the ring to be fully initialized without an accompanying

initialization of IMS in the 3B21D. Ordinarily full ring initialization occurs as a stagein level-4(BOOT) IMS initialization. Under certain circumstances and for certain

users, however, the disruption that IMS initialization entails in the operation of the3B21D may be unacceptable as, for example, when the ring is down or when ringhardware is being retrofitted to a system that has IMS as a subsystem. In these

cases, the ring may be initialized manually.

Procedure 3-9. Manual Initialization of the Ring

Before manual initialization, the ring must be down and enough hardware must be

in place to satisfy the requirement for minimum ring size. To initialize the ringmanually,

1. Consult ``Setting the ECD Flag for Manual Ring Mode'' in Appendix B, Ring Maintenance Reference Material .

7/29/2019 172254

Ring Maintenance

2. Set the ECD Manual Ring Mode flag as described in the above reference. IMS is

programmed to abort if, during initialization, the ring fails to come up. The ECD

manual ring mode flag inhibits this response.

3. If you are employing manual ring mode for a new installation, or if you areexperiencing ring down and no RPCNs are in the standby state, restore as many

RPCNs aspossible. When RPCNs are restoredwith the ring down, theywill be in the

STBY, not the ACT, state. This state is expected and sufficient for moving on to Step

4. Enter the command CFR:RING

5. Expect to receive a form of the REPT RING INIT message indicating that the

initialization was or was not successful and a CFR RING COMP message indicating

that the program has completed. Forms of the REPT RING FLT message may also

appear to identify nodes that failed to participate in the initialization.

6. If the initialization was successful, reset the manual ring mode flag to null.

7. If the initialization was not successful, leave the ECD flag set for manual ring mode

and use the information you gained in Step 5 to troubleshoot the ring in the manner

described in ``Responding to Ring Down.''

Ring Application Processor Critical Maintenance

Procedure

The ring application processors (RAPs) of the CDN-I must be manually diagnosed

and maintained using special procedures. Automatically-initiated diagnostics ofthe RAP sometimes produce deceptive results. If RAP firmware is not executing,diagnostics run on RAP circuit packs (phases 42 through 53) will provide

erroneous data about phase and circuit pack failures; yet technicians cannot knowfrom ROP output that the data they are receiving is incorrect. They can, however,

receive correct data if, during diagnostics, they are present at the RAP housingand observe the RAP LEDs.

Each RAP circuit pack is equipped with an LED that turns on to indicate that thepack has failed a diagnostic phase. In addition, each of the LEDs on certain packs

turn on when the RAP is initializing and then turn off when initialization testsconfirm that the firmware is executing. The LEDs, thus, supply a means by which

technicians can observe the progress of RAP diagnostics and of RAPinitialization, provided they are present at the RAP housing as these actions

occur. And they can be present, because power and diagnostic switches located

7/29/2019 172254

401-661-045

on each RAP power control interface and display (PCID) board allow them to

control these functions locally. Thus RAP initialization and diagnostics may be runcentrally by the host or locally by means of PCID-board switches.

A RAP failure will usually be tested initially by central diagnostics at the request ofARR, and ROP output will indicate the phases that failed and the circuit pack(s)

suspected of being faulty. The procedure described below for fully diagnosing aRAP fault begins by tentatively accepting the results of the automatic diagnostics

and then proceeds to confirm them. (Notice in the procedure the requirement thata CDN be quarantined when its RAP circuit packs are diagnosed or replaced.)

Procedure 3-10. Manually Confirming RAP Diagnostic Results

1. Remove the CDN from service by quarantining it.

2. Turn off RAP power by toggling the top switch on the PCID board.

3. Replace the first circuit pack listed in the TLP.

4. Test as follows to determine that RAP firmware is capable of initializing the RAP:

Turn on RAP power, observing the LEDs on the following non-MASA circuit packs.

s The node processor interface (NPI) circuit pack.

s The central controller support (CCS) circuit pack.

s The central controller cache (CCC) circuit pack.

s All equipped main store controller (MASC) circuit packs.

When power is restored the LED of each pack should come on, go off, come backon, and finally go off; and this sequence of LED blinks should be completed for all

packs within [18 + (2 the number of MASA boards) +/-2] seconds for systems withthe 2-Mbyte memory and within [18 + (20 the number of MASA boards) +/-2]

seconds for systems with the 16-Mbyte memory. If an LED fails to come oninitially, turn off RAP power, replace the circuit pack, and repeat this step. If anyLED fails to follow the full sequence of blinks, or if all LEDs fail to complete the

sequence of blinks within the allotted time, go to Step 7 of this procedure.

5. This step manually diagnoses the node. The following information is helpful in

understanding it:

When diagnostics begin, the LED on each non-MASA circuit pack turns on andstays on until the pack has passed diagnostics. Moreover, diagnostics run on

non-MASA packs early-terminate. Therefore, when a non-MASA pack fails

7/29/2019 172254

Ring Maintenance

diagnostics, the diagnostic routine ends and the LEDs on the failed pack and on

all non-MASA packs that have not yet been diagnosed stay on. MASA LEDs, onthe other hand, may or may not come on when diagnostics begin, but they will

come on if their circuit packs fail diagnostics. Moreover, MASA diagnostics do not

early-terminate. Therefore, it is possible during a single diagnostic routine for aMASA pack to fail and for another pack perhaps a non-MASA pack further

downstream to fail as well.

Depress the DIAG switch on the PCID board. All non-MASA LEDs should comeon, then go off within 6 minutes for systems with the 2-Mbyte memory and within 4

minutes for systems with the 16-Mbyte memory. (If more than one MASC memorygroup is present, add 2 minutes and 40 seconds for each additional group.) If anyLED fails to come on initially, turn off RAP power, replace the circuit pack, and

repeat this step. If any LED fails to go off in the time indicated, turn off RAP power,replace the circuit pack, and repeat this step. If more than one LED fails to go off

in the time indicated, turn off RAP power, replace the first circuit pack in thefollowing list whose LED is on, and then repeat this step.

a. CCS

b. Memory group 0, that is, MASC_0 and all MASA packs associated

with it. (MASC diagnostics depend upon memory from the first—theMASA_0—memory board, so a fault in one pack may under some

circumstances cause the other to fail diagnostics. Therefore, if thesituation here or elsewhere indicates that either of these related

packs should be replaced but replacing it does not solve theproblem, try reinstalling the original pack and replacing the pack ofthe other.)

c. CCC

d. Each additional equipped memory group in numerical order.

e. NPI

If, upon repetition, a replaced circuit pack fails to pass diagnostics, leave RAP

power off, quarantine the node, and contact the CTS.

6. If Step 5 succeeded, unconditionally restore the node to service and end this

procedure.

7. Systematically search for the fault that is preventing initialization by following Steps 7

through 23.

Turn off RAP power. Reinstall the original circuit pack removed in Step 3.

8. Unplug the following circuit packs by opening their latches and pulling them out

about one inch:

s All MASCs packs except MASC_0

7/29/2019 172254

401-661-045

s The NPI pack

s All MASAs packs in memory group 0 except MASA_0.

9. Restore RAP power and observe the LED on the CCS pack. If it goes on, off, on, off

in 33 to 43 seconds, go to Step 24.

10. Turn off RAP power and replace the CCS pack.

12. Turn off RAP power. Reinstall the original CCS pack. Replace the CCC pack.

14. Turn off RAP power. Reinstall the original CCC pack. Replace the MASC pack.

16. Turn off RAP power. Reinstall the original MASC pack. Replace the MASA_0 pack.

18. Measure the voltage at each power converter (PWRB on the main unit and PWRC

on the growth unit) from + pin 056 to gnd pin 032. If the voltage is below the +5.1 to

+5.3 volt range, turn RAP power off and replace the appropriate converter.

19. Restore RAP power and observe the LED on CCS pack. If it goes on, off, on, off in

33 to 43 seconds, go to Step 24.

20. Steps 20-23 attempt to identify a problem that is not associated with the failure of a

circuit pack.

a. Turn off RAP power.

b. Reinstall the original MASA_0 pack.

c. Check backplane for shorted pins.

d. Check growth unit cables and bus terminators for proper installation,adjusting as needed.

e. Restore RAP power and observe the LED on the CCS pack. If itgoes on, off, on, off in 33 to 43 seconds, go to Step 24.

7/29/2019 172254

Ring Maintenance

21. If the RAP is not equipped with a growth unit, go to Step 23. Otherwise, turn off RAP

power and remove the basic-unit ends of the six growth cables, leaving them

hanging free. Remove the six terminator resistors from the growth unit and place

them in the positions formerly occupied by the basic-unit ends of the six growth

cables.

in 33 to 43 seconds, the problem is in the growth-unit backplane. Go to Step 24.

23. Leave the node quarantined, call the CTS, and end this procedure.

24. Manually diagnose the node as follows:

a. Depress the PCID DIAG switch.

b. Check that the CCS, CCC, and MASC_0 LEDs come on.

c. Check that the CCS LED goes off in 25 to 35 seconds for systems

with the 2-Mbyte memory and in 35 to 45 seconds for systems withthe 16-Mbyte memory.

d. Check that the following circuit packs all go off in the order listedwithin 2 minutes for systems with the 2-Mbyte memory and within 75

seconds for systems with the 16-Mbyte memory.

1. MASA_0

2. MASC_0

3. CCC Check that the yellow fail light on the PCID has goneout.

e. If the LED on any of the four circuit packs fails to go off on time or in

the indicated sequence, or if the PCID fail light fails to go off, turn offRAP power, replace the faulted pack, turn on RAP power, andrepeat this step. If the repetition is unsuccessful, leave the nodequarantined and call the CTS.

Recognizing and Finding Intermittent Faults

Faults that occur in IMS hardware may be hard, transient, or intermittent. Hard

faults permanently disable a component and are easy to find. IMS automaticmaintenance software dependably locates hard faults, removes them from thesystem, and directs technicians to repair them. One-time transient faults, if not

easy to find, are easy to deal with. They are caused by temporary hardwareproblems or glitches in software. Usually they are corrected by the IMS practice of

reinstating the ring or a component after a first failure. By contrast, intermittent orrecurring transient faults are often neither easy to find nor to deal with. If the

frequency of their occurrence is fairly short and fairly regular, IMS software can

7/29/2019 172254

401-661-045

usually locate them. But if their frequency of occurrence is long or very irregular,

they may escape the IMS net. In such cases, manual records kept by techniciansare the indispensable tool for identifying, finding, and correcting them.

How will an intermittent fault show up? In a ring interface or IRN node processor,an intermittent fault may appear in several guises as repeated losses of token, as

successful ring restarts following instances of blockage, as a node that EARisolates but ARR returns to service because it passes diagnostics, as a node that

ARR turns over to technicians because it has violated the fourth-time rule, or as acombination of these automatic responses. It could also appear as a repeated

failure of EAR recovery level 3 to find a fault that levels 1 and 2 had attemptedunsuccessfully to isolate. Again, the existences and histories of faults of this kindare likely to be caught only in the manual records of technicians.

On nodes suspected of having intermittent faults, enact the following checks:

s Inspect the node and its housing (Visually). Look for poorly seated circuitpacks, backplane damage or improper grounding, and poorly seated cable

connections.

s Run diagnostics on the node in the repeat mode.

s Tap on the front of the circuit packs and apply pressure to the backplane

with your thumb in an effort to stress cracks and in an attempt to stimulatean intermittent fault to recur.

s Move the circuit packs of a suspected node one-by-one to another locationto see which hardware (if any) have an intermittent failure follow. (Makesure you keep careful records of each move.)

IMS attempts to recover automatically from software faults. Thus no regular

software maintenance is required of the Craft. Intermittent faults are more likely to

be in hardware than in software. Nevertheless, when a troubled componentconsistently passes diagnostics, the fault could be in software.

Other Suggestions for Troubleshooting

The following are hints and advice based upon developer experience.

New Circuit Pack; Old Failure

Technicians are sometimes faced with the following anomaly. A node continues tofail diagnostics after its circuit packs have been replaced, yet no problem is visible

in the backplane or ring bus wiring. Faced with this problem, technicians should

consider that the fault might lie in the isolated RAC of the BISO or EISO node. Anexplanation follows:

7/29/2019 172254

Ring Maintenance

Messages, including messages containing diagnostic code, are sent from the

3B21D to an isolated segment of the ring through the BISO or the EISO node.BISO and EISO nodes have one RAC participating in the active-ring segment and

one RAC participating in the isolated-ring segment. Messages destined for the

isolated segment are read from the active ring by the active-ring RAC, thentransmitted by the node processor to the isolated-ring RAC, which writes them to

the isolated segment of the ring. A fault in the isolated-ring RAC of either BISO orEISO node might go undetected, since it would not affect the transportation of

message on the active ring and could show up misleadingly as a diagnostic failurein the isolated node, thereby, creating the maintenance anomaly described above.

Therefore, technicians who face this problem should consider extending theisolation to include the current BISO and EISO nodes and running diagnostics onthem.

Unconditional Restorals

Do not unconditionally restore a node unless you are certain it is without faults.

Even when you are certain, do not unconditionally restore a node that has beenpowered down, that contains a ring-interface circuit pack that has been reset, or

that exists in isolation with a node that has had a ring-interface circuit pack resetwithout first running diagnostic phases 1 and 2 on it. When a node or a circuit

pack has been powered down, the status registers of its ring-interface hardwaremay become improperly set, and an unconditional restoral of the node will likely

result in a ring transport error and an isolation. Diagnostic phases 1 and 2 reset allring-interface status registers to their proper positions.

Be aware that some correlation exists between unexplained losses of token andthe number of out-of-service nodes, because the node processors of quarantined

and isolated nodes cannot fulfill their important and unassignable role in errordetection and reporting.

Avoiding Trouble

Be careful not to leave the system unattended with ARR or CNR inhibited.

Recording Trouble

When troubleshooting a ring-related problem, frequently enter theOP:RING;DETD command as a way of providing, on the ROP output, sequentialrecords of ring status. Such records may be useful during postmortems. If a

problem is likely to be referred to developers at Bell Laboratories, save the currentRPTERR0 and RPTERR1 log files in /etc/log .

Keep records on all circuit pack replacements and failures.

7/29/2019 172254

401-661-045

Keep records on all indications of transient and intermittent faults identifying, if

possible, the locations where they occur. Remember that a transient fault may bean intermittent fault in its infancy.

New Installations or Ring Growth

New installations may wish to utilize the manual r ing mode which is explained

above. Avoid growing nodes on a live system that is experiencing unexplainedtransient failures. When installing a new IMS ring or growing a new node, verifythat the hardware specified in the ECD UCB hv field matches the hardware that is

physically present. Also execute full diagnostics (automatic and optional) on everynew ring node, resolving problems until diagnostics indicate ATP. If you encounter

troubles, be suspicious of cables. Look for poor or open connectors, for cablesconnected to the wrong place, and for improper backplane grounding.

Examples of Ring Maintenance

This chapter exemplifies some of the maintenance principles and practices thatwere formulated in the previous two chapters. Its purposes are to familiarizetechnicians with the IMS ROP output, to suggest ways for technicians to monitor

and interact with automatic maintenance, and to provide technicians with realisticexamples of both manual and automatic maintenance activities. Most of the

examples represent common scenarios. A few are special cases. Together theycompose an IMS tutorial.

Each example is preceded by an introduction. The examples themselves arecomposed of two elements. A literal reproduction of ROP output in the left column

of the page records maintenance-related events occurring in the ring subsystem.

A commentary in the right column of the page provides a gloss on the adjacentROP output. The gloss is selective and cumulative. It usually avoids explainingfeatures that previous entries have explained.

The examples composing this chapter incorporate two recently developedfeatures, ring restart and automatic TLP output. Readers whose systems do not

have ring restart should ignore the level-0 recovery efforts in the examples andbegin with the level-1s. Readers without the TLP feature may use the DGN output

messages to identify probable faulty equipment.

A convention of this chapter is that data in ROP output messages that is notordinarily used by technicians will be omitted and replaced by rows of periods.

7/29/2019 172254

Ring Maintenance

Responses to Single, Ring-Related Faults

The following four examples of ring recovery occur in response to single faults of

the kind that disrupt the transportation of messages on the ring.

Automatic Recovery from a Transient Fault by EAR

Level 0

IMS software responds to faults that disrupt the transportation of messages on thering with the EAR escalative recovery strategy. The first or 0 level of this strategy

consists of restarting the ring in conformity with its structure prior to the fault. Sucha response will usually recover the ring subsystem from a transient fault, as it

does in this example. Technicians should record the occurrence and, if possible,identify the location of transient faults.

7/29/2019 172254

401-661-045

This example occurs on the following ring:

REPT RING CFR

LEVEL 0 RING CONFIGURATION INITIATED BY EAR

NORMAL CONFIGURATION REQUESTED

0 1 4 3600000..........................................(4030614766)

Announces the onset of a level-0 recov-

ery attempt, stimulated by EAR’s receipt

of one or more error messages indicating

a ring-related fault. The onset time of the

attempt appears in milliseconds in paren-

theses on the bottom line. Other numbers

on the bottom line pertain to the ring error

threshold. The first digit indicates EAR’s

mode where 0 = ``threshold not

exceeded” and 1 = ``threshold

exceeded.” The second digit identifies the

number of ring errors that have occurred

within the current threshold interval. The

third digit is the user-specified number of

errors per threshold interval that causes

the threshold to be exceeded. And

3600000 is the user-specified threshold

interval in milliseconds. When the second

number equals the third, the threshold

has been exceeded.

REPT RING CFR

RING CONFIGURATION ESTABLISHED (455 ms)

NORMAL CONFIGURATION, NODE NODES ISO-

.................................(4030614777)(4030615120)

Announces a successful restart of the

ring. Thus no manual response is

required. 455 ms is the duration in milli-

seconds of ring silence resulting from the

configuration attempt, and in parentheses

are the times when the ring configuration

job started and was completed.

00AAAAAAAAAAAA.... 01................ 02................

30................ 31.AAAAAAAAAAAAAAA 32AAAAAAAAAAAA....

63.AAAAAAAAAAAAAAA

CMD FUNCTION

400 OP RING DETAILED

7/29/2019 172254

Ring Maintenance

RAC PARITY/FORMAT ERROR DETECTED, IUN31 11

.......................................................................

....................................................(4030614653)

IMS in the 3B21D received this and the

following two-ring transport error mes-

sages (at the times in parentheses) as a

result of the fault that stimulated the

above recovery attempt. This message(the first to arrive) identifies the error type

and the node and RAC associated with

the error. Notice that ring transport error

messages appear on the ROP following

the messages announcing the system

response to the error.

BLOCKAGE DETECTED, IUN31 9 RAC 0

.......................................................................

.....................................................(4030614663)

The fault spawned two instances of block-

age, one from this, the second node

upstream of the faulty node...

REPT RING TRANSPORT ERRBLOCKAGE DETECTED, IUN31 10 RAC 0

.......................................................................

.....................................................(4030614667)

and one from this, the first node upstreamof the faulty node. IUN 31 9 detected

blockage before IUN 31 10 could drain

the ring. IUN 31 10 must have detected

blockage prior to IUN 31 9, but IUN 31 9’s

ring transport error report reached the

3B21D first.

7/29/2019 172254

401-661-045

Manual Recovery from a Hard Fault

After a hard fault, EAR level-0 will ordinarily try unsuccessfully to restart the ring.

Then based upon its analysis of ring transport error messages, EAR level-1 will

attempt to locate and isolate the fault. If EAR succeeds, ARR will then attempt torestore the isolated node conditionally and, if it fails, will change the nodemaintenance mode to manual, thereby, directing technicians to perform

maintenance on it. This example is composed of the scenario just described.

REPT RING CFR

.....................................................(4030772385)

Prompted by a ring transport error report,

EAR level-0 requests that the ring config

module restart the ring.

REPT RING CFR

RING CONFIGURATION ATTEMPT FAILED 17

COULD NOT ESTABLISH A NORMAL RING CONFIG-

URATION

.......................................................................

(4030772397)(4030772536)

The continuity test run by the ring config

module failed, an indication that the fault

is probably hard.

REPT RING CFR

ISOLATION FROM IUN31 11 TO IUN31 11REQUESTED

0 2 4 3600000..................................(4030772561)

EAR level-1 requests that the ring config

module isolate the node indicated as

faulty by the ring transport error mes-

sages.

OP:RING;DETD

RING STAT: ACTIVE

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

7/29/2019 172254

Ring Maintenance

REPT RING CFR

RING CONFIGURATION ESTABLISHED (658 MS)

BISO NODE = IUN31 10, EISO NODE = IUN31 12

(4030772580)(4030772942)

IUN31 11 is isolated with IUN31 10 acting

as BISO node and IUN31 12 acting as

EISO node.

RAC 0.

................................................(4030772270)

BLOCKAGE DETECTED, IUN31 10 RAC 0.

................................................(4030772278)

BLOCKAGE DETECTED, IUN31 9 RAC 0.................................................(4030772282)

REPT ARR AUTORST

ARR COND RST FOR IUN31 11 STARTED

ARR requests that MIRA conditionally

restore the isolated node. This is ARR’s

check that the removal and isolation of

the node was necessary. The attempt will

generate diagnostic data that the techni-

cian should use if called upon to perform

maintenance on the node.

RST TERM LN31 11 TASK 3 MSG STARTED RTR message announcing that ARR`s

restoral request is on the active queue

and being processed.

7/29/2019 172254

401-661-045

The 1105 display page now looks as follows:

RMV IUN31 11 STOPPED 5 RTR message announcing that it could

not remove IUN31 11 from service

(because EAR had done so previously).

DGN IUN31 11 PH 1 STF (9 X’00000000 X’00000000)

004...........................................................005 X’00000dfb................................................

006...........................................................

008...........................................................

009...........................................................

Indicates that during phase 1 diagnostics,

some tests (nine in all) failed and none

(X’00000000 X’00000000) were skipped.

IUN31 11 is not necessarily the node inwhich phase 1 failed, but the node speci-

fied in ARR’s diagnostic request. Since

phases 1 and 2 test all RACs in the iso-

lated segment, the fault that produces a

phase 1 or 2 failure may not reside in the

specified node. The failure of test 005

indicates that, in this instance, low-phase

ambiguity exists; in other words, that both

a RAC of the isolated node and a RAC of

either the EISO or BISO node is sus-

pected of being faulty. See the ̀ `Low-

Phase Ambiguity” section in this chapter.

ARR RESORE COND IUN31 11

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

CMD FUNCTION

7/29/2019 172254

Ring Maintenance

DGN IUN31 11 PH 2 STF (10 X’00000000 X’00000000)

002...........................................................

004...........................................................

005 X’00000dfb................................................

006...........................................................

007...........................................................

Phase-1 diagnostics test the isolated

segment beginning at the BISO node and

phase-2 tests them beginning at the

EISO node. In the case of single-node

isolations, the two phases should reportfailure data for the same node(s), but in

the case of multiple-isolations they usu-

ally report failure data for different nodes.

DGN IUN31 11terminated at ph 2 stmnt 36 after test 17 Indicates the point in the diagnostic rou-

tine at which execution terminated.

ANALY:TLPFILE: IUN31 11 SUMMARY DATA MSG

STARTED

TLP: IUN31 11 PH=1....................................................

TLP: IUN31 11 PH=2....................................................

TLPFILE COMPLETED

Summarizes diagnostic failure data.

Phases cited are those that failed; but

because phases 1 and 2 are at issue,

IUN31 11 is not necessarily the location

of the failure.

DGN IUN 31 11 COMPLETED STF (19........................)

ANALY TLPFILE IUN31 11 TLPSRCH MSG IP

TLPFILE #983090

Short form of this message. The longer

form is next.

ANALY TLPFILE IUN31 11 SUSPECT FLTY EQUIP-

CODE GRP MEM CONT POS WT NOTE

UN303 31 11 -- -- 10 --

CABLE -- -- -- -- 10 3

This data is printed only after a test fails

and only if the TLP option was specified

in the DGN command (as it always is by

ARR). The entry lists in weighted (WT)

order equipment suspected of being

faulty. The “WT” is a number between 1and 10. The higher the WT the greater

the likelihood of the equipment being

faulty. Because ARR does not specify the

RAW optionof theDGN command, failure

data for test 010 is not given. (See the

``Low-Phase Ambiguity” section of this

chapter.)

RST IUN31 11 STOPPED 1 Because of diagnostic failure (error code

DGN IUN31 11 STF..............................................MSG

REPT ARR AUTORST

ARR COND RST FOR IUN 31 11 FAILED

Confirms that ARR’s restoral request hasfailed. Many IMS processes write to the

ROP, at times resulting in some redun-

dancy.

7/29/2019 172254

401-661-045

OP:RING;DETD Manual input message.

RING STAT: ISOLATED SEGMENT

BISO: IUN31 10 EISO: IUN31 12

. The subnumber 4 under the i in the above

output message indicates that the ring

interface of IUN31 11 is faulty. The num-

bers used in this way have the following

meanings:

1 = manual mode

2 = RI QUSBL or NP faulty or untested

3 = combination of 1 and 2

4 = RI faulty or untested

7 = combination of 1, 2, and 4

OP:RING, IUN31 11 Manual input message.

OP:RING IUN31 11 COMPL

IUN32 11: MJ = OOS; NM = MAN; RI = FLTY ; NP =

IN ISOL SEG

Like the TLP and OP:RING;DETD out-

puts above, this data does not reflect the

low-phase ambiguity.

Following the procedures, ̀ `Responding

to Single Node Isolations” and ``Clearing

Faults in Response to ARR Actions,” atechnician replaces circuit pack UN303 in

IUN 31 11...

RST:IUN31 11 and conditionally restores the node.

00AAAAAAAAAAAA.... 01................ 02................

30................ 31.AAAAAAAAAAiAAAA 32AAAAAAAAAAAA.... 4

63.AAAAAAAAAAAAAAA

7/29/2019 172254

Ring Maintenance

Automatic Recovery from a Transient Fault by ARR

In this example a fault triggers a level-0 recovery attempt that fails; EAR level 1

then isolates the apparently faulty node; and ARR's attempts to restore the nodesucceeds. Though the fault triggers two levels of EAR responses, no manualaction is required other than to record the occurrence and location of the problem

as a probable transient fault.

RST IUN31 11 TASK 4 MSG STARTED

RMV IUN31 11 STOPPED 5

DGN IUN31 11 COMPLETED ATP MESSAGE IN

PROGRESS

Repaired IUN31 11 now passes diagnos-

REPT RING CFR

NORMAL CONFIGURATION, NO NODES ISOLATED

The isolation is dissolved automatically

as IUN31 11 is restored.

(4031118365)(40311118740)

RST IUN31 11 COMPLETED IUN31 11 has been returned to the active

ring, pumped with operational code and

placed in execution.

DGN IUN31 11 ATP MESSAGE COMPLETE

OP:RING;DETD

RING STAT: ACTIVE

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

7/29/2019 172254

401-661-045

REPT RING CFR

NORMAL CONFIGURATION REQUESTED.0 3 4 3600000................(4031349825)

REPT RING CFR

COULD NOT ESTABLISH A NORMAL RING CONFIGURATION

.....................................................

(4031349837)(4031350005)

REPT RING CFR

LEVEL 1 RING CONFIGURATION INITIALED BY EAR

ISOLATION FROM IUN31 11 TO IUN31 11 REQUESTED.

0 3 4 3600000.................(4031350030)

REPT RING CFRRING CONFIGURATION ESTABLISHED (695 ms)

(4031350049)(4031350422)

RAC PARITY/FORMAT ERROR DETECTED. IUN31 11 RAC 0.

........................................(4031349712)

........................................(4031349722)

BLOCKAGE DETECTED, IUN31 10 RAC 0.........................................(4031349727)

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

CMD FUNCTION

7/29/2019 172254

Ring Maintenance

OP:RING;DETD

DGN IUN31 11 COMPLETED ATP MESSAGE IN PROGRESS

REPT RING CFR

(4031519404)(4031519780)

RST IUN31 11 COMPLETED

DGN IUN31 11 ATP MESSAGE COMPLETE

REPT ARR AUTORST

ARR COND RST FOR IUN31 11 SUCCEEDED

OP:RING;DETD

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

RING STAT: ACTIVE

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

7/29/2019 172254

401-661-045

Manual Recovery from a Hard Fault on a Small Ring

Small rings with padded interframe buffers are subject to ring fragmentation—a

condition that causes the ring to go down. Ring fragmentation will occur when an

isolation that includes padded buffers shortens an active ring below its minimumdata length. Padded buffers are employed redundantly in pairs at opposite sidesof the ring. Thus a single-node isolation on a small ring will never include both

pairs, while in many cases a two-node isolation will. Nevertheless, a single-nodeisolation on small rings can pose problems because of the common need, arisingfrom low-phase ambiguity, to extend isolations to include the BISO or EISO node.

(For a discussion of this issue, see the section ``Low-Phase Ambiguity'' in thischapter.) Isolations on small r ings often include one pair of padded buffers, and

extending the isolation would often include the other pair as well. The conditionsthat give rise to this problem are illustrated in the following two figures.

Figure 3-4. Manual Recovery - Method One

Padded Interframe Buffers

Isolated Ring

Active Ring

BISO NodeIUN32 1

RPCN32 0

RPCN00 0

EISO NodeRAC 0 RAC 1

Isolated Node

7/29/2019 172254

401-661-045

The following example occurs on the four-node ring just il lustrated:

REPT RING CFR

0 1 4 3600000.............................(242674464)

REPT RING CFR

URATION

.......................................................................

(242674474)(242674649)

REPT RING CFR

ISOLATION FROM RPCN32 0 TO RPCN32 0

REQUESTED

0 1 3 3600000.............................(242674676)

REPT RING CFR

(242674689)(242674963)

RAC 0.

......................................................................

............................................(242674346)

In this instance EAR did not receive ordid not report blockage.

REPT ARR AUTORST

ATT COND RST FOR RPCN32 0 STARTED

RMV RPCN32 0 STOPPED 5

7/29/2019 172254

Ring Maintenance

DGN RPCN32 0 PH 1 STF (11 X’00000000

X’00000000)

TEST..................................................................

002...................................................................004...................................................................

005 (X’00000e00)......................................................

006...................................................................

007...................................................................

The failure of test 5 means that low-

phase ambiguity exists in this case; in

other words, the IMS regards either RAC

1 in the BISO node or RAC 0 in the

EISO node, or both, as possibly faulty.

DGN RPCN32 0 PH 2 STF (11 X’00000000

X’00000000)

TEST..................................................................

002...................................................................

004...................................................................

005 (X’00000e00).........................................................

006...................................................................

007...................................................................

RPCN32 0 TERMINATED AT PH 27 STMNT 15 AFTER

TEST 8

ANALY:TLPFILE: RPCN32 0 SUMMARY DATA

TLP: RPCN32 0 PH=1....................................................

TLP: RPCN32 0 PH=2....................................................

T.PFILE COMPLETED

DGN RPCN32 0 COMPLETED STF (21 X’00000000X’00000000)

ANALY TLPFILE RPCN32 0 TLPSRCH

TLPFILE #917573

ANALY TLPFILE RPCN32 0 SUSPECT FLTY EQUIP-

UN122C 32 0 -- -- 10 --

UN123B 32 0 -- -- 10 --

CABLE -- -- -- -- 10 3

The extended TLP output message

does not identify equipment in the BISO

or EISO node as faulty, because the ring

interfaces of these nodes are necessar-

ily classified as usable.

RST RPCN32 0 STOPPED 1

DGN RPCN32 0 STF (21X’00000000 X’00000000)

7/29/2019 172254

401-661-045

REPT ARR AUTORST

ARR COND RST FOR RPCN32 0 FAILED

Failure of the ARR restoral attempt

results in the maintenance mode of the

node being changed to manual.

OP:RING;DETD

. The isolation in this small ring during a

time of heavy traffic creates an emer-

gency condition. Following the proce-

dures for ``Clearing Faults in Response

to ARR Actions” and ``Responding to

Single-Node Isolations,” the technician

elects to change both UN122C and

UN123B in RPCN32 0 but does not trou-

bleshoot the cable. It is possible, of

course, that the fault is in the cable, but

this being a situation involving low-phase

ambiguity, it is far more likely that the

fault, if it is not in the circuit packs of

RPCN32 0, is in the isolated RAC of

either the EISO or BISO node.

DGN RPCN32 0;RAW! Then, this being a phase 1 and 2 failure,

the technician diagnoses the node using

the RAW option so that if phase 1 or 2

still fails, an indication will be given as to

whether the isolated RAC of the BISO or

EISO node is suspected of being faulty.Of course, the problem could be in the

cable of RPCN32 0.

DGN RPCN32 0 TASK 5 MSG STARTED

00AA.............. 01................ 02................

30................ 31................ 32iA..............

7/29/2019 172254

Ring Maintenance

RMV RPCN32 0 STOPPED 5

DGN RPCN32 0 PH 1 STF (11X’00000000

X’00000000)

TEST MISMATCH........................

002...................................................................

004...................................................................

005 X’00000e00......................................................

006...................................................................

007...................................................................

008...................................................................

009...................................................................

010 X’00000e01......................................................

011...................................................................

016...................................................................

017...................................................................

The mismatch data for failing test 10

identifies both IUN32 1 and IUN00 1 as

suspect nodes. (Hexadecimal e01 is

translated by the ``Physical Node

Address Hexadecimal Representation”

table in the reference chapter of this doc-

ument as node 32 1 and hexadecimal

c01 is translated as node 00 1.) In this

situation, the standard procedure calls

for technicians to extend the isolation to

include IUN32 1 or IUN00 1 to perform

maintenance on it. Extending the isola-

tion to include IUN32 1 would in this

instance, however, bring the ring down,

because it would result in the isolation of

both pairs of padded interframe buffers.

DGN RPCN32 0 PH 2 STF (10X’00000000

X’00000000)

TEST MISMATCH

002...................................................................

004...................................................................

005 X’00000e00............................

006...................................................................

007...................................................................008...................................................................

009...................................................................

010 X’00000c01............................

011...................................................................

016...................................................................

017...................................................................

(See the illustration of the ring that

appears at the beginning of this section.)

Therefore, the first action (which to con-

serve space is not shown here) was to

extend the isolation to include IUN00 1

and to perform maintenance on it. This

action, however, did not find a fault in

IUN00 1, and so the isolation was

reduced to include once again only

RPCN32 0, and the MOVFLT option of

the CFR command was employed to shift

the isolation from RPCN32 0 to IUN32 1

as played out below.

DGN RPCN32 0 PH 10 ATP....................

DGN RPCN32 0 PH 11 ATP.....................

DGN RPCN32 0 PH 12 ATP.....................

DGN RPCN32 0 PH 13 ATP.....................

DGN RPCN32 0 PH 20 ATP.....................

7/29/2019 172254

401-661-045

DGN RPCN32 0 PH 23 ATP.....................

DGN RPCN32 0 PH 24 ATP.....................

DGN RPCN32 0 PH 26 ATP.....................

DGN RPCN32 0 PH 27 ATP..................... Unuseful output generated by the DGN

RAW option could have been stopped by

terminating DGN with the STOP:DMQ

command.

DGN RPCN32 0 TERMINATED AT PH 27

STMNT 15 AFTER TEST 3

DGN RPCN32 0 STF (21 X’00000000 X’0000000).........

RMV:LN32 1 In preparation for entering the CFR com-

mand, the node specified in the com-

mand must be removed from service.

RMV IUN32 1 TASK 0

RMV IUN32 1 COMPLETED

OP:RING;DETD

REPT RING CFR

WARNING: BISO AND/OR EISO NODE OOS

BISO NODE - IUN00 1, EISO NODE =IUN32 1

ACTIVE RING SEGMENT NOT LONG ENOUGH

Removing a BISO or EISO node from

service would ordinarily cause the isola-

tion to extend to include the out-of-ser-vice node. In this case it does not,

however, because IMS calculates that

doing so would shorten the ring below its

minimum data length.

RING STAT: RESTORING

00AA.............. 01................ 02................

30................ 31................ 32iO..............

7/29/2019 172254

Ring Maintenance

Responses to Multiple, Ring-Related Faults

The following two examples of ring-recovery actions occur in response to multiple

faults of the kind that disrupt the transportation of messages on the ring.

Manual Recovery from Multiple Hard Faults

Multiple faults have the potential of creating massive isolations. Because theyusually develop as extensions of single faults, they are best avoided by prompt

and effective attention to single faults. The history of the following massiveisolation is typical. In the first stage, a single node is isolated, diagnosed at the

CFR:RING,IUN32 1;MOVFLT! With the suspect IUN32 1 quarantined

out-of-service, the technician enters the

MOVFLT version of the CFR command

to shift the isolation to include IUN32 1.

REPT RING CFR

BISO NODE = RPCN32 0, EISO NODE = RPCN00 0

(243506608) (243506934)

REPT ARR AUTORST

CNR UCL REST FOR RPCN32 0 STARTED

ARR undertakes its highest-priority task,

the restoral of a node designated as a

BISO or EISO node.

CFR RING IUN32 1 COMPL The isolation shifted, the ring now has

the structure of the second illustration at

the beginning of this section, and the

probable fault in IUN32 1 may now be

corrected.

BISO: RPCN32 0 EISO: RPCN00 0

00AA.............. 01................ 02................

30................ 31................ 32Ai..............

7/29/2019 172254

401-661-045

request of ARR as RI faulty, and its maintenance mode changed to manual. Then,

before the technician can repair and return it to service, another ring-related faultoccurs on a distant part of the ring, with the result that the many nodes lying

between the two faulty nodes must be removed from service as victims of the

expanded isolation.

The first stage of this example is identical to the example recorded above in``Manual Recovery from a Hard Fault,'' except that the massive isolation

intervenes before the first fault can be repaired.

REPT RING CFR

.....................................................(4030772385)

Prompted by a ring transport error

report, EAR level-0 requests that the ring

config module restart the ring.

REPT RING CFR

URATION

.......................................................................(4030772397)(4030772536)

The continuity test run by the ring config

module failed, an indication that the fault

is probably hard.

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

CMD FUNCTION

7/29/2019 172254

401-661-045

The 1105 display page now looks as follows:

RMV IUN31 11 STOPPED 5 RTR message announcing that it could

not remove IUN31 11 from service

(because EAR had done so previously).

DGN IUN31 11 PH 1 STF (9 X’00000000 X’00000000)

004...........................................................

005 X’00000dfb................................................

006...........................................................

008...........................................................

009...........................................................

Indicates that during phase 1 diagnos-

tics, some tests (nine in all) failed and

none (X’00000000 X’00000000) were

skipped. IUN31 11 is not necessarily thenode in which phase 1 failed, but the

node specified in ARR’s diagnostic

request. Since phases 1 and 2 test all

RACs in the isolated segment, the fault

that produces a phase 1 or 2 failure may

not reside in the specified node. The fail-

ure of test 005 indicates that, in this

instance, low-phase ambiguity exists; in

other words, that both a RAC of the iso-

lated node and a RAC of either the EISO

or BISO node is suspected of being

faulty. See the ̀ `Low-Phase Ambiguity”

section in this chapter.

ARR RESTORE COND IUN31 11

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

CMD FUNCTION

7/29/2019 172254

Ring Maintenance

DGN IUN31 11 PH 2 STF (10 X’00000000

X’00000000)

002...........................................................004...........................................................

005 X’00000dfb................................................

006...........................................................

007...........................................................

DGN IUN31 11 terminated at ph 2 stmnt 36 after test

Phase-1 diagnostics test the isolated

segment beginning at the BISO node

and phase-2 tests them beginning at the

EISO node. In the case of single-node

isolations, the two phases should reportfailure data for the same node(s), but in

the case of multiple-isolations they usu-

ally report failure data for different nodes.

Indicates the point in the diagnostic rou-

tine at which execution terminated.

ANALY:TLPFILE: IUN31 11 SUMMARY DATA MSG

STARTED

TLP: IUN31 11 PH=1....................................................TLP: IUN31 11 PH=2....................................................

TLPFILE COMPLETED

DGN IUN 31 11 COMPLETED STF

(19...................................)

Summarizes diagnostic failure data.

Phases cited are those that failed; but

because phases 1 and 2 are at issue,

IUN31 11 is not necessarily the locationof the failure.

ANALY TLPFILE IUN31 11 TLPSRCH MSG IP

TLPFILE #983090

Short form of this message. The longer

form is next.

UN303 31 11 -- -- 10 --

CABLE -- -- -- -- 10 3

This data is printed only after a test fails

and only if the TLP option was specified

in the DGN command (as it always is byARR). The entry lists in weighted (WT)

order equipment suspected of being

faulty. The “WT” is a number between 1

and 10. The higher the WT the greater

the likelihood of the equipment being

faulty. Because ARR does not specify

the RAW option of the DGN command,

failure data for test 010 is not given. (See

the ``Low-Phase Ambiguity” section of

this chapter.)

RST IUN31 11 STOPPED 1 Because of diagnostic failure (error code

DGN IUN31 11 STF..............................................MSGCOMPL

7/29/2019 172254

401-661-045

REPT ARR AUTORST

ARR COND RST FOR IUN 31 11 FAILED

Confirms that ARR’s restoral request has

failed. Many IMS processes write to the

ROP, at times resulting in some redun-

dancy.

OP:RING;DETD Manual input message.

OP:RING, IUN31 11 Manual input message.

IUN31 11: MJ = OOS; NM = MAN; RI = FLTY ; NP =

IN ISOL SEG

Like the TLP output above, this data

does not reflect the low-phase ambiguity.

REPT RING CFR

ISOLATION FROM IUN31 11 TO IUN31 11

REQUESTED.

0 1 4 3600000................(403082426)

Before the technician can respond to the

single isolation, another fault occurs.

EAR level-0 attempts to restart the ring

in conformity with its isolated structure

prior to the occurrence of the second

fault.

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

7/29/2019 172254

Ring Maintenance

REPT RING CFR

COULD NOT ESTABLISH BISO NODE = IUN31 10,

EISO NODE = IUN31 12

......................................................................

(403082441)(403082625)

Ring config’s continuity test failed...

REPT RING CFR

REQUESTED.

0 2 4 3600000.................(403082654)

so the isolation must be extended to

include both nodes suspected of having

faulty ring interfaces.

RMV RPCN 32 0 RQSTD; RPC ISOLATION RPTD

...................................(403082796)

This messagenotifies the technician that

an innocent-victim RPCN is being

included in the extended isolation.

REPT RING CFR

(403082671)(403082031)

The multiple-node isolation is now estab-

lished.

RAC 0.

........................................(403082306)

......................................................................

........................................(403082316)

......................................................................

........................................(403082322)

REPT ARR AUTORST

Having failed previously (during the sin-

gle isolation stage) to restore IUN31 11,ARR now selects IUN32 6 for a condi-

tional restoral attempt.

7/29/2019 172254

401-661-045

DGN IUN32 6 PH 1 STF (9 X’00000000 X`00000000)

TEST....................................................................

004.....................................................................

005 X’00000dfb.........................................................

006.....................................................................

008.....................................................................

009.....................................................................

Phase-1 diagnostic tests begin running

from the BISO node. Therefore, they

identify IUN31 11 as faulty.

DGN IUN32 6 PH 2 STF (11 X’00000000

X`00000000)

TEST....................................................................

002.....................................................................

004.....................................................................

005 X’00000e06.........................................................

006.....................................................................

007.....................................................................

Phase-2 diagnostic tests begin running

from the EISO node. Therefore, they

identify IUN32 6 (e06) as faulty. The fail-

ure of test 005 of phase 2 indicates that

low-phase ambiguity exists surrounding

IUN32 6. Probably, though not certainly,

IUN32 5, whose ring interface is sus-pected to be faulty, is the node involved

in this instance of low-phase ambiguity.

DGN IUN32 6 TERMINATED AT PH 2 STMNT 36

AFTER TEST 17

ANALY:TLPFILE: IUN32 6 SUMMARY DATA

TLP: IUN32 6 PH=1........................................................

TLP: IUN32 6 PH=2........................................................

TLPFILE COMPLETED

DGN IUN32 6 COMPLETED STF (20..................)

ANALY TLPFILE IUN 32 6 TLPSRCH

TLPFILE # 1179716

UN303 31 12 -- -- 10 --

UN303 31 11 -- -- 10 --

CABLE -- -- -- -- 10 3

Contrast this output with the TLP output

when IUN32 11 was singly isolated. Both

then and now the ring interface of IUN31

12 was suspect. The difference is that

when the suspect RAC of IUN31 12 was

part of an EISO node, its ring interface

could not be set to FLTY. IUN32 6 is not

included because the TLP output

reflects only the first failing phase.

RST IUN32 6 STOPPED 1

DGN IUN32 6 STF (20 X`00000000 X`00000000)

7/29/2019 172254

Ring Maintenance

REPT ARR AUTORST

ARR COND RST FOR IUN32 6 FAILED

OP:RING;DETD

Notice that thesubnumbers produced by

the OP:RING;DETD command indicatethat, as a result of low-phase ambiguity,

four nodes are suspected of having

faults in their ring interfaces. Because

none of the four is now in the active ring

as an EISO or BISO node, each can

have its ring interface minor state

marked FLTY.

DGN:IUN31 11;RAW! In accordance with the procedures,

``Responding to Multiple-Node Isola-

tions” and ``Clearing Faults in Response

to ARR Actions,” a technician replaces

circuit pack UN303 in IUN 31 11 and

submits the node to automatic diagnos-tics with the RAW option.

DGN IUN31 11 TASK 8 MSG STARTED

00AAAAAAAAAAAA.... 01................ 02................

30................ 31.AAAAAAAAAAiiiii 32iiiiiiiAAAAA....

63.AAAAAAAAAAAAAAA

7/29/2019 172254

401-661-045

DGN IUN31 11 PH 1 (STF (10X’00000000

X’00000000)

TEST....................................................................

004.....................................................................

005 X’00000e05...........................................

006.....................................................................

007.....................................................................

008.....................................................................

009.....................................................................

010 X’00000e06........................................................

011.....................................................................

016.....................................................................

017.....................................................................

This output from the manual diagnostic

request with the RAW option shows

IUN32 5 and IUN32 6 as suspected of

having faulty ring interfaces, implying

that IUN31 11 and IUN31 12 have

passed phase 1, a condition that should

cause their ring interface states to

change to QUSBL.

REPT ARR AUTORSTR

Having failed to restore IUN31 11 and

IUN32 6, ARR now attempts to restore

IUN31 12. This automatic action occurs

at nearly the same time as the manual

diagnostic procedure.

RST IUN31 12 QUEUED TASK 0

DGN IUN31 11 PH 2 STF (11 X’00000000

X’00000000)

TEST....................................................................

002.....................................................................004.....................................................................

005 X’00000e06..........................................................

006.....................................................................

007.....................................................................

008.....................................................................

009.....................................................................

010 X’00000e05........................................................

011.....................................................................

016.....................................................................

017.....................................................................

DGN IUN31 11 TERMINATED AT PH 2

STMNT 36 AFTER TEST 17

7/29/2019 172254

Ring Maintenance

DGN IUN31 11 COMPLETED STF (21...........)

RST LN31 12 TASK 9 ARR restoral request on IUN31 12

started.

DGN IUN31 12 PH 1 (STF (10X’00000000

X’00000000)

TEST....................................................................

004.....................................................................

005 X’00000e05.........................................................

006.....................................................................

007.....................................................................

008.....................................................................

This is output from ARR’s restoral

request.

DGN IUN31 12 PH 2 (STF (11X’00000000X’00000000)

TEST..................................................................request.

004.....................................................................

005 X’00000e06.........................................................

006.....................................................................

007.....................................................................

008.....................................................................

AFTER TEST 17

TLP: IUN31 12 PH=1......................................................

TLP: IUN31 12 PH=2......................................................

UN303 32 6 -- -- 10 --

UN303 32 5 -- -- 10 --

CABLE -- -- -- -- 10 3

Only the extended TLP message explic-

itly identifies the node(s) within the isola-

tion that may have failed diagnostic

phases 1 and 2.

7/29/2019 172254

401-661-045

REPT RING CFR

(403041870)(403042272)

This action was triggered by the auto-

matic RST command, which concludes

with a request that as much as possible

of an isolated segment be included in

the active ring. The isolated segment isnow reduced to the two nodes whose

ring interfaces are still suspected of

being faulty.

DGN IUN 31 12 STF...................................................

REPT ARR AUTORST

CNR UCL RST FOR IUN32 4 STARTED

The new BISO node, having been an

innocent victim of the isolation, was out-

of-service. Restoring a BISO or EISO

node is the highest priority of ARR.

REPT ARR AUTORST

CNR UCL RST FOR IUN32 4 SUCCEEDED

REPT ARR AUTORST

Having previously attempted and failed

to restore IUN32 6, ARR now attempts

to restore IUN32 5. Consult the section

``Restoral Priorities Rule” in this chapter

for an explanation of ARR’s behavior in

the remainder of this example.

DGN IUN32 5 PH 1 (STF (10X’00000000 X’00000000)

TEST....................................................................

004.....................................................................

005 X’00000e05.........................................................

006.....................................................................

007.....................................................................

008.....................................................................

This is output from ARR’s restoral

request for IUN32 5.

7/29/2019 172254

Ring Maintenance

DGN IUN32 5 PH 2 (STF (11X’00000000 X’00000000)

TEST..................................................................request.

004.....................................................................

005 X’00000e06.........................................................

006.....................................................................

007.....................................................................

008.....................................................................

AFTER TEST 17

TLP: IUN32 5 PH=1........................................................

TLP: IUN32 5 PH=2........................................................

ANALY TLPFILE IUN31 12 / SUSPECT FLTY EQUIP-MENT

UN303 32 6 -- -- 10 --

UN303 32 5 -- -- 10 --

CABLE -- -- -- -- 10 3

RST IUN32 5 STOPPED 10

DGN IUN32 5 STOPPED COMPLETED

REPT ARR AUTORST

ARR UCL RST FOR RPCN32 0 STARTED

Having attempted to restore all nodes

whose ring interfaces are possibly faulty,

ARR now unconditionally restores the

innocent victim RPCN...

RST RPC32 0 COMPLETED

REPT ARR AUTORST

ARR UCL RST FOR IUN31 13 STARTED

and then the innocent victim IUNs. (The

ROP output concerning restoral of the

innocent victim IUNs is omitted from this

example.)

REPT ARR AUTORST

ARR UCL RST FOR IUN31 13 SUCCEEDED

OP:RING;DETD

7/29/2019 172254

401-661-045

OP:RING, IUN31 11

IUN31 11: MJ = OOS; NM = MAN; RI = QUSBL; NP =

IN ACT RING

OP:RING, IUN31 12

IUN31 12: MJ = OOS; NM = MAN; RI = QUSBL; NP =

IN ACT RING

Notice that IUN31 11 and IUN31 12 are

now quarantined and in the manual

mode. They are in the manual mode

because ARR previously failed to restore

them. They are quarantined—classified

as QUSBL—because no diagnostic

phases higher than 2 have been run on

them and, therefore, IMS cannot know

that their ring-interface hardware (except

for the hardware tested by phases 1 and

2—that is, the hardware that propagates

messages on the ring) is usable.

00AAAAAAAAAAAA.... 01................ 02................

30................ 31.AAAAAAAAAAOOAAA 32AAAAAiiAAAAA....

63.AAAAAAAAAAAAAAA

7/29/2019 172254

Ring Maintenance

RST:IUN32 6:TLP Following standard procedures, the tech-

nician now assigns priority to performing

maintenance on the remaining isolated

segment. Choosing IUN32 6 because it

was an external isolated node in themassive isolation, the technician

changes the circuit pack indicated in the

original TLP message and then condi-

tionally restores the node to service.

(Although manual restoral requests take

priority over automatically requested

conditional restorals, the former can

occur in parallel with automatically

requested unconditional restorals, such

as are occurring. Therefore, the techni-

cian felt free to conditionally restore

IUN32 6. If a conflict had existed, allow-

ing the rapid recovery of the many inno-

cent victim nodes to proceed withoutinterruption would usually make sense.

The decision to conditionally restore

IUN32 6 rather than to follow the some-

what slower procedure of running diag-

nostics on it with the RAW option was

dictated by the high probability that

IUN32 5 is the other node involved in this

instance of low-phase ambiguity.)

REPT ARR AUTORST

RST:IUN31 11 TASK 1

REPT ARR AUTORST

DGN IUN31 11 COMPL CATP (X’00000000X’40000000)

See the OM under DGN IUN, Bit 30,which indicates that all phases did not

run because the node under test was not

the only isolated node.

7/29/2019 172254

401-661-045

RST IUN31 15 COMPLETED ROP output concerning ARR’s uncondi-

tional restorals of the remaining innocent

victims is omitted from this example.

DGN IUN32 6 COMPL CATP (X’00000000 X’40000000)

REPT RING CFR

(403431319)(403431699)

That IMS is dissolving the remaining iso-

lation, returning the ring subsystem to a

two-ring structure, indicates the fault was

located in IUN32 6.

OP:RING;DETD!

OP RING COMP

RING STAT: ACTIVE

00AAAAAAAAAAAA.... 01................ 02................

30................ 31.AAAAAAAAAAOOAAA 32AAAAAOAAAAAA....

63.AAAAAAAAAAAAAAA

7/29/2019 172254

Ring Maintenance

Automatic Recovery from Two Intermittent Faults

In the following example of ring maintenance, two staggered intermittent faultsoccur at intervals that frustrate successive EAR recovery attempts by repeatedly

violating the 5-second confidence intervals. In this manner the faults drive EAR tolevel 4 before it can establish a stable, usable ring. The sequence of automatic

actions culminates in a restored system. It, therefore, requires the technicians toonly record the occurrences and locations of the two intermittent faults.

This episode occurs in the following ring:

RST:IUN31 12! Now the only task remaining for the tech-

nician is to conditionally restore the

remaining out-of-service nodes, none ofwhich will be handled by ARR, since

they are all in the manual mode. Proba-

bly none of the out-of-service nodes will

contain faults, since one has had its ring-

interface circuit pack replaced and the

other two were designated as possibly

faulty as a result of low-phase ambiguity.

Nevertheless, the technician restores

them conditionally to be certain that a

fault undetected in one of them does not

lead to another massive isolation. If

while diagnostics are run on these

nodes, a fault were to appear elsewhere

in the ring, IMS would avoid a massiveisolation by immediately returning the

node being diagnosed to the active ring.

REPT RING CFR

(403490173)(403490559)

The predictable action that concludes

this example is not reproduced.

7/29/2019 172254

401-661-045

REPT RING CFR

0 1 4 3600000.......................(4034364845)

A ring-related fault stimulates EAR to a

level-0 attempt (restart) to recover the

REPT RING CFR

(4034364857)(4034365210)

The restart succeeds initially, but...

.......................................................................

............................................(4034364730)

.......................................................................

............................................(4034364740)

00AAAAAAAAAAAA.... 01................ 02................

63.AAAAAAAAAAAAAAA

CMD FUNCTION

7/29/2019 172254

Ring Maintenance

.......................................................................

............................................(4034364745)

REPT RING CFR

REQUESTED

0 1 4 3600000.......................... (4034368158)

...another fault occurs less than 3 sec-

onds into the recovery, thereby, driving

EAR to escalate to a level-1 attempt to

isolate the faulty node.

REPT RING CFR

(4034368175)(4034368492)

The isolation succeeds momentarily,

but...

.......................................................................

............................................(4034368041)

.......................................................................

............................................(4034368051)

.......................................................................

............................................(4034368056)

UNEXPLAINED LOSS OF TOKEN REPORTED ON

BOTH RINGS.

...within the confidence interval the

3B21D receives notice that the token is

lost without receiving other error reports.

REPT TOKEN TRACK

TOKEN WAS LOST BETWEEN IUN32 5 AND IUN32 6

ON RING: 0

The token-track module reports the

probable location where the token left

the ring.

REPT RING CFR

0 1 4 3600000.............................(4034373503)

When unexplained loss of token occurs

during the confidence interval of levels 0

or 1, EAR jumps to level 3.

7/29/2019 172254

401-661-045

REPT RING CFR

(4034374032)(4034374330)

EAR level-3 tests for continuity in the

rings. Because the tests succeed, EAR

directs ring configuration to establish the

normal, two-ring structure. The success

of the ring continuity tests are the firstclear indication that the recent faults are

transient in nature.

REPT RING CFR

0 1 4 3600000..............................(4034376599)

But again the confidence interval fails, so

EAR escalates to level 4.

REPT RING CFR

(4034384478)(4034384790)

Level 4 also finds continuity in the rings

and directs ring configuration to estab-

lish the normal, two-ring structure. In this

instance the recovery out lasts the confi-

dence interval, thereby, ending this epi-

sode of EAR escalation. Evidently the

episode was triggered by two transientfaults. The location of one fault is sug-

gested by the short-lived, level-1 isola-

tion of IUN31 11. The location of the

other was identified by token track as

between IUN32 5 and IUN32 6. The

technician who witnesses these events

should record the occurrences and loca-

tions of the two intermittent faults and

perhaps should retain the ROP output of

this unusual episode.

7/29/2019 172254

401-661-045

ContentsLoading Memory 4-24

Reading Memory 4-24

Loading and Dumping RGRASP Utility Variables (UVARs) 4-25

Feature Activation 4-25Feature Deactivation 4-25

s Equipment Configuration Data (ECD) 4-25

s Recent Change Procedures 4-25

s Measurement 4-25

s Network Management Impact 4-26

s Maintenance/Troubleshooting Impact 4-26

s Recording 4-27

s Output Messages 4-30

s Audits 4-31

s Critical Events 4-31

s Support Tools 4-31s Related Documentation Cross-References 4-31

7/29/2019 172254

Ring and Ring Node MaintenanceProcedures

Introduction

This guide serves as an aid in performing ring and ring hardware maintenancefunctions. It contains procedures used in detecting, troubleshooting, and clearing

faults associated with the ring and ring hardware. The procedures detailed in thisguide are only guidelines for resolving ring-associated maintenance problems,

and are not the only methods that may be used in performing ring maintenance.

A system called trace provides a formal mechanism for embedding tracepoints

within application code for use in testing and debugging. The system collects andforwards the trace messages produced by individual tracepoints to one or more

destinations, including log files, ROPs and MCRTs. The tracepoints arecontrolled, so a related group scattered throughout the software can be turned on/

off at will. The parameters can also be set and changed using craft commands.The trace system is created automatically by during its initialization. Also, the usermay create it manually. The tracepoints are designed to generate little overhead

when disabled, but when used improperly, the trace system can consume largeamounts of system resources while yielding little useful information.

Craft commands allow one to totally inhibit all tracepoints, so that no trace

messages are generated and the trace system uses little overhead, or to enablesubsets of the tracepoints, thus restricting trace output to only that dealing with

selected portions of application code. ALW:TRACE and INH:TRACE provide thebasic on/off switch for trace. Until ALW:TRACE is invoked, no trace messages canbe generated and logged under any circumstances. Similarly, once INH:TRACE is

invoked, trace becomes totally dormant except for a certain amount of fixedoverhead. If trace is inhibited, the SET:TRACE command allows one to specify

7/29/2019 172254

401-661-045

which tracepoints are active once trace is again enabled or, if trace is active, the

command allows one to control the tracepoints during operation. The command,OP:TRACE, presents a summary of the current status of trace. The output

message, REPT TRACE, reports a tracepoint from a 3B21D computer process or

a node processor. The output message REPT TDTP indicates that the traceprocess has encountered a hardware or software fault. It should also be noted

that the trace process is terminated when the system enters disk independentoperation; see the 401-610-055 FLEXENT™/AUTOPLEX ® Wireless Networks

INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT™/ AUTOPLEX ® Wireless Networks OUTPUT MESSAGES Manual.

Ring maintenance functions for a office serve to detect, troubleshoot, and clear allfault conditions associated with the ring and ring hardware. The most common

fault conditions associated with the ring are the following:

s Ring node out-of-service (OOS)

s Single ring node isolation

s Multiple ring node (RN) isolation

s Ring down.

Another less common fault condition on the ring is unexplained loss of token.

These fault conditions are discussed in the remainder of this section. Foradditional information on ring maintenance.

Direct link nodes (DLNs) follow the same guidelines as link nodes (s) in thissection. CDN-I nodes also follow these guidelines except for removing ring

application processor (RAP) circuit packs which require the power be turned offbefore circuit pack (CP) extraction.

7/29/2019 172254

Ring and Ring Node Maintenance Procedures

Ring Fault Conditions and Maintenance

Approach

The information contained in this guide provides a maintenance approach foreach ring fault condition listed above. These guidelines should be used only afterthe automatic ring recovery (ARR) has completed its attempt or has restoredfaulty ring nodes. For additional information concerning the use of ARR, refer to

the “Maintenance Description” section in the this Manual.

Ring Node Out-of-Service

A ring node can be removed from active service and placed in the Out-Of-Service(OOS) state for many reasons. An RN may be placed in either of the OOS

maintenance states (OOS-NORMAL or the OOS-ISOLATED state). When a nodeis placed in the OOS-ISOLATED state, the node is first removed from service

(OOS-NORMAL) and then isolated from the active ring (OOS-ISOLATED). Whena node is removed from service for maintenance or fault detected purposes that

does not interfere with the operation of system functions, the node may be takenOOS-NORMAL. The isolated node is not able to communicate or perform normal

node functions with the ring, but is capable of performing and handlingmaintenance functions. In the OOS-NORMAL state, the node is said to bequarantined. The OOS maintenance states may be observed from the

maintenance CRT (MCRT) on the 1106 display page. For additional informationconcerning OOS nodes in the quarantine state, refer to the “Maintenance

Description” section in this Manual.

Ring Node OOS Maintenance Approach

This maintenance approach provides information which aids in diagnosing,correcting faults, and restoring nodes to active service. When a node is

quarantined, it is not allowed to communicate with either the 3B21D computer, orthe ring. When a node is quarantined, the state of the ring interface is quarantine

usable (QUSBL). To verify this state, refer to the OP:RING command in the 401-610-055 FLEXENT™/AUTOPLEX ® Wireless Networks INPUT MESSAGES

Message Manual or the 401-610-057 FLEXENT™/AUTOPLEX ® WirelessNetworks OUTPUT MESSAGES Manual. In cases where a node is in the OOS(quarantined) state, the most likely cause of this failure is the node processor (NP)

or link interface. Listed below are guidelines to be used in troubleshooting,correcting, and restoring quarantined nodes to service.

Assumption: An equipment malfunction has been detected, the fault recoverysoftware has removed the node from service and placed it in the OOS-NORMALmaintenance state, where xx and yy are active nodes. The ARR has attempted torestore the node to service and has failed (manual action is required).

7/29/2019 172254

401-661-045

Figure 4-1. Ring OOS Normal

Procedure 4-1. Ring Node OOS Maintenance Guidelines

1. Determine the reason(s) the node hasbeen taken OOSand placed in the quarantinestate. Diagnose the faulty (OOS-NORM) node. Use guidelines presented in Chapter

6, Diagnostic User's Guide.

Does the node remain OOS-NORMAL?

No—DONE.

Yes—Proceed to next step.

2. If the node remains OOS-NORMAL, then starting with the OOS-NORMAL node,

isolate and replace all RN CPs in the order of the NP, the link interface, ring interface

0 (RI0), and RI1, and then perform a conditional restore. For very large scale

integration (VLSI) RNs, replace the integrated ring node (IRN) circuit pack and thenthe link interface. If the trouble clears after replacing the CPs in the order listed,

when office traffic is minimal, the original CP(s) should be reinserted one at a time in

the node, and diagnostics should be run to determine the faulty CP(s). If the

diagnostics fail to detect the faulty CP(s), but the previous CP replacements cleared

the trouble, then the CP(s) should be saved, noting the failure conditions. Inform the

CTS of this condition.

3. After replacing the CP(s), if the node still remains OOS, then check the equipment

for shorts, loose wiring, bent or broken pins, etc., and correct any problems

discovered. Also, check to see if proper equipment has been used with the long

message option.

4. Diagnose node (xx) adjacent to the faulty node using guidelines in Chapter 6,

Diagnostic User's Guide .

If problems are located, correct and restore node (xx) to service.

OOS-NORMxx yy

7/29/2019 172254

NOTE:Perform an unconditional restore on the OOS-NORMAL node using the commandRST:nodexx y ;UCL

where:

For LN—

node = LN

x = node member numbery = node member number

UCL = restores the node without performing diagnostics.

For RPCN—

xx = group number

! CAUTION:Do not perform an unconditional restore unless one of the following has occurred:

s A complete diagnostics has produced an all-tests-passed (ATP)response.

s A complete diagnostics has produced a conditional all-tests-passed (CATP) response and the RI and the NP minor states are both usable (USBL).

Does the faulty node remain OOS-NORMAL?

No—DONE.

5. Diagnose node (yy) adjacent to the faulty node.

If problems are located, correct and restore node (yy) to service.

NOTE:Perform an unconditional restore on the OOS-NORMAL node using the command

RST:nodexx y ;UCL

where:

7/29/2019 172254

401-661-045

Figure 4-2. Single Node Isolation

Procedure 4-2. Single-Ring Node Isolation Maintenance Guidelines

1. Diagnose the isolated and faulty node using diagnostic guidelines listed in Chapter6, Diagnostic User's Guide . If the isolation still exists after using these guidelines,

proceed to next step.

2. If after diagnosing and troubleshooting the isolated node, the node does not restore

to activeservice (thereby eliminating the isolated segment), diagnosetheBISO node

using guidelines listed in Chapter 6, Diagnostic User's Guide .

If the ring is too small to allow the adjacent nodes to be isolated, the isolation must

be moved.

To diagnose the BISO node, the node must be excluded from the active ring. To

accomplish this, use the RMV command. See the 401-610-057 FLEXENT™/ AUTOPLEX ® Wireless Networks OUTPUT MESSAGES Manual. When the BISOnode is removed from service (OOS-NORM), it is automatically included in theisolated segment (OOS-ISOLATED). The application may restrict the RMV

request.

If the request is accepted, proceed with diagnostics as usual.

If the request is denied, it may be necessary to input the command to remove theapplication's node from service and to diagnose the node.

Put the signaling link (SLK) in the AVAILABLE-Manual Out-of-Service (MOOS)state, type the following message into the MCRT, and proceed with diagnostics as

usual:

CHG:SLK (a, b, [c, d] ); MOOS

where: a = group number (00 - 63)

isolatedBISO EISO

7/29/2019 172254

401-661-045

to replace CPs, and to restore the r ing to an operational state. The second

approach (B) details guidelines that should be used when the load on the CNI isminimal. The first approach is not intended to be used as the total maintenance

approach, and should only be used when time does not allow for diagnostic

testing. Otherwise, approach ``B'' should be used whenever possible.

7/29/2019 172254

Multiple Node Isolation Maintenance Approach

A multiple node isolation occurs when there are two or more failures that occur on

the ring, causing a potentially large isolated segment. This maintenance approach

provides information which aids in testing, repairing, and restoring nodes inisolation to minimize the effect on service. When there is an isolated segment ofmultiple nodes, with an established BISO and EISO node, the most probable

faulty node(s) are the isolated nodes adjacent to the BISO and EISO nodes. Thisis assumed because both the BISO and EISO nodes of a multiple node isolationare most likely to be established adjacent to the faulty node when attempting to

recover from ring error conditions. Therefore, by troubleshooting the nodesadjacent to the BISO and EISO nodes, faults are corrected with the least amount

of time and service interruption. For a more complete explanation of BISO andEISO node information, refer to the “Maintenance Description” section in this

Manual.

Assumption: An equipment malfunction has been detected, the fault recovery

software has removed multiple nodes from service, reconfigured the ring, andformed an isolated ring segment around the faulty nodes. The ARR has attempted

to restore the nodes to service and has failed.

NOTE:If multiple nodes are isolated within a segment, the test approach is to diagnosethe isolated node adjacent to the BISO node first, and then the isolated node

adjacent to the EISO node. See Figure 4-5. Next, the nodes (xx and yy) must bediagnosed. After these nodes are diagnosed, the BISO and then the EISO nodes

are diagnosed. Nodes are diagnosed in this manner because the most probabletrouble nodes are established next to, or close to BISO and EISO nodes. There

may be other nodes within the isolated segment that are not faulty but are

included in the isolated segment because they are between the two faulty nodes.When performing maintenance on a multiple node isolation, one should attempt to

clear problems associated with either the BISO or the EISO end of the segment toform a single node isolation. Once the single node isolation has been established,

follow the single-node isolation test approach.

It has been determined that there are two or more faulty nodes in an isolatedsegment, and all faulty nodes have been removed from service and isolated fromthe active ring.

7/29/2019 172254

401-661-045

Figure 4-5. Two or More Faulty Nodes

The xx, yy, and zz represent nodes that are in the isolated segment and may or

may not be faulty.

Procedure 4-3. Multiple-Ring Node Isolation Maintenance Guidelines - A

This maintenance approach does not detail direct procedures, but insteadprovides the user with an understanding about what may be done differently from

Approach B to reduce time consumed in restoring the ring and ring hardware.

1. Have “tested good'' link node CPs available.

2. When a multiple fault occurs that isolates two or more nodes, causing innocent

nodes to become OOS and included in an isolated segment as depicted in the

diagram above (xx, yy, zz), then perform the following:

a. Replace all CPs within the node at either end of the isolatedsegment, and perform a conditional restore on the node. Be certainto place all replaced CPs in protected static packaging.

b. After problems are cleared at either end, and the isolation clears or

is reduced in size, then the innocent OOS nodes should restore toactive service automatically, possibly leaving only a single isolatednode at the other end.

3. Diagnose and correct all problems associated with the node left isolated.

Troubleshoot the node in this manner to avoid including innocent nodes in theisolated segment.

4. When office traffic is minimal, replace the original CPs in the faulty node where the

CPswere originally replaced, anddiagnose (troubleshoot) it until the faulty CP(s) are

located.

BISO iso 0 xx yy zz iso 1

7/29/2019 172254

5. Place all otherCPs in the original static wrapping, andstore them (the ` t̀ested good''

CPs) for possible, future faults.

Procedure 4-4. Multiple-Ring Node Isolation Maintenance Guidelines - B

1. Diagnose iso 0 using guidelines listed in Chapter 6, Diagnostic User's Guide .

NOTE:If the fault in iso 0 is corrected and the node is restored to service, then theisolated segment of the ring is shortened. This creates a new BISO node and

change from a multiple node isolation to a single node isolation, restoring all theinnocent OOS nodes.

Does the original isolation still exist, or is iso 0 OOS-NORMAL?

If an isolation still exists, but has been shortened, and iso 0 is OOS-NORMAL andknown to be usable, unconditionally restore iso 0 to service, and then proceed to

Step 6. Use one of the following commands to restore the node:

s For s, enter RST:xx y;UCL!

s For RPCN, enter RST:RPCNxx yy;UCL

where: xx = group number

y = node member number

s A complete diagnostics has produced an ATP response.

s A complete diagnostics has produced a CATP response, and the RI and the NP minor states are both USBL.

If iso 0 remains OOS-NORMAL, refer to ``Ring Node OOS Maintenance

Approach'' in this chapter.

If the original isolation still exists, proceed to next step.

2. Diagnose node xx using guidelines detailed in Chapter 6, Diagnostic User's Guide .

7/29/2019 172254

401-661-045

If node iso 0 is in the OOS-NORMAL state, and the original BISO node no longer

exists after diagnosing and repairing node xx, then refer to ``Ring Node OOSMaintenance Approach.''

If the above statement is true, and all problems are corrected concerning thesenodes, then a single node isolation may be formed, including a new BISO node,

iso 1, and the EISO node. If this occurs, then refer to ``Single Node IsolationMaintenance Approach'' for the remainder of these guidelines.

If the original isolation still exists after diagnosing node xx and correcting any

problems, then repeat Steps 1 and 2 using nodes iso 1 and yy. If the originalisolation still exists, then proceed to the next step.

3. Diagnose the BISO node.

NOTE:The BISO node is an active node on the ring. To diagnose the BISO node, the

node must be excluded from the active ring. See Figure 4-6. To accomplish this,use the RMV command. See the 401-610-057 FLEXENT™/AUTOPLEX ®

Wireless Networks OUTPUT MESSAGES Manual. When the BISO node isremoved from service (OOS-NORM), it is automatically included in the isolated

segment (OOS-ISOLATED).

Figure 4-6. New BISO Node

The RMV request may or may not be accepted. If the request is accepted,proceed with diagnostics as usual, using guidelines listed in Chapter 6, Diagnostic

User's Guide .

If the request is denied, it may be necessary to remove the node and SLK fromservice, and then diagnose the node.

To put the SLK in the AVAILABLE-MOOS state, type the following message into

the MCRT, and proceed with diagnostics as usual:

CHG:SLK (a, b, [c, d] ); MOOS

NEWiso 0 xx yy zz EISOiso 1BISO

OLDBISOiso

7/29/2019 172254

b = member number (01 - 15)

The following message should appear on the MCRT:

CHG SLK a b [c d]

NEW REQUESTED MINOR STATE = MOOS

c = LI4 circuit pack (0 - 1)

d = LI4 port (0 - 3)

NOTE:After diagnosing and clearing problems associated with the BISO node, if any are

located, restore the node to service using guidelines for restoring all other nodes.

After diagnosing the BISO node, if problems are found and corrected, and if anATP response is received, the BISO node may be deleted, leaving the iso 0 nodein the OOS-NORMAL state. If this occurs, restore iso 0 to service. Refer to ``Ring

Node OOS Maintenance Approach'' in this chapter.

A complete diagnostics has produced a CATP response, and the RI and the NP minor states are both USBL.

If problems are corrected with the BISO, iso 0 , and xx node, then the isolatedsegment of the ring should shorten, leaving only a single isolated node. If this

occurs, refer to ``Single Node Isolation Maintenance Approach'' in this chapter forthe remainder of this test.

If the SLK was manually removed from service, put it back in the AVAILABLE-IS or

AVAILABLE-STBY state by entering the following message at the MCRT:

CHG:SLK (a, b, [c, d] );{ IS | ARST}

7/29/2019 172254

401-661-045

CHG SLK a b [c d]

NEW REQUESTED MINOR STATE = e

c = LI4 circuit pack (0 - 1)

d = LI4 port (0 - 3)

4. If the original ring isolation still exists, startingwith node iso 0 , then xx, and finally the

BISO node, replace all RN CPs in this order: ring interface 0 (RI0), RI1, the NP, and

the link interface. Perform a conditional restore. For VLSI RNs, replace the IRN

circuit pack and then the link interface. If the trouble clears after replacing the CPs in

the order listed, the original CPs should be reinserted one at a time in the node and

diagnostics run to determine the faulty CP(s). If the diagnostics fail to detect the

faulty CP(s), but the previous CP replacement cleared the trouble, then the CP(s)

should be saved, noting the failure conditions. Inform the CTS of the condition.

5. If the original ring isolation still exists, visibly inspect affected equipment for shorts,

bent or broken pins, backplane faults, etc. Also ensure that proper equipment has

been used with the long message option. If problems are located, correct the

problems and perform a conditional restore on the affected equipment.

6. If the isolation still exists, or if all problems with the original BISO node, the iso 0

node, and node xx have been cleared, diagnose and attempt to correct problems

associated with nodes iso 1, yy, and the EISO node, using Steps 3 through 5 of

these guidelines. See Figure 4-7.

Figure 4-7. More Than One Faulty Node

NOTE:After correcting and restoring this portion of the isolated segment of the ring,

attempt to restore iso 0 , xx, and the BISO nodes if problems were not corrected inprevious steps.

NEWiso 0 xx yy zzBISO iso 1 EISO

OLDEISOiso

7/29/2019 172254

401-661-045

NOTE:For additional information on the initialization levels, refer to ``Initialization,'' Part 4of this manual.

Does the ring initialize?

Yes—Proceed to next step.No—Proceed to Step 5.

3. Are all nodes that were not previously OOS (except quarantined nodes) before the

ring down state restored to service?

Yes—Proceed to Step 8.

No—Proceed to next step.

4. For all nodes that were not previously OOS before the ring failure, perform an

unconditional RST. See Chapter 6, Diagnostic User's Guide, or the 401-610-055FLEXENT™/AUTOPLEX ® Wireless Networks INPUT MESSAGES Message

Manual or the 401-610-057 FLEXENT™/AUTOPLEX ® Wireless Networks

OUTPUT MESSAGES Manual.

Did all nodes previously not OOS prior to the ring failure restore?

Yes—Proceed to Step 8.No—Proceed to next step.

5. Attempt to reinitialize the ring. Perform a level-4 initialization (see the proper

application in the 401-610-055 FLEXENT™/AUTOPLEX ® Wireless Networks

INPUT MESSAGES Message Manual or the 401-610-057 FLEXENT™/

AUTOPLEX ® Wireless Networks OUTPUT MESSAGES Manual.).

NOTE:For additional information on the initialization levels, refer to ``Initialization,'' Part 4of Chapter 6, Diagnostic User's Guide .

Does the ring initialize?

No—Proceed to Step 9.

6. Are all nodes that were not previously OOS prior to the ring failure restored to

service?

Yes—Proceed to Step 8.

No—Proceed to next step.

7/29/2019 172254

7. For all nodes that were not previously OOS before the ring failure, perform an

unconditional RST. See Chapter 6, Diagnostic User's Guide .

8. Are there any other nodes OOS left on the ring?

No—DONE.Yes—Determine the ring condition (single node isolation, multiple node isolation,etc.) and proceed to that condition's maintenance approach presented in this

chapter.

9. If the system still doesn't initialize after the level-3 and level-4 initialization attempts,

call the CTS.

Ring Generic Access Package (RGRASP)

Feature Definition

RGRASP is a single-user utility system for the CNI ring nodes. RInteractions

! CAUTION:Care must be exercised when using the RGRASP tool. Improper use of RGRASP can result in program mutilation or excessive utilization of system

resources. Both of these consequences of improper use of the tool can lead to call processing downtime and therefore interrupt the operation of a node on the ring or the whole ring.

Feature Description

The RGRASP tool can:

s Set (allow) breakpoints (a breakpoint corresponds to the address of the

first byte of a target process instruction).

s Clear breakpoints.

s Report on current status for specified breakpoints.

s Inhibit breakpoints.

s Load a specified RGRASP utility variable (UVAR).

s Dump a specified RGRASP UVAR.

s Load a specified node with data.

7/29/2019 172254

401-661-045

s Dump the contents of a specified address in a given node.

s Direct the loading of an address.

s Dump the contents of a specified Application Processor or Node Processor

register.

Software Impact

This feature does not impact customer engineerable software resources on APs.

This feature could impact customer engineerable software resources on NPs,dependent on memory size.

Software Description

The software consists of the following processes:

RGP_KER This is a UNIX process kernel for the feature. It acts as theinterface between the AM (RG_CFT and RG_PRT) and the ring

node (monitor) processes.

RGP_CFT This UNIX process handles input commands from the craft shell.

It parses and performs some preliminary checking on the inputcommand. Then it relays the command to the RG_KER process

for further processing.

RGP_PRT This UNIX process handles printing of output.

monitor This system process performs the actual operations required tohandle breakpoints, memory dumping, and memory loading. It

communicates with the RGP_KER.

User Profile

This feature and its associated input commands are intended for use bytechnicians in conjunction with the CTS.

Description of Feature Operation

The following paragraphs describe how this feature can be used.

Initial Setup

First, determine the address in memory that requires investigation. This can bedone by using the latest PR/PK listings provided. This address may be provided

by the CTS.

7/29/2019 172254

Determine which processor should be looked at. In the case of the DLN, there is

an active and a standby processor. Use the OP:SLK or poke the 118 page todetermine this. As a precaution, it is a good idea to set breakpoints in only one

processor at a time.

Setting a Breakpoint

You can set a breakpoint in a program using the WHEN:RUTIL input command.

Before this can be done, the opcode (OPC) must be known. To verify the OPC,use the DUMP:RUTIL command to dump the memory at the breakpoint address.

If the expected OPC does not match the dump output, then the listings do notmatch the memory. This discrepancy should be cleared up before continuing the

procedure. One possible explanation is that the node software is out of date. Toeliminate this possibility, you can remove and restore the target node (node in

which breakpoint is to be set). Doing this will ensure that the newest version ofcode has been pumped from disk. You can use the RMV:LN and RST:LN

commands or 118 poke to achieve this. After the node has been pumped, trydumping the breakpoint address again. If it does not match up now, you know thelistings are out of date. In this case, you should stop and get a current l isting

before proceeding.

The WHEN:RUTIL command allows you to specify actions (commands) to beexecuted when the breakpoint you set fires. The input message manual page for

WHEN:RUTIL defines the actions. Up to 24 actions may be specified in the actionlist for a single breakpoint. The action list must be terminated by a END:WHENcommand. The action list can contain only the END:WHEN command, in which

case you will simply know whether a piece of code is being executed.

Only five breakpoints can be set in any one ring node processor.

Loading Memory

You can load memory with the LOAD:ADDR, LOAD:WORD, LOAD:SHORT orLOAD:BYTE commands within the WHEN:RUTIL command or with theLOAD:RUTIL command. Details on the use of these command are providedunder " Input Messages.''

! CAUTION:

Loading memory may drastically change program execution. If not done properly, this can interrupt or degrade service; for example, calls may be lost.

7/29/2019 172254

s LOAD:RUTIL

s OP:RUTIL or OP:RUTILFLAG

s WHEN:RUTIL command

Feature Deactivation

You can deactivate the feature; that is, clear all breakpoints in a specified nodewith the CLR:RUTIL command. You can clear a specific breakpoint in a specified

node with the CLR:RUTILFLAG command.

You can temporarily disable or inhibit all breakpoints in a specified node with theINH:RUTIL command. You can temporarily disable or inhibit a specific breakpointin a specified node with the INH:RUTILFLAG command.

Equipment Configuration Data (ECD)

ECD are not affected by the RGRASP feature.

Recent Change Procedures

Recent change procedures are not associated with the use of the RGRASP tool.

Measurement

No measurements are provided as part of the RGRASP tool.

Network Management Impact

If the RGRASP tool is used improperly, service interruption or degradation can

occur.

Maintenance/Troubleshooting Impact

The RGRASP tool is a debugging tool for CNI ring nodes. It is usable only atnodes that are active from an IMS viewpoint, such as the IMS ACT state. Nodes

that are quarantined or isolated cannot be accessed with RGRASP.

There are no new diagnostics related to this tool.

7/29/2019 172254

401-661-045

RGRASP breakpoints are affected by CNI initialization levels as follows:

Level Effect

O,1,FPI,2,3 None4 Clears all breakpoints

Recording

This tool has no impact on recording.

Procedure 4-6. Input Messages

The following input messages/commands are associated with the RGRASP tool.For more information about each of these messages, refer to the 401-610-055

FLEXENT™/AUTOPLEX ® Wireless Networks INPUT MESSAGES MessageManual or the 401-610-057 FLEXENT™/AUTOPLEX ® Wireless Networks

OUTPUT MESSAGES Manual.l.

! CAUTION:Incorrect use of these commands may interrupt operation of a node on the

ring or the whole r ing. READ EACH PURPOSE CAREFULLY.

1. ALW:RUTIL or ALW:RUTILFLAG

The first command allows all breakpoints in the specified node; the second allows

a specific breakpoint in the specified node.

2. CLR:RUTIL or CLR:RUTILFLAG

The first command clears all breakpoints in the specified node; the second clearsspecific breakpoints in the specified node.

3. DUMP:ADDR

Dumps the contents of the specified address in the given node. This command is

allowed only within a WHEN:RUTIL command <action-list >.

4. DUMP:REG

7/29/2019 172254

Dumps the contents of the specified Application or Node Processor register in the

given node. This command is allowed only within a WHEN:RUTIL command<action-list >.

5. DUMP:RUTIL

Dumps the contents of memory at the address range given at the specified node.

It can also dump the contents of memory starting at the given address for thespecified number of bytes.

Currently a maximum length of 468 bytes is allowed for a single dump operation.

A formatted output of the node's memory contents will follow this input command.

6. DUMP:UVAR

Dumps the contents of the specified RGRASP UVAR. This command is allowed

only within a WHEN:RUTIL command <action-list >.

7. INH:RUTIL or INH:RUTILFLAG

The first command inhibits all breakpoints in the specified node; the second

inhibits specific breakpoint(s) in the specified node.

8. LOAD:ADDR

Loads the specified address with the specified data. This command is allowedonly within a WHEN:RUTIL command <action-list> .

9. LOAD:BYTE

Loads the address in the given node with the specified data. This command is

allowed only within a WHEN:RUTIL command <action-list >.

10. LOAD:REG

Loads an Application or Node Processor register with the specified data in thegiven node. This command is allowed only within a WHEN:RUTIL command

<action-list >.

11. LOAD:RUTIL

Loads the address at the given node with the specified data. The maximumnumber of data items allowed for loading is 128 bytes or 32 4-byte words.

7/29/2019 172254

7. REPT RGP PRT

Prints when anomalies occur within the print process of the RGRASP tool.

Indicates the kind of anomaly that has occurred.

8. REPT RUTIL

This message has 40 formats. Formats [1] through [15] report an error conditionencountered by the RGRASP RGP_KER process. Formats [16] through [40] print

in response to the firing of a breakpoint.

9. WHEN RUTIL

Prints in response to a WHEN:RUTIL command.

Audits

The RGRASP tool does not affect any audits.

Critical Events

The RGRASP tool does not affect any critical events.

Support Tools

The RGRASP tool is a new support tool.

172254

Documents

How Computer Keyboards Work

Basic Buffer Overflows Explained

(Tesla) - The Tesla Magnetic Car Engine

Physical Modelling Synthesis Overview

Chapter 24

Fortran

Personality Development

Who Killed God

Chapter-01

Venture Capital

Introduction to Six Sigma

Algorithms

Daniel Zanella and Alexander Weygers

European Colinization of Latin America

Disclaimer

Star Wars Original Trilogy Trivia (Episodes IV-VI)

Keyboard Shortcuts for the Opera Browser for Mac OS X

The Best American Humorous Short Stories

Improve the Color Quality Of Your Monitor

Simple Functions in Haskell