21
Alarm: UtranCell_InternalResourceUnav ailable 86% Usage Count: 28 Network: WCDMA Service: W-RAN Node: CPP RNC 3820 Alarm: UtranCell_InternalResourceUnavailable Alarm: UtranCell_InternalResourceUnavailable Cold Restart of TX board in cell 3 Node does not work Cold restart of Node B on older CV does not work Remodule Node B in RNC and problem dissapears Module Error: [2011-01-11 15:57:09.388] RnhLmCellCPT(rnhCellRoC[52]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId xxx, iubLinkFroId xxx] failed to unlock cell, reason : RnhCellDataD::errorStatusNoDrhResources lhsh 000600 drh_ccrh_hostdata 0006: 0x21000067 24 10 23769 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000068 23 29 19088 0 0xffffffff 0006: 0x21000069 24 10 22361 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000026 24 10 23637 0 0xffffffff releasing sendReleaseRspToClient In absence of crashes in ETIPG search for cold restarts: Lh etipg te log read | grep -I restart [2011-03-09 02:07:05.032] Ipet_atish_proc atish_trafind.c:351 INFO:TrafficIndication: COLD restart [2011-03-09 02:07:05.088] Ipet_scish_proc scish_trafind.c:435 INFO:TrafficIndication: COLD restart ETIPG Log:lhsh 002500 dumpelg LOG ENTRIES: seqNr date time message 2 100729 104733 000;;Subrack 02;Slot 25 3 100903 102509 000;VANRNC1;Subrack 00;Slot 25 4 101103 200144 000;CXP9013831_R9YC/28 5 101130 050310 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 6 110222 180100 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 7 110222 191901 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 8 110223 064546 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 9 110226 202109 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 10 110226 233552 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 11 110227 055434 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 12 110309 020556 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 Coli printouts from commands:

IPG_Log

Embed Size (px)

Citation preview

Page 1: IPG_Log

Alarm: UtranCell_InternalResourceUnavailable

 86%

Usage Count: 28

Network: WCDMA

Service: W-RAN

Node: CPP RNC 3820

Alarm: UtranCell_InternalResourceUnavailable

Alarm: UtranCell_InternalResourceUnavailable

Cold Restart of TX board in cell 3 Node does not work

Cold restart of Node B on older CV does not work

Remodule Node B in RNC and problem dissapears

Module Error: [2011-01-11 15:57:09.388] RnhLmCellCPT(rnhCellRoC[52]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId xxx, iubLinkFroId xxx] failed to unlock cell, reason : RnhCellDataD::errorStatusNoDrhResources lhsh 000600 drh_ccrh_hostdata 0006:    0x21000067            24     10            23769                0  0xffffffff  releasing sendReleaseRspToClient 0006:    0x21000068            23     29            19088                0  0xffffffff   0006:    0x21000069            24     10            22361                0  0xffffffff  releasing sendReleaseRspToClient 0006:    0x21000026            24     10            23637                0  0xffffffff  releasing sendReleaseRspToClient  In absence of crashes in ETIPG search for cold restarts: Lh etipg te log read | grep -I restart

[2011-03-09 02:07:05.032] Ipet_atish_proc atish_trafind.c:351 INFO:TrafficIndication: COLD restart [2011-03-09 02:07:05.088] Ipet_scish_proc scish_trafind.c:435 INFO:TrafficIndication: COLD restart ETIPG Log:lhsh 002500 dumpelg

LOG ENTRIES: seqNr date   time   message 2     100729 104733 000;;Subrack 02;Slot 25 3     100903 102509 000;VANRNC1;Subrack 00;Slot 25 4     101103 200144 000;CXP9013831_R9YC/28 5     101130 050310 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 6     110222 180100 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 7     110222 191901 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 8     110223 064546 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 9     110226 202109 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 10     110226 233552 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 11     110227 055434 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 12     110309 020556 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 Coli printouts from commands:lh mod drh_ccrh_topdata lh mod drh_ccrh_celldata all lh mod drh_ccrh_hostdata

0202:   [723]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0202:   [737]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0216:   [1303]: cellRef= xxx, clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ] 0216:   [614]: cellRef= xxx,clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ] 

HS configuration updated in Node B

References: TR HN50575 :

Page 2: IPG_Log

REMEDY:

CONDITIONS:1. Ensure SW is below the version that this is fixed in2. Alarm is present for UtranCell_InternalResourceUnavailable unavailable3. From Coli commands lh mod drh_ccrh_topdata, lh mod drh_ccrh_celldata all, lh mod drh_ccrh_hostdata       That the message in the logs shows  msgBoard = [ releasing ] 4. Customer permission is granted to use the work around which will affect traffic in the Module with the problematic cell

PROCEDURE:Locate the RncLmCell load module in the affected RNC module and restart it with "lh modx progkill RncLmCell" Note X = RNC ModuleSOLUTION:

CONDITIONS:The fault occur due to Hanging in the RNC Module RncLmcell. The UtranCell_InternalResourceUnavailable alarms are triggered by a cells hanging in 'releasing' or 'clearing' state in DrhCcRh block. These cells cannot be released because there are IpTp (IP termination point) sessions associated with cells which are also hanging in 'releasing' state. Such hanging sessions are caused by a fault in an audit procedure, which is performed after ET-IPG crash or restart. When ET-IPG goes down, an application receives two signals: hostStateChangeInd an serverDownInd (from IPAPPLSCI).During first of these signals, the IP sessions associated with the restarted ET-IPG are marked as 'releasing'. and sessionReleaseReq signal is sent to IPAPPLSCI (CPP interface) to release the sessions. However, instead a release response, the application receives a serverDownInd signal which triggers the audit. Unfortunately, the audit procedure skips a removal of IP sessions which are marked as 'releasing'.

+---------+                   +---------+                   +---------+                   +---------+                   +---------+ | RNHCell |                   | DrhCcRh |                   | Aal2Eri |                   | Aal2Nci |                   | ApplSci | | (Mod_A) |                   | (Mod_B) |                   | (Mod_B) |                   | (Mod_A) |                   | (IP-ET) | +----+----+                   +----+----+                   +----+----+                   +----+----+                   +----+----+      |                             |                             |                             |                             |      #----{initialResourceReq}---->*                             |                             |                             |      *<----{initialResourceCfm}----#                             |                             |                             |      #--------{ipTpUdpReq}-------->*                             |                             |                             |      |                             #----------------------------------{setUpUdpSessionReq}---------------------------------->*      |                             *<-----------------------------------{sessionSetUpCfm}------------------------------------#      *<-------{ipTpUdpCfm}---------#                             |                             |                             |      #------{modifyIpTpReq}------->*                             |                             |                             |      |                             #---------------------------------{modufyUdpSessionReq}---------------------------------->*      |                             *<----------------------------------{sessionModifyCfm}------------------------------------#      *<-----{modifyIpTpReq}--------#                             |                             |                             |      #----{reserveAal2CepReq}----->*                             |                             |                             |      |                             #--{reserveLocalAal2CepReq}-->*                             |                             |      |                             *<----{reserveAal2CepCfm}-----#                             |                             |      *<----{reserveAal2CepCfm}-----#                             |                             |                             |      #--------------------------------------{nodeConnReq}------------------------------------->*                             |      |                             |                             |                             #--------{connectCep?}------->*      |                             |                             |                             *<-----

Page 3: IPG_Log

{disconnectCep?}-------#      *<---------------------------------------{connCfm}----------------------------------------#                             |      |                             |                             |                             |                             | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{restart of Mod_A}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<      |                             |                             |                             |                             |      X       {connNotOkInd}------->*                             |                             X                             |                                    #--{releaseLocalAal2CepReq}-->*                                                           |                                    *<----{releaseAal2CepCfm}-----#                                                           |                                    #-----------------------------------{sessionReleaseReq}---------------------------------->* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{FLOW IS HANGED - no response from ApplSci towards DrhCcRh!}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<                                    *<----------------------------------{sessionReleaseCfm}-----------------------------------#

PROCEDURE:

 Upgrade RNC W10.1.3.7.

M3UA goes down when ET-MFG hanging  86%

Usage Count: 11

Network: WCDMA

Node: CPP RNC 3810

Service: W-RAN P7.1

Software: CPP RNC P7.1.4 EU4

Software: CPP RNC CXP9013831 R9YC/6

All M3UA goes down

All Mu3a connections are disable

Alarm: M3UA Association Down

Alarm: Contact to Default Router 1 Lost

Alarm: Contact to Default Router 0 Lost

ETMFG - te log readIpet_ipps_proc pcidrv_coli.c:3284 ERROR:ttyram: Could not send dataETMFG - te log readIpet_scish_proc scish_root.c:429 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:437 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:456 INFO:ColiDumpReq not yet implemented for INTERNAL_HOST

Refer to UABtr79810; WRNae89971; WRNae88084;HM15053; HM27791; HM52198; HM52208; HM52231;HL93961ET-MFG memory leakage makes the ET-MFG hanging happening. When the problem happens, ET-MFG will not be able to handle traffic.SOLUTION:

CONDITIONS:

PROCEDURE:

1. CPP has provided the solution for this. EU55 for P7.1.4 on 2010-09-03;  W10.1.1-4 scheduled for delivery on Sept 15th and W10.1.2.REMEDY:

Page 4: IPG_Log

CONDITIONS:

PROCEDUREL:

1. Cold restart of ET-MFG or RNC node cold restart.This memory problem has tight relation with fragamented traffic.     The counters: pmIpReasmReqds, pmIpReasmOks and pmIpReasmFails (IpAccessHostEt) can be used to check fragamented traffic.  # This helps us to check the impact of each defined interface individuallyIpinterface                    pmDot1qTpVlanPortInFrames                                                                     pmDot1qTpVlanPortOutFrames                                    pmIfStatsIpOutDiscards                                    pmIfStatsIpOutRequests  # This helps us to check if any discarded data sent/received at the ETMFG GigaBithEthernet ports.GigaBitEthernet            pmIfInDiscardsLink1                                    pmIfInDiscardsLink2                                       pmIfOutDiscardsLink2                                       pmIfOutDiscardsLink2     # Information for the fragmented traffic, to be activated at cusotmer's conveniences.IpAccessHostEt           pmIpReasmFails                                    pmIpReasmOks                                    pmIpReasmReqds                                    pmIpFragCreates                                    pmIpFragFails                                    pmIpFragOks

see also SCS1079568

WCDMA RNC W10 : Failed to unlock utrancell in IUB over IP site  86%

Usage Count: 4

Network: WCDMA

Node: CPP RNC

Service: W-RAN

Alarm: UtranCell_InternalResourceUnavailable

Utrancell do not come up

ReadErrorLog: ModuleMP, te log read

RnhLmCellCPT(rnhCellRoC[n]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId n, iubLinkFroId n] failed to unlock cell, reason : reason : RnhCellDataD::errorStatusNoDrhResourcesRnhCellDataD::errorStatusNoDrhResourcesReadErrorLog: ModuleMP, te e trace3 drhCcRhRouterC

RnhLmCellCPT(drhCcRhRouterC) ../src/DrhCcRhRouterC.cpp:491 TRACE3:cellRef=n is already in our list. We'll reject until the old one is removedReadErrorLog: ModuleMP, drh_ccrh_celldata all

00nn: List of all CC SPs owned by ccRhModule n:00nn: List of cells from SpId n:00nn: [n]: cellRef=<cellRef>, clientModuleId = 0, spmFroId = <spmFroId> msgBoard = [ releasing ]ReadErrorLog: ModuleMP, drh_ccrh_hostdata

00nn: IpTp table:00nn: ipTpSessionId  ipHostFroId   piuId  serverSessionId  clientPortIndex  clientId    msgBoard…00nn:    0x..                 ..      ..               ..                0  0xffffffff  releasing 00nn:    0x..                 ..      ..               ..                0 

Page 5: IPG_Log

0xffffffff  releasing 00nn:    0x..                 ..      ..               ..                0  0xffffffff  releasing …00nn:    0x..                 ..      ..               ..                0  0xffffffff  releasingReadErrorLog: ET-IPG, te log read

Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned

Ipet_lh_proc ipplh_agent.c:783 INFO:Restart Rank COLD and updated State is: 4

Ipet_atish_proc atish_trafind.c:398 INFO:TrafficIndication: COLD restartReadErrorLog:ET-IPG, llog

Board restart rank=Coldwithtest Proc=Cs_boardManager_proc Err=0xB0AD0006 (eri_api). Board manager restart. Restart ordered by system manager

ET-IPG crash or ET-IPG restart with Cold With HW test

Transmission outage

Node B restart

Root cause of the problem found :

Some cells could not come up after IP-ET board had been restarted.  UtranCell_InternalResourceUnavailable alarms were raised.

Cause : The root cause of this problem was triggered by ET-IPG crash. DrhCcRh starts releasing of all IpTp (IP termination point) sessions created on all IpAccessHostEts that located on restarted ET-IPG board. During this procedure a release request signal is sent towards ApplSci. All these sessions are flagged as "releasing" until response from ApplSci service is received. But such response will not be received since ET-IPG board is restarted. Instead a serverDownInd signal is received. After that IP service initialization procedure is performed including an audit between DrhCcRh and ApplSci, which purpose is to clean unused (marked as faulty) IpTp sessions.

The cause of the problem is located in audit handling - if IpTp session is marked as faulty, then it should be removed and release procedure should go on. But if IpTp session is marked as faulty and that IpTp session is flagged as "releasing", then it is not removed - DrhCcRh still waits for release response from ApplSci. This hanged IpTp session prevent cell from releasing, so affected cell hangs after it is locked and eventually could not be unlocked.References:

HN49221 : W10B: Sector not come up after Node B Restart

Mapped to HN61121 : W10B: Sector not come up after Node B Restart

HM48225SOLUTION:

CONDITIONS:

1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"

2. Utrancell does not come Up after node B restarted or after transmission outage.

3. Failed to unlock utrancell

4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions

Note! This procedure requires software delivery. Please contact your local Ericsson Support for more

Page 6: IPG_Log

information.

PROCEDURE:

The correction will be delivered in W11.0.1.2 (CXP9014711/3-R2C)REMEDY:

CONDITIONS:

1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"

2. Utrancell does not come Up after node B restarted or after transmission outage.

3. Failed to unlock utrancell

4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions

This procedure is for recovery of the problem

PROCEDURE:

Restart RncLmCell process on problematic ModuleMp, please refer to KCS document  SCS1003029 "CPP : How To restart a board a process or JVM. Using telnet, NCLI, Moshell or EMAS"

WCDMA RNC : High Module MP load on extension subrack with only one ET-MFX

 86%

Usage Count: 3

Network: WCDMA

Service: W-RAN

Node: CPP RNC

Alarm: Ethernet Switch Port Fault

RRC degradation in one extension subrack

High RRC Failure in one subrack

RRCSucc degradation on all Module of an RNC Extension Subrack

High processor load can be observed in extension subrack.

Module MP overload in an entire RNC subrack (processor load >85%)

High MP load in ETMFX

ReadErrorLog: ET-MFX

Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 21 to 19Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 19 to 21

No access/connectivity to ET-MFX board

Root cause of the fault found. Configuration problem:

- High RRC failures on sites belonging to Module MP's on extension subracks which have one ET-MFX board

- High MP load on module MP's on Extension subracks with one ET-MFX board

- No or bad Connectivity to the impacted ET-MFX  boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards. 

 

Investigation :

Page 7: IPG_Log

This problem is due to a dimensioning issue. There is too much Iublink activity on the ES for one ET-MFX board.  The dimensioning on the node did not follow the Ericsson recommendation and it did not take the full advantage of the Spanning Tree Protocol.

Connectivity to the ET-MFX boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards.

 

General recommendations about ET-MFX usage:

1- It is recommended to have two ET-MFX boards per subrack for load sharing and redundancy. So that if one ET-MFX board was lost the other will take all the traffic.

2- ET-MFX load sharing is supported only in the subrack. Intersubrack ET-MFX load sharing is not supported.

3- If both ET-MFX boards on the subrack were lost, the Iublink need to be re-allocated to a new subrack manually (Iublink preferredSubrack attribute) to remain operational.

SOLUTION:

CONDITIONS:

1- High MP load on module MP's on Extension subracks with one ET-MFX board

2- Alarm: Ethernet Switch Port Fault

This procedure is for correction of the configuration problem.

PROCEDURE:

Add a second ET-MFX board to the subrack that has only one ET-MFX board.REMEDY:

CONDITIONS:

1- High MP load on module MP's on Extension subracks with one ET-MFX board

2- Alarm: Ethernet Switch Port Fault

This procedure is for work around to avoid the problem until a better configuration is used.

PROCEDURE:

Re-alloacte the RBS's on module MP's with high MP load to a new subrack manually (preferred to an Extension subrack with two ET-MFX boards), so that the Iub links remain operational.  

Board Restart: ET-IPG Error code: 0xB0AD0006 Process: Cs_boardManager_proc

 86%

Usage Count: 2

Data collection for ET-IPG restart Error code:  0xB0AD0006 Process:  Cs_boardManager_proc

Network: WCDMA

Service: W-RAN

Node: CPP RNC

Board Restart: Board manager restart. Restart ordered by system manager

Process Restart: ET-IPGError code:  0xB0AD0006 (Reported via CELLO:ERI IF)Process:  Cs_boardManager_procRestart type:  ProcessorERROR NUMBER 0xB0AD0006 WITH EXTRA DATA 0x00A60ABC WAS REPORTED BYPROCESS Cs_boardManager_procTYPE PRI-10BLOCK   osemainReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned

Page 8: IPG_Log

Refer to TR HO87223  for details

Root cause not found

There is some timing issue between memory and network processor (NPU) cause the NPU HW alarms and ET-IPG board restarts.

The timing issue already improve in the TR HM97390. The delay value is experimentally determined on the basis of the worst board we had at that moment. It is required to analyze the faulty board to adjust the timing issue further. SOLUTION:

CONDITIONS:

1. ET-IPG restarts without board alarms

2. ReadErrorLog: ET-IPG

Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned

 3. This procedure is to send the faulty board to PLM for further analysis PROCEDURE:

 If the problem happens again please do the following steps:

1. collect dcgm/dcgi 2. change the ET-IPG board with good one

3. Send the board in the following address with new TR no

Ericsson ABUlf WallgrenSE KI30 06401Färögatan 6.SE-164 80 StockholmSweden

SCS1198264, SCS1049250

PLM needed board for further investigation. After that event the board is working fine. So customer does not want to send the board.

They will wait until it occur again.

ERROR:SciShSession applaudit req on non tagged session, true is returned.

 86%

Usage Count: 2

Network: WCDMA

Node: CPP RNC 3810

Software: CPP P7FP CU4 EU67

High speech drop on IP/IUB

ET-FMX shows Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned

Upgrade from P7FP CU4 EU44   to P7FP CU4 EU67

Error trace  states that application has setupped session before audit was finished. Reason for the call drops cannot be localized for nowREMEDY:

Page 9: IPG_Log

CONDITIONS:

1.- During failure it is found in error logs of ET-MFX the following message:

[2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned

PROCEDURE:

1.- board  ET-MFX cold restart

Speech performance degraded after ET-IPG restart  86%

Usage Count: 2

Network: WCDMA

Service: W-RAN W10.1

Node: CPP RNC

Software: CPP RNC W10.1.2

Software: CPP RNC CXP9014711/2 R3F

Product Name: ET-IPG

Product ID: ROJ1192345/1

Speech performance degraded after ET-IPG restart

Board Restart: Board manager restart. Restart ordered by system manager

Process Restart: ET-IPGError code: 0xB0AD0006Process: Cs_boardManager_procRestart type: Processor

Speech performance affected on same subrack, where ET-IPG is located

ReadErrorLog: ET-IPGCls_Cls_atmPdr_proc atmpdr.c:596 INFO:Lost 1 packets on channel 14 due to Error: Errors: Length ReadErrorLog: ET-IPGCls_atmPdr_proc atmpdr.c:598 INFO:Egress  VPI:0 VCI:134  Ingress VPI:0 VCI:131  Tag:0x4a1001bfbfbfbfReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returnedReadErrorLog: ET-IPGapp6drProc app6dr_bh_hwsup.c:1519 INFO:NPU HW alarm: n2=0x80000002, n3=0x0, n4=0x0, n4top=0x0, n5=0x0, n8=0x0ET-IPG recovered after restart by it self but from the statistics was observed that RRC, RAB and CCSR degradation started when the ET-IPG board restarted.

After ET-IPG restart there were the following captured: Ipet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned

This trace means that application sent in CELLO_IPAPPLSCI_AUDIT_SET_REQ signal session Id was not marked by special flag on CPP-IPET RO level. This could happen as application begins setup sessions over IPAPPLSCI before it ends sessions audit and sessions which are setup after begining of audit procedure do not have AUDIT flag. We have sessions audit after warm restart on ET-board, after application reconnection or application could start audit procedure by sending CELLO_IPAPPLSCI_AUDIT_SET_REQ.

This trace point out that application should not setup sessions before end of audit procedure over IPAPPLSCI.

Page 10: IPG_Log

Anyway this sessions will be kept after end of session audit and It will not affect traffic. See HN32045-AA001 for more details. Besides this no suspicious traces has been seen in log. So, reason of traffic degradation after ET-IPG restart is unclear.

References: HN32045, Restart is handled in solution SCS1049250

REMEDY:

CONDITIONS:

1. Speech RAB, RRC and CCSR success rate decradation2. For recovery

PROCEDURE:

1. Soft lock the ET-IPG that restarted2. Cold restart ET-IPG3. Unlock ET-IPGREMEDY:

CONDITIONS:

For data collection

PROCEDURE:

1. Several times with 1 minute delay, please run data collection script attached to TR with the following command:   ipg_dcs_r1.mos <node password> (see note, ipg_dcs_r1.mos)

and provide logs captured for further analysis.

2. Log in to ET-IPG affected, enable and capture the following traces:   te e all SCISH_SESSION

Also capture output from following coli commands:   SciShDump -o 0 -c EtHostDump -o 0 -c

ipg_dcs_r1.mos - Copy script below line and run it as described in the procedure________________________________________________________________l+mmo $tempdir/dummy# to be silent...

l echo "### ET-IPG Data Collection Script - version R01"

##################### Script Variables #####################$NumberOfRepeats = 1$WaitTime = 0

###############################################  ipg data function step 1 call function ###############################################

func get_data_from_ipg_step1_call

  if $board ~ all            for $board1 in group_ipg        get_data_from_ipg_step1 $board1            done    else if $board ~ ^[0-9]+$       get_data_from_ipg_step1 $board      fi

endfunc

#################################  ipg data function step 1 #################################

func get_data_from_ipg_step1

  if $1 ~ ^[0-9]+$    $etipg = $1  else    return  fi  

Page 11: IPG_Log

  #start logging  l+mmo $logdir/$nodename_$ipaddress_step0_$etipg_$date.log

  lhsh $etipg appdh info   for $v1 = 0 to 8    lhsh $etipg appdh ipif $v1     lhsh $etipg appdh dist $v1     lhsh $etipg appdh rps $v1     lhsh $etipg apphost data $v1   done lhsh $etipg apphost info  lhsh $etipg applh info  lhsh $etipg applh attr    #stop logging  l-  

endfunc

###############################################  ipg data function step 2 call function ###############################################

func get_data_from_ipg_step2_call

  if $board ~ all            for $board2 in group_ipg        get_data_from_ipg_step2 $board2            done    else if $board ~ ^[0-9]+$       get_data_from_ipg_step2 $board      fi

endfunc

#################################  ipg data function step 2 #################################

func get_data_from_ipg_step2

  if $1 ~ ^[0-9]+$    $etipg = $1  else    return  fi      lt InternalEthernetPort  lt IpInterface  lma vlnids GigabitEthernet  mr vlnids GigabitEthernet       #start logging  l+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$etipg_stage$var_$date.log      lhsh $etipg apparp info     for $mo in vlnids     lhsh $etipg apparp print $mo   done   for $v2 = 0 to 8    lhsh $etipg appdh cnt $v2        done     lhsh $etipg appph info    lhsh $etipg appph cnt    lhsh $etipg applh cnt    lhsh $etipg;appapi;pm all;q;    lhsh $etipg;appapi;npr 3.0.0xc 6;q   lhsh $etipg;appapi;npr 8.0.0xe00 0x48;q;   lhsh $etipg;appapi;npr 8.0.0x3002c0 0x48;q;   lhsh $etipg;appapi;npr 8.0.0x320000 0x44;q;    lhsh $etipg;appapi;npr 8.0.0x330000 0x4c;q;    lhsh $etipg;appapi;npRGS;q;    lhsh $etipg;appapi;npRSS 0 40;q;      #stop logging  l-   endfunc

Page 12: IPG_Log

################# BP traces #################

func get_BP_traces

  if $1 ~ ^[0-9]+$    $intboardaddr = $1  else    return  fi      if $debuglevel = 3        lhsh $intboardaddr; te log clear     lhsh $intboardaddr; te e send_sig Ipet_ipps_proc         fi    if $debuglevel = 4       lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e rec_sig Ipet_ipps_proc          fi    if $debuglevel = 5 && $mycpp_version = old5        lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e trace3 Ipet_ipps_proc          else if $debuglevel = 5 && $mycpp_version = new       lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e trace3 IPET_NPCI_IF         fi            if $debuglevel = 6 && $mycpp_version = old5        lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e trace4 Ipet_ipps_proc         else if $debuglevel = 6 && $mycpp_version = new        lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e trace4 IPET_NPCI_IF          fi          if $debuglevel = 7 && $mycpp_version = new       lhsh $intboardaddr; te log clear    lhsh $intboardaddr; te e param Ipet_ipps_proc           fi

  if $debuglevel = 8 && $mycpp_version = old5      #start logging    l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log          lhsh $intboardaddr; te log read        lhsh $intboardaddr; te default Ipet_ipps_proc        #stop logging    l-      else if $debuglevel = 8 && $mycpp_version = new      #start logging    l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log          lhsh $intboardaddr; te log read        lhsh $intboardaddr; te default Ipet_ipps_proc    lhsh $intboardaddr; te default IPET_NPCI_IF   

Page 13: IPG_Log

    #stop logging    l-    

  fi  endfunc

############### MO data ###############

func get_MO_data

  l echo "\n## Collecting MO information ##\n"    #start logging  l+mmo $logdir/$nodename_$ipaddress_MO_data_$date.log    get GigaBitEthernet  get IpInterface   get IpAccessHostGpb    pcr pmGigaBitEthernet GigaBitEthernet  pcr pmIpInterface IpInterface   pcr pmIpAccessHostGpb IpAccessHostGpb        if $mycpp_version = old5             get UdpHostMainMsb        get IpAccessHostMsb      pcr pmIpAccessHostMsb IpAccessHostMsb

  else

    get IpAccessHostEt    get IpAccessHostSpb

    pcr pmIpAccessHostEt IpAccessHostEt    pcr pmIpAccessHostSpb IpAccessHostSpb

  fi    if $mycpp_version = old5 && $debuglevel = 2         pdiff GigaBitEthernet|ipinterface|IpAccessUdpHostMsb|IpUdpHostMainMsb|IpAccessHostMsb|IpAccessHostGpb      else if $mycpp_version != old5 && $debuglevel = 2        pdiff GigaBitEthernet|ipinterface|IpAccessHostEt|IpAccessHostGpb|IpAccessHostSpb    fi 

  #stop logging  l-      endfunc

####################### get PM counters #######################

func get_PM_counters

  #start logging  l+mmo $logdir/$nodename_$ipaddress_PM_counters_stage$var_$date.log    pget GigaBitEthernet  pget IpInterface   pget IpAccessHostGpb           if $mycpp_version = old5     

    pget IpAccessHostMsb

  else

    pget IpAccessHostEt

Page 14: IPG_Log

    pget IpAccessHostSpb

  fi    #stop logging  l-   endfunc

####################### del PM scanners #######################

func del_PM_scanners

  pdel pmGigaBitEthernet  pdel pmIpInterface  pdel pmIpAccessHostGpb     if $mycpp_version = old5           pdel pmIpAccessHostMsb    else

    pdel pmIpAccessHostEt    pdel pmIpAccessHostSpb      fi  endfunc

#########################################             SPAS statistics          #########################################

func get_spashwinfo

   l echo "### Collecting SPAS statistics ..."         #start logging   l+mmo $logdir/$nodename_$ipaddress_SPAS_statistics_$date.log    ########################################  #            ipg Boards                #  ########################################   if $board ~ all         for $board1 in group_ipg            lhsh $board1; spashwinfo all      lhsh $board1; spashwinfo egrq      lhsh $board1; spashwinfo ingrq    done    else if $board ~ ^[0-9]+$        lhsh $board1; spashwinfo all      lhsh $board1; spashwinfo egrq      lhsh $board1; spashwinfo ingrq  fi    ########################################  #            GPB Boards                #  ########################################   for $board1 in group_gpb           lhsh $board1; spashwinfo all     lhsh $board1; spashwinfo egrq     lhsh $board1; spashwinfo ingrq  done  

  ########################################  #            SCB Boards                #  ########################################  for $board1 in group_scb           lhsh $board1; spashwinfo all     lhsh $board1; spashwinfo egrq     lhsh $board1; spashwinfo ingrq  done          #stop logging   l-

Page 15: IPG_Log

   endfunc    

######################################### T&E, alarm, event logs and other info#########################################

func get_logs      l echo "\n## Collecting Alarm and Event logs ##\n"    #start logging  l+mmo $logdir/$nodename_$ipaddress_Alarm_and_Event_logs_$date.log      lgaer      #stop logging  l-      l echo "\n## Get boards configuration ##\n"    #start logging  l+mmo $logdir/$nodename_$ipaddress_cabx_$date.log    cabx    #stop logging  l-             endfunc

###################################     M F G - M A I N         ###################################

func focus_on_ipg

  get_MO_data    get_logs  get_data_from_ipg_step1_call    for $var = 1 to $NumberOfRepeats       get_data_from_ipg_step2_call     get_PM_counters     wait $WaitTime       done    del_PM_scanners

  if $debuglevel = 2     get_spashwinfo       fi    ###################  ## BP Traces     ##  ###################    if $board ~ all && $debuglevel > 2         for $board4 in group_ipg      get_BP_traces $board4    done    else if $board ~ ^[0-9]+$  && $debuglevel > 2    get_BP_traces $board  fi   endfunc

############# USAGE #############func print_usage  l echo "\n###########################################################################################"  l echo "Syntax: run <script name> <password to node> <debuglevel> all|<specific>\n"  l echo "where '<debuglevel>' is a value from 1 upwards telling type of info grabbed and"  l echo "where 'all|<specific>' means all boards or a specific one which is referred as 012300"

Page 16: IPG_Log

  l echo "(If only password to node set script will run with debug level=1 and collect iformation"  l echo "from all boards)"  l echo "example: run /home/xxkuzyaa/tmp/ipg_dcg.mos x 2 000900"  l echo "\n###########################################################################################"    l echo "\n<debuglevel>"  l echo "--------------------------------"  l echo "  1\t Collect ipg TE Log, NP counters, MO and PM counters"  l echo "      (without pdiff), Alarm and Event logs"  l echo "  2\t Collect ipg TE Log, NP counters, MO and PM counters, Alarm and Event logs, SpasHwInfo,"  l echo "  3\t BP traces: enable send_sig on Ipet_ipps_proc"   l echo "  4\t BP traces: enable rec_sig on Ipet_ipps_proc"   l echo "  5\t BP traces: enable trace3 on Ipet_ipps_proc (CPP5.1) or trace3 on IPET_NPCI_IF (CPP6,7)"  l echo "  6\t BP traces: enable trace4 on Ipet_ipps_proc (CPP5.1) or trace4 on IPET_NPCI_IF (CPP6,7)"  l echo "  7\t BP traces: enable param on Ipet_ipps_proc (CPP6,7)"  l echo "  8\t BP traces: Read and store T&E log"  l echo "\n###########################################################################################"

endfunc

##########################                    ####       M A I N      ####                    ##########################

# check argumentsif $1  l echo "\nStarting ..."  $password = $1  unset $1else  print_usage  l-  returnfi

if $2 ~ ^[0-9]+$  $debuglevel = $2  unset $2else  $debuglevel = 1fi

if $3 = all || $3 ~ ^[0-9]+$  $board = $3  unset $3 else  $board = allfi

#some info to the userl echo "\n####################################################################################################"l echo "### Data collection executing ..."l echo "### Result is stored here: $logdir/$nodename_$ipaddress_$ipg_or_ipg_..."l echo "####################################################################################################"

$date = `date +%y%m%d-%H%M`

#start loggingl+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$date.log

ba group_ipg ipg

######################################################Print all user variables and scripting variables######################################################uvpv

Page 17: IPG_Log

readclock###################################        Get the MO's         ###################################lt all

readclock

################################### Check the MOM version...#################################

#Possible printouts:#$cellomomversion = 6-LSV31-1#$cellomomversion = 6.1-LSV13-2#$cellomomversion = 7-LSV26_13-3#$celloversion = 7-LSV34.6BC1-1

if $cellomomversion >= 7 || $celloversion >= 7  $mycpp_version = newfi 

if $cellomomversion >= 6 && $cellomomversion < 7  $mycpp_version = old6fi

if $celloversion >= 6 && $celloversion < 7  $mycpp_version = old6fi

if $cellomomversion >= 5 && $cellomomversion < 6  $mycpp_version = old5fi

if $celloversion >= 5 && $celloversion < 6  $mycpp_version = old5fi

l echo "\n###################################"l echo "## MOM version is $mycpp_version "l echo "###################################\n"

################################### Do the work!#################################focus_on_ipg

readclock

unset $date

#stop loggingl-

#stop silent loggingl-

#Done

ET-MFX Board Restart : OSE_ECORRUPTED_POOL  86%

Usage Count: 2

Network: WCDMA

Node: CPP RNC 3810

Service: W-RAN P7.1

Software: CPP RNC P7.1.4 EU55

Software: CPP RNC CXP9013831 R9YC/65

ET-MFX Board Restart : OSE_ECORRUPTED_POOL

ReadErrorLog: ET-MFXExs_spi_proc exspi_proc_write_normal.c:280 ERROR:Normal IO write failed with 3, page 0x80, reg 0x38, size 2, data 0x7C sender 0x101DDIpet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295

Page 18: IPG_Log

Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Root cause not found. The current problem is reported on a dbm2 based board  where all the load modules share a common pool called mainpool for signal allocations. It is quite possible that a signal buffer can corrupt the other signals that lies adjacent to it. In such cases, the problems like OSE_ECORRUPTED_POOL can be reported on different processes.REMEDY:

CONDITIONS:1. The problem happens frequently in the RNC.

PROCEDURE:1. Enable bellow traces on the module MP which is using the ET-MFX board.

lh modx te e trace1 drhTrBrIpClh modx te e trace1 drhCcRhClh modx te e rec_sig send_sig param trace1 cpxApplSciCThe conditions under which OSE_ECORRUPTED_POOL is reported is most likely related to  a user error. This kind of errors will be reported by kernel when it detects that the buffer that is presented to it via system calls such as send(), sender(), restore() etc.. is corrupted.    The usual case for the fault is that some other process write to a buffer outside its allocated size. This will result in the overwriting of the next buffer i.e. you are likely to have the problem in some other part of the code that overwrites the buffer, but the problem is reported when this corrupted signal is presented to the kernel via system calls send, receive, restore etc...