IPG_Log

Alarm: UtranCell_InternalResourceUnavailable

86%

Usage Count: 28

Network: WCDMA

Service: W-RAN

Node: CPP RNC 3820



Cold Restart of TX board in cell 3 Node does not work

Cold restart of Node B on older CV does not work

Remodule Node B in RNC and problem dissapears

Module Error: [2011-01-11 15:57:09.388] RnhLmCellCPT(rnhCellRoC[52]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId xxx, iubLinkFroId xxx] failed to unlock cell, reason : RnhCellDataD::errorStatusNoDrhResources lhsh 000600 drh_ccrh_hostdata 0006: 0x21000067 24 10 23769 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000068 23 29 19088 0 0xffffffff 0006: 0x21000069 24 10 22361 0 0xffffffff releasing sendReleaseRspToClient 0006: 0x21000026 24 10 23637 0 0xffffffff releasing sendReleaseRspToClient In absence of crashes in ETIPG search for cold restarts: Lh etipg te log read | grep -I restart

[2011-03-09 02:07:05.032] Ipet_atish_proc atish_trafind.c:351 INFO:TrafficIndication: COLD restart [2011-03-09 02:07:05.088] Ipet_scish_proc scish_trafind.c:435 INFO:TrafficIndication: COLD restart ETIPG Log:lhsh 002500 dumpelg

LOG ENTRIES: seqNr date time message 2 100729 104733 000;;Subrack 02;Slot 25 3 100903 102509 000;VANRNC1;Subrack 00;Slot 25 4 101103 200144 000;CXP9013831_R9YC/28 5 101130 050310 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 6 110222 180100 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 7 110222 191901 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 8 110223 064546 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 9 110226 202109 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 10 110226 233552 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 11 110227 055434 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 12 110309 020556 NPU HW alarm: n2=0x0, n3=0x0, n4=0x80000002, n4top=0x80000001, n5=0x0, n8=0x0 Coli printouts from commands:lh mod drh_ccrh_topdata lh mod drh_ccrh_celldata all lh mod drh_ccrh_hostdata

0202: [723]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0202: [737]: cellRef= xxx, clientModuleId = 3, spmFroId = xx, msgBoard = [ releasing ] 0216: [1303]: cellRef= xxx, clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ] 0216: [614]: cellRef= xxx,clientModuleId = 16, spmFroId = xx, msgBoard = [ releasing ]

HS configuration updated in Node B

References: TR HN50575 :

https://mhweb.ericsson.se/mhweb/servlet/trview?trid=HN50575

REMEDY:

CONDITIONS:1. Ensure SW is below the version that this is fixed in2. Alarm is present for UtranCell_InternalResourceUnavailable unavailable3. From Coli commands lh mod drh_ccrh_topdata, lh mod drh_ccrh_celldata all, lh mod drh_ccrh_hostdata That the message in the logs shows msgBoard = [ releasing ] 4. Customer permission is granted to use the work around which will affect traffic in the Module with the problematic cell

PROCEDURE:Locate the RncLmCell load module in the affected RNC module and restart it with "lh modx progkill RncLmCell" Note X = RNC ModuleSOLUTION:

CONDITIONS:The fault occur due to Hanging in the RNC Module RncLmcell. The UtranCell_InternalResourceUnavailable alarms are triggered by a cells hanging in 'releasing' or 'clearing' state in DrhCcRh block. These cells cannot be released because there are IpTp (IP termination point) sessions associated with cells which are also hanging in 'releasing' state. Such hanging sessions are caused by a fault in an audit procedure, which is performed after ET-IPG crash or restart. When ET-IPG goes down, an application receives two signals: hostStateChangeInd an serverDownInd (from IPAPPLSCI).During first of these signals, the IP sessions associated with the restarted ET-IPG are marked as 'releasing'. and sessionReleaseReq signal is sent to IPAPPLSCI (CPP interface) to release the sessions. However, instead a release response, the application receives a serverDownInd signal which triggers the audit. Unfortunately, the audit procedure skips a removal of IP sessions which are marked as 'releasing'.

+---------+ +---------+ +---------+ +---------+ +---------+ | RNHCell | | DrhCcRh | | Aal2Eri | | Aal2Nci | | ApplSci | | (Mod_A) | | (Mod_B) | | (Mod_B) | | (Mod_A) | | (IP-ET) | +----+----+ +----+----+ +----+----+ +----+----+ +----+----+ | | | | | #----{initialResourceReq}---->* | | | *<----{initialResourceCfm}----# | | | #--------{ipTpUdpReq}-------->* | | | | #----------------------------------{setUpUdpSessionReq}---------------------------------->* | *<-----------------------------------{sessionSetUpCfm}------------------------------------# *<-------{ipTpUdpCfm}---------# | | | #------{modifyIpTpReq}------->* | | | | #---------------------------------{modufyUdpSessionReq}---------------------------------->* | *<----------------------------------{sessionModifyCfm}------------------------------------# *<-----{modifyIpTpReq}--------# | | | #----{reserveAal2CepReq}----->* | | | | #--{reserveLocalAal2CepReq}-->* | | | *<----{reserveAal2CepCfm}-----# | | *<----{reserveAal2CepCfm}-----# | | | #--------------------------------------{nodeConnReq}------------------------------------->* | | | | #--------{connectCep?}------->* | | | *<-----

{disconnectCep?}-------# *<---------------------------------------{connCfm}----------------------------------------# | | | | | | >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{restart of Mod_A}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | | | | | X {connNotOkInd}------->* | X | #--{releaseLocalAal2CepReq}-->* | *<----{releaseAal2CepCfm}-----# | #-----------------------------------{sessionReleaseReq}---------------------------------->* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>{FLOW IS HANGED - no response from ApplSci towards DrhCcRh!}<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< *<----------------------------------{sessionReleaseCfm}-----------------------------------#

PROCEDURE:

Upgrade RNC W10.1.3.7.

M3UA goes down when ET-MFG hanging 86%

Usage Count: 11

Network: WCDMA

Node: CPP RNC 3810

Service: W-RAN P7.1

Software: CPP RNC P7.1.4 EU4

Software: CPP RNC CXP9013831 R9YC/6

All M3UA goes down

All Mu3a connections are disable

Alarm: M3UA Association Down

Alarm: Contact to Default Router 1 Lost

Alarm: Contact to Default Router 0 Lost

ETMFG - te log readIpet_ipps_proc pcidrv_coli.c:3284 ERROR:ttyram: Could not send dataETMFG - te log readIpet_scish_proc scish_root.c:429 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:437 INFO:ColiDumpReq not yet implemented for BHRIIpet_ethost_proc ethost_root.c:456 INFO:ColiDumpReq not yet implemented for INTERNAL_HOST

Refer to UABtr79810; WRNae89971; WRNae88084;HM15053; HM27791; HM52198; HM52208; HM52231;HL93961ET-MFG memory leakage makes the ET-MFG hanging happening. When the problem happens, ET-MFG will not be able to handle traffic.SOLUTION:

CONDITIONS:

PROCEDURE:

1. CPP has provided the solution for this. EU55 for P7.1.4 on 2010-09-03; W10.1.1-4 scheduled for delivery on Sept 15th and W10.1.2.REMEDY:

https://mhweb.ericsson.se/mhweb/servlet/trview?trid=HL93961

https://mhweb.ericsson.se/mhweb/servlet/trview?trid=HM52231





CONDITIONS:

PROCEDUREL:

1. Cold restart of ET-MFG or RNC node cold restart.This memory problem has tight relation with fragamented traffic. The counters: pmIpReasmReqds, pmIpReasmOks and pmIpReasmFails (IpAccessHostEt) can be used to check fragamented traffic. # This helps us to check the impact of each defined interface individuallyIpinterface pmDot1qTpVlanPortInFrames pmDot1qTpVlanPortOutFrames pmIfStatsIpOutDiscards pmIfStatsIpOutRequests # This helps us to check if any discarded data sent/received at the ETMFG GigaBithEthernet ports.GigaBitEthernet pmIfInDiscardsLink1 pmIfInDiscardsLink2 pmIfOutDiscardsLink2 pmIfOutDiscardsLink2 # Information for the fragmented traffic, to be activated at cusotmer's conveniences.IpAccessHostEt pmIpReasmFails pmIpReasmOks pmIpReasmReqds pmIpFragCreates pmIpFragFails pmIpFragOks

see also SCS1079568

WCDMA RNC W10 : Failed to unlock utrancell in IUB over IP site 86%

Usage Count: 4

Network: WCDMA

Node: CPP RNC

Service: W-RAN


Utrancell do not come up

ReadErrorLog: ModuleMP, te log read

RnhLmCellCPT(rnhCellRoC[n]) ../src/RnhCellRoC.cpp:7901 INFO:rnhCellRoC[cellFroId n, iubLinkFroId n] failed to unlock cell, reason : reason : RnhCellDataD::errorStatusNoDrhResourcesRnhCellDataD::errorStatusNoDrhResourcesReadErrorLog: ModuleMP, te e trace3 drhCcRhRouterC

RnhLmCellCPT(drhCcRhRouterC) ../src/DrhCcRhRouterC.cpp:491 TRACE3:cellRef=n is already in our list. We'll reject until the old one is removedReadErrorLog: ModuleMP, drh_ccrh_celldata all

00nn: List of all CC SPs owned by ccRhModule n:00nn: List of cells from SpId n:00nn: [n]: cellRef=<cellRef>, clientModuleId = 0, spmFroId = <spmFroId> msgBoard = [ releasing ]ReadErrorLog: ModuleMP, drh_ccrh_hostdata

00nn: IpTp table:00nn: ipTpSessionId ipHostFroId piuId serverSessionId clientPortIndex clientId msgBoard…00nn: 0x.. .. .. .. 0 0xffffffff releasing 00nn: 0x.. .. .. .. 0

0xffffffff releasing 00nn: 0x.. .. .. .. 0 0xffffffff releasing …00nn: 0x.. .. .. .. 0 0xffffffff releasingReadErrorLog: ET-IPG, te log read

Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned

Ipet_lh_proc ipplh_agent.c:783 INFO:Restart Rank COLD and updated State is: 4

Ipet_atish_proc atish_trafind.c:398 INFO:TrafficIndication: COLD restartReadErrorLog:ET-IPG, llog

Board restart rank=Coldwithtest Proc=Cs_boardManager_proc Err=0xB0AD0006 (eri_api). Board manager restart. Restart ordered by system manager

ET-IPG crash or ET-IPG restart with Cold With HW test

Transmission outage

Node B restart

Root cause of the problem found :

Some cells could not come up after IP-ET board had been restarted. UtranCell_InternalResourceUnavailable alarms were raised.

Cause : The root cause of this problem was triggered by ET-IPG crash. DrhCcRh starts releasing of all IpTp (IP termination point) sessions created on all IpAccessHostEts that located on restarted ET-IPG board. During this procedure a release request signal is sent towards ApplSci. All these sessions are flagged as "releasing" until response from ApplSci service is received. But such response will not be received since ET-IPG board is restarted. Instead a serverDownInd signal is received. After that IP service initialization procedure is performed including an audit between DrhCcRh and ApplSci, which purpose is to clean unused (marked as faulty) IpTp sessions.

The cause of the problem is located in audit handling - if IpTp session is marked as faulty, then it should be removed and release procedure should go on. But if IpTp session is marked as faulty and that IpTp session is flagged as "releasing", then it is not removed - DrhCcRh still waits for release response from ApplSci. This hanged IpTp session prevent cell from releasing, so affected cell hangs after it is locked and eventually could not be unlocked.References:

HN49221 : W10B: Sector not come up after Node B Restart

Mapped to HN61121 : W10B: Sector not come up after Node B Restart

HM48225SOLUTION:

CONDITIONS:

1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"

2. Utrancell does not come Up after node B restarted or after transmission outage.

3. Failed to unlock utrancell

4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions

Note! This procedure requires software delivery. Please contact your local Ericsson Support for more


https://hn61121/


information.

PROCEDURE:

The correction will be delivered in W11.0.1.2 (CXP9014711/3-R2C)REMEDY:

CONDITIONS:

1. ET-IPG board crash or has been restarted with rank "Cold With HW Test"

2. Utrancell does not come Up after node B restarted or after transmission outage.

3. Failed to unlock utrancell

4. Check if there is hanging on IpTp session in ModuleMP ,check printout of "lh mod drh_ccrh_celldata all" . If there is either "releasing" or "clearing" flags on cells, that means that ModuleMP has hanging on IpTp sessions

This procedure is for recovery of the problem

PROCEDURE:

Restart RncLmCell process on problematic ModuleMp, please refer to KCS document SCS1003029 "CPP : How To restart a board a process or JVM. Using telnet, NCLI, Moshell or EMAS"

WCDMA RNC : High Module MP load on extension subrack with only one ET-MFX

86%

Usage Count: 3

Network: WCDMA

Service: W-RAN

Node: CPP RNC

Alarm: Ethernet Switch Port Fault

RRC degradation in one extension subrack

High RRC Failure in one subrack

RRCSucc degradation on all Module of an RNC Extension Subrack

High processor load can be observed in extension subrack.

Module MP overload in an entire RNC subrack (processor load >85%)

High MP load in ETMFX

ReadErrorLog: ET-MFX

Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 21 to 19Ipet_scish_proc scish_root.c:320 INFO:Changing priority from 19 to 21

No access/connectivity to ET-MFX board

Root cause of the fault found. Configuration problem:

- High RRC failures on sites belonging to Module MP's on extension subracks which have one ET-MFX board

- High MP load on module MP's on Extension subracks with one ET-MFX board

- No or bad Connectivity to the impacted ET-MFX boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards.

Investigation :

http://e-support.ericsson.se/reader_iview/ui/eserver.asp?App=iView&ID=SCS1003029

This problem is due to a dimensioning issue. There is too much Iublink activity on the ES for one ET-MFX board. The dimensioning on the node did not follow the Ericsson recommendation and it did not take the full advantage of the Spanning Tree Protocol.

Connectivity to the ET-MFX boards was restored after remoduling sites to other subracks that have two load sharing ET-MFX boards.

General recommendations about ET-MFX usage:

1- It is recommended to have two ET-MFX boards per subrack for load sharing and redundancy. So that if one ET-MFX board was lost the other will take all the traffic.

2- ET-MFX load sharing is supported only in the subrack. Intersubrack ET-MFX load sharing is not supported.

3- If both ET-MFX boards on the subrack were lost, the Iublink need to be re-allocated to a new subrack manually (Iublink preferredSubrack attribute) to remain operational.

SOLUTION:

CONDITIONS:

1- High MP load on module MP's on Extension subracks with one ET-MFX board

2- Alarm: Ethernet Switch Port Fault

This procedure is for correction of the configuration problem.

PROCEDURE:

Add a second ET-MFX board to the subrack that has only one ET-MFX board.REMEDY:

CONDITIONS:

1- High MP load on module MP's on Extension subracks with one ET-MFX board

2- Alarm: Ethernet Switch Port Fault

This procedure is for work around to avoid the problem until a better configuration is used.

PROCEDURE:

Re-alloacte the RBS's on module MP's with high MP load to a new subrack manually (preferred to an Extension subrack with two ET-MFX boards), so that the Iub links remain operational.

Board Restart: ET-IPG Error code: 0xB0AD0006 Process: Cs_boardManager_proc

86%

Usage Count: 2

Data collection for ET-IPG restart Error code: 0xB0AD0006 Process: Cs_boardManager_proc

Network: WCDMA

Service: W-RAN

Node: CPP RNC

Board Restart: Board manager restart. Restart ordered by system manager

Process Restart: ET-IPGError code: 0xB0AD0006 (Reported via CELLO:ERI IF)Process: Cs_boardManager_procRestart type: ProcessorERROR NUMBER 0xB0AD0006 WITH EXTRA DATA 0x00A60ABC WAS REPORTED BYPROCESS Cs_boardManager_procTYPE PRI-10BLOCK osemainReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned

Refer to TR HO87223 for details

Root cause not found

There is some timing issue between memory and network processor (NPU) cause the NPU HW alarms and ET-IPG board restarts.

The timing issue already improve in the TR HM97390. The delay value is experimentally determined on the basis of the worst board we had at that moment. It is required to analyze the faulty board to adjust the timing issue further. SOLUTION:

CONDITIONS:

1. ET-IPG restarts without board alarms

2. ReadErrorLog: ET-IPG

Ipet_scish_proc scish_session.c:5234 ERROR:SciShSession applaudit req on non tagged session, true is returned

3. This procedure is to send the faulty board to PLM for further analysis PROCEDURE:

If the problem happens again please do the following steps:

1. collect dcgm/dcgi 2. change the ET-IPG board with good one

3. Send the board in the following address with new TR no

Ericsson ABUlf WallgrenSE KI30 06401Färögatan 6.SE-164 80 StockholmSweden

SCS1198264, SCS1049250

PLM needed board for further investigation. After that event the board is working fine. So customer does not want to send the board.

They will wait until it occur again.

ERROR:SciShSession applaudit req on non tagged session, true is returned.

86%

Usage Count: 2

Network: WCDMA

Node: CPP RNC 3810

Software: CPP P7FP CU4 EU67

High speech drop on IP/IUB

ET-FMX shows Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned

Upgrade from P7FP CU4 EU44 to P7FP CU4 EU67

Error trace states that application has setupped session before audit was finished. Reason for the call drops cannot be localized for nowREMEDY:


https://mhweb.ericsson.se/mhweb/servlet/trview?trid=HO87223

CONDITIONS:

1.- During failure it is found in error logs of ET-MFX the following message:

[2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned [2010-12-07 20:44:42.660] Ipet_scish_proc scish_session.c:4168 ERROR:SciShSession applaudit req on non tagged session, true is returned

PROCEDURE:

1.- board ET-MFX cold restart

Speech performance degraded after ET-IPG restart 86%

Usage Count: 2

Network: WCDMA

Service: W-RAN W10.1

Node: CPP RNC

Software: CPP RNC W10.1.2

Software: CPP RNC CXP9014711/2 R3F

Product Name: ET-IPG

Product ID: ROJ1192345/1

Speech performance degraded after ET-IPG restart

Board Restart: Board manager restart. Restart ordered by system manager

Process Restart: ET-IPGError code: 0xB0AD0006Process: Cs_boardManager_procRestart type: Processor

Speech performance affected on same subrack, where ET-IPG is located

ReadErrorLog: ET-IPGCls_Cls_atmPdr_proc atmpdr.c:596 INFO:Lost 1 packets on channel 14 due to Error: Errors: Length ReadErrorLog: ET-IPGCls_atmPdr_proc atmpdr.c:598 INFO:Egress VPI:0 VCI:134 Ingress VPI:0 VCI:131 Tag:0x4a1001bfbfbfbfReadErrorLog: ET-IPGIpet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returnedReadErrorLog: ET-IPGapp6drProc app6dr_bh_hwsup.c:1519 INFO:NPU HW alarm: n2=0x80000002, n3=0x0, n4=0x0, n4top=0x0, n5=0x0, n8=0x0ET-IPG recovered after restart by it self but from the statistics was observed that RRC, RAB and CCSR degradation started when the ET-IPG board restarted.

After ET-IPG restart there were the following captured: Ipet_scish_proc scish_session.c:5233 ERROR:SciShSession applaudit req on non tagged session, true is returned

This trace means that application sent in CELLO_IPAPPLSCI_AUDIT_SET_REQ signal session Id was not marked by special flag on CPP-IPET RO level. This could happen as application begins setup sessions over IPAPPLSCI before it ends sessions audit and sessions which are setup after begining of audit procedure do not have AUDIT flag. We have sessions audit after warm restart on ET-board, after application reconnection or application could start audit procedure by sending CELLO_IPAPPLSCI_AUDIT_SET_REQ.

This trace point out that application should not setup sessions before end of audit procedure over IPAPPLSCI.

Anyway this sessions will be kept after end of session audit and It will not affect traffic. See HN32045-AA001 for more details. Besides this no suspicious traces has been seen in log. So, reason of traffic degradation after ET-IPG restart is unclear.

References: HN32045, Restart is handled in solution SCS1049250

REMEDY:

CONDITIONS:

1. Speech RAB, RRC and CCSR success rate decradation2. For recovery

PROCEDURE:

1. Soft lock the ET-IPG that restarted2. Cold restart ET-IPG3. Unlock ET-IPGREMEDY:

CONDITIONS:

For data collection

PROCEDURE:

1. Several times with 1 minute delay, please run data collection script attached to TR with the following command: ipg_dcs_r1.mos <node password> (see note, ipg_dcs_r1.mos)

and provide logs captured for further analysis.

2. Log in to ET-IPG affected, enable and capture the following traces: te e all SCISH_SESSION

Also capture output from following coli commands: SciShDump -o 0 -c EtHostDump -o 0 -c

ipg_dcs_r1.mos - Copy script below line and run it as described in the procedure________________________________________________________________l+mmo $tempdir/dummy# to be silent...

l echo "### ET-IPG Data Collection Script - version R01"

##################### Script Variables #####################$NumberOfRepeats = 1$WaitTime = 0

############################################### ipg data function step 1 call function ###############################################

func get_data_from_ipg_step1_call

if $board ~ all for $board1 in group_ipg get_data_from_ipg_step1 $board1 done else if $board ~ ^[0-9]+$ get_data_from_ipg_step1 $board fi

endfunc

################################# ipg data function step 1 #################################

func get_data_from_ipg_step1

if $1 ~ ^[0-9]+$ $etipg = $1 else return fi


#start logging l+mmo $logdir/$nodename_$ipaddress_step0_$etipg_$date.log

lhsh $etipg appdh info for $v1 = 0 to 8 lhsh $etipg appdh ipif $v1 lhsh $etipg appdh dist $v1 lhsh $etipg appdh rps $v1 lhsh $etipg apphost data $v1 done lhsh $etipg apphost info lhsh $etipg applh info lhsh $etipg applh attr #stop logging l-

endfunc

############################################### ipg data function step 2 call function ###############################################

func get_data_from_ipg_step2_call

if $board ~ all for $board2 in group_ipg get_data_from_ipg_step2 $board2 done else if $board ~ ^[0-9]+$ get_data_from_ipg_step2 $board fi

endfunc

################################# ipg data function step 2 #################################

func get_data_from_ipg_step2

if $1 ~ ^[0-9]+$ $etipg = $1 else return fi lt InternalEthernetPort lt IpInterface lma vlnids GigabitEthernet mr vlnids GigabitEthernet #start logging l+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$etipg_stage$var_$date.log lhsh $etipg apparp info for $mo in vlnids lhsh $etipg apparp print $mo done for $v2 = 0 to 8 lhsh $etipg appdh cnt $v2 done lhsh $etipg appph info lhsh $etipg appph cnt lhsh $etipg applh cnt lhsh $etipg;appapi;pm all;q; lhsh $etipg;appapi;npr 3.0.0xc 6;q lhsh $etipg;appapi;npr 8.0.0xe00 0x48;q; lhsh $etipg;appapi;npr 8.0.0x3002c0 0x48;q; lhsh $etipg;appapi;npr 8.0.0x320000 0x44;q; lhsh $etipg;appapi;npr 8.0.0x330000 0x4c;q; lhsh $etipg;appapi;npRGS;q; lhsh $etipg;appapi;npRSS 0 40;q; #stop logging l- endfunc

################# BP traces #################

func get_BP_traces

if $1 ~ ^[0-9]+$ $intboardaddr = $1 else return fi if $debuglevel = 3 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e send_sig Ipet_ipps_proc fi if $debuglevel = 4 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e rec_sig Ipet_ipps_proc fi if $debuglevel = 5 && $mycpp_version = old5 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace3 Ipet_ipps_proc else if $debuglevel = 5 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace3 IPET_NPCI_IF fi if $debuglevel = 6 && $mycpp_version = old5 lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace4 Ipet_ipps_proc else if $debuglevel = 6 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e trace4 IPET_NPCI_IF fi if $debuglevel = 7 && $mycpp_version = new lhsh $intboardaddr; te log clear lhsh $intboardaddr; te e param Ipet_ipps_proc fi

if $debuglevel = 8 && $mycpp_version = old5 #start logging l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log lhsh $intboardaddr; te log read lhsh $intboardaddr; te default Ipet_ipps_proc #stop logging l- else if $debuglevel = 8 && $mycpp_version = new #start logging l+mmo $logdir/$nodename_$ipaddress_BP_traces_debuglevel_$debuglevel_$intboardaddr.log lhsh $intboardaddr; te log read lhsh $intboardaddr; te default Ipet_ipps_proc lhsh $intboardaddr; te default IPET_NPCI_IF

#stop logging l-

fi endfunc

############### MO data ###############

func get_MO_data

l echo "\n## Collecting MO information ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_MO_data_$date.log get GigaBitEthernet get IpInterface get IpAccessHostGpb pcr pmGigaBitEthernet GigaBitEthernet pcr pmIpInterface IpInterface pcr pmIpAccessHostGpb IpAccessHostGpb if $mycpp_version = old5 get UdpHostMainMsb get IpAccessHostMsb pcr pmIpAccessHostMsb IpAccessHostMsb

else

get IpAccessHostEt get IpAccessHostSpb

pcr pmIpAccessHostEt IpAccessHostEt pcr pmIpAccessHostSpb IpAccessHostSpb

fi if $mycpp_version = old5 && $debuglevel = 2 pdiff GigaBitEthernet|ipinterface|IpAccessUdpHostMsb|IpUdpHostMainMsb|IpAccessHostMsb|IpAccessHostGpb else if $mycpp_version != old5 && $debuglevel = 2 pdiff GigaBitEthernet|ipinterface|IpAccessHostEt|IpAccessHostGpb|IpAccessHostSpb fi

#stop logging l- endfunc

####################### get PM counters #######################

func get_PM_counters

#start logging l+mmo $logdir/$nodename_$ipaddress_PM_counters_stage$var_$date.log pget GigaBitEthernet pget IpInterface pget IpAccessHostGpb if $mycpp_version = old5

pget IpAccessHostMsb

else

pget IpAccessHostEt

pget IpAccessHostSpb

fi #stop logging l- endfunc

####################### del PM scanners #######################

func del_PM_scanners

pdel pmGigaBitEthernet pdel pmIpInterface pdel pmIpAccessHostGpb if $mycpp_version = old5 pdel pmIpAccessHostMsb else

pdel pmIpAccessHostEt pdel pmIpAccessHostSpb fi endfunc

######################################### SPAS statistics #########################################

func get_spashwinfo

l echo "### Collecting SPAS statistics ..." #start logging l+mmo $logdir/$nodename_$ipaddress_SPAS_statistics_$date.log ######################################## # ipg Boards # ######################################## if $board ~ all for $board1 in group_ipg lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done else if $board ~ ^[0-9]+$ lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq fi ######################################## # GPB Boards # ######################################## for $board1 in group_gpb lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done

######################################## # SCB Boards # ######################################## for $board1 in group_scb lhsh $board1; spashwinfo all lhsh $board1; spashwinfo egrq lhsh $board1; spashwinfo ingrq done #stop logging l-

endfunc

######################################### T&E, alarm, event logs and other info#########################################

func get_logs l echo "\n## Collecting Alarm and Event logs ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_Alarm_and_Event_logs_$date.log lgaer #stop logging l- l echo "\n## Get boards configuration ##\n" #start logging l+mmo $logdir/$nodename_$ipaddress_cabx_$date.log cabx #stop logging l- endfunc

################################### M F G - M A I N ###################################

func focus_on_ipg

get_MO_data get_logs get_data_from_ipg_step1_call for $var = 1 to $NumberOfRepeats get_data_from_ipg_step2_call get_PM_counters wait $WaitTime done del_PM_scanners

if $debuglevel = 2 get_spashwinfo fi ################### ## BP Traces ## ################### if $board ~ all && $debuglevel > 2 for $board4 in group_ipg get_BP_traces $board4 done else if $board ~ ^[0-9]+$ && $debuglevel > 2 get_BP_traces $board fi endfunc

############# USAGE #############func print_usage l echo "\n###########################################################################################" l echo "Syntax: run <script name> <password to node> <debuglevel> all|<specific>\n" l echo "where '<debuglevel>' is a value from 1 upwards telling type of info grabbed and" l echo "where 'all|<specific>' means all boards or a specific one which is referred as 012300"

l echo "(If only password to node set script will run with debug level=1 and collect iformation" l echo "from all boards)" l echo "example: run /home/xxkuzyaa/tmp/ipg_dcg.mos x 2 000900" l echo "\n###########################################################################################" l echo "\n<debuglevel>" l echo "--------------------------------" l echo " 1\t Collect ipg TE Log, NP counters, MO and PM counters" l echo " (without pdiff), Alarm and Event logs" l echo " 2\t Collect ipg TE Log, NP counters, MO and PM counters, Alarm and Event logs, SpasHwInfo," l echo " 3\t BP traces: enable send_sig on Ipet_ipps_proc" l echo " 4\t BP traces: enable rec_sig on Ipet_ipps_proc" l echo " 5\t BP traces: enable trace3 on Ipet_ipps_proc (CPP5.1) or trace3 on IPET_NPCI_IF (CPP6,7)" l echo " 6\t BP traces: enable trace4 on Ipet_ipps_proc (CPP5.1) or trace4 on IPET_NPCI_IF (CPP6,7)" l echo " 7\t BP traces: enable param on Ipet_ipps_proc (CPP6,7)" l echo " 8\t BP traces: Read and store T&E log" l echo "\n###########################################################################################"

endfunc

########################## #### M A I N #### ##########################

# check argumentsif $1 l echo "\nStarting ..." $password = $1 unset $1else print_usage l- returnfi

if $2 ~ ^[0-9]+$ $debuglevel = $2 unset $2else $debuglevel = 1fi

if $3 = all || $3 ~ ^[0-9]+$ $board = $3 unset $3 else $board = allfi

#some info to the userl echo "\n####################################################################################################"l echo "### Data collection executing ..."l echo "### Result is stored here: $logdir/$nodename_$ipaddress_$ipg_or_ipg_..."l echo "####################################################################################################"

$date = `date +%y%m%d-%H%M`

#start loggingl+mmo $logdir/$nodename_$ipaddress_ipg_dcg_$date.log

ba group_ipg ipg

######################################################Print all user variables and scripting variables######################################################uvpv

readclock################################### Get the MO's ###################################lt all

readclock

################################### Check the MOM version...#################################

#Possible printouts:#$cellomomversion = 6-LSV31-1#$cellomomversion = 6.1-LSV13-2#$cellomomversion = 7-LSV26_13-3#$celloversion = 7-LSV34.6BC1-1

if $cellomomversion >= 7 || $celloversion >= 7 $mycpp_version = newfi

if $cellomomversion >= 6 && $cellomomversion < 7 $mycpp_version = old6fi

if $celloversion >= 6 && $celloversion < 7 $mycpp_version = old6fi

if $cellomomversion >= 5 && $cellomomversion < 6 $mycpp_version = old5fi

if $celloversion >= 5 && $celloversion < 6 $mycpp_version = old5fi

l echo "\n###################################"l echo "## MOM version is $mycpp_version "l echo "###################################\n"

################################### Do the work!#################################focus_on_ipg

readclock

unset $date

#stop loggingl-

#stop silent loggingl-

#Done

ET-MFX Board Restart : OSE_ECORRUPTED_POOL 86%

Usage Count: 2

Network: WCDMA

Node: CPP RNC 3810

Service: W-RAN P7.1

Software: CPP RNC P7.1.4 EU55

Software: CPP RNC CXP9013831 R9YC/65

ET-MFX Board Restart : OSE_ECORRUPTED_POOL

ReadErrorLog: ET-MFXExs_spi_proc exspi_proc_write_normal.c:280 ERROR:Normal IO write failed with 3, page 0x80, reg 0x38, size 2, data 0x7C sender 0x101DDIpet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295

Ipet_scish_proc scish_session.c:2166 ERROR:Illegal sessionId=4294967295Root cause not found. The current problem is reported on a dbm2 based board where all the load modules share a common pool called mainpool for signal allocations. It is quite possible that a signal buffer can corrupt the other signals that lies adjacent to it. In such cases, the problems like OSE_ECORRUPTED_POOL can be reported on different processes.REMEDY:

CONDITIONS:1. The problem happens frequently in the RNC.

PROCEDURE:1. Enable bellow traces on the module MP which is using the ET-MFX board.

lh modx te e trace1 drhTrBrIpClh modx te e trace1 drhCcRhClh modx te e rec_sig send_sig param trace1 cpxApplSciCThe conditions under which OSE_ECORRUPTED_POOL is reported is most likely related to a user error. This kind of errors will be reported by kernel when it detects that the buffer that is presented to it via system calls such as send(), sender(), restore() etc.. is corrupted. The usual case for the fault is that some other process write to a buffer outside its allocated size. This will result in the overwriting of the next buffer i.e. you are likely to have the problem in some other part of the code that overwrites the buffer, but the problem is reported when this corrupted signal is presented to the kernel via system calls send, receive, restore etc...

Documents

IPG_Log