24
INTRODUCTION Objective Statistical RNC/RXI/RBS counters reflect the performance of a 3G cell within the network. Specific counters are analysed to detect various faults. The objective of the procedure is therefore to trace any network-related faults to the source of the problem by analysing the fault symptoms evident in the counter values. Scope This procedure identifies and recommends solutions for network faults identified through the analysis of RNC/RXI/RBS statistical counters. References Additional references to this procedure are as follows: ALEX Libraries: o Radio Network Controller (RNC) 3810 (CXP 901 2011 RXX) o RXI 820 ATM R4.1 (CXP 901 102/3 RXX) o Radio Base Station (RBS) 3202/3206/3402/3412 (CXP 901 0811/X RXX) o WCDMA RAN (CXS 101 06/4 RXX) TEMS User’s Manual For RNC/RXI/RBS counter descriptions refer to the Performance Statistics document within the relevant RNC/RXI/RBS ALEX Library. PROCEDURE ANALYSING COUNTERS 1.Available on an RNC, RXI and RBS basis, the FACTS tool is available to interface to and represent the relevant counters. Counters are collected on a quarterly (15 minute) basis (known as a reporting period) and are stored from the operational date of the cell thereby allowing for past analysis. 2.There are numerous counters available from an RNC/RXI/RBS. However, this procedure concentrates on counters reflecting the critical performance of the cells. These counters (and formulae derived there from) are best analysed graphically through the use of FACTS. The formulae used for statistics such as DCR and CSSR may be obtained within the FACTS. 3.Both the NMC and the Planning & Optimization Engineer are responsible for monitoring counters. The NMC has the responsibility of maintaining the active status of all cells and therefore must act in accordance to all such related 3G Cell Optimisation

3G Cell Optimization

Embed Size (px)

DESCRIPTION

3g Cell optimization

Citation preview

  • INTRODUCTION

    ObjectiveStatistical RNC/RXI/RBS counters reflect the performance of a 3G cell within the network. Specific counters are analysed to detect various faults. The objective of the procedure is therefore to trace any network-related faults to the source of the problem by analysing the fault symptoms evident in the counter values.

    ScopeThis procedure identifies and recommends solutions for network faults identified through the analysis of RNC/RXI/RBS statistical counters.

    ReferencesAdditional references to this procedure are as follows:

    ALEX Libraries: o Radio Network Controller (RNC) 3810 (CXP 901 2011 RXX)o RXI 820 ATM R4.1 (CXP 901 102/3 RXX)o Radio Base Station (RBS) 3202/3206/3402/3412 (CXP 901 0811/X RXX)o WCDMA RAN (CXS 101 06/4 RXX)

    TEMS Users Manual

    For RNC/RXI/RBS counter descriptions refer to the Performance Statistics document within the relevant RNC/RXI/RBS ALEX Library.

    PROCEDURE

    ANALYSING COUNTERS

    1.Available on an RNC, RXI and RBS basis, the FACTS tool is available to interface to and represent the relevant counters. Counters are collected on a quarterly (15 minute) basis (known as a reporting period) and are stored from the operational date of the cell thereby allowing for past analysis.

    2.There are numerous counters available from an RNC/RXI/RBS. However, this procedure concentrates on counters reflecting the critical performance of the cells. These counters (and formulae derived there from) are best analysed graphically through the use of FACTS. The formulae used for statistics such as DCR and CSSR may be obtained within the FACTS.

    3.Both the NMC and the Planning & Optimization Engineer are responsible for monitoring counters. The NMC has the responsibility of maintaining the active status of all cells and therefore must act in accordance to all such related

    3G Cell Optimisation

  • counters. The Planning and Optimization Engineer monitors and acts on counters reflecting the cells active performance.

    4.It is possible to configure alarms to be generated for counters exceeding specific values. These alarms would then be monitored by the NMC.

    5.For the Radio Planning & Optimisation Engineer the focus is on maintaining adequate cell performance in terms of Accessibility (call setup analysis), Retainability (drop call analysis) and Integrity (speech quality/video quality/packet throughput analysis)

    ACCESSIBILITY

    6.If a cell has poor accessibility it is typically due to either some form of congestion or a hardware/software fault or a misconfiguration. It is also possible that there is some external source of interference (such as a microwave link on the same frequency) affecting the accessibility.

    7.Accessibility should be monitored independently for the different RAB types (e.g. Speech, CS Video, PS Interactive R99, PS Interactive HSDPA, etc.) as in certain situations only one of the RAB types will be affected. For example, a disabled HS-TXB will affect the accessibility of the PS Interactive HSDPA RAB, but if the RBS also has a TXB (non-HS) installed then the other RABs may continue to have an acceptable accessibility.

    8.When a low CSSR is detected on a cell the first thing to check is if Admission Control is rejecting the RRC/RAB setup attempt (pmNoReqDeniedAdm) or if it is failing after admission (pmNoFailedAfterAdm). For high pmNoReqDeniedAdm refer to the Admission Control sections below. For high pmNoFailedAfterAdm refer to the Failure After Admission sections below.

  • Example: FACTS Report showing a low CSSR Speech caused by a high pmNoReqDeniedAdm. Note that pmNoReqDeniedAdm is not RAB specific so other RABs will most likely be affected in this case too.

  • Admission Control: DL Power

    9.If Admission Control rejects a RAB establishment due to a lack of DL power then the counter pmNoFailedRabEstAttemptLackDlPwr is incremented. Check that the feeder losses are configured correctly in the RBS and that the parameter maximumTransmissionPower is set correctly (typically to maxDlPowerCapability minus 0.2dBm). The value of parameter pwrAdm should also be verified (typically set to 75%). Also, check for MCPA alarms i.e. sometimes RBS 3202 sites with high feeder losses are configured with two MCPAs per sector and one of the MCPAs fails resulting in a large reduction in the maxDlPowerCapability of the sector and, hence, causing a lack of DL power. Long term solutions are to increase the power capability of the sector by adding or upgrading an MCPA (RBS 3203) or RU (RBS 3206), re-engineering the site to reduce feeder lengths, or perhaps to change the RBS type to one using RRUs (RBS 3402 or RBS 3412) if this provides higher power at the reference point. The short term solution is to reduce the traffic carried by the site (See the Traffic Offload sections).

    Example: FACTS Report showing a high no of RAB establishment failures due to Admission Control rejections caused by a lack of DL power. In this situation the site had only one of the two MCPAs in sector 1 functioning correctly causing the DL power congestion. This is shown in the cabinet viewer snapshot below (Red LED on MCPA). There was also an alarm in the RBS for the faulty MCPA.

  • Admission Control: DL Channelisation Codes

    10. If Admission Control rejects a RAB establishment due to a lack of DL channelisation codes then the counter pmNoFailedRabEstAttemptLackDlChnlCode is incremented. This will typically affect the PS Interactive R99 (DCH/FACH) CSSR worse than the Speech CSSR as the PS Interactive R99 RAB requires channelisation codes at a lower spreading factor (using more of the code tree). In the P4 software release a cell that supports R99 and HSDPA typically has 5 spreading factor 16 DL channelisation codes reserved for HSDPA. This means that approximately 32% of available codes are reserved for HSDPA. When this is the case it is common for DL channelisation code congestion too occur. Check the setting of parameter dlCodeAdm (typically set to 85% on MTNs network). The long term solution is to add another cell in the coverage area to take some of the traffic; this may be achieved by introducing a second carrier, another sector, or another site. The short term solution is to reduce the traffic carried by the site (See the Traffic Offload sections).

  • Example: FACTS Reports showing a high no of RAB establishment failures due to Admission Control rejections caused by a lack of DL channelisation codes; and the corresponding decrease in CSSR for Packet Interactive. In this case a large portion of the speech calls were already redirected to GSM so the R99 Packet Interactive RAB was worst affected; the required solution is sectorisation of the inbuilding antenna system or implementation of a second carrier frequency.

  • Admission Control: UL/DL ASE

    11. If Admission Control rejects a RAB establishment due to a lack of UL or DL air speech equivalent (ASE) then the counters pmNoFailedRabEstAttemptLackUlAse or pmNoFailedRabEstAttemptLackDlAse are incremented. The ASE monitor accounts for the air interface resource usage in a cell (separately for UL and DL) by means of an average static load estimation of each radio link in the cell; for more information refer to the Capacity Management document in the WCDMA RAN ALEX library. Because a RLs ASE is an estimation it is possible that in certain situations it is an over-estimation of the load in a cell e.g. for inbuilding cells on a different carrier frequency to the surrounding macro cells. In such situations it is possible to increase the UL/DL ASE admission control limit (parameters aseUlAdm/aseDlAdm) in order to prevent unnecessary admission control rejections of RAB establishments. So, a short term solution to relieve such congestion may be to increase aseUlAdm/aseDlAdm, but the effect on DCR/CSSR should be closely monitored (Note that the aseUlAdm default value on MTNs network is already less stringent than the Ericsson default). Another short term solution is to reduce the traffic carried by the site (See the Traffic Offload sections). The long term solution is to add another cell in the coverage area to take some of the traffic; this may be achieved by introducing a second carrier, another sector, or another site.

    Example: FACTS Report showing RAB establishment failures due to Admission Control rejections caused by a lack of UL ASE. In this case the UL ASE congestion was minor congestion for a few days so no action was taken.

  • Admission Control: Connection Limits

    12. If Admission Control rejects a RAB establishment due to exceeding the configured connection limit for SF 8, SF 16, or SF 32 then the counter pmNoFailedRabEstAttemptExceedConnLimit is incremented. These spreading factors are used by the PS64/384, PS64/128, and PS64/384 RBs so the connection limit blocking typically applies to channel switching between these RBs for an R99 packet interactive RAB. The connection limits are configured by parameters sf8Adm, sf16Adm and sf32Adm. The default settings allow the maximum possible number of RLs for each spreading factor in which case Admission Control will not block for this reason. Lower settings have been tested (in combination with adjusted Class B QoS settings on the Iub interface) in which case some connection limit rejections were obtained. But this is a special situation and for the purpose of this document such connection limit rejections are not worth further consideration.

    Admission Control: Hardware Usage (Channel Elements)

    13. It is possible for Admission Control to reject a RAB establishment attempt due to insufficient UL or DL RBS hardware capacity i.e. too few channel elements available. The channel element capacity of an RBS may be software limited (according the software license configured for the RBS) or hardware limited (according to the TXBs and RAXBs installed in the RBS). The two parameters that control the RBS hardware admission policy are ulHwAdm and dlHwAdm. By default these parameters should be set to 100% in which case no hardware is reserved for handovers and Admission Control will not block RAB establishment attempts for this reason (see Failure After Admission: Hardware Usage). In software revision P4 there is no specific counter to indicate this type of Admission Control rejection, so if pmNoReqDeniedAdm is triggered without any of the other relevant counters indicating a reason then it is likely that this is the cause and that ulHwAdm or dlHwAdm is incorrectly configured to a value below 100%. In the P5 software release there are new counters that indicate when lack of hardware capacity causes RAB establishment failures in a cell: pmNoFailedRabEstAttemptLackDlHw, pmNoFailedRabEstAttemptLackDlHwBest, pmNoFailedRabEstAttemptLackUlHw, pmNoFailedRabEstAttemptLackUlHwBest.

  • Example: FACTS Reports showing RAB establishment failures due to Admission Control with no counter showing the reason (this is for P4). In this case the HW admission limits were suspected and found to be ulHwAdm=70 and dlHwAdm=70 (instead of both being 100). After correcting these settings the Admission Control rejections disappeared and, as can be seen in the second plot below, the Packet CSSR improved. In the third plot below the UL CE Usage is seen to peak around 45 CEs. This RBS had a capacity of 64 UL CEs; 70% of 64 CEs is 44.8 CEs. In otherwords, the UL CE Usage and the Admission Control limit correlate to each other.

  • Failure After Admission: Iub Congestion

    14. If a UTRAN cell has a high number of RRC/RAB establishment request failures after being admitted by Admission Control (pmNoFailedAfterAdm), then a common reason would be due to Iub Congestion. When considering the Iub interface it is important to remember that mainly RABs configured to use strict AAL2 QoS settings will be blocked at call setup by AAL2 CAC. Typically the R99 RABs (i.e. all RABs excluding HSDPA and EUL RABs) are configured to use AAL2 QoS class A or class B, with both classes configured to use a strict QoS. HSDPA and EUL will typically use AAL2 QoS class C and class D, with both classes configured to use a best effort QoS. Typically the R99 Packet Interactive RAB will be the first RAB to show signs of AAL2 congestion with a poor Packet Interactive CSSR and corresponding high pmNoFailedAfterAdm. The AAL2 Setup Success Rate statistics from the relevant RXI towards the RBS may then be investigated. This should typically be 99% and above, but if not and the counter pmUnSuccOutConnsLocal indicates that it is local rejections (on the RXI) by CAC, then there is congestion on the Iub interface.

    Example: FACTS Reports showing high pmNoFailedAfterAdm (1st plot), low CSSR Packet Interactive (2nd plot), and low AAL2 Call Setup Success Rate with corresponding high pmUnSuccOutConnsLocal (3rd plot). From 2006-11-24 the problem disappears. In this case the solution was to activate Directed Retry to GSM and to change the AAL2 QoS class B traffic to use a best effort configuration thereby allowing more PS64/128 and PS64/384 users (as well as ordering a 2nd E1 to the site); note that this RBS did not have HSDPA configured therefore there was no concern about affecting the experience of HS users as described in section Considerations For HSDPA: Iub Bandwidth.

  • Failure After Admission: Core Transport Network Congestion

    15. Related to the above point (Failure After Admission: Iub Congestion) is transport network congestion in links other than the Iub e.g. RNCMGW (Iu-

    cs), RNCSGSN (Iu-ps) and inter-MGW links. If this is the case then the CSSR of an entire RNC(s) will deteriorate along with the AAL2 Setup Success Rate for a major link to the RNC. It would then be necessary to look at the link utilisation in order to confirm such link congestion, but that is beyond the scope of this document.

    Example: FACTS Reports showing poor CSSR Speech for CTRNC1 for two days and then an improvement for the next two days (1st plot); and the corresponding AAL2 Setup Success Rate for the CTMGW1->RBMGW1 (2nd plot) and RBMGW1->CTMGW1 (3rd plot) links for the same days. The CTMGW1RBMGW1 link had a high utilisation (>80%) so the peak cell rate (PCR) for the link was increased resulting in the noticeable improvement.

  • Failure after Admission: Hardware Usage (Channel Elements)

    16. A high number of RRC/RAB setup failures after admission (pmNoFailedAfterAdm) could be due to insufficient UL or DL RBS hardware

    capacity i.e. too few channel elements available. The channel element capacity of an RBS may be software limited (according the software license configured for the RBS) or hardware limited (according to the TXBs and RAXBs installed in the RBS). The two parameters that control the RBS hardware admission policy are ulHwAdm and dlHwAdm. If these parameters are set to a value lower than 100% then Admission Control should block any RRC/RAB setup attempts requiring more than the available channel elements (see Admission Control: Hardware Usage); however, by default these parameters should be set to 100% in which case no hardware is reserved for handovers and Admission Control will not block RAB establishment attempts for this reason so the setup attempt fails after admission. The RBS counters pmSetupFailureSfXX in the UplinkBasebandPool (ULSETUPFAILURESSFXX) and pmSetupFailureSfXX in the DownlinkBasebandPool (DLSETUPFAILURESSFXX) indicate RL (at SF XX) setup failures due to a lack of UL and DL hardware capacity. If this is the case then a short term solution may be to reduce the traffic carried by the site (See the Traffic Offload sections). The long term solution is to upgrade the UL (RAXB) or DL (TXB) channel element capacity of the site. This may be achieved by swapping the relevant board with that of another site that has more capacity than it requires, or by sourcing a new board. Note that it is possible for these counters to increment even when there should be sufficient channel element capacity (for example due to a software bug in the software revision being used; see Failure After Admission: Other) so it is important to compare the channel element usage to the channel element capacity of the RBS to make sure that it makes sense for this to be the root of the problem.

    Example: FACTS Reports showing poor CSSR Packet Interactive (1st plot); high pmNoFailedAfterAdm (2nd plot); and UL setup failures due to a lack of UL baseband hardware capacity (RAXB). Note that this RBS had 64 UL channel element capacity until 31st August when it was upgraded to 128 UL channel elements. The estimated UL CE Usage peaks above 64 channel elements even before the 31st confirming that RAXB congestion is the source of the problem, and then after the upgrade to 128 channel elements the UL CE Usage starts peaking above 100 indicating how necessary the upgrade was. The improvement to CSSR Packet Interactive and the decrease in pmNoFailedAfterAdm after the RAXB upgrade is clearly noticeable.

  • Failure After Admission: Other

    17. If none of the above reasons for a poor CSSR are apparent, then it is likely to be a more complicated problem to resolve; often relating to a software/hardware fault, or perhaps an external source of interference in the area. At the time of writing, the 3G technology is not as mature as the current 2G system (as would be expected) and hence there are still numerous improvements being implemented in every software release, along with the continued development of new, more efficient and optimised hardware generations for the various 3G nodes. The example below illustrates one such problem of this type encountered.

    Example: FACTS Reports showing poor CSSR Speech with high pmNoFailedAfterAdm (1st plot); and high pmSetupFailuresSfXX indicating TXB congestion. However, the DL CE Usage is very low, seldom peaking above 6 channel elements so this doesnt make sense. After investigating numerous RBSs showing these symptoms it was established that they all had a single HS-TXB as opposed to the other RBSs which all had a TXB as well as an HS-TXB. Both configurations are valid and have more than sufficient downlink channel element capacity. It was also noted that if the RBS is restarted then the problem disappeared for a few days and then re-appeared; this is clearly visible in the plots where the restart occurred on 2 January. This turned out to be a software fault for the single TXB configuration (due to a failure to release some resources on the TXB). The fix was delivered from software release P4.0.20 (whereas the release installed on the nodes at the time was P4.0.12).

  • RETAINABILITY

    18. If a cell has poor retainability it is typically due to either missing neighbour definitions (WCDMA and/or GSM), overshooting cell(s), a misbehaving neighbour

    site, a hardware/software fault or a misconfiguration. It is also possible that there is some external source of interference (such as a microwave link on the same frequency) affecting the retainability.

    19. Retainability should be monitored independently for the different RAB types (e.g. Speech, CS Video, PS Interactive R99, PS Interactive HSDPA, etc.) as in certain situations only one of the RAB types will be affected. For example, a cell may be configured with GSM as the preferred HO type in which case Speech calls will perform IRAT handovers to GSM rather than performing IFHOs, but CS Video calls will perform IFHOs. In such a situation, missing inter-frequency neighbour cell relation definitions will impact the DCR of CS Video calls, but not Speech calls.

    20. However, in the majority of cases the factors that affect the Speech retainability will also affect the retainability of the other RABs. When a high speech DCR is detected on a cell the first thing to check is the type of drops occurring as indicated by the counters pmNoSysRelSpeechSoHo, pmNoSysRelSpeechNeighbr, pmNoSysRelSpeechUlSynch and pmNoOfTermSpeechCong; and then to analyse the situation with the following in mind

    Soft Handover Drops

    21. Typically a cell that has a high number of dropped calls due to SOHO failures (pmNoSysRelSpeechSoHo) will also have a high number of drops due to missing neighbours (pmNoSysRelSpeechNeighbr) indicting that the SOHO failures are due to missing neighbour relations; however, there are situations where SOHO failures happen for other reasons. Two common reasons are a neighbouring cell that is misbehaving (often due to faulty hardware/software) or a misconfiguration resulting in a failure to perform an inter-RNC SOHO across the Iur interface. These two situations are illustrated in the following examples

    Example: FACTS Reports showing two cells in the same area (1st plot) with a high pmNoSysRelSpeechSoHo and a much lower pmNoSysRelSpeechNeighbr indicating that the soft handover failures are not due to missing neighbours (2nd & 3rd plot). After further investigation it was discovered that the cells on the neighbouring site U4554 were automatically locked (4th plot) and the Mub interface to the site was down. These cells were transmitting CPICH yet there were multiple channels (RACH, FACH, etc) that were disabled preventing the site from carrying any traffic. However, UEs in the neighbouring cells were measuring the CPICH from these cells and attempting to perform SOHO to them. Such SOHO attempts were failing leading to the SOHO drops. As is clearly visible in the FACTS Reports, when the site U4554 came back on air on 22 Jan the SOHO drops on the neighbouring cells disappeared along with a huge reduction in the DCR experienced by these cells.

  • Example: Refering to the three sites shown in the figure below (1st plot): U1393 and U0547 are on CTRNC1 while U3970 is on TBRNC1. The three FACTS Reports below (2nd, 3rd & 4th plots) show cells from these sites with a high pmNoSysRelSpeechSoHo. Note that although there are some drops due to missing neighbours (pmNoSysRelSpeechNeighbr), but most of the SOHO drops are for another reason. In this case the soft handover counters (pmRlAddAttemptBestCellSpeech and pmRlAddSuccessBestCellSpeech) indicated SOHO success between 3970C1 and 1393B1/547B1; however a GPEH trace of event INTERNAL_SOFT_HANDOVER_EXECUTION showed that these handovers actually failed (see snapshot of slide in 5th plot). It was established that a misconfiguration of an AAL2 routing case between the two RNCs resulted in all SOHO attempts across the Iur interface to fail. This was corrected on 9 Jan and from the FACTS Reports the improvement is obvious.

  • Missing Neighbour Drops

    22. A cell that has a high number of dropped calls due to missing neighbour relations will have a high pmNoSysRelSpeechNeighbr. A missing neighbour relation will only cause a dropped call if the RNC receives an Event 1a, 1c or 1d Measurement Report from the UE requesting the addition of a SC to the AS (or an HS cell change) for a SC that is not defined as a neighbour relation to any of the cells in the AS and if the Ec/No reported for that SC is releaseConnOffset above the Ec/No of the best serving cell in the AS; where the RNC parameter releaseConnOffset is typically set to 12dBm. The reason for this system release is to prevent excessive UL interference in the network. This type of dropped call is relatively easy to solve using the General Performance Event Handler (GPEH) tool in OSS-RC. With this tool all details on Event 1a, 1c or 1d Measurement Reports containing a SC not in the AS neighbour list may be captured and analysed using the INTERNAL_SOHO_DS_MISSING_NEIGHBOUR event (including those Measurement Reports that do not cause a system release of the call). In this way the missing neighbour or interfering cell may be established and appropriate action taken e.g. addition of the neighbour relation and/or antenna tilting, etc. For more information on the GPEH tool refer to the relevant documentation in the ALEX RNC and OSS-RC libraries. Because missing neighbour drops are relatively easy to solve, it is recommended to optimise the neighbour relations and antenna configuration until the percentage of drops due to missing neighbour relations is less than 10% of the total number of drops in each RNC.

    Example: FACTS Report showing a high DCR Speech on cell 1379C1 with the majority of dropped calls due to missing neighbours as shown by the counter pmNoSysRelSpeechNeighbr (1st plot). A GPEH trace with event INTERNAL_SOHO_DS_MISSING_NEIGHBOUR was executed on 4 Jan where it was found that SC 24 and SC 88 were the major cause of these missing neighbour drops (2nd plot). The cells in the area with these scrambling codes were found to be 10C1 and 416B1 (3rd plot). With the addition of these two neighbours to 1379C1 on 4 Jan the improvement in the DCR Speech from around 5% to around 2% is clearly visible in the FACTS Report.