78
Sponsored by: CDMA TroubleShooting CDMA TroubleShooting Ft. Lauderdale, FL Ft. Lauderdale, FL March 4, 2008 March 4, 2008

IRT Troubleshooting 030408

Embed Size (px)

Citation preview

Page 1: IRT Troubleshooting 030408

Sponsored by:

CDMA TroubleShootingCDMA TroubleShooting

Ft. Lauderdale, FLFt. Lauderdale, FLMarch 4, 2008March 4, 2008

Page 2: IRT Troubleshooting 030408

2 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Agenda

9:10 Voice Trouble Resolution David Weixelman, Network Engineer, Sprint

9:55 Q&A

10:05 SMS Trouble Resolution Daniel Salek, Staff Engineer, Qualcomm

10:40 Q&A

10:50 Break

11:05 Packet Data

Nars Haran, US CellularBryan Cook, Senior Staff Engineer, Qualcomm

11:50 Q&A

Page 3: IRT Troubleshooting 030408

3 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Contributions

• Many thanks to the following for their contributions to the materials.– Bryan Cook, Qualcomm– Nars Haran, US Cellular– Jeff Kraus, US Cellular– Devora Pippenger, Syniverse– Daniel Salek, Qualcomm

Page 4: IRT Troubleshooting 030408

Sponsored by:

CDMA Voice CDMA Voice TroubleShootingTroubleShooting

Ft. Lauderdale, FLFt. Lauderdale, FLMarch 4, 2008March 4, 2008

Page 5: IRT Troubleshooting 030408

5 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Opening Remarks

• This presentation will attempt to take a holistic view of the trouble ticketing process with Document 87 as the centerpiece.

• I will speak about processes leading up to the main points of information in Document 87 as well as discuss information within the document and address processes after a ticket is resolved.

• This will hopefully provide you a template for improving your customer service and employee development in the world of roaming.

Page 6: IRT Troubleshooting 030408

6 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Outline / Agenda

1. Organizational Preparation

2. Ticket Methodology

3. Entrance criteria

4. Tools

5. Object lesson: Checklist

6. Investigation Results Report

7. Work Load And Root Cause Analysis

8. Wrap Up

Page 7: IRT Troubleshooting 030408

7 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#1 Organizational Preparation

• Know your roaming network configuration. Know your own network configuration

• Billing system access and basic navigational skills to access the customers account. This helps in validating any discrepancies between the billing system and the HLR.

• Network access to STP’s, HLR’s, MSC’s, SMSC’s and any other mission critical applications needed for roaming troubleshooting, such as the trouble ticket system.

• IS41 fundamental call flow and standards knowledge for analysis of call traces. Pictures, pictures, pictures. If you can’t draw it, you don’t know it as well as you should

Page 8: IRT Troubleshooting 030408

8 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#1 Organizational Preparation

Verisign

Direct Links with Carrier A

Direct Links with Carrier B

Syniverse

Understanding your network configuration.

Page 9: IRT Troubleshooting 030408

9 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#1 Organizational Preparation

• Detailed oriented people are usually best suited for this type of job.

• Also, people who can empathize with the customers situation and go the extra mile for resolution are your key personnel. Employees that personalize the situation are well suited for troubleshooting.

• If you have cross functional teams (Customer Care, Tier II, Tier III etc) handling roaming tickets, make sure all teams are in agreement on best practices for trouble ticket resolution

• Define where each team’s roles and responsibilities start and stop. This is usually best done through Service Level Agreements between the groups

Page 10: IRT Troubleshooting 030408

10 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket Methodology

• Every ticket logged is an opportunity to evaluate and learn something about:

– At the customer level

• Carriers get a first hand look at what the customer telling them about their service

– At the troubleshooting level (Customer Care, Tier 2 and Tier 3 levels)

• Are they prompted to ask the correct clarifying questions?

• Are they following the established processes?

• Do they have the proper tools at each level to correct the issues?

Page 11: IRT Troubleshooting 030408

11 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket Methodology

– At the implementation level

• Did something get overlooked during the implementation process?

• Does the implementation process need to be modified to accommodate new service enhancements?

– At the roaming partner level

• Are there particular areas having the same, repeated issues with the same roaming partner?

– At the device level

• Are there particular devices having certain issues?

• Are the newly launched devices?

Page 12: IRT Troubleshooting 030408

12 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket Methodology

• Evaluation of these types of questions can/will identify and drive inefficiencies out of all levels of the troubleshooting process and possibly other internal organizations, especially on the device front.

• Typically with many carriers, roaming is an afterthought and if device testing is not thoroughly completed from a network and a roaming perspective, carriers put the roaming ‘testing’ into the hands of their subscribers. This is not a good way to have a positive roaming experience for your subscribers, especially if they are internationally roaming half way around the world.

• Of course a balance must be struck between time to launch and testing. This is more easier said than done.

Page 13: IRT Troubleshooting 030408

13 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket Methodology

• What are the ticket metrics that are important for determining quality work throughout the troubleshooting process?

– Is the purpose of the process only focused on how fast tickets are closed?

• Fast closure alone = quality customer service?

– Or is there post mortem ticket analysis performed by the respective management teams to gauge their teams strength and weaknesses?

• Knowledgeable, skilled employees + consistent performance through defined best practices + ticket analysis = quality customer service

Page 14: IRT Troubleshooting 030408

14 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket Methodology

Represents someone with great knowledge, great closing speed

with great quality

Quality

Small Amount

Knowledge

Large Amount

Speed To Closure

Slow

FastRepresents someone with poor knowledge, great closing speed

with poor quality

Represents someone with great knowledge,

slow closing speed with good quality

Page 15: IRT Troubleshooting 030408

15 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#2 Ticket methodology

• Standardize what technical information to capture. This means establishing entrance and exit criteria. (Document 87)

• Identify and document common problems and solutions and from that create a troubleshooting ‘check list’.

• Establish systematic methodology for trouble resolution

– Once personnel get comfortable with a methodology they can ‘free lance’ to match their individual skill sets and talents provided the resolution and quality metrics are met. The key point being resolution and quality are to be monitored.

Page 16: IRT Troubleshooting 030408

16 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#3 Entrance Criteria

– MDN, MIN/IRM

– ESN/MEID/UIMID

– Detected date and time

– Roaming MSCID

– Problem description

– Location (City/State or City/Country)

– Duration of stay and alternate contact information

– Problem carriers contact information

Page 17: IRT Troubleshooting 030408

17 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#4 System Tools

– MSC access

– HLR access to validate customer profiles

– SMSC access

– STP access

– SS7 messaging analyzer (Access7)

– RSP messaging analyzer

– Troubleshooting ticketing system

– Billing system

Page 18: IRT Troubleshooting 030408

18 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: Checklist

Home HLR

Serving MSC

Home MSC

Home Voice Mail

Incoming call

Routing Request (A2)

LOC REQ (A1)

LOC REQ RET (A4)

XFER NUM (A7)

Registration = BCall Termination = A

Home HLR

STP

Home STP

STP

Home STP

STP

Direct Connects

STP

SS7 Providers

STP

Home STP

STP

Home STP

Route Request Return Result (TLDN#) (A3)

Standard Route Through the PSTN (A5)

No Answer & Page TMO/Redirect Request (A6)

Redirect Request Response (A9)

Reg Not (B2)

Reg Not Return Result (VLR Info) (B2)

XFER NUM (A8)

RO

UT

E (

A10

)

Page 19: IRT Troubleshooting 030408

19 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: ChecklistType of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause

Call origination: Can’t originate calls because of ‘manual roaming’ / credit card prompt

Serving MSC

Home HLR

MBI/IRM block

Special events

• Handset has ‘locked’ on a network where no automatic roaming agreement is implemented

•MBI/IRM block of which the phone belongs is not loaded in the roaming partners serving MSC

•MBI/IRM block not pointed to the correct HLR point code

•SS7 network capacity and/or issues with causing registration failure (example: Super Bowl or other large public events)

•MSCID of the roaming partner’s serving MSC is not loaded in the home HLR

Page 20: IRT Troubleshooting 030408

20 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: Checklist

Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause

Call origination: Fast busy / call failed

Serving MSC

Home HLR

MBI/IRM block

Special events

• Poor signal strength / to week to connect to cell site

•Network capacity / cell site in use is at capacity

•Handset / phone equipment transmitter failure or error

•PRL cycle has not yet acquired an available network

Page 21: IRT Troubleshooting 030408

21 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: Checklist

Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause

Call origination: Fast busy / call failed

Serving MSC

Home HLR

MBI/IRM block

Special events

• Similar to ‘fast busy’, a channel could not be found due to network capacity constraints

•If the mobile is not stationary and the network attempts to hand off to a cell cite that in which all channels are being utilized by other callers, the call will drop

•Signal strength is lower than the minimum to maintain the call

•Network outage and/or maintenance

Page 22: IRT Troubleshooting 030408

22 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: Checklist

Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause

Call termination: Receive recording ‘The number you have dialed is incorrect. Please check the number and dial again’

Customer education

Serving MSC

Home MSC

•The required number of digits is not keyed correctly by the caller

•Validate the TLDN being received from the roaming partner is sending the correct digits. If the Digits Identifier is labeled as International, 011 should not be sent to the Home carrier by the Roaming Partner or by their RSP. The standard is the TLDN should not have 011 in front of the TLDN on a Routing Request Response

•On a Lucent MSC, validate the ‘Apply Dialing Prefix For International TLDN’ is turned on in the switch.

Page 23: IRT Troubleshooting 030408

23 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#5 Object Lesson: ChecklistType of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause

Call Termination: Incoming calls go directly to voice mail without any ‘rings’ while the handset is roaming

Home HLR

Home MSC

Customer education

Coverage

• Home carriers HLR point codes are not loaded in the roaming partners network

•Applicable home MSC point code is not loaded in the roaming partners network

•HLR has not registered the handset on a roaming network (for various reasons) and the HLR has deleted the last location of the registration i.e. the HLR does not know where to tell the home MSC to direct the call and thus the traffic switch routes the call by default to the voice mail platform without paging any network

•Signal strength is lower that the minimum to ‘page’ the mobile on the roaming network and the call is redirected to the home traffic switch to terminate on the voice mail platform

Page 24: IRT Troubleshooting 030408

24 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#6 Investigation Results Report

• The Investigation Results Report (IRR) is to provide the ticket analyst(s) information on– What was done to resolve the issue– The date and time it was resolved– The root cause of the problem– And any action items needed to be taken

• If this information is consistently captured and analyzed, it can be highly useful information to apply toward root cause analysis

• There is no reason to have a results report if it is not utilized further in the wider scope of root cause analysis. It essentially becomes reporting for the sake of reporting and increases inefficiency in the troubleshooting process.

Page 25: IRT Troubleshooting 030408

25 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#7 Work Load And Root Cause Analysis

• From the entrance criteria received trending should be analyzed on that data– Trend tickets on volume of roaming tickets– Handset types– Categorized issues– Root cause and other information

• For categorized problems there should be a troubleshooting check list associated with them

Page 26: IRT Troubleshooting 030408

26 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#7 Work Load And Root Cause Analysis

Case Report MDN Device City ST/Country Problem

17406207-080209   RIM Blackberry 8830 Buenos Aires Argentina Searching for service

17415876-080212   RIM Blackberry 8830 Adelaide Australia Can't originate/terminate

17335469-080117   Handspring Treo 700W Georgetown Cayman Islands Can't originate/terminate

17354258-080124   Samsung SPH-A900M Bogota Columbia Can't terminate

17362912-080127   Handspring Treo 650 Guayaquil Ecuador Can't originate

17364974-080128   Handspring Treo 700W Bangalore India Can't originate

17415744-080212   RIM Blackberry 8830 Bengal Jamaica Can't originate/terminate

17314585-080110   Handspring Treo 650 Acapulco Mexico Can't originate/terminate due to auth issue

17386313-080202   Motorola Q Bangkok Thailand Can't originate/terminate

Page 27: IRT Troubleshooting 030408

27 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#7 Work Load And Root Cause Analysis• Categories For CDMA Voice Issues

– Generally speaking most voice issues are going to fall within one of the below categories

• Can’t originate

• Can’t originate. Fast busy

• Can’t originate to a specific number

• Can’t originate international calls

• Can’t originate or terminate

• Can’t originate or terminate. Welcome to…carrier’s name.

• Can’t retrieve voice mails

• Can’t deposit a voice mail

• Can’t terminate

• Can’t terminate from specific numbers

• Coverage complaints, dropped calls etc.

• Not a roaming issue

• Searching for service

Page 28: IRT Troubleshooting 030408

28 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#8 Wrap Up

Catagorize issue

Roaming customer

Customer Care

Tier 2

Tier 3

Yes

End

End

Yes

End

No

No

No

No resolution.New problem

Where?Issue?

Billing/Account verificationLogs ticket

System checks of MBI/Network ElementsCapture/Analyze RSP and/or live traces

Possibly test with customer or roaming partnerClarify the issue if need be

Further system checks in the networkAnalyze live traces received from Tier 2

Possibly test with customer or roaming partner

Investigation Results Report

Root cause analysis

Trending

Lessons learned

Updated training/tools/

knowledge

Process Improvement Resolution Process

Page 29: IRT Troubleshooting 030408

29 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#8 Wrap Up

0

5

10

15

20

25

Month 1 Month 2 Month 3 Month 4

Knowledge

Resolution Time

Tickets

Page 30: IRT Troubleshooting 030408

30 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

#8 Wrap Up

• Questions• Action Items

Page 31: IRT Troubleshooting 030408

Sponsored by:

SMS Roaming SMS Roaming TroubleshootingTroubleshooting

Ft LauderdaleFt LauderdaleMarch, 2008March, 2008

Page 32: IRT Troubleshooting 030408

32 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Contents

• Assumptions

• Background

• Reference documentation/Tools

• Possible Problems

• Troubleshooting Process

Page 33: IRT Troubleshooting 030408

33 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Assumptions

• Voice Roaming working– System Determination– Registration– ANSI-41 authorization

• Focus on SMS-specific issues• Assume element/link failures alarmed

– Focus here on subscriber-reported issues

• Not addressing Billing issues– In general assume billing records produced at MC

• Post-implementation issues– Assume initial testing completed

Page 34: IRT Troubleshooting 030408

34 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Background – Roaming Architecture

• ANSI-41 network elements involved in SMS Roaming:– Message Center (MC) – aka Short Message Service Center

(SMSC). Store and forward function for messages. End-point for SMS communication with a Mobile Station (MS)

– Mobile Switching Center (MSC) – Includes (for convenience) the VLR and Base Station. ANSI-41 to IS-2000 interface, and relay point for SMS messages

– Home Location Register (HLR) – Stores subscriber location and profile information. Doesn’t see actual SMS message contents

– Roaming Service Provider (RSP) – Usually present in CDMA-CDMA roaming today. Provides signaling connectivity and ANSI-41 translation. Looks like an MSC/VLR to the home network, and an HLR/MC to the serving network.

Page 35: IRT Troubleshooting 030408

35 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Background – Message Flows (1 of 3)

• Mobile-Terminated (MT):

MC

HLR MSC

MS1

2

3

4

5

1. Message arrives at MC, addressed to MS

2. MC queries HLR for MS location – SMS Request (SMSREQ) message3. HLR checks subscriber is authorized, returns address (SMS_Address from registration time)

4. MC sends message to MSC using the address received in the previous step – SMS Delivery Point To Point (SMDPP) message

5. MSC delivers message to MS over the air

Page 36: IRT Troubleshooting 030408

36 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Background – Message Flows (2 of 3)

• Mobile-Terminated with delayed delivery (MT):

MC

HLR MSC

MS1

2

3

45

1 - 4. As per previous slide

5. MS goes into coverage hole, message delivery fails. MSC sets “SMS Delivery Pending” flag for MS

6

6. Some time later, MS returns to coverage, makes system access

7. System access plus pending flag trigger MSC to send advice to MC that MS is available – SMSNotification (SMSNOT) message

7

8. MC resends SMDPP

8

9. Message is delivered successfully to MS

9

Other notification scenarios are possible – if HLR knows that subscriber is unavailable, it will issue the SMSNOT instead of the MSC

Page 37: IRT Troubleshooting 030408

37 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Background – Message Flows (3 of 3)

• Indirect routing means that the message is routed through the originator’s MC:

MSC

MSMC

2

1

31. The MS originates a short message

2. The MSC sends the message to the MC for this MS (SMDPP)

3. The MC analyzes the destination address, and routes the message on. If the destination is a MS which belongs to another MC, the message will be sent to that MC

• Mobile-Originated (MO) – Indirect Routing

Page 38: IRT Troubleshooting 030408

38 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Reference Documentation

• There are several sources of information you can turn to when faced with a problem:– General Reference

• ANSI-41 standard• ANSI-41 textbook• SMPP standard

– Roaming-specific• SMS Roaming Reference Document

– Carrier-specific• TDS• SMS Roaming Partner Qualification Form• SMS Test Plan Results

Page 39: IRT Troubleshooting 030408

39 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Tools• Tools available to assist in troubleshooting:

– HLR• O/b Subscriber profile, registration status

– MC• O/b message queue, maybe subscriber profile• Billing records

– MSC/VLR• I/b subscriber profile/status, SMSDPF?• No billing records produced

– Protocol Analyzer• Real time, may be swamped by roaming traffic

– RSP Trace• See message delivery attempts, longer storage

Page 40: IRT Troubleshooting 030408

40 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Possible Problems (1 of 5)

• List some potential areas where problems can arise:• Subscriber Provisioning

– Home vs Roaming• Some HLRs define a separate value of SMSTERMREST and

SMSORIGREST to be sent to MSCs designated as “roaming”.

– Unusual values• The most common values for these parameters are 0 (Block all) and

3 (Allow all). Other values might be handled poorly…

– Service Option• Specific service options are defined for SMS (6 & 14). Usually

however these aren’t required to be present in the CDMASOL.

Page 41: IRT Troubleshooting 030408

41 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Possible Problems (2 of 5)

• MSC Datafill– SMSADDR Population

• MSC’s PC/GT in application layer

• ITU vs ANSI encoding can be tricky

• This value usually overwritten by the RSP

– MC address• Required for MO-SMS.

• Associated by MIN range or HLR

• For roamers typically the same as the HLR address – i.e. RSP.

Page 42: IRT Troubleshooting 030408

42 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Possible Problems (3 of 5)

• RSP Datafill– SMSADDR Population

• Overwrite with their own address

– MC address• Required for MO-SMS

• Info supplied by home operator

• MC defined as valid sender for MT-SMS

– Addressing• Map serve-supplied addresses to home-required values –

e.g. MDN in SMS_OOA.

• HLR Datafill– SMSADDR (Again)

• Some HLRs statically define the SMSADDR against the MSCID

Page 43: IRT Troubleshooting 030408

43 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Possible Problems (4 of 5)

• User Error– Wrong dialplan

• Enter destination address in format for visited country

• Enter a short code only valid for the visited network’s subscribers

• Message “Jamming”– Subscriber not able to receive any messages

• Can occur when an overlength message arrives – this fails delivery but remains at the front of the queue in the MC – it is attempted again before any new incoming message

• Commercial Issues– SMS Roaming not yet implemented in a particular market

• Customers often expect/assume SMS to be present wherever voice roaming available

Page 44: IRT Troubleshooting 030408

44 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Possible Problems (5 of 5)

• Intermittent / Performance Issues– Hardest to troubleshoot

• Often reported after subscriber returns home

• Roaming cases may actually provide more information – access to trace information after-the-fact via RSP

– Examples:• “I never received an important message, but I received other

messages”

• “I was powered on in good coverage for hours before my messages arrived”

– Trending/aggregation may be important to decide if a bigger problem exists

Page 45: IRT Troubleshooting 030408

45 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Troubleshooting Process

• General stages will be equivalent to other roaming services

• Specific details will vary for SMS within the stages:– Clarifying the issue– Confirming expected behavior– Investigation– Resolution Actions– Feedback/lessons

Page 46: IRT Troubleshooting 030408

46 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Clarifying the Issue• Eliminate wider roaming issues

– Phone shows signal strength– Make/receive voice calls

• SMS specific– MO, MT or both affected?– Exact destination address for MO issues– Length of attempted message

• Impact– User-, MSC-, HLR-, MC-, Application-, Operator-wide?– Works at home?

• Time– Used to work/never worked/past fault

Exchange troubleshooting information as specified by IRT

Page 47: IRT Troubleshooting 030408

47 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Confirming Expected Behavior

• Is SMS supposed to work for this market?– Does troubleshooting team have access to an up-to-date list of

markets where MO/MT SMS is expected?

• Reference Check– Test Results/TDS/RPQF– Historical troubleshooting information– Is this a new issue?

Page 48: IRT Troubleshooting 030408

48 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Investigation

• Checklist– Subscriber authorized for SMS at HLR & VLR– Check RSP tool for delivery attempts

• If not present, may not be reaching RSP (datafill error, link/element outage) or may not be reaching RSP application layer (overlength)

• If present, check response. “Postponed” is the only SMSCAUSE value that indicates a notification is pending

– Check MC logs/queue

• Retest– Recreate issue if possible– Capture complete logs with protocol analyzer or MC/MSC tool– MC retry schedule may mask SMSNOT functioning

Page 49: IRT Troubleshooting 030408

49 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Resolution Actions

• Datafill errors: fix per operational policy – e.g. maintenance window only

• Provisioning errors: fix • Subscriber “reset” actions

– E.g. power cycle, VLR clear at RSP– May fix an unexplained problem– May prevent the problem from ever being explained– Balance between short- and long-term benefit to subscriber base

• Capability Gaps– Escalate per company procedures

Page 50: IRT Troubleshooting 030408

50 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Feedback

• How to ensure knowledge gained during troubleshooting process is captured and available in the future?– Knowledgebase– Training– Vendor follow-up– Statistical analysis

Page 51: IRT Troubleshooting 030408

51 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Thank [email protected]

Page 52: IRT Troubleshooting 030408

Sponsored by:

Packet Data RoamingPacket Data RoamingTroubleshootingTroubleshooting

Ft LauderdaleFt LauderdaleMarch, 2008March, 2008

Page 53: IRT Troubleshooting 030408

53 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Packet Data Roaming

• For the purposes of this module, “data roaming” implies:

– A subscriber accessing data services in a foreign network

– 1xRTT and/or EV-DO used to access data services

– Voice roaming is also functioning

Page 54: IRT Troubleshooting 030408

54 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Roaming IP Access with Mobile IP• IP address assigned by home agent (HA)

– Visited operator provides COA. – Mobile IP tunnel created between visited PDSN/FA and HA.

• Public Internet access tunnels back to home network• Access to home network servers without NAT

Internet/CRX

Internet/CRX

Home Operator

AAA

RAN

PDSN

PCF

Visited Operator

AAA

RAN

PDSNFA

PCFApplicationServer

10.23.45.13

HACOA

Page 55: IRT Troubleshooting 030408

55 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Roaming IP Access with Simple IP• Serving network assigns IP address to roamer• NAT required If private IP address assigned. • Direct access to the public Internet • VPN over public Internet to access home application servers

Internet/CRX

Internet/CRX

Home Network

AAA

RAN

PDSN

PCF

Serving Network

AAA

RAN

PDSN

PCFApplicationServer

10.23.45.13

NAT

Page 56: IRT Troubleshooting 030408

56 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Implementing Roaming with L2TP• Home operator LNS assigns roaming MS its IP address.• L2TP tunnel is created between visited PDSN/LAC and LNS.• Must tunnel back to home network to access public Internet• Access application servers in home network without NAT

Internet/CRX

Internet/CRX

Home Operator

AAA

RAN

PDSN

PCF

Visited Operator

AAA

RAN

PDSNFA

PCFApplicationServer

10.23.45.13

LNS

Page 57: IRT Troubleshooting 030408

57 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Aspects of Data Roaming Troubleshooting

• Pre vs. Post commercial implementation (focus here is post)

• Functional vs. performance

– Functional troubleshooting (It doesn’t work!)

– Performance troubleshooting (It works, but not very well!)

• Billing for data roaming out of scope of this training module

Page 58: IRT Troubleshooting 030408

58 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Organizational Procedures

• Essentially same as described in voice troubleshooting

– Prepare organization (personnel, trouble ticket system, etc.)

– Standardize what technical information to capture

– Identify and document common problems and solutions

– Establish systematic methodology

Page 59: IRT Troubleshooting 030408

59 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Functional

Troubleshooting

Page 60: IRT Troubleshooting 030408

60 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Troubleshooting Scenarios

• Subscribers reporting trouble vs. engineers troubleshooting a known issue with a device:

– Engineers have access to many more tools than subscribers

– Different methodologies are used in each case

• Device scenarios

– Handset only: Depends strongly on network logs for troubleshooting

– Handset with data cable and laptop: More tools available

– Data card (or tethered handset): Allows access to greatest number of network tools, although handset applications more difficult to test

Page 61: IRT Troubleshooting 030408

61 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Clarifying Questions (1/2)

• The first step in troubleshooting is providing a high-level clarification of the situation

• Important to all trouble shooting scenarios

– Data roaming implementation exists?

• Obviously, this should be “yes” or no issue exists

– Does handset/application function in home network?

• If “no”, then focus on issues in home network first

Page 62: IRT Troubleshooting 030408

62 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Clarifying Questions (2/2)

• Voice roaming work in foreign network?– If “no”, then focus on voice roaming first– System selection or HLR authentication related?

• Do any data applications work at all?– If “yes” then many potential issues eliminated

• System selection, authentication, basic network connectivity

– Shift focus to the specific application

• Data authentication obviously fails?– If “yes”, then focus on data authentication component

Page 63: IRT Troubleshooting 030408

63 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Troubleshooting Subscriber Reported Issue (1/3)• Assume clarifying questions have been answered

• Assume subscriber can’t access device tools (e.g. tracert, WireShark)

• Important for home operator to gather information about the subscriber’s device

• The required device information currently being standardized in CDG reference document

• Identifies troubleshooting info operators should gather:– MSID (IMSI, IRM)– MEID/ESN– MDN– NAI– IP Address– Technology– MIP, SIP– Application– tracert (if available, but requires data card and subscriber sophistication)

Page 64: IRT Troubleshooting 030408

64 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

• Essentially, dependent on infrastructure logs as subscribers don’t have access to or knowledge of device tools

Methodology:

• Use systematic approach, and eliminate categories of issues

• System selection failure– Look at subscriber’s PRL and roaming partner’s TDS– Work with roaming partner to determine possible issues

• Authentication failure– Review relevant H-AAA logs– Look for clues on reason for failure (bad password?)

Troubleshooting Subscriber Reported Issue (2/3)

Page 65: IRT Troubleshooting 030408

65 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

• Routing Issues– Check Home HA or LNS logs (pass authentication, etc?)– Look for possible firewall, port blocking, and routing table issues– Work with CRX and roaming partner engineers

• PPP Issues– Obtain roaming subscriber’s A10/A11 logs if available (e.g. RADCOM)– Otherwise, very difficult

Troubleshooting Subscriber Reported Issue (3/3)

Page 66: IRT Troubleshooting 030408

66 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Field Engineering Troubleshooting

• Implies an engineer troubleshooting in roaming market

• Engineer could be from home or visited market

• In either case, coordination between home/visited operators is usually required

• More tools are available and, obviously, a greater level of technical knowledge

Page 67: IRT Troubleshooting 030408

67 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Network Tools

Assumes data card or tethered laptop:

Tool NameTool Name PurposePurpose

ipconfig Provides TCP/IP information (i.e., IP address, adapters, gateways, etc.)

netstat Displays current TCP/IP connections and protocol information

Ping, hrping, pathpingGenerates ICMP echo requests to diagnosis routing, address resolution, latency, etc.

tracert, traceroute Provides hop count and RTT for a server

Nslookup Provides DNS and IP address information of a remote host

Route View and modify the local routing table

Hostname Provides the local computers NETBIOS hostname

telnet Terminal emulator to allow terminal-mode sessions with a host

FTP, TFTP Allows for TCP and UDP file transfers to and from a server

WireShark/EtherealAllows for packet sniffing, stream analysis, TCP traces, throughput calculation, etc.

Page 68: IRT Troubleshooting 030408

68 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Mobile IP Error Code Values• Code Values for Mobile IP Registration Reply Messages

– 0-8 Success Codes

• 0 = registration accepted

– 9-63 No allocation guidelines currently exist

– 64-127 Error Codes from the Foreign Agent

• 67 = MN Failed Authentication

• 68 = HA Failed Authentication

– 128-192 Error Codes from the Home Agent

• 129 = Administratively prohibited

– 193-200 Error Codes from the Gateway Foreign Agent

– 201-255 No allocation guidelines currently exist

• The error codes values can help explain the reason why Mobile IP registration failed.

• General MIP numbers found at: http://www.iana.org/assignments/mobileip-numbers

Page 69: IRT Troubleshooting 030408

69 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

PPP Connection Failures

• When PPP connections are unexpectedly failing a few items can be verified

• Checklist:

– Verify the correct networking interface/modem is selected for the connection

– Verify RF conditions are sufficient for establishing a connection

– Verify no other interfaces have active TCP/IP bindings on the device

– View PDSN, AAA, and PPP logs (from device)

Page 70: IRT Troubleshooting 030408

70 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Application Connectivity Issues

• Variety of reasons may cause Application Connectivity issues:

– Firewalls

• IP address ranges or specific application traffic may be blocked

• Examples: ICMP, SSH, Instant Messenger, Peer-to-peer traffic

– Port blocking

• Port ranges an application needs may be closed for security reasons

– Server availability

• A server may not exist or may have been moved

• May have exceeded the maximum number of connections

– Routing table

• Routes to an application server may not exist in routing tables

Page 71: IRT Troubleshooting 030408

71 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Application Connectivity Issues• A few things can be tried to mitigate application connectivity issues:

• Checklist:

– Try pinging the local host to verify the network interface is up

– Try pinging the server (remote host)

– Verify port blocking may be occurring

– Try different source/destination ports (if possible)

– Verify the route to the gateway host is defined

– Try another default gateway that may have a route to the host

– Try using another application server that may be less loaded

Page 72: IRT Troubleshooting 030408

72 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Performance

Troubleshooting

Page 73: IRT Troubleshooting 030408

73 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Performance Troubleshooting

• Assumes application(s) working, but not well

• Obviously, geographic distance to home servers can add significant latency (can’t be avoided)

• Usually requires engineers to troubleshoot

• Most performance troubleshooting requires significant coordination of:– Internal routing engineers– CRX– ISPs

Page 74: IRT Troubleshooting 030408

74 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Performance Troubleshooting

Type of IssueType of Issue Focus AreaFocus Area Possible CausePossible Cause

Latency Issues Network and Device• Number of hops and Routing problems• Routing problem• Spurious device traffic and laptop/device performance

Throughput Issues

Network • IP fragmentation

Transport• TCP congestion control• UDP packet loss

Application and Device• Spurious device traffic and laptop/device performance• Application server settings• Server Selection and loading

High Packet Error/Loss RateCables and Devices • Physical cables and devices

Network• IP fragmentation• Insufficient core network capacity

Sub-optimal Media and Application Performance

Network / Transport / Core Network / Application / Physical

Cables

• Networking loading• QoS• Server settings• Latency • IP fragmentation• High Packet Error/Loss rates

Page 75: IRT Troubleshooting 030408

75 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Latency Issues• Variety of issues may cause high/variable latency:

– Number of hops

• Too many hops between the client and server increases the RTT

– Routing problem

• Inefficiencies in routing tables may cause packets to not take the minimum path

• Incorrect default gateway selection causes redirection to other hosts

– Network loading

• Other users sharing the same data pipe cause packets to be queued

– Spurious device traffic

• Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs

Page 76: IRT Troubleshooting 030408

76 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

What to Verify for Latency Issues• When performance does not meet expectations due to

latency issues

– Throughput may be lower than expected

– Application responsiveness may be poor

• Checklist:

– Verify number of hops to server (traceroute)

– Verify round-trip time to server (Ping)

– Verify network loading (# of other users)

– Verify no extraneous or foreign traffic being generated by the device

Page 77: IRT Troubleshooting 030408

77 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Throughput Performance Issues• Variety of reasons may cause throughput issues:

– IP fragmentation

• Fragmenting of IP packets causes additional physical layer packets to be generated

• Results in a high percentage of packets in error, retransmissions, and delays

– TCP Congestion Control Issues / UDP packet loss

• Retransmissions will cause TCP Slow Start and Congestion avoidance

• Network congestion may cause lost UDP datagrams

– Spurious device traffic

• Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs

– Application server settings / server selection / server loading

• Sub-optimal FTP server settings will reduce data transfer capabilities

• A public server or a server located too many hops away may cause reduced throughputs

Page 78: IRT Troubleshooting 030408

78 www.cdg.org

Ft Lauderdale, March ’08Sponsored by Verisign

Thank [email protected]