Upload
dinhhoanggia
View
121
Download
1
Tags:
Embed Size (px)
Citation preview
Sponsored by:
CDMA TroubleShootingCDMA TroubleShooting
Ft. Lauderdale, FLFt. Lauderdale, FLMarch 4, 2008March 4, 2008
2 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Agenda
9:10 Voice Trouble Resolution David Weixelman, Network Engineer, Sprint
9:55 Q&A
10:05 SMS Trouble Resolution Daniel Salek, Staff Engineer, Qualcomm
10:40 Q&A
10:50 Break
11:05 Packet Data
Nars Haran, US CellularBryan Cook, Senior Staff Engineer, Qualcomm
11:50 Q&A
3 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Contributions
• Many thanks to the following for their contributions to the materials.– Bryan Cook, Qualcomm– Nars Haran, US Cellular– Jeff Kraus, US Cellular– Devora Pippenger, Syniverse– Daniel Salek, Qualcomm
Sponsored by:
CDMA Voice CDMA Voice TroubleShootingTroubleShooting
Ft. Lauderdale, FLFt. Lauderdale, FLMarch 4, 2008March 4, 2008
5 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Opening Remarks
• This presentation will attempt to take a holistic view of the trouble ticketing process with Document 87 as the centerpiece.
• I will speak about processes leading up to the main points of information in Document 87 as well as discuss information within the document and address processes after a ticket is resolved.
• This will hopefully provide you a template for improving your customer service and employee development in the world of roaming.
6 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Outline / Agenda
1. Organizational Preparation
2. Ticket Methodology
3. Entrance criteria
4. Tools
5. Object lesson: Checklist
6. Investigation Results Report
7. Work Load And Root Cause Analysis
8. Wrap Up
7 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#1 Organizational Preparation
• Know your roaming network configuration. Know your own network configuration
• Billing system access and basic navigational skills to access the customers account. This helps in validating any discrepancies between the billing system and the HLR.
• Network access to STP’s, HLR’s, MSC’s, SMSC’s and any other mission critical applications needed for roaming troubleshooting, such as the trouble ticket system.
• IS41 fundamental call flow and standards knowledge for analysis of call traces. Pictures, pictures, pictures. If you can’t draw it, you don’t know it as well as you should
8 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#1 Organizational Preparation
Verisign
Direct Links with Carrier A
Direct Links with Carrier B
Syniverse
Understanding your network configuration.
9 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#1 Organizational Preparation
• Detailed oriented people are usually best suited for this type of job.
• Also, people who can empathize with the customers situation and go the extra mile for resolution are your key personnel. Employees that personalize the situation are well suited for troubleshooting.
• If you have cross functional teams (Customer Care, Tier II, Tier III etc) handling roaming tickets, make sure all teams are in agreement on best practices for trouble ticket resolution
• Define where each team’s roles and responsibilities start and stop. This is usually best done through Service Level Agreements between the groups
10 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket Methodology
• Every ticket logged is an opportunity to evaluate and learn something about:
– At the customer level
• Carriers get a first hand look at what the customer telling them about their service
– At the troubleshooting level (Customer Care, Tier 2 and Tier 3 levels)
• Are they prompted to ask the correct clarifying questions?
• Are they following the established processes?
• Do they have the proper tools at each level to correct the issues?
11 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket Methodology
– At the implementation level
• Did something get overlooked during the implementation process?
• Does the implementation process need to be modified to accommodate new service enhancements?
– At the roaming partner level
• Are there particular areas having the same, repeated issues with the same roaming partner?
– At the device level
• Are there particular devices having certain issues?
• Are the newly launched devices?
12 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket Methodology
• Evaluation of these types of questions can/will identify and drive inefficiencies out of all levels of the troubleshooting process and possibly other internal organizations, especially on the device front.
• Typically with many carriers, roaming is an afterthought and if device testing is not thoroughly completed from a network and a roaming perspective, carriers put the roaming ‘testing’ into the hands of their subscribers. This is not a good way to have a positive roaming experience for your subscribers, especially if they are internationally roaming half way around the world.
• Of course a balance must be struck between time to launch and testing. This is more easier said than done.
13 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket Methodology
• What are the ticket metrics that are important for determining quality work throughout the troubleshooting process?
– Is the purpose of the process only focused on how fast tickets are closed?
• Fast closure alone = quality customer service?
– Or is there post mortem ticket analysis performed by the respective management teams to gauge their teams strength and weaknesses?
• Knowledgeable, skilled employees + consistent performance through defined best practices + ticket analysis = quality customer service
14 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket Methodology
Represents someone with great knowledge, great closing speed
with great quality
Quality
Small Amount
Knowledge
Large Amount
Speed To Closure
Slow
FastRepresents someone with poor knowledge, great closing speed
with poor quality
Represents someone with great knowledge,
slow closing speed with good quality
15 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#2 Ticket methodology
• Standardize what technical information to capture. This means establishing entrance and exit criteria. (Document 87)
• Identify and document common problems and solutions and from that create a troubleshooting ‘check list’.
• Establish systematic methodology for trouble resolution
– Once personnel get comfortable with a methodology they can ‘free lance’ to match their individual skill sets and talents provided the resolution and quality metrics are met. The key point being resolution and quality are to be monitored.
16 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#3 Entrance Criteria
– MDN, MIN/IRM
– ESN/MEID/UIMID
– Detected date and time
– Roaming MSCID
– Problem description
– Location (City/State or City/Country)
– Duration of stay and alternate contact information
– Problem carriers contact information
17 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#4 System Tools
– MSC access
– HLR access to validate customer profiles
– SMSC access
– STP access
– SS7 messaging analyzer (Access7)
– RSP messaging analyzer
– Troubleshooting ticketing system
– Billing system
18 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: Checklist
Home HLR
Serving MSC
Home MSC
Home Voice Mail
Incoming call
Routing Request (A2)
LOC REQ (A1)
LOC REQ RET (A4)
XFER NUM (A7)
Registration = BCall Termination = A
Home HLR
STP
Home STP
STP
Home STP
STP
Direct Connects
STP
SS7 Providers
STP
Home STP
STP
Home STP
Route Request Return Result (TLDN#) (A3)
Standard Route Through the PSTN (A5)
No Answer & Page TMO/Redirect Request (A6)
Redirect Request Response (A9)
Reg Not (B2)
Reg Not Return Result (VLR Info) (B2)
XFER NUM (A8)
RO
UT
E (
A10
)
19 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: ChecklistType of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause
Call origination: Can’t originate calls because of ‘manual roaming’ / credit card prompt
Serving MSC
Home HLR
MBI/IRM block
Special events
• Handset has ‘locked’ on a network where no automatic roaming agreement is implemented
•MBI/IRM block of which the phone belongs is not loaded in the roaming partners serving MSC
•MBI/IRM block not pointed to the correct HLR point code
•SS7 network capacity and/or issues with causing registration failure (example: Super Bowl or other large public events)
•MSCID of the roaming partner’s serving MSC is not loaded in the home HLR
20 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: Checklist
Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause
Call origination: Fast busy / call failed
Serving MSC
Home HLR
MBI/IRM block
Special events
• Poor signal strength / to week to connect to cell site
•Network capacity / cell site in use is at capacity
•Handset / phone equipment transmitter failure or error
•PRL cycle has not yet acquired an available network
21 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: Checklist
Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause
Call origination: Fast busy / call failed
Serving MSC
Home HLR
MBI/IRM block
Special events
• Similar to ‘fast busy’, a channel could not be found due to network capacity constraints
•If the mobile is not stationary and the network attempts to hand off to a cell cite that in which all channels are being utilized by other callers, the call will drop
•Signal strength is lower than the minimum to maintain the call
•Network outage and/or maintenance
22 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: Checklist
Type of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause
Call termination: Receive recording ‘The number you have dialed is incorrect. Please check the number and dial again’
Customer education
Serving MSC
Home MSC
•The required number of digits is not keyed correctly by the caller
•Validate the TLDN being received from the roaming partner is sending the correct digits. If the Digits Identifier is labeled as International, 011 should not be sent to the Home carrier by the Roaming Partner or by their RSP. The standard is the TLDN should not have 011 in front of the TLDN on a Routing Request Response
•On a Lucent MSC, validate the ‘Apply Dialing Prefix For International TLDN’ is turned on in the switch.
23 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#5 Object Lesson: ChecklistType of IssueType of Issue Focus AreasFocus Areas Possible CausePossible Cause
Call Termination: Incoming calls go directly to voice mail without any ‘rings’ while the handset is roaming
Home HLR
Home MSC
Customer education
Coverage
• Home carriers HLR point codes are not loaded in the roaming partners network
•Applicable home MSC point code is not loaded in the roaming partners network
•HLR has not registered the handset on a roaming network (for various reasons) and the HLR has deleted the last location of the registration i.e. the HLR does not know where to tell the home MSC to direct the call and thus the traffic switch routes the call by default to the voice mail platform without paging any network
•Signal strength is lower that the minimum to ‘page’ the mobile on the roaming network and the call is redirected to the home traffic switch to terminate on the voice mail platform
24 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#6 Investigation Results Report
• The Investigation Results Report (IRR) is to provide the ticket analyst(s) information on– What was done to resolve the issue– The date and time it was resolved– The root cause of the problem– And any action items needed to be taken
• If this information is consistently captured and analyzed, it can be highly useful information to apply toward root cause analysis
• There is no reason to have a results report if it is not utilized further in the wider scope of root cause analysis. It essentially becomes reporting for the sake of reporting and increases inefficiency in the troubleshooting process.
25 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#7 Work Load And Root Cause Analysis
• From the entrance criteria received trending should be analyzed on that data– Trend tickets on volume of roaming tickets– Handset types– Categorized issues– Root cause and other information
• For categorized problems there should be a troubleshooting check list associated with them
26 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#7 Work Load And Root Cause Analysis
Case Report MDN Device City ST/Country Problem
17406207-080209 RIM Blackberry 8830 Buenos Aires Argentina Searching for service
17415876-080212 RIM Blackberry 8830 Adelaide Australia Can't originate/terminate
17335469-080117 Handspring Treo 700W Georgetown Cayman Islands Can't originate/terminate
17354258-080124 Samsung SPH-A900M Bogota Columbia Can't terminate
17362912-080127 Handspring Treo 650 Guayaquil Ecuador Can't originate
17364974-080128 Handspring Treo 700W Bangalore India Can't originate
17415744-080212 RIM Blackberry 8830 Bengal Jamaica Can't originate/terminate
17314585-080110 Handspring Treo 650 Acapulco Mexico Can't originate/terminate due to auth issue
17386313-080202 Motorola Q Bangkok Thailand Can't originate/terminate
27 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#7 Work Load And Root Cause Analysis• Categories For CDMA Voice Issues
– Generally speaking most voice issues are going to fall within one of the below categories
• Can’t originate
• Can’t originate. Fast busy
• Can’t originate to a specific number
• Can’t originate international calls
• Can’t originate or terminate
• Can’t originate or terminate. Welcome to…carrier’s name.
• Can’t retrieve voice mails
• Can’t deposit a voice mail
• Can’t terminate
• Can’t terminate from specific numbers
• Coverage complaints, dropped calls etc.
• Not a roaming issue
• Searching for service
28 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#8 Wrap Up
Catagorize issue
Roaming customer
Customer Care
Tier 2
Tier 3
Yes
End
End
Yes
End
No
No
No
No resolution.New problem
Where?Issue?
Billing/Account verificationLogs ticket
System checks of MBI/Network ElementsCapture/Analyze RSP and/or live traces
Possibly test with customer or roaming partnerClarify the issue if need be
Further system checks in the networkAnalyze live traces received from Tier 2
Possibly test with customer or roaming partner
Investigation Results Report
Root cause analysis
Trending
Lessons learned
Updated training/tools/
knowledge
Process Improvement Resolution Process
29 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#8 Wrap Up
0
5
10
15
20
25
Month 1 Month 2 Month 3 Month 4
Knowledge
Resolution Time
Tickets
30 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
#8 Wrap Up
• Questions• Action Items
Sponsored by:
SMS Roaming SMS Roaming TroubleshootingTroubleshooting
Ft LauderdaleFt LauderdaleMarch, 2008March, 2008
32 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Contents
• Assumptions
• Background
• Reference documentation/Tools
• Possible Problems
• Troubleshooting Process
33 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Assumptions
• Voice Roaming working– System Determination– Registration– ANSI-41 authorization
• Focus on SMS-specific issues• Assume element/link failures alarmed
– Focus here on subscriber-reported issues
• Not addressing Billing issues– In general assume billing records produced at MC
• Post-implementation issues– Assume initial testing completed
34 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Background – Roaming Architecture
• ANSI-41 network elements involved in SMS Roaming:– Message Center (MC) – aka Short Message Service Center
(SMSC). Store and forward function for messages. End-point for SMS communication with a Mobile Station (MS)
– Mobile Switching Center (MSC) – Includes (for convenience) the VLR and Base Station. ANSI-41 to IS-2000 interface, and relay point for SMS messages
– Home Location Register (HLR) – Stores subscriber location and profile information. Doesn’t see actual SMS message contents
– Roaming Service Provider (RSP) – Usually present in CDMA-CDMA roaming today. Provides signaling connectivity and ANSI-41 translation. Looks like an MSC/VLR to the home network, and an HLR/MC to the serving network.
35 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Background – Message Flows (1 of 3)
• Mobile-Terminated (MT):
MC
HLR MSC
MS1
2
3
4
5
1. Message arrives at MC, addressed to MS
2. MC queries HLR for MS location – SMS Request (SMSREQ) message3. HLR checks subscriber is authorized, returns address (SMS_Address from registration time)
4. MC sends message to MSC using the address received in the previous step – SMS Delivery Point To Point (SMDPP) message
5. MSC delivers message to MS over the air
36 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Background – Message Flows (2 of 3)
• Mobile-Terminated with delayed delivery (MT):
MC
HLR MSC
MS1
2
3
45
1 - 4. As per previous slide
5. MS goes into coverage hole, message delivery fails. MSC sets “SMS Delivery Pending” flag for MS
6
6. Some time later, MS returns to coverage, makes system access
7. System access plus pending flag trigger MSC to send advice to MC that MS is available – SMSNotification (SMSNOT) message
7
8. MC resends SMDPP
8
9. Message is delivered successfully to MS
9
Other notification scenarios are possible – if HLR knows that subscriber is unavailable, it will issue the SMSNOT instead of the MSC
37 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Background – Message Flows (3 of 3)
• Indirect routing means that the message is routed through the originator’s MC:
MSC
MSMC
2
1
31. The MS originates a short message
2. The MSC sends the message to the MC for this MS (SMDPP)
3. The MC analyzes the destination address, and routes the message on. If the destination is a MS which belongs to another MC, the message will be sent to that MC
• Mobile-Originated (MO) – Indirect Routing
38 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Reference Documentation
• There are several sources of information you can turn to when faced with a problem:– General Reference
• ANSI-41 standard• ANSI-41 textbook• SMPP standard
– Roaming-specific• SMS Roaming Reference Document
– Carrier-specific• TDS• SMS Roaming Partner Qualification Form• SMS Test Plan Results
39 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Tools• Tools available to assist in troubleshooting:
– HLR• O/b Subscriber profile, registration status
– MC• O/b message queue, maybe subscriber profile• Billing records
– MSC/VLR• I/b subscriber profile/status, SMSDPF?• No billing records produced
– Protocol Analyzer• Real time, may be swamped by roaming traffic
– RSP Trace• See message delivery attempts, longer storage
40 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Possible Problems (1 of 5)
• List some potential areas where problems can arise:• Subscriber Provisioning
– Home vs Roaming• Some HLRs define a separate value of SMSTERMREST and
SMSORIGREST to be sent to MSCs designated as “roaming”.
– Unusual values• The most common values for these parameters are 0 (Block all) and
3 (Allow all). Other values might be handled poorly…
– Service Option• Specific service options are defined for SMS (6 & 14). Usually
however these aren’t required to be present in the CDMASOL.
41 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Possible Problems (2 of 5)
• MSC Datafill– SMSADDR Population
• MSC’s PC/GT in application layer
• ITU vs ANSI encoding can be tricky
• This value usually overwritten by the RSP
– MC address• Required for MO-SMS.
• Associated by MIN range or HLR
• For roamers typically the same as the HLR address – i.e. RSP.
42 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Possible Problems (3 of 5)
• RSP Datafill– SMSADDR Population
• Overwrite with their own address
– MC address• Required for MO-SMS
• Info supplied by home operator
• MC defined as valid sender for MT-SMS
– Addressing• Map serve-supplied addresses to home-required values –
e.g. MDN in SMS_OOA.
• HLR Datafill– SMSADDR (Again)
• Some HLRs statically define the SMSADDR against the MSCID
43 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Possible Problems (4 of 5)
• User Error– Wrong dialplan
• Enter destination address in format for visited country
• Enter a short code only valid for the visited network’s subscribers
• Message “Jamming”– Subscriber not able to receive any messages
• Can occur when an overlength message arrives – this fails delivery but remains at the front of the queue in the MC – it is attempted again before any new incoming message
• Commercial Issues– SMS Roaming not yet implemented in a particular market
• Customers often expect/assume SMS to be present wherever voice roaming available
44 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Possible Problems (5 of 5)
• Intermittent / Performance Issues– Hardest to troubleshoot
• Often reported after subscriber returns home
• Roaming cases may actually provide more information – access to trace information after-the-fact via RSP
– Examples:• “I never received an important message, but I received other
messages”
• “I was powered on in good coverage for hours before my messages arrived”
– Trending/aggregation may be important to decide if a bigger problem exists
45 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Troubleshooting Process
• General stages will be equivalent to other roaming services
• Specific details will vary for SMS within the stages:– Clarifying the issue– Confirming expected behavior– Investigation– Resolution Actions– Feedback/lessons
46 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Clarifying the Issue• Eliminate wider roaming issues
– Phone shows signal strength– Make/receive voice calls
• SMS specific– MO, MT or both affected?– Exact destination address for MO issues– Length of attempted message
• Impact– User-, MSC-, HLR-, MC-, Application-, Operator-wide?– Works at home?
• Time– Used to work/never worked/past fault
Exchange troubleshooting information as specified by IRT
47 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Confirming Expected Behavior
• Is SMS supposed to work for this market?– Does troubleshooting team have access to an up-to-date list of
markets where MO/MT SMS is expected?
• Reference Check– Test Results/TDS/RPQF– Historical troubleshooting information– Is this a new issue?
48 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Investigation
• Checklist– Subscriber authorized for SMS at HLR & VLR– Check RSP tool for delivery attempts
• If not present, may not be reaching RSP (datafill error, link/element outage) or may not be reaching RSP application layer (overlength)
• If present, check response. “Postponed” is the only SMSCAUSE value that indicates a notification is pending
– Check MC logs/queue
• Retest– Recreate issue if possible– Capture complete logs with protocol analyzer or MC/MSC tool– MC retry schedule may mask SMSNOT functioning
49 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Resolution Actions
• Datafill errors: fix per operational policy – e.g. maintenance window only
• Provisioning errors: fix • Subscriber “reset” actions
– E.g. power cycle, VLR clear at RSP– May fix an unexplained problem– May prevent the problem from ever being explained– Balance between short- and long-term benefit to subscriber base
• Capability Gaps– Escalate per company procedures
50 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Feedback
• How to ensure knowledge gained during troubleshooting process is captured and available in the future?– Knowledgebase– Training– Vendor follow-up– Statistical analysis
Sponsored by:
Packet Data RoamingPacket Data RoamingTroubleshootingTroubleshooting
Ft LauderdaleFt LauderdaleMarch, 2008March, 2008
53 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Packet Data Roaming
• For the purposes of this module, “data roaming” implies:
– A subscriber accessing data services in a foreign network
– 1xRTT and/or EV-DO used to access data services
– Voice roaming is also functioning
54 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Roaming IP Access with Mobile IP• IP address assigned by home agent (HA)
– Visited operator provides COA. – Mobile IP tunnel created between visited PDSN/FA and HA.
• Public Internet access tunnels back to home network• Access to home network servers without NAT
Internet/CRX
Internet/CRX
Home Operator
AAA
RAN
PDSN
PCF
Visited Operator
AAA
RAN
PDSNFA
PCFApplicationServer
10.23.45.13
HACOA
55 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Roaming IP Access with Simple IP• Serving network assigns IP address to roamer• NAT required If private IP address assigned. • Direct access to the public Internet • VPN over public Internet to access home application servers
Internet/CRX
Internet/CRX
Home Network
AAA
RAN
PDSN
PCF
Serving Network
AAA
RAN
PDSN
PCFApplicationServer
10.23.45.13
NAT
56 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Implementing Roaming with L2TP• Home operator LNS assigns roaming MS its IP address.• L2TP tunnel is created between visited PDSN/LAC and LNS.• Must tunnel back to home network to access public Internet• Access application servers in home network without NAT
Internet/CRX
Internet/CRX
Home Operator
AAA
RAN
PDSN
PCF
Visited Operator
AAA
RAN
PDSNFA
PCFApplicationServer
10.23.45.13
LNS
57 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Aspects of Data Roaming Troubleshooting
• Pre vs. Post commercial implementation (focus here is post)
• Functional vs. performance
– Functional troubleshooting (It doesn’t work!)
– Performance troubleshooting (It works, but not very well!)
• Billing for data roaming out of scope of this training module
58 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Organizational Procedures
• Essentially same as described in voice troubleshooting
– Prepare organization (personnel, trouble ticket system, etc.)
– Standardize what technical information to capture
– Identify and document common problems and solutions
– Establish systematic methodology
59 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Functional
Troubleshooting
60 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Troubleshooting Scenarios
• Subscribers reporting trouble vs. engineers troubleshooting a known issue with a device:
– Engineers have access to many more tools than subscribers
– Different methodologies are used in each case
• Device scenarios
– Handset only: Depends strongly on network logs for troubleshooting
– Handset with data cable and laptop: More tools available
– Data card (or tethered handset): Allows access to greatest number of network tools, although handset applications more difficult to test
61 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Clarifying Questions (1/2)
• The first step in troubleshooting is providing a high-level clarification of the situation
• Important to all trouble shooting scenarios
– Data roaming implementation exists?
• Obviously, this should be “yes” or no issue exists
– Does handset/application function in home network?
• If “no”, then focus on issues in home network first
62 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Clarifying Questions (2/2)
• Voice roaming work in foreign network?– If “no”, then focus on voice roaming first– System selection or HLR authentication related?
• Do any data applications work at all?– If “yes” then many potential issues eliminated
• System selection, authentication, basic network connectivity
– Shift focus to the specific application
• Data authentication obviously fails?– If “yes”, then focus on data authentication component
63 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Troubleshooting Subscriber Reported Issue (1/3)• Assume clarifying questions have been answered
• Assume subscriber can’t access device tools (e.g. tracert, WireShark)
• Important for home operator to gather information about the subscriber’s device
• The required device information currently being standardized in CDG reference document
• Identifies troubleshooting info operators should gather:– MSID (IMSI, IRM)– MEID/ESN– MDN– NAI– IP Address– Technology– MIP, SIP– Application– tracert (if available, but requires data card and subscriber sophistication)
64 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
• Essentially, dependent on infrastructure logs as subscribers don’t have access to or knowledge of device tools
Methodology:
• Use systematic approach, and eliminate categories of issues
• System selection failure– Look at subscriber’s PRL and roaming partner’s TDS– Work with roaming partner to determine possible issues
• Authentication failure– Review relevant H-AAA logs– Look for clues on reason for failure (bad password?)
Troubleshooting Subscriber Reported Issue (2/3)
65 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
• Routing Issues– Check Home HA or LNS logs (pass authentication, etc?)– Look for possible firewall, port blocking, and routing table issues– Work with CRX and roaming partner engineers
• PPP Issues– Obtain roaming subscriber’s A10/A11 logs if available (e.g. RADCOM)– Otherwise, very difficult
Troubleshooting Subscriber Reported Issue (3/3)
66 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Field Engineering Troubleshooting
• Implies an engineer troubleshooting in roaming market
• Engineer could be from home or visited market
• In either case, coordination between home/visited operators is usually required
• More tools are available and, obviously, a greater level of technical knowledge
67 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Network Tools
Assumes data card or tethered laptop:
Tool NameTool Name PurposePurpose
ipconfig Provides TCP/IP information (i.e., IP address, adapters, gateways, etc.)
netstat Displays current TCP/IP connections and protocol information
Ping, hrping, pathpingGenerates ICMP echo requests to diagnosis routing, address resolution, latency, etc.
tracert, traceroute Provides hop count and RTT for a server
Nslookup Provides DNS and IP address information of a remote host
Route View and modify the local routing table
Hostname Provides the local computers NETBIOS hostname
telnet Terminal emulator to allow terminal-mode sessions with a host
FTP, TFTP Allows for TCP and UDP file transfers to and from a server
WireShark/EtherealAllows for packet sniffing, stream analysis, TCP traces, throughput calculation, etc.
68 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Mobile IP Error Code Values• Code Values for Mobile IP Registration Reply Messages
– 0-8 Success Codes
• 0 = registration accepted
– 9-63 No allocation guidelines currently exist
– 64-127 Error Codes from the Foreign Agent
• 67 = MN Failed Authentication
• 68 = HA Failed Authentication
– 128-192 Error Codes from the Home Agent
• 129 = Administratively prohibited
– 193-200 Error Codes from the Gateway Foreign Agent
– 201-255 No allocation guidelines currently exist
• The error codes values can help explain the reason why Mobile IP registration failed.
• General MIP numbers found at: http://www.iana.org/assignments/mobileip-numbers
69 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
PPP Connection Failures
• When PPP connections are unexpectedly failing a few items can be verified
• Checklist:
– Verify the correct networking interface/modem is selected for the connection
– Verify RF conditions are sufficient for establishing a connection
– Verify no other interfaces have active TCP/IP bindings on the device
– View PDSN, AAA, and PPP logs (from device)
70 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Application Connectivity Issues
• Variety of reasons may cause Application Connectivity issues:
– Firewalls
• IP address ranges or specific application traffic may be blocked
• Examples: ICMP, SSH, Instant Messenger, Peer-to-peer traffic
– Port blocking
• Port ranges an application needs may be closed for security reasons
– Server availability
• A server may not exist or may have been moved
• May have exceeded the maximum number of connections
– Routing table
• Routes to an application server may not exist in routing tables
71 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Application Connectivity Issues• A few things can be tried to mitigate application connectivity issues:
• Checklist:
– Try pinging the local host to verify the network interface is up
– Try pinging the server (remote host)
– Verify port blocking may be occurring
– Try different source/destination ports (if possible)
– Verify the route to the gateway host is defined
– Try another default gateway that may have a route to the host
– Try using another application server that may be less loaded
72 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Performance
Troubleshooting
73 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Performance Troubleshooting
• Assumes application(s) working, but not well
• Obviously, geographic distance to home servers can add significant latency (can’t be avoided)
• Usually requires engineers to troubleshoot
• Most performance troubleshooting requires significant coordination of:– Internal routing engineers– CRX– ISPs
74 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Performance Troubleshooting
Type of IssueType of Issue Focus AreaFocus Area Possible CausePossible Cause
Latency Issues Network and Device• Number of hops and Routing problems• Routing problem• Spurious device traffic and laptop/device performance
Throughput Issues
Network • IP fragmentation
Transport• TCP congestion control• UDP packet loss
Application and Device• Spurious device traffic and laptop/device performance• Application server settings• Server Selection and loading
High Packet Error/Loss RateCables and Devices • Physical cables and devices
Network• IP fragmentation• Insufficient core network capacity
Sub-optimal Media and Application Performance
Network / Transport / Core Network / Application / Physical
Cables
• Networking loading• QoS• Server settings• Latency • IP fragmentation• High Packet Error/Loss rates
75 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Latency Issues• Variety of issues may cause high/variable latency:
– Number of hops
• Too many hops between the client and server increases the RTT
– Routing problem
• Inefficiencies in routing tables may cause packets to not take the minimum path
• Incorrect default gateway selection causes redirection to other hosts
– Network loading
• Other users sharing the same data pipe cause packets to be queued
– Spurious device traffic
• Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs
76 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
What to Verify for Latency Issues• When performance does not meet expectations due to
latency issues
– Throughput may be lower than expected
– Application responsiveness may be poor
• Checklist:
– Verify number of hops to server (traceroute)
– Verify round-trip time to server (Ping)
– Verify network loading (# of other users)
– Verify no extraneous or foreign traffic being generated by the device
77 www.cdg.org
Ft Lauderdale, March ’08Sponsored by Verisign
Throughput Performance Issues• Variety of reasons may cause throughput issues:
– IP fragmentation
• Fragmenting of IP packets causes additional physical layer packets to be generated
• Results in a high percentage of packets in error, retransmissions, and delays
– TCP Congestion Control Issues / UDP packet loss
• Retransmissions will cause TCP Slow Start and Congestion avoidance
• Network congestion may cause lost UDP datagrams
– Spurious device traffic
• Unaccounted for traffic generated by malware applications, spam, etc. will share the data pipe and reduce throughputs
– Application server settings / server selection / server loading
• Sub-optimal FTP server settings will reduce data transfer capabilities
• A public server or a server located too many hops away may cause reduced throughputs