Upload
ngokiet
View
226
Download
3
Embed Size (px)
Citation preview
Troubleshooting Your System Intermediate
By: Matt Messinger and Ed Simmerson
2
Class Objectives
Class Objectives
• This is troubleshooting primarily for issues with Sales • Show key mechanisms in place • Provide some methods for troubleshooting the most
common situations • Find common approaches that can apply to nearly any
problem • Help you resolve issues fastest either on your own or with
Siriusware Technical Support
3
RegularTechnical Support
4
• Regular Technical Support – You will need to provide detailed information on what is happening
• Software Versions (sales / middleware / database / etc.)
• Steps to recreate are the best if it’s a bug
• get the actual error message verbatim even if it doesn’t make sense. “pc load letter?”
• Logs and ini files
– If it used to work, what has changed? – Where is it happening? One machine or all? Is it reproducible? … – The better information you can provide, the faster it is to resolve.
Working with Siriusware Technical Support
5
• In my opinion, TeamViewer has been the best thing since sliced bread • Makes remoting into your system simple, reliable and secure. • Available on http://www.siriusware.com/teamviewer • Launch it and we can connect. Kill it and we can’t. • Leave it running and we can get right back in…
TeamViewer
6
• Three test options available: – Completely independent system
• Set up a local system for test purposes only. That way you can update your whole system and test out all of your products prior to rolling out the updates.
– Training mode • A fully independent system that does not interact with your live system at
all. This uses your existing versions, but a separate training database, so that it does not interfere with your live operations.
– A test license for a salespoint • Uses your live configuration, but does not forward any data to the server
database.
• If you have a seasonal operation, make sure you test out all of your products and versions prior to opening day.
How to prevent problems BEFORE they happen
7
First Problem
Demo • Screen Looks weird… what happened?
• What do we do now?
• There’s the scattershot approach
• And there’s and experimental approach
8
• Successively trying different things • With each thing tried ask, “did that fix it?”
• Advantages
– If you pick the right thing, you might be done
• Disadvantages – You may mess yourself up. You may be breaking more stuff. – You aren’t building a model of what’s happening.
• What are some classic scattershot methods?
Scattershot Approach
9
• Use what you know in order to decide what possibilities there are for what’s happening
• Perform tests in order to narrow down these possibilities • When problem is better understood, potential fixes can be tried.
• Advantages
– Works better for more complex problems – Allows you to get better at troubleshooting and better understand the
system • Disadvantages
– ?
Experimental Approach
10
• Nearly Everything in this class is either – How the system works so that we can make up possibilities on what
might be happening. – How to perform a test to narrow down possibilities – How to collect information to either provide to technical support or
help you diagnose the problem yourself
• For example, because the system can be easily divided into server configuration and local configuration issues, the one salespoint versus many salespoints test is particularly important and easy to conduct.
This is all we’re talking about
11
• You might use a lot of the same tests as a scattershot approach. • The difference is that after each test you need to ask “what have I
learned?” and to use that information to decide on the next test. • The experimental approach is about building and refining a model.
• The better you understand the operation of the system, the better you can use the experimental approach to solve problems. We’ll get into that.
More on the Experimental Approach
12
Overall Salesware Architecture
13
• Architecture – Salespoints operate with their own local data – Salespoints communicate over TCP/IP to the middleware – Middleware communicate “directly” to the database. – Microsoft SQL Server is actually a service running
somewhere.
– All of these components are necessary for normal “on line” transactions.
Overview
14
• To break a salespoint – The network might not permit communication to the
middleware – The middleware might be broken or not running – The connection from the middleware to SQL might be
broken – SQL might not be running – And many other things
Overview
15
• Questions you nearly always need to answer
– Did this work before and then stop? • You may or may not know for certain, but knowing helps a lot.
– Consistent or sporadic. Ie. does it happen every time. • Easy to test. Just try the thing again.
– One location or many/all
• Usually pretty easy, try it somewhere else.
Questions to Answer
16
• If it used to work, what changed? – The answer always seems to be nothing, but… – Something must have changed because it used to work – Possibilities (what tests do these imply?)
• New version
• Change to products / setup
• Installation of new software
• Changes in the data on the machine, in internet traffic, in RF interference, with your firewall
• Anything. Get creative. What have you seen? Please don’t leave stuff out assuming, “Well, that couldn’t affect it”
– You’re trying to put together a mental (or written) list of possibilities to test.
Did it used to work?
17
• What does it mean if the problem is sporadic? (eg. Sometimes an item isn’t in the item tree or intermittent crashing) – Something is different between your tests – Timing, conditions on computer, data, memory, etc.
• What about if it’s consistent
– Your test conditions allow it to happen every time – Likely, changing things above don’t really affect it. – Could be consistent setup issue or bug. – Could be corrupt local data
Consistent vs. Sporadic
18
• One Salespoint means… – Conditions on that one salespoint cause it not to work – Local data, ini files, operating system, hardware, etc. – If it’s one salespoint, reboot and refreshing local data are usually the
first steps to eliminate data corruption and Operating System weirdness.
• Many Salespoints means – Something common to those salespoints
• Configuration of central INI files
• printer layout
• SQL Server
• Middleware
• Operating system update, virus, etc
One Salespoint vs. Many or All
19
• Salespoints are different because of the following: – Salespoint Record
• Configured in 2 places
– In Sales: Tools menu – In SysManager: Activities > System Lists > Salespoints
– INI Settings • Can check INI settings with the Tools > Diagnostics > Sales32c.INI button • Also Ports.INI for Cashdrawer, Pole Display, Coin Dispenser settings (Tools >
Diagnostics > Ports.INI button)
– Local data – Item trees – Hardware – …
What makes salespoints different?
20
Configuration Problems
21
Back to our First Problem Demo
• Screen Looks weird… what happened? • All salespoints. • Something funny with the INI file. • Default INI is empty. • Check the default INI file. • Oops- defaults is messed up. • How to mess up all your salespoint with a single character. • What changed? Correct answer was “nothing I know of”.
22
• Each salespoint maintains its own copy of data locally. Why? – So it can sell when offline.
• This data is kept in sync with the server through Forwarding and Updating – Forwarding is sending data to the server – Updating is getting new data back from the server
• Data can be refreshed through tools menu? Why is this necessary? – This data can be recreated at any time. How? Risk? – How? Tools > Data Files > Refresh
Local Data
23
• Shut down sales • Navigate to your local ticketing data folder
– C:\ProgramData\SiriuswareDemo\SalesTKT\Data • Rename DATA to DATA-back-09212011 • Start Sales Again • Why do we rename and not delete?
Exercise – Local Data
24
• What’s the difference between deleting the local data and just using the refresh button?
• What are the risks of doing this? • How do I see if everything has been forwarded to the server?
Refreshing Local Data
25
• Index files (the ones that end .cdx) sometimes get corrupted. – How? Power loss. Sales crash.
• What is an index? • What are some behaviors that might be caused by a bad index?
• Deleting a CDX file causes the file to get reindexed.
• It’s always safe to shut down sales, delete all CDX files, and restart sales.
• Try it. Look back and see that all the CDX files are back.
Bad Index
26
Let’s mess up our indexes.
Demo
• Delete accounts.cdx • Copy the zipcode.cdx to be accounts.cdx • Restart.
• Actually pretty tricky to do.
27
• The item might not be show up due to a restriction (either Items or ItemTree): – Date available for sale – Time available for sale – Day of week – Blackout dates – Salespoint type – Insufficient Operator security level
Tech Support Scenario – An Item isn’t showing up
28
• The item might not show up due to a configuration: – Salespoint ItemTree – Available quantity (Max4Sale / Points4Sale) – Matrix item – Inactive – Hidden – Deleted – INI settings (Default, Group, Local)
• There may be an updating issue with that salespoint
Tech Support Scenario – An Item isn’t showing up
29
• How do we figure out what could be wrong? – Is it affecting all salespoints or just a single salespoint?
• If it is all salespoints, we should start investigating in SysManager
• If it is a single salespoint, we should start investigating in Sales
– Verify the correct ItemTree is attached to the salespoint. – Check the restrictions on the Items record – Check the restrictions on the ItemTree record – Check the salespoint type (configured vs. salespoint) – Refresh Items and ItemTree tables (Sales Data Group) – Try selling without ItemTree
• Set up a barcode/UPC value for the item (multikey lookup)
• Set up a key code for the item (single key lookup)
Tech Support Scenario – An Item isn’t showing up
30
• How do we figure out what could be wrong? – Using Helper to look in the local data
• Can use helper to find out if the item is in the ItemTree table
• Then do the same for the Items table
• It’s in both, what have we figured out? Could it still be an updating issue?
Tech Support Scenario – An Item isn’t showing up
31
• How do we figure out what could be wrong? – Look at the INI settings
• Go to the salespoint and use the Tools > Diagnostics > Sales32c.ini button to get:
Tech Support Scenario – An Item isn’t showing up
32
• How do we figure out what could be wrong? – Everything looks OK, what now?
• Check outside of the Siriusware system
• Check the OS settings (e.g. system date/time)
• Others?
Tech Support Scenario – An Item isn’t showing up
33
Second Problem
Demo
• Where’s my 3 Day Adult item? • What questions to ask?
34
Second Problem
Demo
• Again, why is that item not showing up?
35
• How do we figure out what could be wrong? – Using Helper to look in the local data
• Helper is useful for a lot of troubleshooting; it’s good to get to know.
• Available on the web site under downloads – 4.0 & 4.1 Utilities
Tech Support Scenario – An Item isn’t showing up
36
• Close Sales • Open windows explorer and navigate to C:\Program
Files\SiriuswareDemo\SiriusFS\Updates • Run Install_Helper_ and install to SiriuswareDemo\Helper • It should create a desktop icon and run when complete • Open your items.dbf table (in
C:\ProgramData\SiriuswareDemo\SalesTKT\Data) • In the filter expression box, type item=‘3DAY’ • Change the item nickname to ‘3DAYS’ • Restart sales and notice how you’ve messed up your own system
Exercise – Install Helper and mess up your own data
37
• Why knowing more about the system and basic troubleshooting skills are good for you and for us. – Call: I can swipe a credit card into notepad, it works fine. When I
swipe in Sales, nothing happens. One salespoint only. – Swapping mag swipe reader doesn’t fix it. – Checking payment tables in Helper and it looks good. – Result: Rebooting fixed it. – Moral?
• Could have taken much longer without the background information and initial troubleshooting done by the client.
Support Call Example
38
• General Problems – Update issues (Local write permissions, Update folder access
permissions, UAC, Network connectivity) – Hardware issues (e.g. failing HDD, RAM, etc.) – Network issue – Interfering (non-Sirius) software – Version mismatch (How to revert versions) – Registration issue (PrintEZ) (DEMO re-registration) – Sales won’t start at all – run Sales without the loader to diagnose.
• Sometimes double clicking on the .exe helps (DEMO)
– Poles and cash drawers • COM port diagnostics
• VirtualPoleDisplay .INI setting
Salespoint Issues
39
• Can be used to test out layouts in real-time
Text Layouts Test Tool
40
• It used to be (it may still be) that printing could cause crashes. • If you have a product that when finalized crashes the system, it’s a great
step to shut off printing and see if it still crashes. • If that fixes it, you know you have a printing issue to address.
• The way to fix a layout issue is to make small, incremental changes to a
layout that is known good so that you don’t have too much to sift through to figure out what caused the problem
More about printing and layouts
41
• Possibilities – You generate them pretty regularly. – You just have some that are lingering.
• If regularly generating, it really can only be because of:
– Can be caused by having data that gets out of sync – usually from a backup scenario or power outage or something
– Or, you legitimately have 2 salespoints assigning numbers for the same suffix.
Duplicates
42
• Primary Key Generation – Each salespoint has a suffix it uses to generate numbers. – You can see it in SysManager:
Tech Support Call – I have duplicates in one of my files
– ID’s are generated for lots of tables, transact, sale_hdr, guests, etc. – So, when this salespoint is created it will first generate number 1003001,
2003001, etc. – FYI: SysManager is salespoint 000.
43
• Duplicates – how do they get generated? – Salespoints know what key to use because of a file called MaxKeys. – These are reset by Tools > Diagnostics > Reset Max Keys
• Duplicates – how do they get repaired?
– We can prevent duplicates on future sales with Reset Max Keys – We can then repair using Check Duplicates > Repair (automatically
runs Reset Max Keys) – Need a code from Siriusware tech support to repair “real” duplicates
Tech Support Call – I have duplicates in one of my files
44
• Using Helper, we can alter our keys to make a duplicate. • Using Tools and Diagnostics, we can look at the dupe. • Using Repair Duplicates, we can fix it.
• What are other ways to make a duplicate? • Sharing salespoints, copying data files, corruption, altering server data. • This is where the reset max keys comes in handy on salespoints and on
the server.
Demo – Making and fixing duplicates
45
1) Shut down sales 2) Delete the previous log 3) Change the verbosity in the INI file to 5 4) Start sales 5) Recreate your problem 6) Shut down sales 7) Grab the sales32c_log.txt file and send it off 8) Tell us what you have done in that log!
– i.e. I sold two passes, the first one 2001003000 for Michael Guiness was fine. The second one 2002003000 for Joan Baez was assigned the wrong number of starting points in the points 1 field.
How to collect a log file
46
• Leave sales running at verbosity 5 and send us a 50MB file. (Sometimes this is the only way…)
• Grab a log and send it when the verbosity is at 2 (default). (There’s almost nothing in there to help.)
• Not tell us what we’re supposed to see in the log
The WRONG ways to collect a log
47
Let’s make a problem and collect a log. • Stop Sales • Navigate in explorer to C:\Program Files\SiriuswareDemo\SalesTKT • Rename siriusutil.dll to be siriusutil.dll.bak • Copy zlib.dll and rename the copy to be siriusutil.dll • Delete the sales32c_log.txt • Navigate to C:\ProgramData\SiriuswareDemo\SalesTKT • Delete sales32c_log.txt • Set the log verbosity to 5 in the sales32c.ini file • Restart Sales
48
• Say that you have a new sales32c.exe that you want to copy over the old one to test something (renegade).
• The way to make really sure that you’re copying over the old one is to try to do it while it’s in use.
• The operating system won’t let you. • Then, shut down sales, say, and try again. • If it succeeds, you can feel pretty sure you found the right file.
Trick on how to make sure you’re looking at the right file
49
• Issue – salespoints are unable to connect to SalesEZ. • Possibilities are network issues or something being broken on the pool-
salesez machine. • How do you test which?
– Ping machine from salespoint – Ask the “what changed” question. Was there an update or, god
forbid, a security patch. • Reboot the machine (this can be a test to rule out temporary issues). • Usually, this will be a level I failure and we’ll be on the phone.
Pool Manager / SalesEZ Issues
50
• If sales are too slow at the salespoint, the experimental approach should be to determine where the slowdown is occurring.
• To do that we need possibilities. – Local salespoint slowdown – Network traffic – Middleware – Sql server
• How do we start narrowing this down?
• To determine if it is Pool/SalesEZ, you can look at the number of instances of SalesEZ running and the number of calls per instance. – Three instances is the “normal” – It is expected to have non-zero values for the number of calls for each
of the instances, but the third instance should have much less and should not be growing rapidly.
Performance Problems
51
• First question is it local? – Could knock the salespoint offline and see how well it works. – Could try to isolate what sales take longer and why? – When in the sale does the slowdown occur? – Can look at the cpu on the local salespoint. It may be
underpowered. How can we test if this is the cause of the slowdown?
Performance Problems
52
• How do we determine if it’s Pool/SalesEZ? – You can look at the number of instances of SalesEZ running and the
number of calls per instance. – Three instances is the “normal” – It is expected to have non-zero values for the number of calls for each
of the instances, but the third instance should have much less and should not be growing rapidly.
– Newer pool manager shows “lag”. There may not be enough pool managers.
– CPU usage here may tell us a lot.
Performance Problems
53
• You can separate the SalesEZ load by using another pool manager.
• If salespoint are slow and you shift to another pool-salesez, what are you testing?
• If this does speed things up, what do you know? • If it doesn’t?
Performance Problems
54
• Is the performance problem on the sql server machine itself? – If an alternate pool-salesez from a fast salespoint is slow, this is
likely the issue. – Often performance is affected by client in-house queries direct to
SQL Server – Can check the cpu usage – Can go into SQL server and look for jobs causing trouble.
Performance Problems
55
• Most problems are a big job running in SQL Server • Most resorts have only 1 SQL Server. Running a long report can slow
down your entire resort • Some have a replicated database for reporting to avoid this problem • Even with a reporting database, big jobs can slow down the resort • You can find them using a SQL trace (e.g. From SQL 2005/2008/…): • But not using express…
SQL Server
56
• You can look at Activity Monitor (Current Activity in previous versions)
SQL Server
57
• Why would you ever want to do this?
• Navigate to where sales.exe is • Copy and create a shortcut, rename it to “run wo loader” • Edit the shortcut and put 1234567 afterwards.
• What does this test? • Why don’t you want to leave it like this?
Special Bonus – Running Sales w/o the loader
Thank You!
59
• Initial questions for troubleshooting – Have you rebooted? – Are you able to run and launch Sales?
• Usually the only recourse if you can’t is to reinstall.
– Are you connected to SalesEZ? – Is Pool Manager SalesEZ running? – What’s different between this salespoint and the ones that work? – Hardware? If applicable, you can swap with a machine that’s working
(magnetic swipe reader, printer, touchscreen, PINPad, etc.).
More Philosophy of Troubleshooting
60
• Pool / Sales EZ diagnostic - separate the SalesEZ load by using another pool manager. – What does this test exactly? – Very useful for access control (GateKeeper/ScanMan) – Changes need to be made to the RunExe.ini file, the Pool.ini file, and a
new shortcut must be created to run the new pool manager instance. – The Pool.ini file (usually found in c:\Program Files\Siriusware\Pool)
needs a new section: [SalesEZ-2] Name=salesez40.basic_sales ListenPort=4251 Pool=3
Performance Problems
61
• Salespoint and other end user applications (e.g. ScanMan/GateKeeper) – This is where you’re going to notice the problems in general. These
are the application that will have the most impact on your guest’s experience.
– Credit cards, printing, finalization, pass generation, access control. – We will be spending most of our time here.
• Middleware (Pool Manager & SalesEZ) – Generally very stable, but impacts whole resort when there’s a
problem • Back Office (SysManager & ReportManager) • SQL Server
– These can affect the whole resort as well. • Infrastructure – Network, power, cabling, etc.
Key Components (Where can things go wrong?)