Upload
rogan-hamby
View
169
Download
6
Tags:
Embed Size (px)
Citation preview
Is Evergreen Slowing Down – Basic Network TroubleshootingRogan Hamby, June 13th 2013
Is Evergreen slowing down?
There could be several culprits.
Staff Client Issues
There are known memory leaks in the staff client. These are being
actively addressed by the community.
If this is happening it probably isn’t happening the same at all stations.
Reboot the troubled station.
Network Issues
From your local switch having fits to a router in Tennessee dying to
someone in Atlanta doing a thirteen terabit backup we are at the mercy
of the pipes inbetween.
Usually these problems will grow slowly. All machines will be affected but it may not seem like that at first as some activities are more prone to
interruption.
Staff facing patrons and those functions moving large data frames (e.g. cataloging) will usually notice
first because lost packets and latency have the greatest
perceivable impact.
Now it’s important to look at your network path. There are many
common elements in the paths from SCLENDS member libraries to the
hosting facility but no universal ones except the last few.
If you use ICMP or UDP based tools be aware of the false positives they
can give since they are often blocked.
I recommend that you use TCP based trace routes.
Windows – Pingplotter Pro
http://www.pingplotter.com/pro/
Linux – traceroute -T
Mac – Path Analyzer ProUses protocol paths, not just hops.
http://www.pathanalyzer.com/
If the issue is on your local LAN or anywhere in SC and ongoing you need to either address the issue
internally or with the State level e-rate board.
If the issue is outside SC we can look at trying to appeal for a remedy or some kind of routing but we can’t
guarantee results.
If the issue is at the hosting facility we can fix the issues immediately.
Standard Traceroute
TCP Based Path
So… what if everything so far looks clear?
It’s a SERVER(s)!
Our Setup
Load Balancer
App Servers
Production
Replication and Reporting
Database Servers
How can I tell which has just gone to meet Werner Jacob?
(warning: broad simplifications ahead)
If it’s the DB servers then everything goes to heck starting with database
retrieval and the errors will say ‘SQL’ in them somewhere usually. But it’s
quick!
If it’s only the replication one then only reports will be affected
including notices.
App bricks – its very rare for all four app bricks to fail at once so usually some machines will do fine while others have issues or it appears
random.
Example: When catalogers have template issues, they may have lost them on one brick but not others.
When a brick crashes you will usually get errors referencing
various PM files (perl modules) or specific scripts.
When it’s the load balancer – everything slows down painfully and everything goes to heck. Eventually stations will time out and errors will
reflect that.
Don’t jump to conclusions but these examples should give you some
insight into the kinds of things to look for.
Copy errors. Observe and report. Communicate on listserv. IRC
channel is also available specific to SCLENDS. Call Rogan in an
emergency (he’s not always at his desk).