32
1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

Embed Size (px)

Citation preview

Page 1: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

1

Root-Cause Network TroubleshootingOptimizing the Process

Tim TitusCTO, PathSolutions

Page 2: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

2

• Business disconnect• Why is troubleshooting so hard?• Troubleshooting methodology• Tool selection• Finding the root-cause• Achieving Total Network Visibility

Agenda

Page 3: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

3

• You’re responsible for the entire network• Most network engineers know less about their

network’s health and performance than their user community

You can’t managewhat you can’t measure

-- Peter Drucker

Business Disconnect

Page 4: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

4

Business Reasons• Networks are getting more complex• Less staff remains to support the network

Technical Reasons• Proper methodology is not utilized• Wrong tools are employed

Why is Troubleshooting so Hard?

Page 5: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

5

What graduates a junior levelEngineer to a senior level

Engineer is theirtroubleshooting methodology

Troubleshooting Methodologies

Page 6: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

6

“Do something to try to fix the problem”

• Reboot the device• Change the network settings• Replace hardware• Re-install the OS

Bad Methodology

Page 7: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

7

Collect information

Verify Original Problem isSolved and no new problems

exist

Create hypothesis

Test hypothesis

Implement fix

Document fix

Notify users

Undo changes

Good Methodology

Page 8: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

8

Types of Tools• Cable Testers• Packet analyzers/capture• Application Performance Monitoring (APM)• Flow collectors• SNMP Collectors

Tool Selection

Page 9: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

9

E

H

x43

x52

x51

x53

FDB

x41

x42

Results4.3db of loss

NEXT detectedCable Tester

Actual VoIP Call

You have information aboutLayer 1 on one link in the

network

Using a cable tester to solve a call quality problemCable Testers

Page 10: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

10

Good for:• Confirming physical issues on one link in the network

Bad for:• Finding physical issues on the network• Determining application usage• Finding bandwidth limitations• Finding device limitations

Cable Testers

Page 11: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

11

E

H

x43

x52

x51

x53

FDB

x41

x42

Results of VoIP CallLatency: 127ms

Jitter: 87msPacket loss: 8.2%

Packet Capture

Actual VoIP Call

You have confirmation that there is a problem,

but no idea which device or link caused the packet loss

Using a sniffer to solve a call quality problemPacket Capture

Page 12: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

12

Good for:• Confirming packet loss

(Are we missing packets?)

• Confirming packet contents issues(No QoS tagging on packets when there should be)

• Determining application-level issues(Source and destination IP and ports used for a

session)

Bad for:• Finding physical, data-link, or network issues• Finding bandwidth limitations• Finding device limitations

Packet Capture

Page 13: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

13

E

H

x43

x52

x51

x53

FDB

x41

x42

Results of SimulationLatency: 127ms

Jitter: 87msPacket loss: 8.2%

Agent

Simulated VoIP Call

You have knowledge of the experienceacross the network, but no understanding

of the source or cause of the problem.

Using APM to determine performance through the network

Agent

Application Performance Monitoring

Page 14: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

14

Good for:• Measuring user experience across the network

(Are we having problems right now?)

Bad for:• Finding physical, data-link, or network issues• Finding bandwidth limitations• Finding device limitations

Application Performance Monitoring

Page 15: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

15

E

H

x43

x52

x51

x53

FDB

x41

x42

Results of FlowSourceIP: 192.168.1.12:80

DestinationIP: 172.16.3.98:3411Packets: 251Bytes: 19,386

Flow Collector

Actual VoIP Call

You have knowledge of a transfer acrossthe network, but no recognition if therewere any problems with the transfer.

Using a flow collector to determine usage of the network

Flow Record

Flow Collectors

Page 16: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

16

Good for:• Determining communications across the network

Who is using a link?

When do they use it?

What do they use it for?

Bad for:• Finding physical, data-link, or network issues• Finding bandwidth limitations• Finding device limitations

Flow Collectors

Page 17: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

17

E

H

x43

x52

x51

x53

FDB

x41

x42

Results of CollectionWAN link is overloaded at

2:35pm SNMP Collector

Actual VoIP Call

You have data about conditions onsome parts of the network,

but no analysis of the problem orcorrelation to events

Collecting information from switches and routers to discover faultsSNMP Collectors

Page 18: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

18

Good for:• Tracking packet loss per interface/device

(Are we dropping packets on a link? why?)

• Monitoring device and link resource limitations(Are we over-utilizing a link? Is the router CPU

pegged?)

Bad for:• Determining who is using the network• Finding application layer problems

SNMP Collectors

Page 19: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

19

E

H

x43

x52

x51

x53

FDB

x41

x42

Step 1:Identify the involved endpoints and where they are connected into the network

Poor Quality VoIP Call

Finding the Root-Cause

Page 20: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

20

E

H

x43

x52

x51

x53

FDB

x41

x42

Step 2:Identify the full layer-2 path through the network from the first phone to the second phone

Finding the Root-Cause

Page 21: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

21

E

H

x43

x52

x51

x53

FDB

x41

x42

Step 3:Investigate involved switch and router health (CPU & Memory) for acceptable levels

Finding the Root-Cause

Page 22: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

22

E

H

x43

x52

x51

x53

FDB

x41

x42Step 4:

Investigate involved interfaces for:

• VLAN assignment• DiffServe/QoS tagging• Queuing configuration• 802.1p Priority settings• Duplex mismatches

• Cable faults• Half-duplex operation• Broadcast storms• Incorrect speed settings• Over-subscription

TRANSIENT PROBLEM WARNING:If the error condition is no longer occurring when this investigation is performed, you may not catch the problem

Finding the Root-Cause

Page 23: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

23

In a perfect world, you want:

• Monitoring of: Every switch, router, and link in the entire infrastructure All error counters on the interfaces QoS configuration and performance

• Continuous collection of information• Automatic layer-1, 2, and 3 mapping from any IP

endpoint to any other IP endpoint• Problems identified in plain-English for rapid

remediation

This is what PathSolutions TotalView does

Optimizing the Methodology

Page 24: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

24

Install TotalView

Result:One location is able to monitor all devices and links in the entire network for performance and errors

All Switches and Routers are queried for information

Deployment

Page 25: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

25

• Broad: All ports on all routers & switches• Continuous: Health collected every 5 minutes• Deep: 18 different error counters collected

and analyzed

• Network Prescription engine provides plain-English descriptions of errors:

Total Network Visibility®

“This interface is dropping 12% of its packets due to a cable fault”

Page 26: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

26

Establish Baseline of Network Health

7% Loss from cabling fault

12% Loss from Alignment

Errors

11% Loss from Jumbo Frame

Misconfiguration

28% Loss from Duplex

mismatch

Results Within 12 Minutes

Page 27: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

27

Repair Issues

7% Loss from cabling fault

12% Loss from Alignment

Errors

28% Loss from Duplex

mismatch

Results Within 12 Minutes

11% Loss from Jumbo Frame

Misconfiguration

Page 28: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

28

11:32am100% Transmit utilization15% Loss from discards

Latency & Jitter penalty incurred

7:56am18% Loss from

Cable Fault

12:02pm12% Loss from

Collisions

Path Analysis Report

Page 29: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

29

Demo

Page 30: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

30

Don’t turtle your network

Page 31: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

31

With it, you will always have an easy way to map out your network on any white board!

Free Network Equipment Magnet Set

www.PathSolutions.com

Page 32: 1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions

34