18
Cisco IT and ThousandEyes How Cisco IT Gains Visibility into Cloud Service Stability and Troubleshooting Andrea Di Lecce IT Technical Project Manager, Cisco Systems 03/2016

Cisco IT and ThousandEyes

Embed Size (px)

Citation preview

Cisco IT and ThousandEyesHow Cisco IT Gains Visibility into Cloud Service Stability and Troubleshooting

Andrea Di LecceIT Technical Project Manager, Cisco Systems03/2016

Why ThousandEyes ?

Cisco Confidential 3© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Why ThousandEyes ?Cisco’s Requirements

• Once off the Cisco network, the “Cloud” is basically a black box

• Ping and traceroute have limited capability and no ability to alert or keep historical information

• Growing importance and criticality of Cloud solutions requires acomprehensive solution !

Cisco’s Goals

• Monitor a growing suite of Cloud solutions and end-to-end network health – latency, packet loss, Web transactions, BGP reachability

• Constant monitoring and alerting from critical, strategic network locations

Cisco’s Deployment of ThousandEyes

Cisco Confidential 5© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Cisco’s Enterprise Agent Deployment• Strategic enterprise agent

placement at high-prioritysites• Call centers• iPoPs• High-priority sales

sites

• Business-critical Cloudand internal services monitored• Salesforce• WebEx• TAC tools via

Akamai

Cisco Confidential 6© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Cisco’s Cloud Agent Usage• Business-critical internal

services monitored from ThousandEyes’ Cloud agents• WebEx• TAC tools via

Akamai

• BGP reachability of our service-containing public IP address ranges from Cloud agents

Success Stories & Lessons Learned

Cisco Confidential 8© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Metrics Definition

• Mean Time to Troubleshoot (MTTT)• The time it takes from the start of the incident to when the Engineer has narrowed down the source of the issue

• Mean Time to Restore (MTTR)• The time it takes from the start of the incident to when the service is restored• For incidents with external providers, the timeline after handoff is beyond Cisco’s control

Cisco Confidential 9© 2013-2014 Cisco and/or its affiliates. All rights reserved.

ThousandEyes Success Story - Business Outcomes

1. Reduce Mean Time to Troubleshoot (MTTT) for applicable network events by 43% (measured)

2. Reduce Mean Time to Restore (MTTR) for applicable network events by 8% (measured)

NOTE: Reducing MTTT and MTTR reduces Engineering time and total outage time.

Cisco Confidential 10© 2013-2014 Cisco and/or its affiliates. All rights reserved.

ThousandEyes Success Story - WebEx

• Business case for ThousandEyes: The issue was automatically detected by the program, which also pinpointed that the packet loss was occurring internal to the Cisco network. This allowed Engineers to concentrate their troubleshooting efforts on the device in question, and resolve the issue quickly.

• Troubleshooting: It was determined via ThousandEyes that there was packet loss into the WebEx service from an internal network device.• Resolution: Within 90 minutes, the Engineers resolved the case, because they knew where the packet loss was occurring.

Cisco Confidential 11© 2013-2014 Cisco and/or its affiliates. All rights reserved.

ThousandEyes Success Story - Salesforce

• Business case for ThousandEyes: The issue was automatically detected by the program, which also pinpointed that the packet loss was occurring external to the Cisco network. This allowed Engineers to perform only the basic internal network checks, and then hand off to Salesforce to fix their network !

• Troubleshooting: It was determined via ThousandEyes that there was packet loss into salesforce.com from two India sites.• Resolution: Within one hour, the P2 case was handed to Salesforce.com for investigation.• The issue was isolated to a saturated Level3 ISP link in Salesforce's network and resolved.

Cisco Confidential 12© 2013-2014 Cisco and/or its affiliates. All rights reserved.

ThousandEyes Success Story – India Firewall

• Business case for ThousandEyes: The issue was automatically detected by ThousandEyes, which also pinpointed that the packet loss was occurring on a specific device within the Cisco network. This allowed Engineers to resolve the issue quickly by failing over to the redundant gateway.

• Troubleshooting: Alert from ThousandEyes detected packet loss on our India corporate gateway, which affected all India sites.• Resolution: Within one hour, the P1 case was identified with root cause, which was 100% CPU utilization on the corporate gateway device. • The issue was resolved by failing over to the backup corporate gateway.

Cisco Confidential 13© 2013-2014 Cisco and/or its affiliates. All rights reserved.

ThousandEyes Success Story – India Support Apps

• Business case for ThousandEyes: The issue was proactively detected by ThousandEyes, and troubleshooting with ThousandEyes pinpointed the root cause. Since the issue was intermittent, ThousandEyes prevented multiple subsequent P2 outages.

• Troubleshooting: Alert from ThousandEyes detected packet loss from a TAC site to the Cisco TAC apps portal (served by Akamai).• Resolution: ThousandEyes indicated a problematic link between Bharti and Akamai. This prompted Akamai to remove the server from rotation, which

immediately restored services.

Cisco Confidential 14© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Lessons Learned

1. Email alert volume was extremely high due to transient network issues• Solution: Configure alerts only if event occurs more than

2x in a row

1. VirtualBox VM did not auto-start after software update and reload of Mac Mini box• Solution: Procedure provided by ThousandEyes

Looking Toward the Future

Cisco Confidential 16© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Looking Toward the Future

• ThousandEyes has certified their application to run on the Cisco 4451 ISR Service Container• Cisco Design is currently testing this for deployment on our network !• Our goal is to install ThousandEyes on the service containers on our existing WAN routers

No more Mac Minis !

Cisco Confidential 17© 2013-2014 Cisco and/or its affiliates. All rights reserved.

Looking Toward the Future

• ThousandEyes is being integrated with our Network Operations standard alerting system

• ThousandEyes is working on certifying its application on the Cisco ASR 1000 Service Container

Thank you.