18
Operational Analytics “Data is the New Soil” –David McCandless October 16th 2015 Darrell Westbury Director of Operational Analytics

Operational Analytics at Credit Suisse from ThousandEyes Connect

Embed Size (px)

Citation preview

Page 1: Operational Analytics at Credit Suisse from ThousandEyes Connect

Operational Analytics “Data is the New Soil” –David McCandless

October 16th 2015

Darrell Westbury Director of Operational Analytics

Page 2: Operational Analytics at Credit Suisse from ThousandEyes Connect

About me….

October 16th 2015 2

§  I’m Darrell Westbury

§  I work in Global Technology Services for Credit Suisse

§ Credit Suisse is a fortune 200, Swiss-based Investment Bank, with about ~46,000 employees world wide

§  I’ve worked at CS for ~7 years

§  I’ve held various roles ranging from: • Head of Storage Ops for the Americas • Head of Capacity and Inventory Services • Director of Operational Analytics (current)

Page 3: Operational Analytics at Credit Suisse from ThousandEyes Connect

“Data is the new Soil” - David McCandless

October 16th 2015 3

Page 4: Operational Analytics at Credit Suisse from ThousandEyes Connect

What is Operational Analytics?

“Operations Analytics (OA) is an approach or method of applying big data principles and data analytics to the IT Operations realm”

“OA centers on discovering trends and patterns in high volume, complex and noisy IT systems data and making predictions that will help avoid impact to key services where possible and ‘recover quickly *’ when issues do occur” (* reduced MTTR)

October 16th 2015 4

Page 5: Operational Analytics at Credit Suisse from ThousandEyes Connect

October 16th 2015 5

Machine Data

Wire Data

Agent Data

Synthetic Transactions

Human Maintained

System and Application logs, System Events, Performance and Capacity Metrics…

Simulated views of a customer’s experience while interacting with a service

Asset Inventories, Lifecycle Status, Data classes, Apps Names and the people who manage them, etc.

Intercepted system calls and Application Method Invocations

Network Packet Captures that have been pre-decoded for ease of use

What Types of Data Does OA Target?

Page 6: Operational Analytics at Credit Suisse from ThousandEyes Connect

So, What Does OA Actually Do? Phase I: Data Onboarding

October 16th 2015 6

§  Identify Golden Data Sources §  Extract and Transform (sometimes) §  Load & Ingest

§  Manage Data Quality §  Reference Data, Maturity Scales §  Accountable Data Owners

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Data Onboarding

EFFO

RT

§  Track Progress with Score cards §  Remove any Blockers §  Update Data Documentation

Page 7: Operational Analytics at Credit Suisse from ThousandEyes Connect

So, What Does OA Actually Do? Phase II: Data Science

October 16th 2015 7

§  Identify trends & seasonal patterns in data §  look for baselines & outliers using statistics and linear algebra techniques

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Data Onboarding Data Science

EFFO

RT

Page 8: Operational Analytics at Credit Suisse from ThousandEyes Connect

October 16th 2015 8

So, What Does OA Actually Do? Phase III: Data Visualizations

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Data Onboarding Data Science Data Visualization

EFFO

RT

Page 9: Operational Analytics at Credit Suisse from ThousandEyes Connect

How are we using ThousandEyes?

October 16th 2015 9

1 Basic DNS Testing Alert on internet domain name resolution failures

2 Port Listener Health Tickle a TCP port over the internet to ensure it’s listening and responding

3 Data Path Testing Observe the end-to-end path through the internet to a target service and monitor for route changes, packet loss, latency & jitter

4 Page Load Testing Ensure internet facing web sites and services are responding correctly and consistently

5 Synthetic Transactions Test authentication and site navigation; collect performance stats at the page object level

Page 10: Operational Analytics at Credit Suisse from ThousandEyes Connect

Detecting Internet Client Access Issues

 Thousand Eyes probes began reporting a 50-100% drop in authentication responses from a public facing Web Service

 Evidence of issue was provided to Web Operations, who were unaware , as all of their monitoring tools depicted what they believed to be a healthy and stable infrastructure (CPU, Memory, Disk & Network I/O, Capacity, Performance, etc.)

October 16th 2015 10

Page 11: Operational Analytics at Credit Suisse from ThousandEyes Connect

Evidence of Improved Performance

 Implemented a synthetic transaction to compare the relative End User Experience of using the infrastructure with and without acceleration.

 Collected concrete evidence of a ~33% service time / latency performance improvement when accessing an accelerated URL

 Also able to demonstrate relative smoothing of service time inconsistency

October 16th 2015 11

Page 12: Operational Analytics at Credit Suisse from ThousandEyes Connect

Insight into a Production Incident Pre-Incident, Paths Look OK

October 16th 2015 12

10:45 AM 11:00 AM

Page 13: Operational Analytics at Credit Suisse from ThousandEyes Connect

Insight into a Production Incident Alert Received - BGP Route Disruption

October 16th 2015 13

10:45 AM 11:00 AM

Page 14: Operational Analytics at Credit Suisse from ThousandEyes Connect

Insight into a Production Incident Internet Paths are Failing

October 16th 2015 14

10:45 AM 11:00 AM

Page 15: Operational Analytics at Credit Suisse from ThousandEyes Connect

Insight into a Production Incident No BGP Routes – SPoF Detected

October 16th 2015 15

10:45 AM 11:00 AM

Page 16: Operational Analytics at Credit Suisse from ThousandEyes Connect

Insight into a Production Incident Path Failover to DR Site Successful

October 16th 2015 16

10:45 AM 11:00 AM

Page 17: Operational Analytics at Credit Suisse from ThousandEyes Connect

Some Final Thoughts…

October 16th 2015 17

§ ThousandEyes lets us run multiple types of tests at various levels of sophistication and granularity from all over the world (DNS Test, TCP Port Tickle, Internet Path Test, Page Loads, Full Synthetic Transactions)

§ We see exactly how our clients are experiencing our services (Internet Path, BGP Route health, packet loss, jitter & latency)

§ We receive alerts on any significant variations from our established baselines (Paths, Routes, Performance, Service Quality, etc.)

§ We’re leveraging real quantifiable data to take the guesswork and subjectivity off the table

§ We’re empowering important business decisions

Page 18: Operational Analytics at Credit Suisse from ThousandEyes Connect

Thank You

October 16th 2015

Darrell Westbury Director of Operational Analytics