58
Automate Problem Detection with Operations Analytics April 7, 2016 © Copyright 2016 Vivit Worldwide

Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Automate Problem Detection with Operations Analytics

April 7, 2016

© Copyright 2016 Vivit Worldwide

Page 2: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Brought to you by

© Copyright 2016 Vivit Worldwide

Page 3: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Today’s Speakers

© Copyright 2016 Vivit Worldwide

Gabriel Martinez

Senior Product Marketing

Manager

Hewlett Packard Enterprise

Gary Brandt

Senior Product Manager

Hewlett Packard Enterprise

Naama Shwartzblat

Technical Marketing Manager

Hewlett Packard Enterprise

Page 4: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

• This “LIVE” session is being recorded

Recordings are available to all Vivit members

• Session Q&A:

Please type questions in the Questions Pane

Housekeeping

© Copyright 2016 Vivit Worldwide

Page 5: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Webinar Control Panel

Toggle View Window between

Full screen/window mode.

Questions

© Copyright 2016 Vivit Worldwide

Page 6: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Automate problem detection with Operations AnalyticsGabriel Martinez Senior Product Marketing Manager

March 2016

Page 7: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Topics

– Introductions

– Overview of Operations Analytics

– Poll Questions

– Deep Dive into Automated Detection

– Poll Questions

– How HPE IT uses Big Data in IT Operations

– Q&A

– Webinar Survey

Page 8: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Speakers

8

Gabriel Martinez

Senior Product Marketing Manager

Hewlett Packard Enterprise

Gary BrandtSenior Product ManagerHewlett Packard Enterprise

Naama ShwartzblatTechnical Marketing ManagerHewlett Packard Enterprise

Page 9: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPEOperations Analytics

Market Trends and Growth Drivers

ITOA Growth Drivers

Customer Expectations

• Outdated Rule-Based Systems

• Point Tools Limitations

• Complex Diverse Environments

• Flexible, Scalable Architecture

• Transform Data into Intelligence

• Problem Detection & Prediction

Page 10: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Poll question

10

Page 11: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

11

What do you find most challenging in

detecting IT issues?

a) Too many internal resources required

b) Overwhelming manual effort

c) Current tool doesn’t easily find issue

d) Not sure where to begin

e) Other

Page 12: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

There can be a false sense of security when all looks well

What caused your last outage?

12

Warning indicators showing total meltdown may not be true either

Usually, it is a domino effect stemming from just one problem that is affecting all systems

?!

Page 13: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

The answer lies in your dataBut how do you make sense of it?

13

siloed data sources types of data of device types

of different operating systems

data per server/day

Mobile App

Network

Cloud

System LOB data

Storage

Page 14: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

PredictiveAnalytics

MachineLearning

Relationship Score

Automated Log and Event

Analysis

AnomalyDetection& Alerting

Visual Analytics

& RCA

HPE Operations

AnalyticsStandalone, scalable platform

Results

Reduce outages

Faster resolution

Optimize resources

Increased productivity

Introducing HPE Operations Analytics

HPE

Vertica

Data Types

Mobile App

Network

Cloud

System LOB data

Storage

Page 15: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Predicting Future Behavior

Event Analytics

Behavioral LearningAnomaly Detection

Unstructured Text Indexing, Search & Inference

Machine Learning Powers HPE Operations AnalyticsDeveloped in collaboration with HPE Labs

15

Clustering

Machine Learning

Page 16: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Poll question

16

Page 17: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

17

What approach do you currently take,

related to IT issues?

a) Proactive

b) Reactive

c) Not sure

d) We don’t have IT issues

Page 18: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Overview of Operations Analytics

18

Page 19: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations AnalyticsKey features

Log and Event Analytics

Focus on relevant items for quicker resolution

Page 20: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations AnalyticsKey features

Root Cause Analysis (RCA)

Find problems with ease

Visual Analytics

Clear, intuitive dashboards

Page 21: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations AnalyticsKey features

Intelligent Search

Deep-dive into messages

Relationship Score

Connection between metrics

Page 22: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations AnalyticsKey features

Predictive Analytics

Forecast future performance

Anomaly Alerting

Real-time problem warnings

Page 23: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Old-fashioned log search is not enoughMove beyond chance-based resolution with advanced analytics

23

LogSearches

Statistics ScalableData Store

LogAnalysis

PredictiveAnalytics

AnomalyDetection

HPE

Ops Analytics

Others

Automated

Page 24: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations AnalyticsStandard use cases

24

Anomaly detection

and troubleshooting

Historical and

predictive analytics

Business insights

Big Data store and analysis

Page 25: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Poll question

25

Page 26: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

26

Do you feel your current IT process

requires better automation?

a) Yes

b) No

c) Unclear/Not Sure

Page 27: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Deep dive into automated detection

27

Page 28: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Prevent downtimes with predictive analytics

• Self-learning engine to establish baselines in real-time

• Compare past behavior to forecast behavior

• Comprehensive predictive analytics

• Analytics include insights from log files

Predict future performance trends

Establish baselines of acceptable performance

Page 29: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Detection of metric seasonality

Black line – data

Blue line – upper

and lower baseline

sleeve

weekdays weekend

business hours

off hours

1 week season was found automatically

Page 30: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Predict and trend analysis

– Generate a prediction line for multiple metrics based on past behavior and seasonality

– Identify the metric trend

Page 31: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Abnormality based and threshold based alerts

– Identify issues proactively and send early alerts before they impact your business

– Discover issues as they occur without performing an investigation

– Execute actions (email, script execution, SNMP traps and send events to OpsB) when alert is triggered

– Can be defined for a single metrics or up to 10 metrics per alert

Page 32: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Proactive management with early warning alerting

• Near real-time alerting when performance exceeds dynamic baseline

• Alert on multiple metrics

• Flexible alerting parameters

• Normal range defines the baseline sleeve width in standard deviation multiplies

• Above/below/both the baseline sleeve

• Optional - additional condition for static threshold

• Preview analysis of alert

Dynamic Baseline

Performance Metrics

Zone of Prevention

Fixed Threshold

Page 33: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Power of analyzed data hidden in log messages

Valuable information is hidden in the log message behavior as well as the log message data itself

– Analyze log file data as metrics

– Track message group statistics over time

– Track parameter distribution

– Use baseline, prediction, correlations and alerts on information coming from log files

Page 34: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Expose log & event count as metricUnderstanding the message behavior over time

– Analyze log message group count as metric

– Ability to baseline, predict and alert based on log behavior over time

– Ability to correlate log behavior against system and performance metrics as well as non IT metrics

Page 35: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

OpsA

Mobile SIM

chip activationIVR application Service bus Call Center

Business outcomes:

• Streamlined process

• Reduced call center

activity

• Improved customer

experience

• Near real-time data

insights

Using IT data for operational business insights

Operational visibility to revenue, customer experience, and cost

Metrics:

• CPF

• Number of calls

• Number of successes

• Time-outs

• Network issues

• Response time

HPE Operations Analytics

Global Telco Company

Page 36: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Fast root cause analysis

• Visualizations of multiple data sources, data types, in the same time-window context, leads to better understanding of complex problems

• “Time machine for the Data Center” guides you to faster problem resolution through play-back and play-forward of IT data

• Get to the most significant log and event messages with the advanced analytics

• Combining analytics of all data (metrics, events, topology, logs) reduces the time to identify and correct problems

Playback

Page 37: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Automated log and event analyticsThe power of pattern recognition

ClusteringFrom raw messages to clusters

Severity and keywordsDiscovering interesting clusters

Abnormal detection Baseline for logs / events

Expert SourcingLearning the importance of logs / events from SMEs

Root cause messagesVisual correlation with application performance

1,000,000s to 10000s

to 1000s

to 100s

to 10s

1s

Page 38: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

How HPE IT uses Big Data in IT Operations

Page 39: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Troubleshooting without Operations Analytics in HPE IT

– Many subject matter experts

involved in major incidents

– Manual analysis in isolation

– Manual correlation of data

– Long time to identify root causeOperation

support team

Application SME

Network SME

Security SME

Server SME

Application ecosystem

Physical or virtual server

Business

application

Network

39

Storage

Storage SME

Database SME

Page 40: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Troubleshooting with Operations Analytics in HPE IT

– All relevant data in a single

dashboard

– Data is timely and correlated

– Data easily viewed in visual

analytics

– Historical view of data instantly

available

– Faster time to identify root cause

with fewer people involved

OpsAOperation support

Application ecosystem

Physical or virtual server

Business

application

Network

Storage

8

Page 41: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE Operations Analytics in HPE-IT

Trend analysis

Predictive insights

Anomaly detection

Unknown root causeresolution

Application

(SiteScope)

Network(Network Node Mgr +iSPIs)

Cloud VMs(Operations Agents)

System Perf(Operations Agents)

Third-party tools (SCOM, Lync, Exchange)

Event Data (OMi)

Metrics

Events

Topology

LogsLog Data(ArcSight)

Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair.

It is able to take in data from all sources and utilize different data types, not just performance metrics and events, but topology data

and logs.

Big Data Store

Vertica

HPE Operations

Analytics

Visual Analytics

Automated Log Analytics

Predictive Analytics

Content Framework

Intelligent Search

Guided Troubleshooting

8 data centers

2600 apps

25K databases

56K servers

5M objects

66K network devices

Custom Data(CSV, XML)

Page 42: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE IT use cases and scenariosTroubleshooting application and database production issues

Page 43: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Troubleshooting application and database production issues

– Solving problems before service performance is affected

– Siloed teams mean working on problem

– No one has overall picture

– Highly manual efforts

Application performance issues detected by monitoring, pointing to database problem

Challenges

17

Traditional troubleshooting approach

Page 44: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Analyze millions of messages to reveal root cause with automated log analytics

Automatically reveals

time and count of most

significant log events.

Page 45: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Drill down to actual root cause log messages

Automatically reveals time

and count of most

significant log events.View the log message

content to identify root

cause

Number of occurrences

of significant log data

over time

Page 46: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Troubleshooting application and database production issues

46

– Automated log analytics

– Automatic troubleshooting guidance based on machine learning

– Automated No-search analytics

– Fast root cause identification (¼ the time)

– Fewer experts involved (5 SMEs to 1)

– Cuts order backlog by 50%

Solution

Benefit

Troubleshooting with Analytics

Page 47: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE IT 3PAR use caseOpsA increasing value to LOB

Page 48: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Hewlett Packard Enterprise 3PAR Storage line of business

HPE premier Storage business

Proactive “phone home” monitoring service available to 3PAR customers

Service enables 3PAR customers with latest capabilities and proactive protection of potential problems

HPE IT systems enable/support “phone home” services

48

Page 49: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Optimizing HPE 3PAR Operations using Big Data Analytics

Isolate problems

Reduction of file transfers late

Reduce file transfer overdue

Difficult to isolate problem

Manually interpreting behavior

Near real-time metric collection using OpsA

Define & measure big-picture view of 3PAR ecosystem

OpsA baselines defines “normal” behavior

OpsA guided troubleshooting

Quickly identify what is not ‘normal’

Faster to diagnose problems

Eliminated manual efforts of collecting and correlating data

Decreased Mean Time To Recover (MTTR)

49

Solution BenefitsBusiness Challenge

Page 50: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Baselines to define “what’s normal” in the environment

– Baselining metrics help IT define normal behavior of the application ecosystem.

– Quickly find outliers. Starting point for troubleshooting.

50

Page 51: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Analyze the ecosystem

51

Network Metrics(nfs call rate)

Application Metrics(processing rate)

Database Metrics(active session count)

Application Metrics(file queues)

File System Metrics(disk queues)

– Define services that describe the application ecosystem.

– Correlate metrics from disparate areas to identify areas of impacts.

Page 52: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Identify trends and take action before problems occurs

52

… take action before a problem occurs

Performance Metrics

Dynamic baseline

Zone of prevention

Fixed threshold

Dangerous rate of

file system growth

Dynamic baselines automatically created for all

metrics collected

Advance notification with abnormality alerts

Page 53: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

HPE IT realized value with Ops Analytics

53

“Enterprise scale” solution: multiple data centers, 50K+ servers

Automated troubleshooting / reduced manual efforts of collecting and correlating data. Faster time to identify root cause with fewer people involved.

Turn key analytics “data scientist in a box”…“No search” analytics.Gained insights to the “unknowns” in the environment.

Decreased Mean Time To Resolution (MTTR). 80% labor reduction in troubleshooting.

Page 54: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Q&A

54

Page 55: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

What’s Next

55

– Visit us online: hpe.com/software/opsanalytics

Visit us

In personOnline

Page 56: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Thank you

56

Page 57: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Discover 2016 is Hewlett Packard Enterprise’s must-attend global customer and partner event. Why attend?

• Explore how Hewlett Packard Enterprise is delivering IT solutions for the New Style of Business to help you go further, faster

• Network with 10,000+ attendees, including C-level executives, IT directors, engineers and HPE experts

• Find content for you, choosing from our broad array of technical and business sessions

• Explore the latest innovations from HPE in the Transformation Zone

• Find thousands of experts on hand to answer your questions and address your challenges

• Exchange ideas, information and best practices with other IT professionals and industry leaders

Register Now and receive your member discount with this

Vivit registration link:https://www.hpe.com/events/discoverSWVivit

Page 58: Operations Analytics Overview · Operations Analytics is an analytics platform for IT to proactively manage its operational performance and reduce mean time to repair. It is able

Thank you

• Complete the short survey and opt-in for more information from Hewlett Packard Enterprise.

www.hpe.com

www.vivit-worldwide.org

© Copyright 2016 Vivit Worldwide