Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Conor BeverlandWavefront SE Lead, VMware
@conor_bev @WavefrontHQ
MMC3164BU
#VMworld MMC3164BU
How Data Science is Transforming Ops:The Wavefront Story
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
#MMC3164BU CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
3CONFIDENTIAL
1 Introduction to Wavefront by VMware
2 Smart Alerts = Metrics Anomalies
3 How to think about Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Consistent InfrastructureVM Infrastructure • Container Infrastructure
Consistent OperationsManagement and Operations • Across Clouds
VMware Cloud Infrastructure Public Cloud IaaS
VISIBILITY OPERATIONS AUTOMATION SECURITY GOVERNANCE
Cloud Management
VMware Cloud Services
Cloud Native AppsTime to market • Innovation • Scale • Differentiation
Existing AppsReduce Costs • Security • Reliability • Control
CONTAINERSVIRTUAL MACHINES
VMware CloudRun, Manage, Connect, Secure Any App on Any Cloud to Any Device
VMware Cloud on AWSfor VMware
VMworld 2017 Content: Not fo
r publication or distri
bution
VMware Cloud ServicesManage, Govern and Secure Public and Private Cloud Apps
5
Discovery
Cost Insight
NSX Cloud
Network Insight
AppDefense
Wavefront
ON PREMISES DATA CENTER
Visibility into apps and resources they consume. Analyze usage and utilization across clouds.
Accounting and cost optimization for multiple clouds. Track and analyze your costs and trends.
Secure networks with micro-segmentationCreate private networks within or across clouds.
Operational visibility, control, and compliance across clouds. Optimize performance, health, and availability.
Metrics-driven monitoring and real-time analytics.
Governance for running workloads.VMworld 2017 Content: Not fo
r publication or distri
bution
1. Introduction to Wavefront by VMware
VMworld 2017 Content: Not fo
r publication or distri
bution
Wavefront is a hosted platform for ingesting, storing, visualizing and alerting on metrics. It is based on a
stream processing approach invented at Google which allows engineers to manipulate time series data with unparalleled power.
VMworld 2017 Content: Not fo
r publication or distri
bution
Q: What are Metrics?
A: Metrics are any ongoing numerical measurement of a given system or activity.
Q: Why Metrics?
A: Easier to parse than log data, Directly Computable, Less Resource Intensive, Lower
latency for querying, and much more…VMworld 2017 Content: N
ot for publicatio
n or distribution
9
Anatomy of a Metric (we often refer to these as “points”)
60.255.10.159 - - [15/Feb/2017:00:17:44 +0000] "POST /sp-admin/admin-ajax.php HTTP/1.0" 200 46
"https://www.mydummysite.com/wp-admin/post.php?post=8981&action=edit" "Mozilla/5.0
(Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/56.0.2924.87 Safari/537.16"
Sample log
Sample metric
MetricData
Value
Timestamp
Source
Name
Point Tags
(optional)
telegraf.apache.response.200.count 25398 1487117864 source=Srl.27-par port=80 endPoint=admin
VMworld 2017 Content: Not fo
r publication or distri
bution
10
Introducing Wavefront By VMware SaaS-Based Metrics Monitoring and Analytics Platform
CONFIDENTIAL
UI and API Backend
Advanced Analytics Engine
Metrics Collection and Storage
Iterate&TroubleshootIssues
Trend & Alert on Anomalies
Visualize Metrics at Scale
Self-Service Metrics Analytics for All
Engineering & Business
VMworld 2017 Content: Not fo
r publication or distri
bution
Four Ways of Sending Metrics into Wavefront
11
1. Wavefront Agents
Wavefront Proxy
2. Metrics Library/Application
Code
4. Logs via TCP or via Filebeat
3. Directly from AWS(APIs, CloudWatch, CloudTrail)
VMworld 2017 Content: Not fo
r publication or distri
bution
12
Easy Integration for Getting Data In and OutComplete API for Extensibility
Web & Proxy
Containers
Cloud OS/Hypervisor
DevOps/ChatOps Tools
Applications & Message Queues
Databases Storage
“The beauty of using
Wavefront, as opposed to
point or silo’d tools, is that
my metrics and alerting all
stay in the same place. I
only need one maintenance
window, one Slack
webhook, or one PagerDuty
configuration.”
VMworld 2017 Content: Not fo
r publication or distri
bution
THE WAVEFRONT DIFFERENCE
Cloud Native with unrivalled Scale &
Performance.
Never Roll-up or Lose Data.
IntelligentAlerting.
Real-time IterativeInvestigation.
Answer Smarter Questions.
Full Stack Correlation.Business Understanding.
VMworld 2017 Content: Not fo
r publication or distri
bution
2. Smart Alerts = Metrics Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
SMART ALERTS = METRICS ANOMALIES
Not-so-smart alert
• Simple thresholds or up/down
• Univariate data only
• All-or-nothing severity
Problems
• Too much noise (false positives + false negatives)
• Can’t use relationships of data
• Too many people involved
Smart alert
• More expressive language to detect anomalies
• Express data relationships, not just individual behavior
• Severity hierarchy with targeted escalation; self-serve
VMworld 2017 Content: Not fo
r publication or distri
bution
Severity Hierarchies
Ops design pattern: Scale out to devs, not scaling up ops team
Informational
• Business hours
• Developers only (email)
Warn
• 24/7
• Ops (email) + dev (email)
Severe
• 24/7
• Ops (paging) + dev (email)
VMworld 2017 Content: Not fo
r publication or distri
bution
What Is an Anomaly?
Oxford English Dictionary, 2016VMworld 2017 Content: N
ot for publicatio
n or distribution
Why Do We Care?
Anomalies (sometimes) tell us when to act.
Planned work Unplanned work (post-anomaly)
VMworld 2017 Content: Not fo
r publication or distri
bution
Why Metrics Anomalies?
• An occurrence
• Timestamped
• Arbitrary annotations
• Duplicates valid
• List metaphor
• Weakly connected
20160821/12:37:01 Login [[email protected]]
20160821/14:12:45 Login [[email protected]]
20160821/14:12:45 Login [[email protected]]
20160821/15:11:28 Login [[email protected]]
• A (numerical) measurement
• Timestamped
• Arbitrary annotations
• Duplicates/conflicts invalid
• Function metaphor
– Strongly connected
MetricsEvents (logs)
VMworld 2017 Content: Not fo
r publication or distri
bution
• A (numerical) measurement
• Timestamped
• Arbitrary annotations
• Duplicates/conflicts invalid
• Function metaphor
– Strongly connected
Metrics
cpu.loadavg.1m host=app-1
[0.42,0.37,0.41,0.41,0.5,…]
1. Function machinery
2. Columnize/Vectorize for speed/cost
3. Visual interpretations @ scale
Time
Valu
e
Events (logs)
Why Metrics Anomalies?
• An occurrence
• Timestamped
• Arbitrary annotations
• Duplicates valid
• List metaphor
– Weakly connected
VMworld 2017 Content: Not fo
r publication or distri
bution
How to Think About Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Defining Anomalies
Anomaly = NormalC
VMworld 2017 Content: Not fo
r publication or distri
bution
10 examples (with a shared theme)
Anomalies are in the eye of the beholder.
(Not just about the data,
but about who is using the data)
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
1. A range
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
2. A windowed-deviation
${a} - mmedian(4h,${a}) /
(mpercentile(4h,75,${a}) - mpercentile(4h,25,${a}))
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
3. A seasonal ratio
${a} / lag(1w, ${a})
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
4. A constant rate
rate(${a})
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Serial Pipeline
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
5. Group range
max(${a}) - min(${a})
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Parallel Balancer
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
6. A group variance
sqrt(variance(${a}))
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
A different team says no anomaly here….why?
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
7. A group sum
sum(${a})
Anomalies are in the eye of the beholder.
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
${cpu}
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
${cpu}, ${requests}
CPU
REQ
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
8. A ratio of two series
(correlation)
${cpu} / ${requests}
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
9. Non-correlation
mcorr(1d, ${a}, ${b})
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
10. Consistent frequency
mcount(20m, ${a})
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
default(0, 1m, 100*
(sum(rate(ts(httpstatus.api.daemon._sshDaemonId_.pushdata._workUnitId_.POST.406.count,
tag=prod and (tag="*-primary" or tag="*-secondary") and not tag=customer)), hosttags)) /
((sum(rate(ts(httpstatus.api.daemon._sshDaemonId_.pushdata._workUnitId_.POST.406.count, tag="*-
primary" or tag="*-secondary")), hosttags)) +
(sum(rate(ts(httpstatus.api.daemon._sshDaemonId_.pushdata._workUnitId_.POST.2*.count, tag="*-
primary" or tag="*-secondary")), hosttags)))) > 10
mavg(10m, avg(last(4w, ts(custom.globalAllowedRate, tag="*-primary" or tag="*-secondary")), hosttags)
as globalAllowedRate *5/6 - sum(rate(ts(dataingester.report-points, (tag=retired or tag=prod) and
(tag="*-primary" or tag="*-secondary" or tag="*-tertiary"))), hosttags) + 0 *
sum(rate(ts(dataingester.report-points, tag="*-primary" or tag="*-secondary" or tag="*-tertiary")),
hosttags)) - $globalAllowedRate *5/6 * 0.02 < 0
11. Any boolean combination or
functional composition
Examples of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
1. A range
2. A windowed deviation
3. A seasonal ratio
4. A rate
5. A group range
6. A group variance
7. A group sum
8. A correlation (of two series)
9. A non-correlation (of two series)
10. A consistent frequency
11. * Any boolean combination / functional composition of the above
Types of Anomalies
VMworld 2017 Content: Not fo
r publication or distri
bution
2. A windowed deviation
3. A seasonal ratio
4. A rate
5. A group range
6. A group variance
7. A group sum
8. A correlation (of two series)
9. A non-correlation (of two series)
10. A consistent frequency
11. Any boolean combination or functional composition of these
1. A Range
Alert
Behavioral differences
Value differences
Anomaly Pipeline
VMworld 2017 Content: Not fo
r publication or distri
bution
Positional outliers
Value Differences
VMworld 2017 Content: Not fo
r publication or distri
bution
Which series is reporting least frequently?
Visual Behavior
VMworld 2017 Content: Not fo
r publication or distri
bution
Higher Level Anomalies
Detection Diagnose (Localize)
a. Scatter (visual)
b. Correlation
VMworld 2017 Content: Not fo
r publication or distri
bution
Free trial at: www.wavefront.com
VMworld 2017 Content: Not fo
r publication or distri
bution
54
MMC1464QU How to Use Cloud Formations in vRealize Automation to Build Hybrid Applications That Span and Reside On-Premises & on VMware Cloud on AWS and AWS Cloud Quick Talk Vijay Raghavan, Manu Prasanna
MMC1532BU Using VMware NSX for Enhanced Networking and Security for AWS Native Workloads: Part 2 Breakout Session Amol Tipnis, Percy Wadia
MMC2046BU Using VMware NSX for Enhanced Networking and Security for AWS Native Workloads: Part 1 Breakout Session Amol Tipnis, Percy Wadia
MMC2210BU Best Practices: How the City of New York Has Configured AWS for the Best vRealize Automation Integration Breakout Session Stefan Andrieux
MMC2256BU Watching the Clouds: Challenges with Monitoring Hybrid Cloud Environments Breakout Session Craig Lee, John Dias
MMC2455BU On-Demand Disaster Recovery for Enterprise Applications with the VMware Cloud on AWS Breakout Session GS Khalsa, Mohan Potheri, Potheri Mohan
MMC2623BU Integrated Multicloud Management for Automating Standardized Security and Governance in Federal Agencies Breakout Session Kris Ostergard, Sean VanDruff, Douglas Bourgeois
MMC2820BU Deploying Applications into AWS EC2 with VMware Cross-Cloud Services Breakout Session Bahubali Shetti, Bill shetti
MMC2877BU Deep Dive into Cost Insight: Understand, Analyze, and Optimize Your Cloud Expenses (Cross-Cloud Service) Breakout Session Kumar Gaurav, Kameswaran Subramanian
MMC2884GU Manage Cross-Cloud Applications Using vRealize Operations Insight Group Discussion Karl Fultz, Manish Bhaskar
MMC2888GU How We’ve Accelerated Innovation While Keeping Our Cloud Spending in Check Group Discussion Burt Toma
MMC3062BU How Customer XYZ Secures and Monitors On-Premises Software-Defined Data Center Virtual and Physical Networks Using Network Insight SaaS Breakout Session Sean O'Dell, Manish Bhaskar
MMC3066BU How Do You Use Network Insights' SaaS to Secure Multitier Hybrid Apps Running on vSphere, VMware Cloud on AWS, and AWS Native? Breakout Session Sean O'Dell, Anuj Jaiswal
MMC3074BU 3 ways to use VMware’s new Cross-Cloud SaaS Services to efficiently run workloads across AWS, Azure and vSphere: VMware and Customer technical session Breakout Session Jason Walker, Burt Toma
MMC3110PU How IT Can Enable Development Teams to Build Apps on AWS, Azure, and VMware Without Compromising on Costs and Security Panel Discussion Mark Leake, Ben Mitchell
MMC3112BU Customer Story: Monitoring Costs and Rightsizing Workloads in AWS, Azure, and VMware-Based Clouds Breakout Session Nikhil Girdhar
MMC3164BU How Data Science is Transforming Operations: The Wavefront Story Breakout Session Conor Beverland
MMC3165BU Becoming a DevOps Superhero: Introduction to Wavefront for Optimizing Cloud-Native Applications. Tuesday 12:30pm – 1:30pm Breakout Session Stela Udovicic, Demetri Mouratis
MMC3321BUS Move, Manage, Use: The New Hybrid IT Breakout Session Donald Foster, Don Foster, Deepak Verma
MMC3406BUS Cloudy Days Ahead!! Leverage F5 to provide application continuity and consistent security policy provisioning and enforcement in an intercloud world. Breakout Session Kent Munson
MMC3424SU VMware Cloud Services and how you can leverage SaaS for your vSphere data center or the public cloud. Spotlight Session Guido Appenzeller
Sessions, Booth and Theatre Presentations for VMware Cloud Services
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution