27
AIOps in a cloud-native world Steven Huels, AI Center of Excellence Director

AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

AIOps in a cloud-native worldSteven Huels, AI Center of Excellence Director

Page 2: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Introductions

2

Steven HuelsAI Center of Excellence DirectorBased in Raleigh

Steven is the Director of the Red Hat AI Center of Excellence with responsibility for Red Hat’s AI strategy and how that impacts internal use cases, products, customers, and partner ecosystems.

Steven has been involved with real-time data management and analytic platforms for over 15 years and has a keen interest in large-scale data problems and using analytics and machine learning to solve those problems.

Page 3: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

3

“By 2022, 50% of IT assets in enterprise

datacenters will have the ability to runautonomously using embedded AI

functionality that leverages smart IT and facilities systems.”

Page 4: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Common Foundations

4

AI Ops: Artificial Intelligence for IT

Operations

ML Ops: Machine Learning

Operations

!=

The cultural change we saw brought about by

Dev & Ops = DevOps

Is being repeated in the combination of

AI + DevOps = AIOps

Page 5: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

AI Ops platforms are software systems that combine big data and AI or machine learning

functionality to enhance and partially replace a broad range of IT operations processes and tasks, including

availability and performance monitoring, event correlation and analysis,

IT service management, and automation.

Source: Gartner Market Guide for AIOps Platforms Published: 03 August 2017 ID: G00322184

5

RED HAT DALLAS EMERGIN

G TECH SUM

MIT - DEC 5, 2019

Page 6: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Increase Efficiency with AI

6

Assisted AI Augmented AI Autonomous AI

Num

ber o

f Sys

tem

s pe

r Ad

min

Page 7: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

7

SUPPORT EFFICIENT SEPARATION OF DUTY

I.T. OPERATIONS

DEVELOPERS

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

= distributed systems

SCALABLE DISTRIBUTED SYSTEMS

Page 8: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

8

DISTRIBUTED SYSTEMS ARE COMPLEX

I.T. OPERATIONS

DEVELOPERS

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

TELEMETRY & DIAGNOSTICS REQUIRED

Page 9: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

Data, AI and AutomationRED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

9

Alert Logs TicketsMonitoring Automation

ChatOps: Data + AI + Automation

Page 10: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Iterative Approach to Implementing Data Science

10

EXPERIMENT OPTIMIZE & TUNE OPERATIONALIZE

● Rapid iteration & ad-hoc analysis ● Focus on ideation ● Little concern for reproducibility ● Introduction of new technology

(stacks) ● Access to pseudonymized or

anonymized data

● Promising experiments promoted● Focus on performance, accuracy,

repeatability, and auditability● Concerned with data governance,

provenance, and security● Employs approved technology (stacks)● Access to production data

● Optimized routines promoted● Focus on ops (automated

deployment/roll-back, workflow, auditing of results)

● Archive old models, decisioning, and automate training and deployment

● Audited access to production data

Page 11: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Red Hat is implementing and enabling AI Ops

11

Page 12: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

AI Ops Architecture

12

OpenShift Platform Services

Hybrid Cloud Infrastructure

Infrastructure Operator

Node Operator(s)

Prometheus Operator Scheduler

... ...

Insights Operator

AIOps Operator(s)

Application

Component Operators

Application Operators

ISV Analytics / ML

Access to data based on customer opt-in

Red Hat Analytics & ML

Data SourcesApplication / Service - Local DecisionsAction Providers

Cluster-wide Data AggregationFilteringCluster-wide decisionsRepresentation

Customer-wide Data AggregationCustomer-wide DecisionsGlobal Data AggregationGlobal DecisionsAggregate representation Platform for ISV plugins

Console

Analyze aggregate dataDerive recommendationsTrain decision modelsImplement / update operators and applications

ISV SystemsData from other sources, such as CI/CD, etc.

Red Hat SystemsData from other sources, such as CI/CD, container factory, container, operator, HW certification processes, etc.

Enrich with data from other systems - e.g. internal CI/CD and certification systems.Guided by learned information from actual deployments

Event

Correlation

Cost M

ngmt

Vendor Plugin

System

C

omparison

Root C

ause A

nalysis

...Visualization ChatOps/ Assist

Page 13: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,
Page 14: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,
Page 15: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,
Page 16: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Harnessing the power of AI Ops for your needs…

16

Page 17: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Detecting Cluster Degradation

17

OCP4 Pro-Active Support

● Clustering of OCP4 deployments● Guide support to similar issues &

resolutions● Gating for releases

Artifacts● https://github.com/AICoE/prometheus-flatliner

Page 18: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Constructed Using Smaller Building Blocks

18

Prometheus Metrics Processing

● Stream processing of Prometheus data ● Predict time series● Alert on anomalies● Aggregated scoring based on metrics

Artifacts● https://github.com/AICoE/prometheus-anomaly-detector ● https://github.com/AICoE/prometheus-api-client-python/● https://github.com/AICoE/prometheus-flatliner

Page 19: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Let’s move on to something a bit more advanced…

19

Page 20: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Log Anomaly Detection

20

● There is too much data to have a human manually search

through logs

● Our internal customer wants tooling that will assist in root

cause analysis

● We think there is lots of potential to use unsupervised

machine learning in identifying anomalies

● We would like to have the ability for our users to give

feedback so we can improve our models

Motivation

https://github.com/AICoE/log-anomaly-detector

Page 21: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Overcoming Common Misconceptions

21

Error messages are a reliable indicator of issues

Timestamp Severity Message

2015-10-17 15:37:56,900 INFO dfs.DataNode$DataXceiver: Receiving block blk_5102849382819239340 src /10.0.0.1:47915 dest /10.0.0.1:50010

2015-10-17 15:37:57,036 INFO dfs.FSNamesystem: BLOCK* NameSystem.allocateBlock: /user/root/foo/_temporary/_task_200811101024_0014_m_001575_0/part-01575. blk_5102849382819239340

2015-10-17 15:37:57,634 INFO dfs.DataNode$DataXceiver: Receiving block blk_5102849382819239340 src: /10.0.0.1:57800 dest: /10.0.0.1:50010

2015-10-17 15:37:57,720 INFO dfs.DataNode$DataXceiver: writeBlock blk_5102849382819239340 received exception java.io.IOException: Could not read from stream

Page 22: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Behind the Scenes

22

Elastic Search Word2Vec Self Organizing Map

Page 23: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

RED HAT DALLAS EMERGING TECH SUMMIT - DEC 5, 2019

Validating the Results and Improving the Model

23

● Ask the user to validate whether this was this a false prediction or correct.

● With this feedback can we improve our ML model.

● We prevent our user from receiving the same false-positive notification more than once.

Page 24: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,
Page 25: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

25

Discussion Collaboration Standards

AIOps Community Success

Page 26: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

26

Red Hat Dallas Em

erging Tech Summ

it

STAY ENGAGEDDevelopers. redhat.comYour access point for no-cost developer tools and product subscriptions, how-tos, and demos

Red Hat User GroupsMeetups for networking and tech deep diveswww.meetup.com/Dallas-Red-Hat-Users-Group/

DevNationVirtual and live eventsCatch replays at https://developers.redhat.com/devnation/

Next.redhat.comStay in touch with the Office of the CTO

Page 27: AIOps in a cloud-native world...Introduction of new technology (stacks) Access to pseudonymized or anonymized data Promising experiments promoted Focus on performance, accuracy, repeatability,

CONFIDENTIAL Designator

linkedin.com/company/red-hat

youtube.com/user/RedHatVideos

facebook.com/redhatinc

twitter.com/RedHat

Red Hat is the world’s leading provider of

enterprise open source software solutions.

Award-winning support, training, and consulting

services make

Red Hat a trusted adviser to the Fortune 500.

Thank you

27