11
Logging, Metrics and Monitoring as a Service Lakshminarayanan (LN) Renganarayana Architect, Cloud Platform Engineering 1 Nov 4, 2014 LMM, Cloud Platform Engineering

A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Embed Size (px)

Citation preview

Page 1: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Logging, Metrics and Monitoring as a Service

Lakshminarayanan (LN) RenganarayanaArchitect, Cloud Platform Engineering

1Nov 4, 2014LMM, Cloud Platform Engineering

Page 2: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Nov 4, 2014LMM, Cloud Platform Engineering2

The Symantec Team

• Cloud Platform Engineering– We are building a consolidated cloud platform that provides infrastructure

and platform services for next generation Symantec products and services

– starting small, but scaling to tens of thousands of nodes across multiple data centers

• Me– Architect / Lead for LMM in Cloud Platform Engineering

– Love to build and break systems

– Previous life: Research Staff Member, IBM Research

– Automatic parallelization, Compilers, Deployment Automation (Heat)

Page 3: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Context for LMM: Symantec Cloud Platform

LMM, Cloud Platform Engineering3

Nov 4, 2014

Bare Metal

IaaS (OpenStack)

Platform ServicesBP, SP, KV, OBS

Symantec Services & Apps

LMM

Identity Manager

CI / CD

Common Services

Page 4: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

LMM : Goals we started with …

• Design for both Developers and Ops

• Multi Tenant, Elastic Scalability

• One system for logs, metrics, monitoring, & other events

• Designed for under and over the cloud

• Build using open source tools and for open sourcing

Nov 4, 2014LMM, Cloud Platform Engineering4

Page 5: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Guiding principle

Nov 4, 2014LMM, Cloud Platform Engineering5

Gen with 1 line: logger.warn(“WARN KVaaS too many failed writes”)

Query with all the power of ElasticSearch and visualize with Kibana

Gen with 1 line: statsd_client.incr(‘myapp_num_requests’)

Query with all the power of InfluxDB and visualize with Grafana

Make it extremely simple to generate logs & metrics

Provide powerful query and visualizations for free

Page 6: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

LMM Client Interface

Nov 4, 2014LMM, Cloud Platform Engineering6

App

Platform

Services

Writes app

metrics directly

Infrastructure

SN

MP

Vars

expose

metr

ics

JM

X

Pull

Metrics

Push

Metrics

LMM

Shipper

VM

logs

LMM

metrics log events

Ale

rts &

Co

nfig

UI

Push: StatsD, metrics extension for openstack

Pull: CollectD. Shipper: logstash, lumberjack

Page 7: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Security and Multi-tenancy

• User authentication with Keystone for Query API & UI

• Tenant id and API key used for events sent to LMM– Tenant ids from Keystone and API keys generated by LMM

• Every event is tagged with a tenant id– Log events: tenant id as a field

– Metric events: tenant id prefixed to the metric name

• Keystone group membership used for sophisticated cross-tenant event visibility / correlation

Nov 4, 2014LMM, Cloud Platform Engineering7

Page 8: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Nov 4, 2014LMM, Cloud Platform Engineering8

Kafka cluster

knode1

Keystone

knode2 knode3 knodeN...

log, metric, alert events

Storm cluster

Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies

Elastic

SearchElastic

SearchRedis

Alerts email &

callbacks

Load Balancer

Client App / Service

s1 s2 s3 s4 ... sn

log & metrics shipper

log, metric & alert events

InfluxDBInfluxDB

InfluxDB

Metr

ics S

tore

Elastic

SearchElastic

SearchElastic

SearchLog S

tore

Authentication, Validation, Alerts Processing

LMM

Un

der

th

e H

oo

d

Quota

&

Policy

Page 9: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Current deployment

• Multiple deployments : on bare KVM nodes, on OpenStack VMs

– On KVM nodes: 40+ VMs, 80+ TB storage, many large memory nodes

– All components are deployed in clustered mode for HA

– Some with active/active replication, some with active/passive

• Use by Platform and Infrastructure Services

– Hundreds of thousands of events per second

– Few TBs of data collected and indexed per day

– Queries are currently coming from Kibana and Grafana, in future from APIs

Nov 4, 2014LMM, Cloud Platform Engineering9

Page 10: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

In the Next Iteration …

• Support Health checks

– Exploring Sensu and related technologies

• Multi data center deployment

• Refactor to a data pipeline architecture

• Deployment automation using Murano

• Open Source LMM

Nov 4, 2014LMM, Cloud Platform Engineering10

Page 11: A Scalable Multi-tenant Logging, Metrics and Monitoring as Service for OpenStack and Cloud Applications

Thank you!

Questions, Comments, Suggestions?

We are interested in open sourcing LMMand are looking for collaborators …

Interested? Feel free to contact [email protected]

Nov 4, 2014LMM, Cloud Platform Engineering 11