View
1.129
Download
6
Embed Size (px)
Citation preview
Logging, Metrics and Monitoring as a Service
Lakshminarayanan (LN) RenganarayanaArchitect, Cloud Platform Engineering
1Nov 4, 2014LMM, Cloud Platform Engineering
Nov 4, 2014LMM, Cloud Platform Engineering2
The Symantec Team
• Cloud Platform Engineering– We are building a consolidated cloud platform that provides infrastructure
and platform services for next generation Symantec products and services
– starting small, but scaling to tens of thousands of nodes across multiple data centers
• Me– Architect / Lead for LMM in Cloud Platform Engineering
– Love to build and break systems
– Previous life: Research Staff Member, IBM Research
– Automatic parallelization, Compilers, Deployment Automation (Heat)
Context for LMM: Symantec Cloud Platform
LMM, Cloud Platform Engineering3
Nov 4, 2014
Bare Metal
IaaS (OpenStack)
Platform ServicesBP, SP, KV, OBS
Symantec Services & Apps
LMM
Identity Manager
CI / CD
Common Services
LMM : Goals we started with …
• Design for both Developers and Ops
• Multi Tenant, Elastic Scalability
• One system for logs, metrics, monitoring, & other events
• Designed for under and over the cloud
• Build using open source tools and for open sourcing
Nov 4, 2014LMM, Cloud Platform Engineering4
Guiding principle
Nov 4, 2014LMM, Cloud Platform Engineering5
Gen with 1 line: logger.warn(“WARN KVaaS too many failed writes”)
Query with all the power of ElasticSearch and visualize with Kibana
Gen with 1 line: statsd_client.incr(‘myapp_num_requests’)
Query with all the power of InfluxDB and visualize with Grafana
Make it extremely simple to generate logs & metrics
Provide powerful query and visualizations for free
LMM Client Interface
Nov 4, 2014LMM, Cloud Platform Engineering6
App
Platform
Services
Writes app
metrics directly
Infrastructure
SN
MP
Vars
expose
metr
ics
JM
X
Pull
Metrics
Push
Metrics
LMM
Shipper
VM
logs
LMM
metrics log events
Ale
rts &
Co
nfig
UI
Push: StatsD, metrics extension for openstack
Pull: CollectD. Shipper: logstash, lumberjack
Security and Multi-tenancy
• User authentication with Keystone for Query API & UI
• Tenant id and API key used for events sent to LMM– Tenant ids from Keystone and API keys generated by LMM
• Every event is tagged with a tenant id– Log events: tenant id as a field
– Metric events: tenant id prefixed to the metric name
• Keystone group membership used for sophisticated cross-tenant event visibility / correlation
Nov 4, 2014LMM, Cloud Platform Engineering7
Nov 4, 2014LMM, Cloud Platform Engineering8
Kafka cluster
knode1
Keystone
knode2 knode3 knodeN...
log, metric, alert events
Storm cluster
Front End Cluster: Multi-tenancy and Kibana, Graphana Proxies
Elastic
SearchElastic
SearchRedis
Alerts email &
callbacks
Load Balancer
Client App / Service
s1 s2 s3 s4 ... sn
log & metrics shipper
log, metric & alert events
InfluxDBInfluxDB
InfluxDB
Metr
ics S
tore
Elastic
SearchElastic
SearchElastic
SearchLog S
tore
Authentication, Validation, Alerts Processing
LMM
Un
der
th
e H
oo
d
Quota
&
Policy
Current deployment
• Multiple deployments : on bare KVM nodes, on OpenStack VMs
– On KVM nodes: 40+ VMs, 80+ TB storage, many large memory nodes
– All components are deployed in clustered mode for HA
– Some with active/active replication, some with active/passive
• Use by Platform and Infrastructure Services
– Hundreds of thousands of events per second
– Few TBs of data collected and indexed per day
– Queries are currently coming from Kibana and Grafana, in future from APIs
Nov 4, 2014LMM, Cloud Platform Engineering9
In the Next Iteration …
• Support Health checks
– Exploring Sensu and related technologies
• Multi data center deployment
• Refactor to a data pipeline architecture
• Deployment automation using Murano
• Open Source LMM
Nov 4, 2014LMM, Cloud Platform Engineering10
Thank you!
Questions, Comments, Suggestions?
We are interested in open sourcing LMMand are looking for collaborators …
Interested? Feel free to contact [email protected]
Nov 4, 2014LMM, Cloud Platform Engineering 11