29
AWS Meetup Chicago

Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Embed Size (px)

Citation preview

Page 1: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

AWS Meetup Chicago

Page 2: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Who am IAsaf YigalCo-Founder and VP Product @logz.ioEmail: [email protected] @asafyigal

Page 3: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Agenda• Why do we need Log analytics?• Intro to ELK• What is Logz.io• Installing ELK on your own• Our Architecture • EC2 machine comparison

Page 4: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Why do we need Log analytics?

Page 5: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Werner VogelsAWS CTO

“Log Analytics is Fundamental for

Building Cloud Applications”

Page 6: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Product Management

BusinessAnalysis

CustomerSuccess

BI

MonitoringDevOps

IoT

Troubleshooting

Support

QA

IT OPPS , ITOA

Compliance

SecOpsSIEM

Multiple Use-Cases

Page 7: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Log driven development• Errors, Warnings and exceptions• Metrics• Alerts• Dashboard

Page 8: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Page 9: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

WhyOpen

Source

Page 10: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

*based on Logz.io research

The Market is Dominated by Open Source Solutions

Over the past 3 years, the market shifted attention from proprietary to open source

ELK Stack, 400,000+

companies

Splunk, Sumo Logic, Log-gly, - 20,000 companies

Graphite has > 1M companies us-ing it

Page 11: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

ELK Popularity

Page 12: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Intro to ELK

Logstash• Streaming data digestion• Time normalization• Field extraction

Elasticsearch• Schema-less search DB• Highly scalable

Kibana• Visualization

Page 13: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Open source ELK +/-

Simple and beautifulIt’s simple to get started and play with ELK and the UI is just beautiful

Open SourceThe largest user base with a vibrant open source community that supports and improves the product

Fast. Very fast.Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents

Hard to ScaleData piles up and organization experience usage bursts. It’s super-complex building elastic ELK deployments that can scale up and down

Poor SecurityLogs include sensitive data and open source ELK offers no real security solution, from authentication to role based access

Not Production ReadyBuilding production ready ELK deployment is a great challenge organization face. With hundreds of different configurations and support matrix, making sure it’s always up is difficult

Page 14: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Up and running in minutesSign up in and get insights into your data in minutes

Logz.io Enterprise ELK Cloud Service

Production readyPredefined and community designed dashboard, visualization and alerts are all bundled and ready to provide insights

Infinitely scalableShip as much data as you want whenever you want

AlertsUnique Alerts system proprietary built on top of open source ELK transform the ELK into a proactive system

Highly AvailableData and entire data ingestion pipeline can sustain downtime in full datacenter without losing data or service

Advanced Security360 degrees security with role based access and multi-layer security

Page 15: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Installing ELK on your own

Page 16: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Prototype• Installing ELK stack on a single server – 1hr• Shipping one type of log – 1hr• Log parsing – 2 hr• Building Kibana Dashboard – 2hr

• 6 hours to get a simple Prototype

Page 17: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Turning ELK Production ready

Page 18: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

OS Level OptimizationElasticsearch require a lot of OS level optimization in order to run properly.

Elasticsearch

Shard AllocationOptimizing insert and query times can be tricky and require a lot of attention.

Index ManagementBecause deletion is an expensive operation Index management is required for log analytics solutions

Zone awarenessThis is specific for AWS and required to achieve high availability

Cluster TopologyElasticsearch clusters require 3 Master nodes, Data nodes and Client nodes.

Bulk inserts OptimizationOptimizing insert time and latency

Page 19: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Capacity provisioningNeed to account for log bursts and be able to provision enough capacity.

Elasticsearch (2)

Archive (DR)Snapshot the data to a different repository for disaster recovery

Mapping managementMapping conflicts and sync issues need to be detected and addressed

MonitoringMarvell does a good job but require DevOps constant attention

CuratorRemove or optimize old indices

Alias managementFor better cluster control you need to define and use aliases

Page 20: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Data parsingExtracting values from text messages and enhancing them with geo user agent etc.

Logstash

High AvailabilityRunning logstash in a cluster is not trivial.

ScalabilityDealing with increase of load on the logstash servers

Burst ProtectionLogs tend to be bursty – A special buffer like Redis, Kafka etc. is required to front logstash

Rejection from ElasticsearchElaticsearch rejects about 1% of messages due to mapping issues – This needs to be addressed

Configuration managementA special infrastructure need to be in place to allow config changes with no data loss

Page 21: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

SecurityKibana by default has no protection. User authentication is required to be implemented

Kibana

High AvailabilityRunning Kibana in a cluster for upgrades and high availability.

Role based accessIf you want to restrict access to certain information this capability needs to be developed

AlertsAlerts is not part of the open source.

Anomaly DetectionBasic anomaly detection is missing from the Kibana

Pre Canned DashboardsBuilding Dashboards and visualization in Kibana is tricky and require special knowledge

Page 22: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Turning ELK Production ready

~ 4-6 weeks of work

Page 23: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

UpgradesChallenging to upgrade – need to be aware of backward compatibility.

Maintenance

Overall cluster healthMonitor the health of the environment

AWS IssuesDealing with AWS stability issues

Mapping conflictsDeal with arising mapping conflicts

Personnel redundancyNeed to have multiple people with deep knowledge of the stack

Capacity increase Provision additional capacity and grow the cluster.

Page 24: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Our Architecture

Page 25: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Ha Proxy

Listener

Listener

Listener

Listener

Kafka

Log Engine

S3

Elasticsearch Play server

Curator

Hot/Cold migration

DLQAlert

Engine

Kibana

Monitoring: ELK, Graphite, Nagios etc.

Shard optimizer

Log Engine

Logstash

API Gateway

Cluster Protec-

tion

Page 26: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Demo

Page 27: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

AWS Server ComparisonMachine Number TB/Day

M1.xlarge 4 0.6

i2.xlarge 4 1

C3.8xlarge 6 1.5

C4.2xlarge + 1TB EBS 3 1.3

Page 28: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

We’re Hiring• Technical evangelist

• Business Development

• Marketing

[email protected]

Page 29: Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

Questions?