Log Analytics with Amazon Elasticsearch Service - September Webinar Series

Preview:

Citation preview

Log Analytics with Amazon Elasticsearch

Service

Jon Handler (handler@amazon.com)

What we'll cover

• Understanding Elasticsearch capabilities• Elasticsearch, the technology• Aggregations; ad-hoc analysis• Amazon Elasticsearch Service is a drop-in

replacement for self-managed Elasticsearch• Q&A

Understanding Elasticsearch capabilities

CloudTrail delivers API calls to you

• AWS API call monitoring

• You need to understand the changing landscape of your AWS resources

• You need to do security analysis and compliance auditing

• You want the ability to dig into your logs in an intuitive, fine-grained way

How Elasticsearch can help

• Combined with Kibana, Elasticsearch provides a tool for search, real-time analytics, and data visualization

API, SDK, CLIConsole

Demo Architecture

Amazon CloudWatch

Logs

Amazon Elasticsearch

Service

CloudTrail Logs

AWS Resources

Log lines

Demo

Scenario: Log data analytics

• Application monitoring and event diagnosis

• You need to monitor the performance of your application, web servers, and hardware

• You need easy to use, yet powerful data visualization tools to detect issues in near real-time

• You want the ability to dig into your logs in an intuitive, fine-grained way

• Kibana provides fast, easy visualization

Scenario: Batch data analytics

• Reporting and Analysis

• You are a mobile app developer• You have to monitor/manage users

across multiple app versions• You want to analyze and report on

usage and migration between app versions

• Use Kibana for dashboarding. Use the query API for deeper analysis

Scenario: Full-text search

• Traditional search

• Your application or website provides search capabilities over diverse documents

• You are tasked with making this knowledge base searchable and accessible

• You need key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting

• Use the query API to support application search

Elasticsearch the technology

Elasticsearch is like a database

SearchValueField

DocumentIndex

Cluster

Queries

DatabaseValueColumnRowTableDatabase

SQL

Documents are the core entity

ID

F1 Value

F2 Value

{"eventVersion": "1.03","eventTime": "2016-06-01T00:16:19Z","eventSource": "dynamodb.amazonaws.com","eventName": "DescribeStream","awsRegion": "eu-west-1","sourceIPAddress": "52.51.24.XX","userAgent": "leb-kcl-580935a6-5f94-4ce0-ac69-cdeb609ba16a,amazon-

kinesis-client-library-java-lambda_1.2.1, aws-internal/3","requestParameters": {

"streamArn": "arn:aws:dynamodb:eu-west-1:17816119XXXX:table/restaurant/stream/2016-04-08T18:07:53.837"

},"responseElements": null,"requestID":

"KC608PH8POAF2I184E2SL1PS2FVV4KQNSO5AEMVJF66Q9ASUAAJG","eventID": "49b56379-903b-4f04-8ce5-d21bbfcf8ab3","eventType": "AwsApiCall","apiVersion": "2012-08-10","recipientAccountId": "17816119XXXX","userIdentity": {

"type": "AssumedRole","principalId":

"AROAJBQVRM7LN25CAHX7Y:awslambda_338_20160531233813522","arn":

"arn:aws:sts::178161197791:assumed-role/geospatial-rec-engine-ApplicationExecutionRole-9LPKB77QMR97/awslambda_338_20160531233813522", ...

Lucene provides text analysis and indexing

0 quick 1,3,5

1 brown 2,3,4,6

2 fox 1,7,9

3 lazy 2,8

4 dog 24

Term ID Term Postings

IndexWriter

IndexSearcher

Segment

Elsaticsearch query processing

Query

quick

brown

fox

lazy

lorem

ipsum

dolor

sit

Index Lookup

id: 216

id: 305

id: 486

id: 713

Matches

Querylogic and post-filtering Scoring,

aggs

id: 713

id: 305

id: 486

id: 216

Sorted matches(results)

Aggregations; ad-hoc analysis

Faceting: basic aggregation

• Query: shirt

Facets Carhartt (1092) Russell Athletic (1087) Dickies (954) RALPH LAUREN (823) Wrangler (701) Doublju (259) Levi's (12)

ID

F1 Value

F2 Value

Elasticsearch Aggregations

• Buckets – a collection of documents meeting some criterion

• Metrics – calculations on the content of buckets.

Bucket: time

Met

ric: c

ount

A more complicated aggregation

Bucket: ARNBucket: RegionBucket: eventNameMetric: Count

More kinds of aggregations

Buckets• Date histogram• Histogram• Range• Terms• Filters• Significant terms

Metrics• Count• Average• Sum• Min• Max• Std. Dev• Unique Count• Percentiles

Setting up your cluster

Shard 1 Shard 2 Shard 3{ { { { Shard 4

Shards: independent collections of documents

Id Id Id . . .

Documents

{ Index/Type

Deployment of indices to a cluster

• Index 1– Shard 1– Shard 2– Shard 3

• Index 2– Shard 1– Shard 2– Shard 3

Amazon ES cluster

12

3

12

3

12

3

12

3

Primary Replica

1

3

3

1

Instance 1,Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Determining storage

• Data:Index ratio is typically close to 1:1• Add a replica, double the storage• Figure out data node count based on storage

– Current limits; 10T EBS, 32T instance store

Determining instance type

• Instance type is workload-dependent• T2; dev, test, QA• M3; solid performance• R3; heavier queries, aggs• I2; largest storage option

Best practices

• Take the minimum number of shards for 50G max data per shard

• Number of replicas = 1• For all prod workloads: use 3 dedicated masters• Use the _bulk API. Some ingest mechanisms do this

automatically• Increase index.refresh_interval for higher throughput

Indexing strategy

Logstash

REST

CWL Agent

EC2 Instances

Amazon Kinesis

AmazonRDS

AmazonDynamoDB

AmazonSQS

Queue

Logstash Cluster

Amazon Elasticsearch

Service

Amazon CloudWatch

AWSLambda

AWSCloudTrail

Access Logs

Amazon VPC Flow

Logs

Amazon S3 bucket

AWS IoT

Amazon Kinesis Firehose

Integration with the AWS ecosystem

Amazon ECS

Indexing strategy for streaming data

• Use an index per time period, typically index-per-day, high volume can go to index-per-hour

• Shard the index according to data size; use 50GB as a soft limit per shard

• Master nodes increase cluster stability

Index settings control sharding and more

curl -XPUT <endpoint>/<index>/_settings -d '{"number_of_shards" : 5,"number_of_replicas" : 1,"refresh_interval": "5s"

}'

Mappings control how data is indexed

curl -XPUT <endpoint>/<index> -d '{"mappings" : {

<type> : {"properties" : {

"eventName" : {"type" : "string", "index" : "not_analyzed" } }

} }}'

Index templates simplify mapping creation

curl -XPUT <endpoint>/_template/<name> -d '{"template" : "<wildcard e.g. cwl-*>","settings" : { "number_of_shards" : 2 },"mappings" : {<type, e.g. _default_> : {"dynamic_templates" : [ {<template name> : { "mapping" : { "index" : "not_analyzed" }, "match" : "*" } } ],"properties" : {"@timestamp" : { "type" : "date" } } }

}'

Don't forget the query API!

Direct access to the Elasticsearch API

• $ curl -XPUT https://<endpoint>/blog -d '{• "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'• $ curl -XPOST http://<endpoint>/blog/post/1 -d '{• "author":"jon handler",• "title":"Amazon ES Launch" }'• $ curl -XPOST https://<endpoint>/blog/post/_bulk -d '• { "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}• {"title":"Amazon ES for search", "author": "carl meadows"},• { "index" : { "_index":"blog", "_type":"post", "_id":"3" } }• { "title":"Analytics too", "author": "vivek sriram"}'• $ curl -XGET http://<endpoint>/_search?q=ES• {"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0},"hits":

{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":"post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch" }},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{"title":"Amazon ES for search", "author": "carl meadows"},}]}}

Elasticsearch is a full-featured search engine

• Built on Lucene, the popular, open-source library• Search structured and unstructured data with

complex, boolean queries• Supports common search features: geo search,

aggregations, highlighting, search suggestions, and more

Challenges with self-managed Elasticsearch

• Easy to get started, challenging to scale• Scaling ingest pipelines is difficult• Undifferentiated heavy lifting

Amazon Elasticsearch Service

Amazon ES overview

Amazon Route 53

Elastic LoadBalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Easy cluster configuration and reconfiguration

API, SDK, CLIConsole

AWS

• Elasticsearch Version• Data nodes, count and type• Master nodes, count and type• Storage option – EBS/instance• HA option• Advanced options

High availability with Zone Awareness

Amazon ES cluster

1

3

Instance 1

2

1 2

Instance 2

3

2

1

Instance 3

Availability Zone 1 Availability Zone 2

2

1

Instance 4

3

3

Monitor with CloudWatch metrics

• FreeStorageSpace – monitor and alarm before the cluster runs out of space

• CPUUtilization – alarm at 80% CPU to signal the need to scale up

• ClusterStatus.yellow – check whether replication requires additional nodes

• JVMMemoryPressure – check instance type and count for sufficient resources

• MasterCPUUtilization – monitoring for master nodes is separated from data nodes

Security with IAM{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"} ] }

Pay for compute and storage you use

• With Amazon Elasticsearch Service, you pay only for the compute and storage resources you use. AWS Free Tier for qualifying customers.

Wrap up

• Combined with Kibana, Elasticsearch provides search and visualization for streaming data and full-text use cases.

• Elasticsearch is based on Lucene, which reads and writes search indices

• Aggregations allow you to analyze your data, splitting into Buckets and computing Metrics

• Amazon Elasticsearch Service makes it easy to set up and manage your Elasticsearch cluster on AWS

• Amazon ES is a great way to get started with Elasticsearch!

Q&A

• Jon Handler: handler@amazon.com• Vivek Sriram: Business Development Manager:

vsriram@amazon.com• https://run.qwiklab.com/searches/elasticsearch

Recommended