63
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jon Handler, Principal Solutions Architect Pravin Pillai, Senior Product Manager October 2015 BDT209 Amazon Elasticsearch Service for Real-time Data Analysis and Visualization

(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Embed Size (px)

Citation preview

Page 1: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Jon Handler, Principal Solutions Architect

Pravin Pillai, Senior Product Manager

October 2015

BDT209

Amazon Elasticsearch Service

for Real-time Data Analysis and

Visualization

Page 2: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Amazon Elasticsearch Service

Page 3: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

What to Expect from the Session

• Context: Managing your growing data

• Introducing Amazon Elasticsearch Service (Amazon ES)

• Configuring, securing, connecting, monitoring, and

scaling your Amazon ES cluster

Page 4: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Your data is constantly growing

Product usage

Page 5: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Your data is constantly growing

System logs

Page 6: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Your data is constantly growing

Customer conversations

Page 7: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

That’s a lot of data!

Page 8: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

“Big data is not about the data”- Gary King, Harvard University, making the point that while data is

plentiful and easy to collect, the real value is in the analytics.

Page 9: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

So what can you do with all this data?

• Share information

• Extract insight

• Recognize patterns

• Track performance

Ultimately, make better business,

technical, and operational decisions

Page 10: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Scenario 1: Full-text search

Knowledge Sharing Systems

• Your team is constantly generating

content

• You are tasked with making this

knowledge base searchable and

accessible

• You need key search features including

text matching, faceting, filtering, fuzzy

search, auto complete, and highlighting

Page 11: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Scenario 2: Streaming data analytics

Intrusion detection

• You have to protect your system from

attacks

• You need easy to use, yet powerful

analytics and data visualization tools to

detect issues in near real-time

• Easy and flexible data ingestion is

important to capture information from a

variety of key data sources

Page 12: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Scenario 3: Batch data analytics

Usage Monitoring

• You are a mobile app developer

• You have to monitor/manage users

across multiple app versions

• You want to analyze and report on

usage and migration between app

versions

Page 13: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

What options do you have?

Page 14: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

How Elasticsearch can help

A powerful, real-time, distributed, open-source search and

analytics engine:• Built on top of Apache Lucene

• Schema free

• Developer friendly RESTful API

Page 15: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

How Elasticsearch can help

Combined with Logstash and Kibana, the ELK stack

provides a tool for real-time analytics and data visualization

Page 16: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Operating Elasticsearch is time-consuming

“Elasticsearch allows us to easily and quickly build bleeding edge big data

and analytics applications using the ELK stack. By offering direct access

to the Elasticsearch API while offloading administrative tasks, Amazon

Elasticsearch Service gives us the manageability, flexibility and control we

need ”

Sean Curtis, SVP Engineering at Major League

Baseball Advanced Engineering

Page 17: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Introducing Amazon Elasticsearch Service

Amazon Elasticsearch Service is

a managed service from AWS that

makes it easy to set up, operate,

and scale Elasticsearch clusters

in the cloud.

Page 18: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Key benefits

Easy cluster

creation and

configuration

management

Support for ELK Security with AWS

IAM

Monitoring with

Amazon

CloudWatch

Auditing with AWS

CloudTrail

Integration options

with other AWS

services

(CloudWatch Logs,

Amazon

DynamoDB,

Amazon S3,

Amazon Kinesis)

Page 19: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Create the cluster

Page 20: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
Page 21: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

AWS CLI commands

add-tags

create-elasticsearch-domain

delete-elasticsearch-domain

describe-elasticsearch-domain

describe-elasticsearch-domain-

config

describe-elasticsearch-domains

list-domain-names

list-tags

remove-tags

update-elasticsearch-domain-config

aws es create-elasticsearch-domain --domain-name my-domain--elasticsearch-cluster-configInstanceType=m3.xlarge.elasticsearch,InstanceCount=3

--ebs-options EBSEnabled=true,VolumeType=gp2,VolumeSize=512

Page 22: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Amazon ES domain overview

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Page 23: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Amazon ES domain overview

Nodes under management

Page 24: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

IAM

CloudWatchCloudTrail

Elasticsearch API

Amazon Route

53

Elastic Load

Balancing

Amazon ES domain overview

Single endpoint, REST API

Page 25: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

CloudWatchCloudTrail

Elasticsearch API

Amazon Route

53

Elastic Load

BalancingIAM

Amazon ES domain overview

IAM integration

Page 26: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Elasticsearch API

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatchCloudTrail

Amazon ES domain overview

CloudWatch/CloudTrail for monitoring

Page 27: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Scale for your

workload

Page 28: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Data partitioning for search

Shard 1 Shard 2

{ {Id Id Id . . .

Documents

Index

• Document: The unit of search

• ID: Unique identifier, one per

document

• Field: Documents comprise a

collection of fields

• Shard: An instance of Lucene with

a portion of an index

• Index: A collection of data

Page 29: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Deployment of indices to a cluster

• Index 1

• Shard 1

• Shard 2

• Shard 3

• Index 2

• Shard 1

• Shard 2

• Shard 3

Amazon ES cluster

1

2

3

1

2

3

1

2

3

1

2

3

Primary Replica

1

3

3

1

Instance 1

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 30: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Instance type recommendations

Instance Workload

T2 Entry point. Dev and test. OK for dedicated masters.

M3 Equal read and write volumes. Up to 5 TB of storage with EBS.

R3 Read-heavy or workloads with high query demands (e.g.,

aggregations).

I2 Up to 16 TB of SSD instance storage.

Page 31: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access

to your domain

Page 32: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Page 33: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Control access by user

with signed requests

Page 34: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Allow/Deny HTTP

methods and Config

operations per policy

Page 35: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Fine-grained control to the

index level

Page 36: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "*"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*",

"Condition":

"IpAddress": {

"aws:SourceIp": [ "xx.xx.xx.xx/yy" ]

} } ] }

And/or use IP-based

access control

Page 37: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Load data

Page 38: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Direct access to the Elasticsearch API

$ curl -XPUT https://<endpoint>/blog -d '{"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'

$ curl -XPOST http://<endpoint>/blog/post/1 -d '{"author":"jon handler",

"title":"Amazon ES Launch" }'

$ curl -XPOST https://<endpoint>/blog/post/_bulk -d '{ "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}

{"title":"Amazon ES for search", "author": "pravin pillai"},

{ "index" : { "_index":"blog", "_type":"post", "_id":"3" } }

{ "title":"Analytics too", "author": "vivek sriram"}'

$ curl -XGET http://<endpoint>/_search?q=ES{"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0},"hits

":{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":"post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch" }},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{"title":"Amazon ES for search", "author": "pravin pillai"},}]}}

Page 39: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Loading data using Logstash

Application nodes/

Logstash forwarders

Logstash indexerAmazon

Elasticsearch

Service

Page 40: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Logstash plugin for Amazon ES

https://github.com/awslabs/logstash-output-amazon_es

output {

amazones {

*hosts => ["foo.us-east-1.es.amazonaws.com"]

*region => "us-east-1"

access_key => 'ACCESS_KEY' (optional)

secret_key => 'SECRET_KEY' (optional)

codec => "plain"

workers => 1

index => "logstash-%{+YYYY.MM.dd}"

}

}

Page 41: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Loading data using Lambda

Amazon

LambdaAmazon

Elasticsearch

Service

Amazon S3

DynamoDB

Amazon

Kinesis

Page 42: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Lambda code snippet (node.js) for upload

var AWS = require('aws-sdk');

var creds = new AWS.EnvironmentCredentials('AWS');

function postDocumentToES(doc, context) {

var req = new AWS.HttpRequest(endpoint);

var signer = new AWS.Signers.V4(req, 'es');

signer.addAuthorization(creds, new Date());

var send = new AWS.NodeHttpClient();

send.handleRequest(req, null, function(httpResp)...

https://github.com/awslabs/amazon-elasticsearch-lambda-samples

Page 43: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Export logs to

Amazon ES

CloudWatch Amazon

Elasticsearch

Service

Page 44: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Export CloudWatch Logs

Demo

Page 45: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Monitor and

auditCloudWatch

CloudTrail

Page 46: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Monitoring

Page 47: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

What should I monitor?

• FreeStorageSpace – monitor and alarm before the

cluster runs out of space

• CPUUtilization – alarm at 80% CPU to signal the need to

scale up

• ClusterStatus.yellow – check whether replication

requires additional nodes

• JVMMemoryPressure – check instance type and count

for sufficient resources

• MasterCPUUtilization – monitoring for master nodes is

separated from data nodes

Page 48: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Snapshot and

restore for data

durability

Page 49: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Daily automated snapshots

• No additional charges

• Snapshots retained for 14 days

Page 50: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Taking manual snapshots

Amazon S3

role

Snapshot

repository

Trust relationship:

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"Service": "es.amazonaws.com"

},

"Action": "sts:AssumeRole"

}

]

}

Page 51: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Taking manual snapshots

Amazon S3

Snapshot

repository

{ "Version":"2012-10-17",

"Statement":[

{

"Action":[ "s3:ListBucket" ],

"Effect":"Allow",

"Resource":

[ "arn:aws:s3:::bucket" ] },

{ "Action":[

"s3:GetObject",

"s3:PutObject",

"s3:DeleteObject",

"iam:PassRole" ],

"Effect":"Allow",

"Resource":[ "arn:aws:s3:::bucket/*"

] } ] }

role

Page 52: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Taking manual snapshots

Register the bucketcurl -XPUT http://<endpoint>/_snapshot/<repo-name>

-d '{"type":"s3",

"settings": {

"bucket":"<bucket>",

"region":"<region>",

"role-arn":"<arn>"}}'

Take a snapshotcurl -XPUT http://<endpoint>/_snapshot/<repo-name>/snapshot1

Snapshot time is proportional to size.

Page 53: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Built-in Kibana

Page 54: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Application overview

Logstash indexerAmazon

Elasticsearch

Service

Application nodes/

Logstash forwarders

Page 55: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Kibana UI

Page 56: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Securing Kibana

IAMProxy

(Optional)

Page 57: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

IAM policy for Kibana

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": { "AWS": "*" },

"Action": [ "es:ESHttpGet",

"es:ESHttpPut",

"es:ESHttpPost",

"es:ESHttpHead"],

"Resource": [ "arn:aws:es:us-east-1:####:domain/<domain>/*" ],

"Condition": { "IpAddress": { "aws:SourceIp": [ xx.xx.xx.xx ] } }

}

]

}

Page 58: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Pay for what you

use

Page 59: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Pay for compute and storage you use

With Amazon Elasticsearch Service, you pay only for the

compute and storage resources you use. AWS Free Tier for

qualifying customers.

Page 60: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Amazon Elasticsearch Service is publicly available now!

• us-east-1

• us-west-1

• us-west-2

• eu-west-1

• eu-central-1

• ap-southeast-1

• ap-southeast-2

• ap-northeast-1

• sa-east-1

You can use Amazon Elasticsearch Service in these regions:

Page 61: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Wrap up

1. Elasticsearch is a tool for full-text search, analysis, and

visualization of time series data that helps you get the

most out of your growing data set

2. Amazon Elasticsearch Service makes it easy to deploy

and manage an Elasticsearch cluster in the AWS cloud

3. Amazon Elasticsearch Service is a drop-in replacement

for your existing Elasticsearch cluster

Page 62: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Thank you!

Page 63: (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Remember to complete

your evaluations!