(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Preview:

Citation preview

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Jon Handler, Principal Solutions Architect

Pravin Pillai, Senior Product Manager

October 2015

BDT209

Amazon Elasticsearch Service

for Real-time Data Analysis and

Visualization

Amazon Elasticsearch Service

What to Expect from the Session

• Context: Managing your growing data

• Introducing Amazon Elasticsearch Service (Amazon ES)

• Configuring, securing, connecting, monitoring, and

scaling your Amazon ES cluster

Your data is constantly growing

Product usage

Your data is constantly growing

System logs

Your data is constantly growing

Customer conversations

That’s a lot of data!

“Big data is not about the data”- Gary King, Harvard University, making the point that while data is

plentiful and easy to collect, the real value is in the analytics.

So what can you do with all this data?

• Share information

• Extract insight

• Recognize patterns

• Track performance

Ultimately, make better business,

technical, and operational decisions

Scenario 1: Full-text search

Knowledge Sharing Systems

• Your team is constantly generating

content

• You are tasked with making this

knowledge base searchable and

accessible

• You need key search features including

text matching, faceting, filtering, fuzzy

search, auto complete, and highlighting

Scenario 2: Streaming data analytics

Intrusion detection

• You have to protect your system from

attacks

• You need easy to use, yet powerful

analytics and data visualization tools to

detect issues in near real-time

• Easy and flexible data ingestion is

important to capture information from a

variety of key data sources

Scenario 3: Batch data analytics

Usage Monitoring

• You are a mobile app developer

• You have to monitor/manage users

across multiple app versions

• You want to analyze and report on

usage and migration between app

versions

What options do you have?

How Elasticsearch can help

A powerful, real-time, distributed, open-source search and

analytics engine:• Built on top of Apache Lucene

• Schema free

• Developer friendly RESTful API

How Elasticsearch can help

Combined with Logstash and Kibana, the ELK stack

provides a tool for real-time analytics and data visualization

Operating Elasticsearch is time-consuming

“Elasticsearch allows us to easily and quickly build bleeding edge big data

and analytics applications using the ELK stack. By offering direct access

to the Elasticsearch API while offloading administrative tasks, Amazon

Elasticsearch Service gives us the manageability, flexibility and control we

need ”

Sean Curtis, SVP Engineering at Major League

Baseball Advanced Engineering

Introducing Amazon Elasticsearch Service

Amazon Elasticsearch Service is

a managed service from AWS that

makes it easy to set up, operate,

and scale Elasticsearch clusters

in the cloud.

Key benefits

Easy cluster

creation and

configuration

management

Support for ELK Security with AWS

IAM

Monitoring with

Amazon

CloudWatch

Auditing with AWS

CloudTrail

Integration options

with other AWS

services

(CloudWatch Logs,

Amazon

DynamoDB,

Amazon S3,

Amazon Kinesis)

Create the cluster

AWS CLI commands

add-tags

create-elasticsearch-domain

delete-elasticsearch-domain

describe-elasticsearch-domain

describe-elasticsearch-domain-

config

describe-elasticsearch-domains

list-domain-names

list-tags

remove-tags

update-elasticsearch-domain-config

aws es create-elasticsearch-domain --domain-name my-domain--elasticsearch-cluster-configInstanceType=m3.xlarge.elasticsearch,InstanceCount=3

--ebs-options EBSEnabled=true,VolumeType=gp2,VolumeSize=512

Amazon ES domain overview

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Amazon ES domain overview

Nodes under management

IAM

CloudWatchCloudTrail

Elasticsearch API

Amazon Route

53

Elastic Load

Balancing

Amazon ES domain overview

Single endpoint, REST API

CloudWatchCloudTrail

Elasticsearch API

Amazon Route

53

Elastic Load

BalancingIAM

Amazon ES domain overview

IAM integration

Elasticsearch API

Amazon Route

53

Elastic Load

BalancingIAM

CloudWatchCloudTrail

Amazon ES domain overview

CloudWatch/CloudTrail for monitoring

Scale for your

workload

Data partitioning for search

Shard 1 Shard 2

{ {Id Id Id . . .

Documents

Index

• Document: The unit of search

• ID: Unique identifier, one per

document

• Field: Documents comprise a

collection of fields

• Shard: An instance of Lucene with

a portion of an index

• Index: A collection of data

Deployment of indices to a cluster

• Index 1

• Shard 1

• Shard 2

• Shard 3

• Index 2

• Shard 1

• Shard 2

• Shard 3

Amazon ES cluster

1

2

3

1

2

3

1

2

3

1

2

3

Primary Replica

1

3

3

1

Instance 1

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Instance type recommendations

Instance Workload

T2 Entry point. Dev and test. OK for dedicated masters.

M3 Equal read and write volumes. Up to 5 TB of storage with EBS.

R3 Read-heavy or workloads with high query demands (e.g.,

aggregations).

I2 Up to 16 TB of SSD instance storage.

Secure access

to your domain

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Control access by user

with signed requests

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Allow/Deny HTTP

methods and Config

operations per policy

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam:123456789012:user/susan"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"

} ] }

Fine-grained control to the

index level

Secure access to your domain

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"AWS": "*"

},

"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",

"es:CreateElasticsearchDomain",

"es:ListDomainNames" ],

"Resource":

"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*",

"Condition":

"IpAddress": {

"aws:SourceIp": [ "xx.xx.xx.xx/yy" ]

} } ] }

And/or use IP-based

access control

Load data

Direct access to the Elasticsearch API

$ curl -XPUT https://<endpoint>/blog -d '{"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'

$ curl -XPOST http://<endpoint>/blog/post/1 -d '{"author":"jon handler",

"title":"Amazon ES Launch" }'

$ curl -XPOST https://<endpoint>/blog/post/_bulk -d '{ "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}

{"title":"Amazon ES for search", "author": "pravin pillai"},

{ "index" : { "_index":"blog", "_type":"post", "_id":"3" } }

{ "title":"Analytics too", "author": "vivek sriram"}'

$ curl -XGET http://<endpoint>/_search?q=ES{"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0},"hits

":{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":"post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch" }},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{"title":"Amazon ES for search", "author": "pravin pillai"},}]}}

Loading data using Logstash

Application nodes/

Logstash forwarders

Logstash indexerAmazon

Elasticsearch

Service

Logstash plugin for Amazon ES

https://github.com/awslabs/logstash-output-amazon_es

output {

amazones {

*hosts => ["foo.us-east-1.es.amazonaws.com"]

*region => "us-east-1"

access_key => 'ACCESS_KEY' (optional)

secret_key => 'SECRET_KEY' (optional)

codec => "plain"

workers => 1

index => "logstash-%{+YYYY.MM.dd}"

}

}

Loading data using Lambda

Amazon

LambdaAmazon

Elasticsearch

Service

Amazon S3

DynamoDB

Amazon

Kinesis

Lambda code snippet (node.js) for upload

var AWS = require('aws-sdk');

var creds = new AWS.EnvironmentCredentials('AWS');

function postDocumentToES(doc, context) {

var req = new AWS.HttpRequest(endpoint);

var signer = new AWS.Signers.V4(req, 'es');

signer.addAuthorization(creds, new Date());

var send = new AWS.NodeHttpClient();

send.handleRequest(req, null, function(httpResp)...

https://github.com/awslabs/amazon-elasticsearch-lambda-samples

Export logs to

Amazon ES

CloudWatch Amazon

Elasticsearch

Service

Export CloudWatch Logs

Demo

Monitor and

auditCloudWatch

CloudTrail

Monitoring

What should I monitor?

• FreeStorageSpace – monitor and alarm before the

cluster runs out of space

• CPUUtilization – alarm at 80% CPU to signal the need to

scale up

• ClusterStatus.yellow – check whether replication

requires additional nodes

• JVMMemoryPressure – check instance type and count

for sufficient resources

• MasterCPUUtilization – monitoring for master nodes is

separated from data nodes

Snapshot and

restore for data

durability

Daily automated snapshots

• No additional charges

• Snapshots retained for 14 days

Taking manual snapshots

Amazon S3

role

Snapshot

repository

Trust relationship:

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": {

"Service": "es.amazonaws.com"

},

"Action": "sts:AssumeRole"

}

]

}

Taking manual snapshots

Amazon S3

Snapshot

repository

{ "Version":"2012-10-17",

"Statement":[

{

"Action":[ "s3:ListBucket" ],

"Effect":"Allow",

"Resource":

[ "arn:aws:s3:::bucket" ] },

{ "Action":[

"s3:GetObject",

"s3:PutObject",

"s3:DeleteObject",

"iam:PassRole" ],

"Effect":"Allow",

"Resource":[ "arn:aws:s3:::bucket/*"

] } ] }

role

Taking manual snapshots

Register the bucketcurl -XPUT http://<endpoint>/_snapshot/<repo-name>

-d '{"type":"s3",

"settings": {

"bucket":"<bucket>",

"region":"<region>",

"role-arn":"<arn>"}}'

Take a snapshotcurl -XPUT http://<endpoint>/_snapshot/<repo-name>/snapshot1

Snapshot time is proportional to size.

Built-in Kibana

Application overview

Logstash indexerAmazon

Elasticsearch

Service

Application nodes/

Logstash forwarders

Kibana UI

Securing Kibana

IAMProxy

(Optional)

IAM policy for Kibana

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "",

"Effect": "Allow",

"Principal": { "AWS": "*" },

"Action": [ "es:ESHttpGet",

"es:ESHttpPut",

"es:ESHttpPost",

"es:ESHttpHead"],

"Resource": [ "arn:aws:es:us-east-1:####:domain/<domain>/*" ],

"Condition": { "IpAddress": { "aws:SourceIp": [ xx.xx.xx.xx ] } }

}

]

}

Pay for what you

use

Pay for compute and storage you use

With Amazon Elasticsearch Service, you pay only for the

compute and storage resources you use. AWS Free Tier for

qualifying customers.

Amazon Elasticsearch Service is publicly available now!

• us-east-1

• us-west-1

• us-west-2

• eu-west-1

• eu-central-1

• ap-southeast-1

• ap-southeast-2

• ap-northeast-1

• sa-east-1

You can use Amazon Elasticsearch Service in these regions:

Wrap up

1. Elasticsearch is a tool for full-text search, analysis, and

visualization of time series data that helps you get the

most out of your growing data set

2. Amazon Elasticsearch Service makes it easy to deploy

and manage an Elasticsearch cluster in the AWS cloud

3. Amazon Elasticsearch Service is a drop-in replacement

for your existing Elasticsearch cluster

Thank you!

Remember to complete

your evaluations!

Recommended