Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computation platform on the AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Scott Miao, SPN, Trend Micro

2016/5/20

Analytic Engine

A common Big Data computation service on the AWS

Who am I

• Scott Miao

• RD, SPN, Trend Micro

• Worked on Hadoop ecosystem about 6

years

• Worked on AWS for BigData about 3 years

• Expertise in HDFS/MR/HBase

• Speaker in some Hadoop related confs

• @takeshi.miao

https://twitter.com/takeshimiao

Agenda

• What problems we suffered ?

• Why AWS ?

• Analytic Engine

• The benefits AWS brings to AE

• AE roadmap on AWS

What problems we suffered ?

Hadoop Expansion

Data volume increases 1.5 ~ 2x every year

Data center issues

• network bottleneck

• server depreciation

Growth

becomes 2x

Why AWS ?

Return of Investment

• On traditional infra., we put a lot of efforts on services operation

• On the Cloud, we can leverage its elasticities to automate our

services

• More focus on innovation !!

Time

Money

Revenue

Cost

AWS is a leader of IaaS platform

https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sbSource: Gartner (May 2015)

https://www.gartner.com/doc/reprints?id=1-2G2O5FC&ct=150519&st=sb

AWS Evaluation

Cost acceptable

Functionalities satisfied

Performance satisfied

Analytic EngineA common Big Data computation service on the AWS

High Level Architecture

Analytic Engine

(AE)

CloudStorage

(CS)

createCluster

submitJob

deleteCluster

Input from

Output to

AWS EMR

RESTful API RESTful API

RDs

Researchers

Services

Common

Storage

Service

Common

Computation

Service

Common Cloud Services in Trend

Analytic Engine

• Computation service for Trenders

• Based on AWS EMR

• Simple RESTful API calls

• Computing on demand

• Short live

• Long running

• No operation effort

• Pay by computing resources

Cloud Storage

• Storage service for Trenders

• Based on AWS S3

• Simple RESTful API calls

• Share data to all in one place

• Metadata search for files

• No operation effort

• Pay by storage size used

Why we use AE instead of EMR directly ?

• Abstraction

• Avoid locked-in

• Hide details impl. behind the scene

• AWS EMR was not design for long running jobs

• >= AMI-3.1.1 – 256 ACTIVE or PENDING jobs (STEPs)

• < AMI-3.1.1 – 256 jobs in total

• Better integrated with other common services

• Keep our hands off from AWS native codes

• Centralized Authentication & Authorization

• No AWS/SSH keys for user

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/AddingStepstoaJobFlow.html

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/AddingStepstoaJobFlow.html

Common usecases for AE

• User creates a cluster

• User can create multiple clusters

• User submits job to target cluster

• AE helps user to deliver job to secondary cluster

• User wants to know their cost

Usecase#1 – User creates a cluster

AEuserscreateCluster

EMR

1.User invokes createCluster

2.AE launches an EMR cluster for user

With tags attached

1.

2.

tag:

‘sched:routine’,

‘env:prod’,

m3.xlarge * 10

tag:


‘env:prod’,

m3.xlarge * 10It is RESTful API,

so I can use any

client I am familiar

with !

Usecase#2 – User can create multiple clusters

as he/she need

AEuserscreateCluster

EMR

1.User invokes createCluster

2.AE launches another new EMR cluster for user

with tags attached

3. User can create many clusters he/she needs

1.

2.

tag:

‘sched:adhoc’,

‘env:prod’,

c3.4xlarge * 20

tag:


‘env:prod’,

m3.xlarge * 10

tag:

‘sched:adhoc’,

‘env:prod’,

c3.4xlarge * 20

1.User invokes submitJob

2.AE matches the job and

deliver it to target cluster

3. AE submits job

4.EMR pulls data from CS

5.Job runs on target cluster

6.EMR outputs result to CS

7. AE sends msg to SNS

Topic if user specified

Usecase#3 – User submits job to target cluster

to run

AEuserssubmitJob

EMR

CS

1.

2.

3.clusterCriteria:

[[‘sched:adhoc’,

‘env:prod’],

[“env:prod”]]

tag:


‘env:prod’

tag:

‘sched:adhoc’,

‘env:prod’

5.7.

4. 6.

Usecase#4 – AE delivers job to secondary

cluster if target cluster down

AEuserssubmitJob

EMR

CS

1.

2.

3.

clusterCriteria:

[[‘sched:adhoc’,

‘env:prod’],

[“env:prod”]]

tag:


‘env:prod’

tag:

‘sched:adhoc’,

‘env:prod’

1.User invokes submitJob

2.AE matches the job and

deliver it to secondary cluster

3. AE submits job

4.EMR pull data from CS

5.Job run on target cluster

6.EMR output result to CS

5.

4. 6.

Usecase#5 – User wants to know what their

current cost isBilling & Cost management -> Cost Explorer -> Launch Cost Explorer

IDC

Middle Level Architecture

AZb

AE API servers

RDS

Internal ELB

AZa

AZb

AZc

AE API servers

RDS

services

services

services

peering

HTTPS

EMR

EMR

Cross-account

S3 buckets

input/outputAuto

Scaling

group

worker

s

worker

sMulti-AZs

Auto

Scaling

groupAuto

Scaling

group

Eureka

Eureka

Internet

HTTPS/HTTP

Basic/VPN

Cloud Storage

HTTPS/HTTP

Basic

Amazon

SNS

Oregon (us-west-2)peering

The benefits AWS brings to AE

Pros & Cons

Aspects IDC AWS

Data Capacity Limited by physical rack

space

No limitation in

seasonable amount

Computation Capacity Limited by physical rack

space

No limitation in

seasonable amount

DevOps Hard, due to on physical

machine/ VM farm

Easy, due to code is

everything (Continuous

Deployment)

Scalability Hard, due to on physical

machine/ VM farm

Easy, relied on ELB,

Autoscaling group from

AWS

Pros & Cons

Aspects IDC AWS

Disaster Recovery Hard, due to on physical

machine/ VM farm

Easy, due to code is

everything

Data Location Limited due to IDC

location

Various and easy due to

multiple regions of AWS

Cost Implied in Total Cost of

Ownership

Acceptable cost with

Cloud optimized design

AE roadmap on AWS

Roadmap

Backups

AZb

AZa

AZb

AZcRDS

peering

HTTPS

Cross-account

S3 buckets

input/output

Oregon (us-west-2)

RDS

1. pre-built infra. by AWS CF

2. Users permission granted

3. Pre-launched RDS

1.

2.

3.

AZb

AE API servers

RDS

Internal ELB

AZa

AZb

AZc

AE API servers

RDS

peering

HTTPS

EMR

EMR

Cross-account

S3 buckets

input/outputAuto

Scaling

group

worker

s

worker

sMulti-AZs

Auto

Scaling

groupAuto

Scaling

group

Eureka

Eureka

Oregon (us-west-2)

4. Provision AE SaaS by CI/CD

4.

IDC

AZb

AE API servers

RDS

Internal ELB

AZa

AZb

AZc

AE API servers

RDS

services

services

services

peering

HTTPS

EMR

EMR

Cross-account

S3 buckets

Auto

Scaling

group

worker

s

worker

sMulti-AZs

Auto

Scaling

groupAuto

Scaling

group

Eureka

Eureka

Internet

HTTPS/HTTP

Basic/VPN

Cloud Storage

HTTPS/HTTP

BasicOregon (us-west-2)

5. Users can access via VPN, FW open for Trend

6. Input from CS or S3

7. Computation in AWS EMR cluster

5.

7.

6. 8.

6. 8.

Amazon

SNS

9.

8. Output to CS or S3

9. Job end msg to AWS SNS (optional)

What is Netflix Genie

• A practice from Netflix

• Hadoop client to submit different kinds of Job

• Flexible data model design to adopt diff kind of cluster

• Flexible Job/cluster matching design (based on tags)

• Cloud characteristics built-in design

• e.g. auto-scaling, load-balance, etc

• It’s goal is plain & simple

• We use it as an internal component

https://github.com/Netflix/genie/wiki

https://github.com/Netflix/genie/wiki

What is Netflix Eureka

• Is a RESTful service

• Built by Netflix

• A critical component for Genie to do Load Balance and failover

Genie

API client API client API client

Technology

Customer Sharing: Trend Micro - Analytic Engine - A common Big Data computation platform on the AWS