13
BloomReach and AWS Elastic MapReduce Prateek Gupta – Lead Engineer 10/24/2014

BloomReach AWS Tech

Embed Size (px)

Citation preview

Page 1: BloomReach AWS Tech

BloomReach and AWS Elastic MapReduce

Prateek Gupta – Lead Engineer10/24/2014

Page 2: BloomReach AWS Tech

The BloomReach

Personalized Discovery

Platformhttp://bloomreach.com/what-we-do/

Page 3: BloomReach AWS Tech

About BloomReach’s applications

Organic

Search

Con

ten

t u

nd

ers

tan

din

g

What it does

Content optimization, management and mea-

surement

Benefit

Enhanced discoverability and customer acquisition in organic

search

What it does

Personalized onsite search and

navigation across devices

Benefit

Relevant and consistent onsite experiences for new and known

users

What it does

Merchandising tool that un-derstands products and identifies opportunities

Benefit

Prioritize and optimize online merchandising

SNAP

Compass

Page 4: BloomReach AWS Tech

BloomReach Organic Search - Merchant Integration

Merchant domain

Bloomreach domain (Amazon Web Services)

Cloudfrontdomain: brcdn.combr-trk.js

pix.gif Elastic Compute Cloud

domain: brsrvr.com

REST API request

domain: brsrvr.com Elastic Compute Cloud

Javascript

API response

Page 5: BloomReach AWS Tech

BloomReach Organic Search Architecture

API response

REST API request

Domain Name Server (DNS)

AWS Load balancer

Instance

Instance

Instance

Instance

Alternate Cloud Provider

Multiple Availability Zones

Domain request

Domain response

Page 6: BloomReach AWS Tech

Example Workflow - Personalization

Compute User

Features

Compute Recommendations

Compute User

Profile

User/ Product Database

Pixel Logs (S3)

Extract Related Users

Extract User

Session

Page 7: BloomReach AWS Tech

Elastic MapReduce (EMR) Usage

• We serve 150+ customer websites 100+ million pages processed/ day Users we see per day > 400M Multiple hadoop steps (clusters)

Usage Metric BloomReach Volume

Clusters per day 1500-2000

Hadoop jobs per day 5000-6000

Instance hours per day

25,000 – 30,000

Page 8: BloomReach AWS Tech

Elastic MapReduce Usage Growth

Q4 20

09

Q1 20

10

Q2 20

10

Q3 20

10

Q4 20

10

Q1 20

11

Q2 20

11

Q3 20

11

Q4 20

11

Q1 20

12

Q2 20

12

Q3 20

12

Q4 20

12

Q1 20

13

Q2 20

13

Q3 20

13

Q4 20

13

Q1 20

14

Q2 20

14

Q3 20

140

100000

200000

300000

400000

500000

600000

700000

800000

Spot Instance

SNAP Mobile

SNAP Desktop

Compass

Instance hours/ month

Organic

Page 9: BloomReach AWS Tech

Challenges

• Cost containment On demand vs spot usage

• Cost tracking EMR tags

• Cluster setup delay Sharing clusters

• Cluster lifecycle management Terminate long-running clusters

Page 10: BloomReach AWS Tech

Resource Selection

• Dynamic resource (instance type) selection based on CPU, memory

maxCpuPerUnitPrice = 0optimalInstanceType = nullFor each instance_type in (Availability Zone, Region) { cpuPerUnitPrice = instance.cpuCores/instance.spotPrice if (maxCpuPerUnitPrice < cpuPerUnitPrice) { optimalInstanceType = instance_type; }}

Page 11: BloomReach AWS Tech

Workflow Management

• Makefile• A framework for flow control using

python meta programming

A

C B

D

Valid Flows:A->B->C->DA->B->D->C

Page 12: BloomReach AWS Tech

EMR Best Practices

• Use spot instances for cost optimization

• Use EMR tags for cost tracking• Share EMR clusters for small jobs• Keep track of long-running clusters• Use optimal resource type based on

resource usage (e.g. CPU, memory)• Workflow management