Transcript
Page 1: Amazon EMR Facebook Presto Meetup

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

March 19, 2015 | Facebook Presto Meetup

Interactive SQL on Amazon S3 using Presto on Amazon EMR

Steve McPherson

Page 2: Amazon EMR Facebook Presto Meetup

instance

AMI DB on instance

instance with

CloudWatch

Elastic IP optimized instance

Amazon WorkSpaces

assignment/task

Amazon EMR cluster MapR M3 engine

MapR M5 engine

MapR M7 engine

engine

Kinesis-enabled app

new!Amazon Route 53

hosted zone

route table

solid state disks

AWS Direct Connect

router

Amazon RDS

customer gateway

attribute

VPC peering

Auto Scaling

Amazon S3 bucket with objects

object AWS Import/Export

AWS Storage Gateway

volume snapshotAmazon EBS

cached volume

virtual tape library

Elastic Beanstalk

Amazon Glacier

archive vaultCloudFron

tdownload distributio

nNode.js

streaming distributio

nitems

tableDynamoDB attributes global secondary

index

Amazon KinesisRDS DB

instanceRDS DB instance standby (Multi-AZ)

Oracle DB instance

MS SQL instance

PostgreSQL instance

PIOP Memcached

Redisnew! new! new! new!

AWS CloudTrail

instances

domain Amazon Redshift

Amazon SimpleDB

new!

DW1 Dense Compute

ElastiCache

DW2 Dense Compute

edge location

AWS Toolkit for Visual

Studio

JavaScriptapplication

stack

Amazon VPC VPN connection

virtual private

gateway

alarm

stack

Internet gateway

.NET

RDS DB instance

read replica

IAMJava Python (boto)

AWS CLI

permissions

role

MFA token

new!

new! new!

AWS OpsWorks

elastic network instance

PHPdata encryption key

AWS Data Pipeline

monitoring

new!

new!

deployment

CloudWatch

Elastic LoadBalancing

SQL master

new!new!

Amazon EC2

new!

SQL slave

encrypted data

AWS Tools for Windows

PowerShellnon-cached volume

users

IAM add-on

deployments

bucket deploymentsnew!

permissions

iOS

resources

cache node

stack

AWS OpsWorks layers

apps

new!

new! apps

new!

Amazon SNS

new!

Human Intelligence Tasks

(HIT)

AWS Simple Icons: Deployment & Management

instances

new!

new!new!

Ruby

new!

instances

new!

permissions

resources

new!

topicnew!

templateAWS Toolkit for Eclipse

Amazon SES

traditional server

Elastic Transcoder

email

monitoring

Requester

email notification

HTTP notification

Amazon CloudSearch SDF metadata

Amazon SQS itemmessage

Amazon SWFdecider

layers

worker

tape storagedisk

userInternet

Amazon Mechanical Turk

client mobile client multimedia

workers

corporate data center generic databaseAndroid

AWS Security Token Service

AWS cloud

AWS Management Console

virtual private cloud forums

MySQL DB instance

queueAMAZON EMR

Page 3: Amazon EMR Facebook Presto Meetup

Amazon EMR makes Cluster Management easy

Amazon EMR

• Setup and configuration

• Node monitoring and replacement

• Log aggregation

• Cloudwatch integration • Expand and shrink on

demand

• Integration with Spot

• AWS Support

Page 4: Amazon EMR Facebook Presto Meetup

Data Warehousing on Amazon EMR

Extract Transform & Load Data Warehouse Report Generation & Ad Hoc Analysis

Amazon S3 Amazon EMR Amazon EMR

• MapReduce API• Scoop

• Spark• Cascading• Pig• MR

• Hive• Spark• Cascading• Pig

• Presto• Hive• Spark-SQL• Lingual

• Parquet• ORC• SEQ• Text

Extract Transform & Load

Data Warehouse Report Generation

Ad Hoc Analysis

write read

Page 5: Amazon EMR Facebook Presto Meetup

Different Clusters for different workloads

Hive, Pig,Cascading

Presto

Spark HBase

Amazon S3

Page 6: Amazon EMR Facebook Presto Meetup

Why our customers like Presto?

• It works directly on S3

• It integrates with Hive

• It’s fast

• It’s Java

Page 7: Amazon EMR Facebook Presto Meetup

Demo: Launch a cluster#> aws emr create-cluster /--name="PRESTO-0-95" /--ami-version=3.5.0 /--applications Name=hive /--ec2-attributes KeyName=[KEY_NAME] /--instance-groups /InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge /InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge /--bootstrap-action Name="install presto",Path="s3://github-emr-bootstrap-actions/presto/0.95/install-presto",Args="[-p,8989,-m,1024,-n,128]”

#wait 5 minutes#> emrscreen

Page 8: Amazon EMR Facebook Presto Meetup

Run a Query#> hiveCREATE EXTERNAL TABLE test(id int, name string, surname string, emails string, country string, ip string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','LOCATION "s3://support.elasticmapreduce/bootstrap-actions/presto/0.95/Query_Sample/";

#> presto-cli --catalog hiveshow tables; SELECT name,COUNT(name) FROM test GROUP BY name;

Page 9: Amazon EMR Facebook Presto Meetup

What’s next

• Formal packaging of Presto

• Graceful shrink

• Cloudwatch integration

• Identity and Authorization integration with AWS services

Page 10: Amazon EMR Facebook Presto Meetup

Get started today

Amazon EMR

http://aws.amazon.com/elasticmapreduce/


Recommended