119
Hadoop & HBase [email protected] Dr. Matt Wood with Amazon Web Services

Hadoop and HBase on Amazon Web Services

Embed Size (px)

DESCRIPTION

Introducing big data and analytics with Hadoop, Hbase and Amazon Elastic Mapreduce.

Citation preview

Page 2: Hadoop and HBase on Amazon Web Services

Thank you.

Page 3: Hadoop and HBase on Amazon Web Services

Introducing Hadoop3

Page 4: Hadoop and HBase on Amazon Web Services

HBase on AWSg

Introducing Hadoop3

Page 5: Hadoop and HBase on Amazon Web Services

Cost optimizationv

HBase on AWSg

Introducing Hadoop3

Page 6: Hadoop and HBase on Amazon Web Services

Data for competitive advantage.

Page 7: Hadoop and HBase on Amazon Web Services

Customer segmentation, financial modeling, system analysis,line-of-sight,business intelligence...

Using data

Page 8: Hadoop and HBase on Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 9: Hadoop and HBase on Amazon Web Services

Cost of data generationis falling.

Page 10: Hadoop and HBase on Amazon Web Services

lower cost, increased throughput

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 11: Hadoop and HBase on Amazon Web Services

HIGHLY CONSTRAINED

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 12: Hadoop and HBase on Amazon Web Services

Very high barrier to turning data into information.

Page 13: Hadoop and HBase on Amazon Web Services

Move from a data generation challengeto analytics challenge.

Page 14: Hadoop and HBase on Amazon Web Services

Enter the AWS Cloud.

Page 15: Hadoop and HBase on Amazon Web Services

Remove the constraints.

Page 16: Hadoop and HBase on Amazon Web Services

Enable data-driven innovation.

Page 17: Hadoop and HBase on Amazon Web Services

Move to a distributed data approach.

Page 18: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Page 19: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Page 20: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Infrastructure for distributed storage and analysis

Page 21: Hadoop and HBase on Amazon Web Services

Frameworks for data-intensive workloads.

Software

Distributed by design.

Page 22: Hadoop and HBase on Amazon Web Services

Platform for data-intensive workloads.

Infrastructure

Distributed by design.

Page 23: Hadoop and HBase on Amazon Web Services

Support the data life cycle.

Page 24: Hadoop and HBase on Amazon Web Services

HIGHLY CONSTRAINED

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 25: Hadoop and HBase on Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 26: Hadoop and HBase on Amazon Web Services

Lower the barrier to entry.

Page 27: Hadoop and HBase on Amazon Web Services

Accelerate time to market and increase agility.

Page 28: Hadoop and HBase on Amazon Web Services

Enable new business opportunities.

Page 29: Hadoop and HBase on Amazon Web Services

Washington Post

Pinterest

NASA

Page 30: Hadoop and HBase on Amazon Web Services

“AWS enables Pfizer to explore di!cult or deep scientific questions in a timely, scalable manner and helps us make better decisions more quickly”

Michael Miller, Pfizer

Page 31: Hadoop and HBase on Amazon Web Services

Introducing Hadoop3

Page 32: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Infrastructure for distributed storage and analysis

Page 33: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Infrastructure for distributed storage and analysis

Page 34: Hadoop and HBase on Amazon Web Services

Apache Hadoop

Software for distributed storage and analysis

Implements the map/reduce pattern

Focus on your data

Page 35: Hadoop and HBase on Amazon Web Services

Built for uncertainty

Hadoop provides tools to navigate data

Allows discovery

Query flexibility at scale

Page 36: Hadoop and HBase on Amazon Web Services

Built for flexibility

Java native

Executes code in any language

Just a distribution mechanism

Page 37: Hadoop and HBase on Amazon Web Services

Rich ecosystem

Diverse tools

Machine learning, recommendations, predictive analytics, segmentation, real time analysis

Lots of innovation

Page 38: Hadoop and HBase on Amazon Web Services

But...

A very big project

500k+ lines of code

Challenging to configure and optimize

Page 39: Hadoop and HBase on Amazon Web Services

Undi!erentiated heavy liftingG

Page 40: Hadoop and HBase on Amazon Web Services

Amazon Elastic MapReduce

Page 41: Hadoop and HBase on Amazon Web Services

Amazon Elastic MapReduce

Web service for data processing

Hosted Hadoop

Configured and optimized

Page 42: Hadoop and HBase on Amazon Web Services

Amazon Elastic MapReduce

Job flows

Elastic platform

Maintain clusters or run once and terminate

Debugging tools

Page 43: Hadoop and HBase on Amazon Web Services

Input data

S3

Page 44: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code

Input data

S3

Page 45: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Page 46: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

Page 47: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

HDFS

Page 48: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

HDFSQueries

+ BIVia JDBC, Pig, Hive

Page 49: Hadoop and HBase on Amazon Web Services

Elastic MapReduce

Code Name node

OutputS3 + SimpleDB

Input data

S3

Elastic cluster

HDFSQueries

+ BIVia JDBC, Pig, Hive

Page 50: Hadoop and HBase on Amazon Web Services

OutputS3 + SimpleDB

Input data

S3

Page 51: Hadoop and HBase on Amazon Web Services
Page 52: Hadoop and HBase on Amazon Web Services
Page 53: Hadoop and HBase on Amazon Web Services
Page 54: Hadoop and HBase on Amazon Web Services
Page 55: Hadoop and HBase on Amazon Web Services
Page 56: Hadoop and HBase on Amazon Web Services
Page 57: Hadoop and HBase on Amazon Web Services
Page 58: Hadoop and HBase on Amazon Web Services
Page 59: Hadoop and HBase on Amazon Web Services
Page 60: Hadoop and HBase on Amazon Web Services
Page 61: Hadoop and HBase on Amazon Web Services
Page 62: Hadoop and HBase on Amazon Web Services

Hadoop all the way down

Amazon Hadoop distribution

HDFS

Streaming interface

Hive, Pig, Mahout, Spark, Shark

Page 63: Hadoop and HBase on Amazon Web Services

Data integration

Optimized and integrated into AWS environment

Reads and writes to S3

Analytics on DynamoDB data

Can process data from any source: Cassandra, Mongo, Couch, Amazon RDS

Page 64: Hadoop and HBase on Amazon Web Services

Data movement

Multi-part upload

Import/Export

AWS Direct Connect

Aspera

Page 65: Hadoop and HBase on Amazon Web Services

Cluster scalability

Resize running job flows

Add capacity for shorter runs

Remove capacity during o! peak hours

Balance scale and cost

Page 66: Hadoop and HBase on Amazon Web Services

Cluster scalability

14 hours remaining

Page 67: Hadoop and HBase on Amazon Web Services

Cluster scalability

7 hours remaining

Page 68: Hadoop and HBase on Amazon Web Services

Cluster scalability

3 hours remaining

Page 69: Hadoop and HBase on Amazon Web Services

Cluster scalability

Steady state Steady stateLarge batch task

Page 70: Hadoop and HBase on Amazon Web Services

Cluster availability

Canonical source of data

Any one in the engineering team

IAM integration

Monitoring

Page 71: Hadoop and HBase on Amazon Web Services

Click stream analysis for retail

3.5 billion records71 million unique cookies1.7 million targeted ads

13 Tb of clickstream logs

Each day

Page 72: Hadoop and HBase on Amazon Web Services

Click stream analysis for retail

Workflow time from 2 days to 8 hours

Procurement time from 2 months to 5 minutes

$13k per month

500% increase return on advertising spend

Page 73: Hadoop and HBase on Amazon Web Services
Page 74: Hadoop and HBase on Amazon Web Services

Months of user click-through data Search terms Ads displayed Premium listing inventory

Amazon S3

Log data stored in Amazon S3

Page 75: Hadoop and HBase on Amazon Web Services

Hadoop Cluster

Amazon EMR Amazon S3

Elastic Map Reduce spins up 200 instance cluster

Page 76: Hadoop and HBase on Amazon Web Services

Hadoop Cluster

Amazon EMR Amazon S3

Find patterns across logs. Write results to S3.

Page 77: Hadoop and HBase on Amazon Web Services

Hadoop in the AWS Cloud

Elastic MapReduce for hosted Hadoop

Optimized, configured, ready to roll

Focus on the business benefit of data

Hadoop all the way down

Page 78: Hadoop and HBase on Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Infrastructure for distributed storage and analysis

Page 79: Hadoop and HBase on Amazon Web Services

HBase on AWSg

Page 80: Hadoop and HBase on Amazon Web Services

Vibrant ecosystem

Mahout for machine learning

Mesos for cluster management

Spark for fast analytics

HBase for unstructured data

Page 81: Hadoop and HBase on Amazon Web Services

HBase

NoSQL data store

Runs on top of HDFS

Scalable

Rapid retrieval across large datasets

Page 82: Hadoop and HBase on Amazon Web Services

Architecture

Huge, distributed map/hash

Distributed

Implements Bloom filters

Sortable

Page 83: Hadoop and HBase on Amazon Web Services

Column based

Columns are similar to fields

Rows are records

Page 84: Hadoop and HBase on Amazon Web Services

Built for data

Built to scale across billions of rows

The more data, the better the relative performance

Page 85: Hadoop and HBase on Amazon Web Services

But...

Large, complex project

Running in production can be challenging

Distributed system

Page 86: Hadoop and HBase on Amazon Web Services

Undi!erentiated heavy liftingG

Page 87: Hadoop and HBase on Amazon Web Services

HBase for Elastic MapReduce

Page 88: Hadoop and HBase on Amazon Web Services
Page 89: Hadoop and HBase on Amazon Web Services
Page 90: Hadoop and HBase on Amazon Web Services
Page 91: Hadoop and HBase on Amazon Web Services
Page 92: Hadoop and HBase on Amazon Web Services
Page 93: Hadoop and HBase on Amazon Web Services
Page 94: Hadoop and HBase on Amazon Web Services
Page 95: Hadoop and HBase on Amazon Web Services
Page 96: Hadoop and HBase on Amazon Web Services

Using HBase

Social media firehose

Customer information

Usage and application logs

Hadoop analytics

Page 97: Hadoop and HBase on Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 98: Hadoop and HBase on Amazon Web Services

Amazon DynamoDB

NoSQL database service

Provisioned throughput

Unlimited storage

Very easy to use

Page 99: Hadoop and HBase on Amazon Web Services

DynamoDB & Amazon EMR

SQL like queries

Query flexibility at scale

Integrate queries across datasets

Hive

Page 100: Hadoop and HBase on Amazon Web Services

NoSQL on the AWS Marketplace

CouchDB

Cassandra

MongoDB

aws.amazon.com/marketplace

Page 101: Hadoop and HBase on Amazon Web Services

Cost optimizationv

Page 102: Hadoop and HBase on Amazon Web Services

Lowered prices 19 times in the past six years.

Page 103: Hadoop and HBase on Amazon Web Services

On-demand

Page 104: Hadoop and HBase on Amazon Web Services

Reserved capacity

Page 105: Hadoop and HBase on Amazon Web Services

100%

Reserved capacity

Page 106: Hadoop and HBase on Amazon Web Services

100%

Reserved capacity

On-demand

Page 107: Hadoop and HBase on Amazon Web Services

100%

Reserved capacity

On-demand

Page 108: Hadoop and HBase on Amazon Web Services

Spot market

Page 109: Hadoop and HBase on Amazon Web Services
Page 110: Hadoop and HBase on Amazon Web Services
Page 111: Hadoop and HBase on Amazon Web Services
Page 112: Hadoop and HBase on Amazon Web Services

$0.08 vs $0.007(yesterday evening)

Page 113: Hadoop and HBase on Amazon Web Services
Page 114: Hadoop and HBase on Amazon Web Services
Page 115: Hadoop and HBase on Amazon Web Services

Reserved Instance Marketplace

Page 116: Hadoop and HBase on Amazon Web Services
Page 117: Hadoop and HBase on Amazon Web Services

Cost optimizationv

HBase on AWSg

Introducing Hadoop3

Page 118: Hadoop and HBase on Amazon Web Services

aws.amazon.com/elasticmapreduceB