26
1 © Copyright 2013 EMC Corporation. All rights reserved. Hadoop-as-a-Service Bernd Kaponig EMC Solutions Group Using Pivotal HD, Project Serengeti, And EMC Isilon Building

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

Embed Size (px)

DESCRIPTION

Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop solution with virtualization, shared storage, and multi-tenancy, and discuss how service providers can use Pivotal Hadoop Distribution, Isilon, and Serengeti to offer Hadoop-as-a-Service. Objective 1: Understand Hadoop and its deployment challenges. After this session you will be able to: Objective 2: Understand the EMC HDaaS solution architecture and the use cases it addresses. Objective 3: Understand Pivotal Hadoop Distribution, Serengeti and Isilon's Hadoop features.

Citation preview

Page 1: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

1 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop-as-a-Service

Bernd Kaponig EMC Solutions Group

Using Pivotal HD, Project Serengeti, And EMC Isilon

Building

Page 2: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

2 © Copyright 2013 EMC Corporation. All rights reserved.

Roadmap Information Disclaimer EMC makes no representation and undertakes no obligations with

regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).

Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.

Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.

Page 3: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

3 © Copyright 2013 EMC Corporation. All rights reserved.

Goal Of This Session

Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context

Page 4: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

4 © Copyright 2013 EMC Corporation. All rights reserved.

What Is Hadoop-As-A-Service?

Analytics-as-a-Service

Hadoop-as-a-Service

Infrastructure-as-a-Service

Data Scientist

Data Scientist

Service Provider

Tenant

Tenant

Provisiong

Metering

Tenant/User Management

Self-Service Portal

Page 5: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

5 © Copyright 2013 EMC Corporation. All rights reserved.

How “Classic” Hadoop Works

Physical Hardware

JOB TRKR

NAMENODE

TASK TRKR

DATA NODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

Master Worker Worker Worker

HDFS CLIEN

T

1: Create file 2: Write 3: Replicate

Page 6: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

6 © Copyright 2013 EMC Corporation. All rights reserved.

How “Classic” Hadoop Works

Physical Hardware

JOB TRKR

NAMENODE

TASK TRKR

DATA NODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

Master Worker Worker Worker

1: Submit job 2: Check for tasks 3: Retrieve task resources

MR APP

Page 7: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

7 © Copyright 2013 EMC Corporation. All rights reserved.

How “Classic” Hadoop Works

Physical Hardware

JOB TRKR

NAMENODE

TASK TRKR

DATA NODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

Master Worker Worker Worker

Physical Hardware Is Dedicated To Node

Each Node Works With Local Storage

Physical Network Topology

Page 8: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

8 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache Pivotal HD Added Value

Configure, Deploy, Monitor, Manage

Command Center

Hadoop Virtualization (HVE)

DataLoader

Pivotal HD Enterprise

Pivotal HD Architecture

Page 9: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

9 © Copyright 2013 EMC Corporation. All rights reserved.

“Classic” Hadoop Challenges

Hard To Deploy And Operate

Poor Utilization Of Storage And/Or CPU

Inefficient Data Staging And Loading Processes

Lack Of Multi-Tenancy

Backup And Disaster Recovery Missing

Cluster Sprawl

Page 10: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

10 © Copyright 2013 EMC Corporation. All rights reserved.

The Road To Hadoop-As-A-Service

Metering

Provisioning

Tenant/User Management

Self-Service Portal

Physical Dedicated Single Tenant

Virtual Shared, Elastic Compute Multi-App

Shared, Elastic Storage Multi-Tenant As-A-Service

Page 11: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

11 © Copyright 2013 EMC Corporation. All rights reserved.

Virtualized Hadoop With Local Storage

Physical Hardware

Master Worker Worker Worker

Virtual Infrastructure

VM + VMDK VM + VMDK VM + VMDK VM + VMDK

Server + DAS Server + DAS Server + DAS Server + DAS

JOB TRKR

NAMENODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

DATA NODE

TASK TRKR

Master Worker Worker Worker

Page 12: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

12 © Copyright 2013 EMC Corporation. All rights reserved.

Virtualized Hadoop With Local Storage

JOB TRKR

NAMENODE

TASK TRKR

DATA NODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

Master Worker Worker Worker

Server + DAS Server + DAS Server + DAS Server + DAS

Unified Operations

Shared Resources = Higher Utilization

Elastic Resources = Faster Provisioning

5-10x Better CPU Utilization!

Page 13: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

13 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop Runs Well Virtualized

0

50

100

150

200

250

300

350

400

450

TeraGen TeraSort TeraValidate

Elapsed time, seconds

(lower is better)

Native 1 VM

Source: http://www.vmware.com/files/pdf/techpaper/VMW-Hadoop-Performance-vSphere5.pdf

Page 14: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

14 © Copyright 2013 EMC Corporation. All rights reserved.

Project Serengeti

Deploy Hadoop Cluster In 10 minutes

Customize Hadoop Cluster

One-Stop Command Center

Open Source Project Backed By VMware, Launched In June 2012

Page 15: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

15 © Copyright 2013 EMC Corporation. All rights reserved.

Virtualized Hadoop With Shared Storage

JOB TRKR

NAMENODE

TASK TRKR

DATA NODE

TASK TRKR

TASK TRKR

DATA NODE

DATA NODE

Master Worker Worker Worker

Server + DAS Server + DAS Server + DAS Server + DAS

Physical Hardware

Virtual Infrastructure

Page 16: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

16 © Copyright 2013 EMC Corporation. All rights reserved.

Virtualized Hadoop With Shared Storage

JOB TRKR

TASK TRKR

TASK TRKR

TASK TRKR

Master Worker Worker Worker

Physical Hardware

Virtual Infrastructure

Server Server Isilon Isilon

NAMENODE

DATA NODE

DATA NODE

DATA NODE

NAMENODE

Page 17: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

17 © Copyright 2013 EMC Corporation. All rights reserved.

Virtualized Hadoop With Isilon

JOB TRKR

TASK TRKR

TASK TRKR

TASK TRKR

Master Worker Worker Worker

Server Server Isilon Isilon

NAMENODE

NAMENODE

DATA NODE

DATA NODE

Multi-App Scale-Out Storage Platform

Independent Scaling

Native HDFS Support (Plus NFS, CIFS etc.)

Efficient Data Loading

No SPOF

End-To-End Data Protection

Leading Storage Efficiency

Replication Overhead Only 20% Rather Than 200%!

Page 18: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

18 © Copyright 2013 EMC Corporation. All rights reserved.

Hadoop With Software-Defined Storage

Physical Hardware

Virtual Infrastructure

JOB TRKR

TASK TRKR

TASK TRKR

Master Worker Worker Isilon VM

Server Server Any NAS Any NAS

NAMENODE

DATA NODE

Page 19: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

19 © Copyright 2013 EMC Corporation. All rights reserved.

Making It As-A-Service

JOB TRKR

TASK TRKR

TASK TRKR

NAMENODE

NAMENODE

DATA NODE

DATA NODE

Portal

SELF SERV

WaveMaker Serengeti

METERING

vCenter O & CB Postgres

USER MGMT

TEN’T MGMT

WORK FLOWS

HD LCM

HD Cmd Center

vCenter

Infrastr. Mgmt.

Page 20: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

20 © Copyright 2013 EMC Corporation. All rights reserved.

SEREN-GETI

AGENT

PIVO-TAL HD MASTER

HDaaS Solution Component Interaction

PORTAL UI

WaveMaker

HDAAS WORK-FLOWS vCenter

Orchestrator

2: Invoke

Postgres

USER/TENANT MGMT

1: AAA

Isilon

ISILON REST API

3: Provision

SEREN-GETI

SERVER

Serengeti

3: Provision SEREN-GETI

AGENT

PIVO-TAL HD MASTER

4: Instantiate

vCenter & ChargeBack

vC & CB APIs

SEREN-GETI

AGENT

PIVO-TAL HD

WORKER

SEREN-GETI

CLIENT

3: Provision

Data Scientist

Manage

Analyze

Serengeti Pivotal HD

PLATINUM

GOLD

SILVER

BRONZE

API

Page 21: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

21 © Copyright 2013 EMC Corporation. All rights reserved.

Tenant Isolation On Isilon

One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist

Access Controlled By Group And User Rights

Leverage SmartQuotas To Set Resource Limits And Report Usage

Separate Subnets For Tenants, Load-Balanced With SmartConnect

/ifs/HDFS

/tenant1 /tenant2

/ds1 /ds2

Page 22: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

22 © Copyright 2013 EMC Corporation. All rights reserved.

Demo

Page 23: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

23 © Copyright 2013 EMC Corporation. All rights reserved.

Summary HDaaS Solution Is Your Jump-Start Kit To

Hadoop-As-A-Service – Free!

Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS

Pivotal HD Brings Features Like Virtualization Support to Hadoop

Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems

Sto

rage

Com

pute

Page 24: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

24 © Copyright 2013 EMC Corporation. All rights reserved.

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource Management & Workflow

Yarn

Zookeeper

Apache Pivotal HD Added Value

Configure, Deploy, Monitor, Manage

Command Center

Hadoop Virtualization (HVE)

DataLoader

Pivotal HD Enterprise

Xtension Framework

Catalog Services

Query Optimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ– Advanced Database Services

What’s Next? HAWQ

Page 25: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

25 © Copyright 2013 EMC Corporation. All rights reserved.

Resources

HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com

EMC Solution Pavillion

Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out

Storage = One-Stop Solution for High Impact Business Insight

Page 26: Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon