Upload
emc-academic-alliance
View
1.377
Download
2
Embed Size (px)
DESCRIPTION
Hadoop has made it into the enterprise mainstream as Big Data technology. But, what about Hadoop as a private or public cloud service on a shared infrastructure? This session looks at a Hadoop solution with virtualization, shared storage, and multi-tenancy, and discuss how service providers can use Pivotal Hadoop Distribution, Isilon, and Serengeti to offer Hadoop-as-a-Service. Objective 1: Understand Hadoop and its deployment challenges. After this session you will be able to: Objective 2: Understand the EMC HDaaS solution architecture and the use cases it addresses. Objective 3: Understand Pivotal Hadoop Distribution, Serengeti and Isilon's Hadoop features.
Citation preview
1 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop-as-a-Service
Bernd Kaponig EMC Solutions Group
Using Pivotal HD, Project Serengeti, And EMC Isilon
Building
2 © Copyright 2013 EMC Corporation. All rights reserved.
Roadmap Information Disclaimer EMC makes no representation and undertakes no obligations with
regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).
Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.
3 © Copyright 2013 EMC Corporation. All rights reserved.
Goal Of This Session
Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context
4 © Copyright 2013 EMC Corporation. All rights reserved.
What Is Hadoop-As-A-Service?
Analytics-as-a-Service
Hadoop-as-a-Service
Infrastructure-as-a-Service
Data Scientist
Data Scientist
Service Provider
Tenant
Tenant
Provisiong
Metering
Tenant/User Management
Self-Service Portal
5 © Copyright 2013 EMC Corporation. All rights reserved.
How “Classic” Hadoop Works
Physical Hardware
JOB TRKR
NAMENODE
TASK TRKR
DATA NODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
Master Worker Worker Worker
HDFS CLIEN
T
1: Create file 2: Write 3: Replicate
6 © Copyright 2013 EMC Corporation. All rights reserved.
How “Classic” Hadoop Works
Physical Hardware
JOB TRKR
NAMENODE
TASK TRKR
DATA NODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
Master Worker Worker Worker
1: Submit job 2: Check for tasks 3: Retrieve task resources
MR APP
7 © Copyright 2013 EMC Corporation. All rights reserved.
How “Classic” Hadoop Works
Physical Hardware
JOB TRKR
NAMENODE
TASK TRKR
DATA NODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
Master Worker Worker Worker
Physical Hardware Is Dedicated To Node
Each Node Works With Local Storage
Physical Network Topology
8 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure, Deploy, Monitor, Manage
Command Center
Hadoop Virtualization (HVE)
DataLoader
Pivotal HD Enterprise
Pivotal HD Architecture
9 © Copyright 2013 EMC Corporation. All rights reserved.
“Classic” Hadoop Challenges
Hard To Deploy And Operate
Poor Utilization Of Storage And/Or CPU
Inefficient Data Staging And Loading Processes
Lack Of Multi-Tenancy
Backup And Disaster Recovery Missing
Cluster Sprawl
10 © Copyright 2013 EMC Corporation. All rights reserved.
The Road To Hadoop-As-A-Service
Metering
Provisioning
Tenant/User Management
Self-Service Portal
Physical Dedicated Single Tenant
Virtual Shared, Elastic Compute Multi-App
Shared, Elastic Storage Multi-Tenant As-A-Service
11 © Copyright 2013 EMC Corporation. All rights reserved.
Virtualized Hadoop With Local Storage
Physical Hardware
Master Worker Worker Worker
Virtual Infrastructure
VM + VMDK VM + VMDK VM + VMDK VM + VMDK
Server + DAS Server + DAS Server + DAS Server + DAS
JOB TRKR
NAMENODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
DATA NODE
TASK TRKR
Master Worker Worker Worker
12 © Copyright 2013 EMC Corporation. All rights reserved.
Virtualized Hadoop With Local Storage
JOB TRKR
NAMENODE
TASK TRKR
DATA NODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
Master Worker Worker Worker
Server + DAS Server + DAS Server + DAS Server + DAS
Unified Operations
Shared Resources = Higher Utilization
Elastic Resources = Faster Provisioning
5-10x Better CPU Utilization!
13 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop Runs Well Virtualized
0
50
100
150
200
250
300
350
400
450
TeraGen TeraSort TeraValidate
Elapsed time, seconds
(lower is better)
Native 1 VM
Source: http://www.vmware.com/files/pdf/techpaper/VMW-Hadoop-Performance-vSphere5.pdf
14 © Copyright 2013 EMC Corporation. All rights reserved.
Project Serengeti
Deploy Hadoop Cluster In 10 minutes
Customize Hadoop Cluster
One-Stop Command Center
Open Source Project Backed By VMware, Launched In June 2012
15 © Copyright 2013 EMC Corporation. All rights reserved.
Virtualized Hadoop With Shared Storage
JOB TRKR
NAMENODE
TASK TRKR
DATA NODE
TASK TRKR
TASK TRKR
DATA NODE
DATA NODE
Master Worker Worker Worker
Server + DAS Server + DAS Server + DAS Server + DAS
Physical Hardware
Virtual Infrastructure
16 © Copyright 2013 EMC Corporation. All rights reserved.
Virtualized Hadoop With Shared Storage
JOB TRKR
TASK TRKR
TASK TRKR
TASK TRKR
Master Worker Worker Worker
Physical Hardware
Virtual Infrastructure
Server Server Isilon Isilon
NAMENODE
DATA NODE
DATA NODE
DATA NODE
NAMENODE
17 © Copyright 2013 EMC Corporation. All rights reserved.
Virtualized Hadoop With Isilon
JOB TRKR
TASK TRKR
TASK TRKR
TASK TRKR
Master Worker Worker Worker
Server Server Isilon Isilon
NAMENODE
NAMENODE
DATA NODE
DATA NODE
Multi-App Scale-Out Storage Platform
Independent Scaling
Native HDFS Support (Plus NFS, CIFS etc.)
Efficient Data Loading
No SPOF
End-To-End Data Protection
Leading Storage Efficiency
Replication Overhead Only 20% Rather Than 200%!
18 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop With Software-Defined Storage
Physical Hardware
Virtual Infrastructure
JOB TRKR
TASK TRKR
TASK TRKR
Master Worker Worker Isilon VM
Server Server Any NAS Any NAS
NAMENODE
DATA NODE
19 © Copyright 2013 EMC Corporation. All rights reserved.
Making It As-A-Service
JOB TRKR
TASK TRKR
TASK TRKR
NAMENODE
NAMENODE
DATA NODE
DATA NODE
Portal
SELF SERV
WaveMaker Serengeti
METERING
vCenter O & CB Postgres
USER MGMT
TEN’T MGMT
WORK FLOWS
HD LCM
HD Cmd Center
vCenter
Infrastr. Mgmt.
20 © Copyright 2013 EMC Corporation. All rights reserved.
SEREN-GETI
AGENT
PIVO-TAL HD MASTER
HDaaS Solution Component Interaction
PORTAL UI
WaveMaker
HDAAS WORK-FLOWS vCenter
Orchestrator
2: Invoke
Postgres
USER/TENANT MGMT
1: AAA
Isilon
ISILON REST API
3: Provision
SEREN-GETI
SERVER
Serengeti
3: Provision SEREN-GETI
AGENT
PIVO-TAL HD MASTER
4: Instantiate
vCenter & ChargeBack
vC & CB APIs
SEREN-GETI
AGENT
PIVO-TAL HD
WORKER
SEREN-GETI
CLIENT
3: Provision
Data Scientist
Manage
Analyze
Serengeti Pivotal HD
PLATINUM
GOLD
SILVER
BRONZE
API
21 © Copyright 2013 EMC Corporation. All rights reserved.
Tenant Isolation On Isilon
One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist
Access Controlled By Group And User Rights
Leverage SmartQuotas To Set Resource Limits And Report Usage
Separate Subnets For Tenants, Load-Balanced With SmartConnect
/ifs/HDFS
/tenant1 /tenant2
/ds1 /ds2
22 © Copyright 2013 EMC Corporation. All rights reserved.
Demo
23 © Copyright 2013 EMC Corporation. All rights reserved.
Summary HDaaS Solution Is Your Jump-Start Kit To
Hadoop-As-A-Service – Free!
Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS
Pivotal HD Brings Features Like Virtualization Support to Hadoop
Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems
Sto
rage
Com
pute
24 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure, Deploy, Monitor, Manage
Command Center
Hadoop Virtualization (HVE)
DataLoader
Pivotal HD Enterprise
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced Database Services
What’s Next? HAWQ
25 © Copyright 2013 EMC Corporation. All rights reserved.
Resources
HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com
EMC Solution Pavillion
Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out
Storage = One-Stop Solution for High Impact Business Insight