80
Hadoop Everywhere Hortonworks. We do Hadoop.

Hadoop Everywhere & Cloudbreak

Embed Size (px)

Citation preview

Page 1: Hadoop Everywhere & Cloudbreak

Hadoop EverywhereHortonworks. We do Hadoop.

Page 2: Hadoop Everywhere & Cloudbreak

$ whoamiSean RobertsPartner Solutions EngineerLondon, EMEA & everywhere

@seanolinkedin.com/in/seanorama

MacGyver. Data Freak. Cook. Autodidact. Volunteer. Ancestral Health. Fito. Couchsurfer. Nomad

Page 3: Hadoop Everywhere & Cloudbreak

- HDP 2.3- http://hortonworks.com/

- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/

- Past & Future workshops:- http://hortonworks.com/partners/learn/

What’s New!

Page 4: Hadoop Everywhere & Cloudbreak

Agenda● Hadoop Everywhere● Deployment challenges & requirements● Cloudbreak & our Docker approach● Workshop: Your own CloudBreak

○ And auto-scaling with Periscope● Cloud best practicesReminder:● Attendee phone lines are muted● Please ask questions in the chat

Page 5: Hadoop Everywhere & Cloudbreak

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

DisclaimerThis document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 6: Hadoop Everywhere & Cloudbreak

Hadoop Everywhere

Page 7: Hadoop Everywhere & Cloudbreak

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Any applicationBatch, interactive, and real-time

Any dataExisting and new datasets

AnywhereComplete range of deployment options

Commodity Appliance Cloud

YARN: data operating system

Existing applications

Newanalytics

Partner applications

Data access: batch, interactive, real-time

Hadoop Everywhere

Page 8: Hadoop Everywhere & Cloudbreak

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hybrid Deployment ChoiceWindows, Linux, On-Premise or CloudData “gravity” guides choice

Compatible ClustersRun applications and data processing workloads wherever and whenever needed

Replicated DatasetsDemocratize Hadoop data access via automated sharing of datasets using Apache Falcon

Hadoop Up There, Down Here...Everywhere!

Dev / Test BI / ML

IoT Apps

On-Premises

Page 9: Hadoop Everywhere & Cloudbreak

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Use Cases Where?Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many

scenarios

ETL / Data Warehouse Optimization Usually has “down here” gravity; DW in cloud is changing that

Smart Meter Analysis Data typically flows “up there”

Single View of Customer May have “down here” gravity; unless you’re using SaaS apps

Supply Chain Optimization May have heavy “down here” gravity

New Data for Product Management “Up there” could be considered for many scenarios.

Vehicle Data for Transportation/Logistics Why not “up there”?

Vehicle Data for Insurance May have “down here” gravity (ex. join with existing risk data)

Anywhere? Up There or Down Here?

Page 10: Hadoop Everywhere & Cloudbreak

DeploymentChallenges & Requirements

Page 11: Hadoop Everywhere & Cloudbreak

Deployment challenges● Infrastructure is different everywhere

○ e.g. Each cloud provider has their own API○ e.g. Each provider has different networking methods

● OS/images are different everywhere● How to do service discovery?● How to dynamically scale/manage?

See prior operations workshops

Page 12: Hadoop Everywhere & Cloudbreak

- Infrastructure- Operating System- Environment Prepared (see docs)- Ambari Agent/Server installed & registered- Deploy HDP Cluster

- Ambari Blueprints or Cluster Wizard- Ongoing configuration/management

Deployment requirements

Page 13: Hadoop Everywhere & Cloudbreak

Options for Automation- Many combinations of tools

- e.g. Foreman, Ansible, Chef, Puppet, docker-ambari, shell scripts, CloudFormation, …

- Provider specific- Cisco UCS, Teradata, HP, Google’s bdutil, …

- Docker with Cloudbreak

Using Ambari with all of the above!

Page 14: Hadoop Everywhere & Cloudbreak

https://github.com/seanorama/ambari-bootstrap/

Demo: Basic script-based example

Page 15: Hadoop Everywhere & Cloudbreak

https://github.com/seanorama/ambari-bootstrap

Requirements:● Infrastructure prepped (see HDP docs)● Nodes with RedHat EL or CentOS 6 systems● HDFS paths mounted (see HDP docs)● sudo or root access

ambari-bootstrap

Page 16: Hadoop Everywhere & Cloudbreak

After Ambari deployment● (optional) Configure local YUM/APT repos● Deploy HDP with Ambari Wizard or Blueprint● Ongoing configuration/management

Page 17: Hadoop Everywhere & Cloudbreak

Using Ansiblehttps://github.com/rackerlabs/ansible-hadoop

Page 18: Hadoop Everywhere & Cloudbreak

Build once. Deploy anywhere.

Docker

Page 19: Hadoop Everywhere & Cloudbreak

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 20: Hadoop Everywhere & Cloudbreak

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Multiplicity of

Stacks

Multiplicity of hardware

environments

Static website Web frontend

User DB

Queue

Analytics DB

Development VMQA server Public Cloud

Contributor’s laptop

Docker is a “Shipping Container” System for Code

Production ClusterCustomer Data Center

An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container

Page 21: Hadoop Everywhere & Cloudbreak

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Docker• Container based virtualization• Lightweight and portable• Build once, run anywhere• Ease of packaging applications• Automated and scripted• Isolated

Page 22: Hadoop Everywhere & Cloudbreak

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Why Is Docker So Exciting?For Developers:Build once…run anywhere

• A clean, safe, and portable runtime environment for your app.

• No missing dependencies, packages etc.• Run each app in its own isolated container• Automate testing, integration, packaging• Reduce/eliminate concerns about

compatibility on different platforms• Cheap, zero-penalty containers to deploy

services

For DevOps:Configure once…run anything

• Make the entire lifecycle more efficient, consistent, and repeatable

• Eliminate inconsistencies between SDLC stages

• Support segregation of duties• Significantly improves the speed and

reliability of CICD• Significantly lightweight compared to VMs

Page 23: Hadoop Everywhere & Cloudbreak

Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

More Technical ExplanationWHY WHA

T• Run on any LINUX

• Regardless of kernel version (2.6.32+)• Regardless of host distro• Physical or virtual, cloud or not• Container and host architecture must match

• Run anything• If it can run on the host, it can run in the

container• i.e. if it can run on a Linux kernel, it can run

• High Level—It’s a lightweight VM• Own process space• Own network interface• Can run stuff as root

• Low Level—It’s chroot on steroids• Container=isolated processes• Share kernel with host• No device emulation (neither HVM nor PV)

from host)

Page 24: Hadoop Everywhere & Cloudbreak

Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Docker - How it worksApp

A

Hypervisor (Type 2)

Host OS

Server

GuestOS

Bins/Libs

AppA’

Guest

OS

Bins/Libs

AppB

Guest

OS

Bins/Libs

Docker

Host OS kernel

Server

bin

App A

lib

App B

VM

Container

Containers are isolated. Share OS and bins/libraries

GuestOS

GuestOS

…result is significantly faster deployment, much less overhead, easier migration, faster restart

lib

App B

lib

App B

lib

App B

binA

pp A

Page 25: Hadoop Everywhere & Cloudbreak

CloudbreakTool for Provision and Managing Hadoop Clusters In The Cloud

Page 26: Hadoop Everywhere & Cloudbreak

Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak• Developed by SequenceIQ• Open source with Apache 2.0

license [ Apache project soon ]• Cloud and infrastructure

agnostic, cost effective Hadoop As-a-Service platform API.

• Elastic – can spin up any number of nodes, add/remove on the fly

• Provides full cloud lifecycle management post-deployment

Page 27: Hadoop Everywhere & Cloudbreak

Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Key Features of CloudbreakElastic

• Enables provisioning an arbitrary node Cluster

• Enables (de)commissioning nodes from Cluster

• Policy and time based based scaling of cluster

Flexible

• Declarative and flexible Hadoop cluster creation using blueprints

• Provision to multiple public cloud providers or Openstack based private cloud using same common API

• Access all of this functionality through rich UI, secured REST API or automatable Shell

Enterprise-ready

• Supports basic, token based and OAuth2 authentication model

• The cluster is provisioned in a logically isolated network

• Tracking usage and cluster metrics

Page 28: Hadoop Everywhere & Cloudbreak

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

BI / Analytics(Hive)

IoT Apps(Storm, HBase, Hive)

Launch HDP on Any Cloud for Any Application

Dev / Test(all HDP services)

Data Science(Spark)

Cloudbreak

1. Pick a Blueprint2. Choose a Cloud3. Launch HDP!

Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev /

Test

Page 29: Hadoop Everywhere & Cloudbreak

Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak Approach• Use Ambari for heavy lifting

• Provisioning of Hadoop services• Monitoring

• Use Ambari Blueprints• Assign Host groups to physical instance types

• Public/Private Cloud provider API abstracted• Azure/Google/Amazon/Openstack

• Run Ambari agent/server in Docker container• Networking: docker run –net=host• Service discovery: consul (previously serf)

Page 30: Hadoop Everywhere & Cloudbreak

Workshop: Your own Cloudbreak

Page 31: Hadoop Everywhere & Cloudbreak

cloudbreak-deployer● https://github.com/sequenceiq/cloudbreak-deployer

Requirements:● A Docker host (laptop, server or Cloud infrastructure)● Resources:

○ Very little. Tested with 2GB of RAM.

Workshop: Your Own CloudBreak

Page 32: Hadoop Everywhere & Cloudbreak

Requirement: a Docker host● OSX or Windows: http://boot2docker.io/

○ boot2docker init○ boot2docker up○ eval "$(boot2docker shellinit)"○ boot2docker ssh

● Linux: Install the docker daemon● Anywhere: docker-machine “lets you create Docker hosts on your

computer, on cloud providers, and inside your own data center”○ Example on Rackspace:

■ docker-machine create --driver rackspace \--rackspace-api-key $OS_PASSWORD \--rackspace-username $OS_USERNAME \--rackspace-region DFW docker-rax

■ docker-machine ssh docker-rax

Page 33: Hadoop Everywhere & Cloudbreak

Install cloudbreak-deployerhttps://github.com/sequenceiq/cloudbreak-deployer

● curl \ https://raw.githubusercontent.com/sequenceiq/cloudbreak-deployer/master/install | sh && cbd --version

● cbd init● cbd start

You’ll then have your own CloudBreak & Periscope server with API and Web UI

Page 34: Hadoop Everywhere & Cloudbreak

Done: Your own Cloudbreak

Page 35: Hadoop Everywhere & Cloudbreak

Deploy a cluster with your CloudBreak

Page 37: Hadoop Everywhere & Cloudbreak

2. Create Cluster

Page 38: Hadoop Everywhere & Cloudbreak

3. Use your ClusterAmbari available as expected

To reach your Hadoop hosts:● SSH to Docker Host

○ Hosts arre listed in “Cloud stack description”○ ssh cloudbreak@IPofHost

● Shell to the “ambari-agent” container○ sudo docker ps | grep ambari-agent

■ note the CONTAINER ID○ sudo docker -it CONTAINERID bash

● Use the hosts as usual. e.g.:○ hadoop fs -ls /

Page 39: Hadoop Everywhere & Cloudbreak

Cloudbreak internals

Page 40: Hadoop Everywhere & Cloudbreak

Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak

Cloudbreak Internals

Uluwatu(cbreak UI)

Sultans(User mgmt UI)

Browser

CloudbreakshellOAuth2

(UAA)

uaa-db(psql)

Cloudbreak(rest API)

cb-db(psql)

Periscope(autoscaling

)

ps-db(psql)

consul registrator ambassador

docker

Page 41: Hadoop Everywhere & Cloudbreak

Docker

Page 42: Hadoop Everywhere & Cloudbreak

Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Swarm• Native clustering for Docker• Distributed container orchestration• Same API as Docker

Page 43: Hadoop Everywhere & Cloudbreak

Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Swarm – How it works • Swarm managers/agents• Discovery services• Advanced scheduling

Page 44: Hadoop Everywhere & Cloudbreak

Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Consul • Service discovery/registry• Health checking• Key/Value store• DNS• Multi datacenter aware

Page 45: Hadoop Everywhere & Cloudbreak

Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Consul – How it works • Consul servers/agents

• Consistency through a quorum (RAFT)

• Scalability due to gossip based protocol (SWIM)

• Decentralized and fault tolerant

• Highly available

• Consistency over availability (CP)

• Multiple interfaces - HTTP and DNS

• Support for watches

Page 46: Hadoop Everywhere & Cloudbreak

Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Ambari • Easy Hadoop cluster provisioning

• Management and monitoring

• Key feature - Blueprints

• REST API, CLI shell

• Extensible• Stacks• Services• Views

Page 47: Hadoop Everywhere & Cloudbreak

Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Ambari – How it works• Ambari server/agents

• Define a blueprint (blueprint.json)

• Define a host mapping (hostmapping.json)

• Post the cluster create

Page 48: Hadoop Everywhere & Cloudbreak

Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Run Hadoop as Docker containers

HDP as Docker Containersvia Cloudbreak

• Fully Automated Ambari Cluster installation• Avoid GUI, use rest API only (ambari-shell)• Fully Automated HDP installation with blueprints• Quick installation (pre-pulled rpms)• Same process/images for dev/qa/prod• Same process for single/multinode

Cloudbreak Ambari HDP

Installs Ambari on the VMs

Docker

VM

Docker

VM

Docker

Linux

Instructs Ambari to build

HDP cluster

Cloud Provider/Bare Metal

Provisions VMs from

Cloud Providers

Page 49: Hadoop Everywhere & Cloudbreak

Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Provisioning – How it works

Start VMs - with a running

Docker daemon

Cloudbreak Bootstrap•Start Consul Cluster

•Start Swarm Cluster (Consul for discovery)

Start Ambari servers/agents - Swarm API

Ambari services

registered in Consul

(Registrator)

Post Blueprint

Page 50: Hadoop Everywhere & Cloudbreak

Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

Page 51: Hadoop Everywhere & Cloudbreak

Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

amb-agn amb-ser amb-

agn

amb-agn

amb-agn

amb-agn

Page 52: Hadoop Everywhere & Cloudbreak

Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

amb-agn amb-ser amb-

agn

amb-agn

amb-agn

amb-agn

Blueprint

Page 53: Hadoop Everywhere & Cloudbreak

Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

amb-agn- hdfs- hbase

amb-seramb-agn-hdfs-hive

amb-agn-hdfs-yarn

amb-agn-hdfs-zookpr

amb-agn-nmnode-hdfs

Page 54: Hadoop Everywhere & Cloudbreak

Workshop: Auto-Scale your Clusterwith Periscope

Page 55: Hadoop Everywhere & Cloudbreak

Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Optimize Cloud Usage via Elastic HDP Clusters

Dev / Test

Auto-scaling Policy

• Policies based on any Ambari metrics• Dynamically scale to achieve physical elasticity• Coordinates with YARN to achieve elasticity based on

the policies.

Page 56: Hadoop Everywhere & Cloudbreak

Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Scaling for Static and Dynamic Clusters

Auto-scale PolicyAuto-scale PolicyAuto-scale Policy

YARN

Ambari Alerts

Ambari Metrics

Ambari

Ambari

Ambari

Provisioning

Cloudbreak Static

Dynamic

Enforces Policies Scales Cluster/YARN Apps

Metrics and Alerts Feed Cloudbreak/Periscope

Page 57: Hadoop Everywhere & Cloudbreak

Scale by Ambari Monitoring Metric1. Ambari: review metric2. CloudBreak: set alert3. Cloudbreak: set scaling policy

Page 58: Hadoop Everywhere & Cloudbreak

Scale up/down by time1. Set time-based alert2. Set scaling policy

Repeat with an alertand policy whichscales down

Page 59: Hadoop Everywhere & Cloudbreak

Roadmap

Page 60: Hadoop Everywhere & Cloudbreak

Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Release Summary Cloudbreak● It’s own project

(separate from Ambari)● Supported on Linux

flavors which support Docker

Periscope● Feature of Cloudbreak 1.0● Will be embedded in

Ambari later in 2015

Page 61: Hadoop Everywhere & Cloudbreak

Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Release Timeline

Cloudbreak 1.0 GA

June/July 2015

Cloudbreak 2.0 GA2H2015

Ambari 2.1.0HDP “Dal” / 2.3

Ambari 2.2HDP “Erie” / 2.4

Cloudbreak 1.1August 2015

(est)

Ambari 2.1.1HDP “Dal-M10”

CloudbreakIncubator Proposal

July/August 2015 (est)

Page 62: Hadoop Everywhere & Cloudbreak

Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Supported Cloud Environments

Cloudbreak HDP 2.3

Microsoft Azure GA

AWS GA

Google Compute GA

Cloudbreak HDP 2.3

Cloudbreak HDP 2.4

Openstack Community Tech Preview Tech Preview

Red Hat OSP TBD

HP Helion GA (Tentative)

Mirantis OpenStack

Page 63: Hadoop Everywhere & Cloudbreak

HDP as a Service

Page 64: Hadoop Everywhere & Cloudbreak

Hortonworks Data Platform On Azure

Page 65: Hadoop Everywhere & Cloudbreak

RackspaceCloud Big Data Platform● Rapidly spin up on-demand HDP clusters● Integrated with Cloud Files (OpenStack Swift)● Opt-in for Managed Services by Rackspace

Managed Big Data Platform● Fully Managed HDP on Dedicated and/or Cloud● Leverage Fanatical Support and Industry Leading SLA’s● Supported by Rackspace with escalation to Hortonworks

Page 66: Hadoop Everywhere & Cloudbreak

CSC

Page 67: Hadoop Everywhere & Cloudbreak

HDP on IaaS - Best Practices

Page 68: Hadoop Everywhere & Cloudbreak

Microsoft Azure● Deployment

○ Deploy using CloudBreak○ Deploy using HWX Azure Gallery Image

● Integrated with Azure Blob Storage● Supported directly by Hortonworks● Other offerings

○ Microsoft HDInsight○ HDP Sandbox

Page 69: Hadoop Everywhere & Cloudbreak

Azure Deployment Guideline● All in same Region● Instance Types

○ Typical: A7○ Performance: D14○ 8x1TB Standard LRS x3 Virtual Hard Disk per

server● Multiple Storage Accounts are recommended

○ Recommend no more than 40 Virtual Hard Disks per Storage Account

Page 70: Hadoop Everywhere & Cloudbreak

Azure Blob StoreAzure Blob Store (Object Storage)

● wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>

Can be used as a replacement for HDFS● Thoroughly tested in HDP release test suites

Page 71: Hadoop Everywhere & Cloudbreak

Amazon Web Services● Deploy using CloudBreak● Integrated with AWS S3 (object storage)● Supported directly by Hortonworks

Page 72: Hadoop Everywhere & Cloudbreak

Amazon Deployment Guideline● All in same Region/AZ● Instances with Enhanced

Networking

Master Nodes:● Choose EBS Optimized● Boot: 100GB on EBS● Data: 4+ 1TB on EBS

Worker Nodes:● Boot: 100GB on EBS● Data: Instance Storage

○ EBS can be used, but local is preferred

Instance Types:● Typical: d2.● Performance: i2.https://aws.amazon.com/ec2/instance-types/

Page 73: Hadoop Everywhere & Cloudbreak

AWS RDS● Some services rely on MySQL, Oracle or PostgreSQL:

○ Apache Ambari○ Apache Hive○ Apache Oozie○ Apache Ranger

● Use RDS for these instead of managing yourself.

Page 74: Hadoop Everywhere & Cloudbreak

AWS S3 (Object Storage)● s3n:// with HDP 2.2 (Hadoop 2.6)● s3a:// with HDP 2.3 (Hadoop 2.7)

Not currently a direct replacement for HDFS

Recommended to configure access with IAM Role/Policy● https://docs.aws.amazon.

com/IAM/latest/UserGuide/policies_examples.html#iam-policy-example-s3

● Example: http://git.io/vLoGY

Page 75: Hadoop Everywhere & Cloudbreak

Amazon Deployment Guideline● All in same Region/AZ● Instances with Enhanced

Networking

Master Nodes:● Choose EBS Optimized● Boot: 100GB on EBS● Data: 4+ 1TB on EBS

Worker Nodes:● Boot: 100GB on EBS● Data: Instance Storage

○ EBS can be used, but local is preferred

Instance Types:● Typical: d2.● Performance: i2.https://aws.amazon.com/ec2/instance-types/

Page 76: Hadoop Everywhere & Cloudbreak

Google Cloud● Deploy using

○ CloudBreak○ Google bdutil with Apache Ambari plug-in

● Integrated with Google Cloud Storage● Supported directly by Hortonworks

Page 77: Hadoop Everywhere & Cloudbreak

Google Deployment Guideline

● Instance Types○ Typical: n1 standard 4 with single 1.5 TB

persistent disks○ Performance: n1 standard 8 with 1TB SSD

● Google GCS (Object Storage)● gs://<CONFIGBUCKET>/dir/file● Not currently a replacement for HDFS

Page 78: Hadoop Everywhere & Cloudbreak

S3 & GCS as Secondary storage systemThe connectors are currently eventually consistent so do not replace HDFS

Backup● Falcon, distCP, hadoop fs, HBase ExportSnapshot● Kafka+Storm bolt sends messages to S3/GCS

providing backup & point-in-time recovery sourceInput/Output● Convenient & broadly used upload/download method

○ As a middleware to ease integration with Hadoop & limit access● Publishing static content (optionally with CloudFront)

○ Removes need to manage any web services ● Storage for temporary/ephemeral clusters

Page 79: Hadoop Everywhere & Cloudbreak

Questions

Page 80: Hadoop Everywhere & Cloudbreak

$ shutdown -h now

- HDP 2.3- http://hortonworks.com/

- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/

- Past & Future workshops:- http://hortonworks.com/partners/learn/