9
1 © Copyright 2014 EMC Corporation. All rights reserved. EMC Hadoop Starter Kit ViPR Edition EMC Open Innovation Lab

EMC Hadoop Starter Kit - ViPR Edition

  • Upload
    walshe1

  • View
    595

  • Download
    2

Embed Size (px)

DESCRIPTION

Are you deploying Hadoop and want enterprise infrastructure manageability, reliability, and availability? The new EMC Hadoop Starter Kit shows you how to this without building HDFS data silo's.

Citation preview

Page 1: EMC Hadoop Starter Kit - ViPR Edition

1© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter KitViPR Edition

EMC Open Innovation Lab

Page 2: EMC Hadoop Starter Kit - ViPR Edition

2© Copyright 2014 EMC Corporation. All rights reserved.

The Digital Universe

Less than 1% of the World’s Data

is AnalyzedBy 2020, the Internet will

connect 7.6B people

and 200B things (sensors, machines, cars, appliances…)

Data Volumes

2000: 2 Exabytes a year2011: 2 Exabytes a day

Page 3: EMC Hadoop Starter Kit - ViPR Edition

3© Copyright 2014 EMC Corporation. All rights reserved.

Location & Types Of Big Data

Structured Data

UnstructuredData

Enterprise

ForecastData

LocationData

CreditData

ShippingData

Social, Video Data

Partner Public

10101010100101010011001010101110010

1101010100101011111

TelemetryData

Location & Types Of Big (& Fast!) Data

Page 4: EMC Hadoop Starter Kit - ViPR Edition

4© Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Challenges

Depends on HDFS for data repository– Must make legacy data accessible through HDFS

Hadoop HDFS inefficiencies:– 3 copies for protection– No advanced data efficiency: de-duplication, thin provision– Security

Integration with robust traditional data center products: compute virtualization, enterprise storage

Page 5: EMC Hadoop Starter Kit - ViPR Edition

5© Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Storage Options

Hadoop HDFS

• Leverage Hadoop distro HDFS data services

• Compute, and data converged on cluster of servers

Storage Array

• Name node and Data node services from storage array (i.e. EMC Isilon)

Storage OS

Name node and Data node services from storage OS (i.e. EMC ViPR)

Page 6: EMC Hadoop Starter Kit - ViPR Edition

6© Copyright 2014 EMC Corporation. All rights reserved.

ViPR HDFS

HDFS is becoming the de facto file system for distributed applications

ViPR is a great platform for HDFS– Addresses limitations of off-the-shelf HDFS– Brings HDFS to existing storage hardware– Enables HDFS/object/file scenarios– Flexible software model allows colocation

Page 7: EMC Hadoop Starter Kit - ViPR Edition

7© Copyright 2014 EMC Corporation. All rights reserved.

Support Mixed WorkloadsObject, File and HDFS operations on the same data

VIRTUAL ARRAY

Isilon3rd Party

VNX5500

ViPR Data Services offer three bucket options:

– Object– HDFS– ObjectandHDFS

ObjectandHDFS provides user with access to either S3 or HDFS

– Full compatibility with existing object based APIs

▪ Amazon S3, Openstack Swift, Atmos

Object HDFSObject& HDFS

Page 8: EMC Hadoop Starter Kit - ViPR Edition

8© Copyright 2014 EMC Corporation. All rights reserved.

Simple, Easy, Cost Effective EMC Starter Kit for Hadoop – ViPR Edition

Deployment guides for major Hadoop distributions:– Pivotal, Cloudera, and Hortonworks

Four step deployment:– Deploy preferred Hadoop Distribution– Deploy EMC ViPR with Object, and HDFS data services– Configure Hadoop distribution to use ViPR HDFS target– Validation Process

▪ Load data file via S3 interface▪ Test MapReduce job

Page 9: EMC Hadoop Starter Kit - ViPR Edition