29
Unstructured Storage Die Basis für ihren Informationsvorsprung Alexander Graf Advisory Systems Engineer Unstructured Storage Solutions

Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Unstructured StorageDie Basis für ihren Informationsvorsprung

Alexander GrafAdvisory Systems Engineer

Unstructured Storage Solutions

Page 2: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

50%+ 48% 45% 92%of global GDP will be

digitized by 20211are unsure what their industry

will look like in 3 yearsfear they will be

obsolete in 3-5 years2see digital business initiatives as critical2

1 IDC FutureScape: Worldwide IT Industry 2018 Predictions Oct 2017 - - Doc # US43171317 2 Dell Digital Transformation Index

Digital Transformation Is Disrupting Every Industry

MEDIA MANUFACTURING HEALTHCARE LIFE SCIENCES AND MORE…AUTOMOTIVE

Page 3: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

AI drives business outcomes

Automotive

Science

Retail

Finance

Manufacturing

…and more

New revenue streams Increase revenue run rate Operational efficiency• Recommendation engines• Cross sell and up-sell• Risk analysis• Fraud detection

• Sentiment analysis• Chatbots• Speech to text to speech• Intent analysis to actions

• Similar products• Visual search• Object recognition• Anomaly detection

3 Dell - Internal Use - Confidential

Page 4: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Artificial Intelligence refers to the simulation of any intellectual task, in order to represent machine intelligence with little to no input from a human or programmer, with the use of machine learning techniques

Machine learning refers to the process of “training” the machine, feeding large amounts of data so that it learns how to respond, rather than being explicitly programmed

Deep learning is a form of machine learning which uses many-layered artificial neural networks, parallel processing, and massive volumes of data to enable faster, more accurate and intellectual artificial intelligence

WHAT IS AI, MACHINE LEARNING AND DEEP LEARNING?

Page 5: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

CUSTOMERS DEMAND OUTCOMES FROM DATA

THE DATA CENTRE OF TODAY THE DATA CENTRE OF TOMORROW

DATA LIVES ON DISK AND TAPEMOVE DATA TO THE CPU AS NEEDED

FOCUS ON DEEP STORAGE HIERARCHY

DATA RESIDES NEAR THE CPU AND MEMORYOUTCOMES ARE DRIVEN BY COMPUTE CENTRIC DESIGN

MOVE FASTER, STORE MORE, COMPUTE EVERYTHING

Page 6: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

CHALLENGE: Transforming Data Into Value

C R E AT I N G B U S I N E S S I M PA C T F R O M D ATA

K E E P I N G U P W I T H D ATA G R O W T H

U N L O C K I N G D ATA T R A P P E D I N S I L O S

Page 7: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

S I M P L I C I T Y A T S C A L ESeamlessly extend management and policies across a massively growing data set.

E X T R A C T V A L U E F R O M D A T ASupport high performance workloads and deliver faster time to insights with all-flash.

U N I F I E D D A T A L A K EEliminate data silos and reduce obstacles across the Edge, Core and cloud.

Dell EMC IsilonSupport the most demanding workloads with the ability to scale performance and capacity as needed.

Page 8: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

8

HighPerformance

Single File System One Namespace

Single File System One Namespace

UnmatchedEfficiency

Simplicity &Ease of Use

LinearScalability

EasyGrowth

Isilon and OneFS

Page 9: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Isilon Workload Consolidation

Page 10: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

TCO Optimization: Simplicity and Ease of Use

• Automation:NO manual interventionNO reconfigurationNO server or client mount point or

application changesNO data migrationsNO RAID

Single File System Spans All Nodes

Directories and Files Striped Across Cluster

Page 11: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

TCO Optimization : AutoBalance Capacity and Performance

AutoBalance Across Nodes

EMPTY

EMPTY

EMPTY

EMPTY

EMPTY

FULL

FULL

FULL

FULL

BALANCED

BALANCED

BALANCED

BALANCED

BALANCED

• Automation Balances Data Reduces Costs, Complexity, RiskEliminates Hot SpotsNO data migrationsNO RAID

Delivers Over 80% Storage Utilization

Page 12: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Push-Button Linear ScalingUnder 60 SecondsTransparent to Users and Applications

Unconstrained Scale

Page 13: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

ISILON SIMPLICITY AND EASE OF USESingle volume and file system Directories and files striped across

cluster nodes

Automation NO manual intervention NO reconfiguration NO client mount point changes NO application changes NO data migrations NO RAID or LUNs

Page 14: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

© Copyright 2017 Dell Inc.14

Enterprise Grade Features: In-Place Analytics

Speeds Time to InsightEnterprise Data Protection for HadoopLower costs• Eliminates dedicated Hadoop infrastructure

Increase flexibility• Simultaneous support for any Apache-

compliant Hadoop distribution

Native HDFS

Page 15: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Ethernet

HADOOP ARCHITECTURE – DAS VS ISILON

NameNode

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Ethernet

Compute Node Compute Node Compute Node

Compute NodeCompute Node Compute Node

name node

name node

name node

data node

Page 16: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

TRADITIONAL “SHARE-NOTHING” HADOOP

Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure

Unstructured Data

1

Existing Primary Storage

2 3 4 2 3 4 2 3 4 2 3 4

• Hadoop on a Stick (R=3) means 5 data copies ($$$$)

• Data has to copy to the Hadoopcluster before analysis can begin (Time to Results)

How will you maintain data consistency when a file changes on your primary storage?

Page 17: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Existing Virtualized Data Center

Existing Primary Storage

ISILON “SHARE-EVERYTHING” HADOOP

1 Start using Hadoop NOW with

unused processing and RAM available in your VMware environment

No replication required (Use your existing data)

Access to same data via NAS and HDFS protocols

Time to results extremely fast using already existing data with NO COPIES or wasted $$$$

Analysis Can Begin with the

1st VM

New Hadoop Compute Nodes

Unstructured Data

Use Native HDFS Protocol

Page 18: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Data Center Network

TIME-TO-RESULTS

Data Copy AnalysisIn-Place Analysis

Existing Primary Storage

Hadoop on a Stick

Have you ever copied 100TB from Primary Storage to a Hadoop system?

How long does it take to copy 100TB from one place to

another over a 10Gb link?

>24 Hours

Data Center Network

Existing Primary Storage

Hadoop Compute Nodes

Reading relevant data to

analysis

Page 19: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Virtual ServersHDFSNFSFTPSMB

Support for Multiple Hadoop Landscapes

name node

name node

name node

name node data node

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

(or even different versions/distro’s)

DATA LAKE

Cloudera IBM

Page 20: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Increase Utilization to Control Costs

Hadoop 1

Hadoop 2

HBase

• Consolidated cluster has access to entire pool of physical resources • Take advantage of multi-tenancy to increase utilization during non-peak hours

Source:

Page 21: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

ObjectC L O U D S C A L E S T O R A G E

G L O B A L D ATA A C C E S S

C L O U D - N AT I V E / M O D E R N A P P S

O B J E C T

Page 22: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

What Is ECS?ECS is a universal object content storePrimarily Object: S3, Swift, CAS

• Lowest cost per TB• Same data protection

overhead for small and large files

• Metadata search native capability

• Easily scalable• Infinitely expandable

• Data globally accessible by Web, mobile and cloud apps

• Reduce data protection overheadwith 3+ sites

Most Cost Effective Tier of Storage

Data and Metadata Stored as Objects

GeodistributedNo Coded Limits

Page 23: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

ECS Target Use CasesCloud BackupTiered

ArchiveCloud Native Apps

(web/mobile)Sync & Share

AnalyticsCloud Gateway

SITE 1

SITE 2

SITE 3

Scale Effortlessly - Store Efficiently - Access Globally

IoT

Page 24: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Ready Solution: Hortonworks Hadoop with Isilon

Solution benefits• Scale Storage independently from Compute• Minimize data movement• Eliminate Shadow IT projects• Current Isilon customers: leverage existing File

Management processes

Differentiation• Industry Leading storage density and scaling• Consolidates data silos with one copy of data• Enterprise-grade File Management• File-level regulatory compliance out-of-the-box• Current Isilon customers: brings analytics to

where the data exists in Isilon

High density Consolidated Data Lake

Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches1x Dell EMC Networking S3048 10GbE Pod Switches

Shared Storage Nodes4x Isilon X410 with 102TB HDD/ 3.2TB SSD/ 256 GB2x QDR Infiniband Switch 8 ports

Infrastructure Nodes4x PowerEdge FC630 with 3x 1.2TB HDD per Sled

Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches

Hortonworks Data Platform Ent+Isilon OneFS

Scales from 100TB to 64 PB

Compute Nodes6x PowerEdge FC630 with 8x 1.2TB HDD per Sled

Compute Configuration: Modular Infrastructure

Page 25: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Solution benefits• Scale Storage independently from Compute• Minimize data movement• Eliminate Shadow IT projects• Current Isilon customers: leverage existing File

Management processes

Differentiation• Industry Leading storage density and scaling• Consolidates data silos with one copy of data• Enterprise-grade File Management• File-level regulatory compliance out-of-the-box• Current Isilon customers: brings analytics to

where the data exists in Isilon

Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches1x Dell EMC Networking S3048 10GbE Pod Switches

Shared Storage Nodes4x Isilon X410 with 102TB HDD/ 3.2TB SSD/ 256 GB2x QDR Infiniband Switch 8 ports

Infrastructure Nodes4x PowerEdge R630 each with 3x 1.2TB HDD

Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches

Cloudera Enterprise Data HubIsilon OneFS

Scales from 100TB to 64 PB

Compute Nodes6x PowerEdge R630 each with 8x 1.2TB HDD

Ready Solution: Cloudera Hadoop with IsilonHigh density Consolidated Data Lake

Compute Configuration: Rack-Server Infrastructure

Page 26: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Introducing Ready Solutions for AI

Validated stack built to handle most demanding AI workloads

Deep Learning with

Machine Learning with

Simpler AI Experience Faster, Deeper AI Insights Proven AI Expertise

30% Improved data scientist productivity Up to 2.9X Performance vs.

competition 98% Lower training time

Self-service for data scientists

Selection of AI frameworks & libraries

Industry-leading, scale-out architecture

Single point of support

Page 27: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

Data Science

Data EngineeringDataOps

Data Thinking

Experienced Partners• Consulting: Data, Algorithms,

Compute, Mindset• Guiding companies to data leader-

and creatorship

• Ideation & Scoping of Usecases• Data Analysis• Development of machine learning

algorithms• Proof of Concepts

• Architechture design and concepts• Engineering and deployment• Testing and test management• Application managment

• Managed, hybrid, cloud infrastructures• DevOps Application management• Haddop and beyond on scale solutions• Security concepts and system design

Page 28: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342

*um Hadoop-as-a-Service

1 Hadoop-HW on prem at customer Datacenter or off prem at UM Datacenter

2 *um provides fully managed platform services including hadoop layer

3 Customer specific analytics Software (tableau, SAS or others)

managed by

Compute nodes

Page 29: Graf DellEMC Infra AG...TRADITIONAL “SHARE-NOTHING” HADOOP Existing Virtualized Data Center SHARE-NOTHING HadoopInfrastructure Unstructured Data 1 Existing Primary Storage 2 342