31
Hype, Hopes, Hell & Hadoop Big Data: Reality Check and Infrastructure Implications of “The Enterprise of Everything” Jean-Luc Chatelain, EVP & CTO StampedeCon 2014

Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Embed Size (px)

DESCRIPTION

The amount of data in our world has been exploding, and storing and analyzing large data sets—so-called big data—will become a key basis of competition for the new “Enterprise of Things”, underpinning fresh waves of productivity growth, innovation, and consumer surplus. Leaders in every sector – from government to healthcare to finance – will have to grapple with the implications of big data, as data growth continues unabated for the foreseeable future. The quest to make sense of all this big data begins with breaking down data silos within organizations using the cost appropriate, shared infrastructure to ensure optimal extraction and analysis of data, knowledge and insight. This presentation highlights all aspect of #bigdata exploitation, good or bad. It also speaks of the infrastructure challenges associated with it, the place of #hadoop in the big picture and areas of opportunity for innovations.

Citation preview

Page 1: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Hype, Hopes, Hell & HadoopBig Data: Reality Check and Infrastructure Implications of “The Enterprise of Everything”

Jean-Luc Chatelain, EVP & CTO StampedeCon 2014

Page 2: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

2

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

2 And now, a quick word from my sponsor

Page 3: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

3

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

DDN | Who We Are

• Main Office: Santa Clara, California, USA• Employees: ~550 in 20 Countries• Installed Base: End Customers in 50 Countries• Go To Market: Partner & Reseller Assisted, Direct• DDN: World’s Largest Private Storage

Company

We Design, Deploy and Optimize Storage Systems that Solve HPC, Big Data and Cloud Business Challenges at Scale

World-Renowned & Award-WinningAll TimeWinner

Page 4: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

4

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Big Data & Cloud Infrastructure DDN’s Award-Winning Product Portfolio

Analytics Reference Architectures

EXAScaler™

10Ks of Clients1TB/s+, HSM

Linux HPC ClientsNFS & CIFS [2014]

Petascale Lustre® Storage

Enterprise Scale-Out File Storage

GRIDScaler™

~10K Clients1TB/s+, HSM

Linux/Windows HPC ClientsNFS & CIFS

SFA12KX™48GB/s, 1.7M IOPS1,680 Drives in 2 RacksOptional Embedded Computing

SFA7700™13GB/s; 600K IOPS• 7700X• 7700E

Storage Fusion Architecture™ Core Storage Platforms

SATA SSD

Flexible Drive ConfigurationSAS

SFX™ Automated Flash Caching

WOS® 3.032 Trillion Unique Objects

Geo-Replicated Cloud Storage256 Million Objects/Second

Self-Healing CloudEmbedded metadata mgmt

Cloud Foundation

Big Data PlatformManagement

DirectMon®

CloudTiering

Infinite Memory Engine™

Distributed File System Buffer Cache

WOS700060 Drives in 4U

Self-Contained ServersAdaptive Transparent Flash Cache SFX API Gives Users Control [pre-staging, alignment, bypass]

SC13

S3/Swift

Page 5: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Hype & Hopes

Page 6: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

6

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Hype

2011 2014

#bigdata in the trough of disillusion is great news for the enterprise!

Today

Page 7: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

7

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Back To The Future?

The term “Big Data” coined circa 1999(1)

• Pervasive in some existing markets since late 90’s– HPC sensu latissimo– Life Sciences– Intelligence– ASP (remember that word?)

Is there anything new here? Why the hype?

(1) A Personal Perspective on the Origin(s) and Development of Big Data" Diebold 2012

Page 8: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

8

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Is There a #bigdata Definition? For some yes; for others no – or maybe there are multiple definitions

• It is “a basket of technologies”

• It creates “a mindset change in decision making”

“Data sets that exceed the boundaries and sizes of current infrastructure capabilities, forcing technologists to take a non-

traditional approach”

Normal Processin

gCapabiliti

esFile/Object Size, Content

Volume

Acti

vity

: IO

PS

Lots of data

Large file sizes

Lots of transactions

Page 9: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

9

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

#bigdata: 2 Dimensions of the 3 V’s

Petabytes of Databut also

Trillions of Information Objects

GB/s to TB/sbut also

Millions of InformationObject per second

Structured & Unstructured

but alsoStreams & Batches

workloads

The “trillions” & “millions” are the primary drivers of complexity and challenge “Time to Results”

VelocityVolume Variety

Remember . . .1ms lost per operation on a billion operations workload= 11.5 days lost!

Page 10: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

10

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

So, is #bigdata the new thing?

No!But, its

DEMOCRATIZATIONis!

Page 11: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

11

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Quiz!

Page 12: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

12

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

The Dawn of a Telemetry Revolution

Internet of

Things

SocialSensors

Telemetry Revolution

The Birth of aMindset Change inBusiness Decision Making

Page 13: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Hell

Page 14: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

14

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Governance, Regulation, Compliance

The Universe of Big Data is a massive black hole into which GRC has fallen

• Governance• Regulation• Compliance• Security• Privacy

Now, welcome to the era of shadow data andbehold the plague of hyper-scalability

Page 15: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

15

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Tackling #bigdata Is Non-trivial

Value extraction (insights driving business results) is only done on 1% of total enterprise data

Time to value & time to result is business critical

– Inadequate infrastructure = failure & credibility loss

The cardinality dimensions of the 3V’s are the infrastructure killers

Material: network, compute, storage

– Human: DBA, sysadmin & storadmin

Today #bigdata project cannot live in IT or it will fail

Dare to be different

#bigdata nullifies the feature race and favors the benefit race

Page 16: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

16

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Let’s Talk Real #bignumbers

HPC is a forward looking time machine that eats #bigdata for lunch

• Enterprise’s #bigdata problems of today were HPC problems 3 to 5 years ago

• HPC & WEB architectures are converging

Page 17: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

17

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

The #bigdata Effect on Existing IT Infrastructures

Page 18: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

18

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Top 3 #bigdata Infrastructure Challenges

I/O I/O

I/O

Page 19: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

19

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

The Scalability Devil Effect on Typical Analytics

• Economics of large capacity EDW storage• Scalability of NAS/SAN file systems• Bandwidth demand of OLAP engine• IOPS demand of modelization• Memory requirements of visualization• MPP drives I/O blending

StructuredData

UnstructuredData

ETL

ETL

EDW

NAS/SAN

ETL

ETL

OLAP Engine

Semantic Engine

Model

Visualize

Report

Page 20: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

Hadoop

Page 21: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

21

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Hadoop

• IS NOT a person or the solution to world famine or a BI platform or an analytics platform or an EDW or a CEP engine or …..

• IS a growing basket of technologies facilitating BI and/or analytics especially if there is a lot of unstructured data

• IS at the core of many “science projects”

• IS in the infancy of deployment in the traditional enterprise

• HDFS “data lake” concept is very important

Page 22: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

22

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

BI & Analytics Today

Database

File System

ETL(primary)

EnterpriseData

Warehouse

Reporting&

Visualization

ETL(secondary)

AnalyticsCEP

Business

Auditing&

Planning

Page 23: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

23

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Hadoop Effect

Database

ETL

EnterpriseData

Warehouse

Reporting&

Visualization

AnalyticsCEPBusiness

Auditing&

Planning

BuinessData

Warehouse

DynamicETL

Page 24: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

24

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

24#bigdata “At Work” with DDNCase Studies

Page 25: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

25

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Accelerating Fraud AwarenessHarnessing Hadoop and Big Data

DDN helps PayPal’s Financial Linking System achieve 200–250ms processing and customer transparency

“On the cost side, the same performance at 3-4 times less cost, that’s clearly important. The fact is, you’ve got scalability you didn’t have previously.”

Ryan Quick, Principal Architect, PayPal

Case

Stud

y

Page 26: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

26

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Accelerating Financial Insights

Case

Stud

y

“Other technologies paled in comparison to the performance levels achieved with DDN’s SFA12K.”

Brian Alexseychuk, Managing Director of Infrastructure

• Resolved scaling challenges and parallelized workflows

• Exceeded competitors on metrics such as scalability, speed, density, and TCO

• Improved revenues, reduced trade slippage by 70% & cut telecom expenses

Page 27: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

27

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Accelerating Time To Cure

Case

Stu

dy

“If you can serve some of the fastest computers on the planet, then you can help us.”

Phil Butcher, Head IT

“If you need 10K cores to perform an extra layer of analysis in an hour … you need a real solution that can address everything from very small to extremely large data sets.”

Tim Cutts, Head of Scientific Computing

Page 28: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

28

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

Accelerating Intelligence Insights

Case

Stud

y

Naval Research Lab Large Data Program

Application• Deep storage & fast distributed

search • Super-HD, 2/3-D, and streaming

data

DDN enables rapid threat detection by speeding up real-time data and imagery up to 500%.

Page 29: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

In Conclusion

Page 30: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

30

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

2 Faces of #bigdata = Opportunities for Innovation

Technology– Hyper-scalability: DB & FS

– Privacy (masking, obfuscation)

– Keyless security– Visualization and navigation

of large datasets– HDFS persistence

– Provenance– In-memory computing– In-Storage Processing

– GraphDB on MPP– Brute force or machine

learning?– Predictive & prescriptive

analytics

Business– Agility– Narrow casted solutions

with higher stickiness– Data driven business

decision– Retain existing customers

and gain new ones

Information is the currency of today’s global

business

Page 31: Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)

31

© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com

@informationcto

Thank You!Merci!