Upload
jlchatelain
View
203
Download
0
Embed Size (px)
DESCRIPTION
The amount of data in our world has been exploding, and storing and analyzing large data sets—so-called big data—will become a key basis of competition for the new “Enterprise of Things”, underpinning fresh waves of productivity growth, innovation, and consumer surplus. Leaders in every sector – from government to healthcare to finance – will have to grapple with the implications of big data, as data growth continues unabated for the foreseeable future. The quest to make sense of all this big data begins with breaking down data silos within organizations using the cost appropriate, shared infrastructure to ensure optimal extraction and analysis of data, knowledge and insight. This presentation highlights all aspect of #bigdata exploitation, good or bad. It also speaks of the infrastructure challenges associated with it, the place of #hadoop in the big picture and areas of opportunity for innovations.
Citation preview
Hype, Hopes, Hell & HadoopBig Data: Reality Check and Infrastructure Implications of “The Enterprise of Everything”
Jean-Luc Chatelain, EVP & CTO StampedeCon 2014
2
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
2 And now, a quick word from my sponsor
3
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
DDN | Who We Are
• Main Office: Santa Clara, California, USA• Employees: ~550 in 20 Countries• Installed Base: End Customers in 50 Countries• Go To Market: Partner & Reseller Assisted, Direct• DDN: World’s Largest Private Storage
Company
We Design, Deploy and Optimize Storage Systems that Solve HPC, Big Data and Cloud Business Challenges at Scale
World-Renowned & Award-WinningAll TimeWinner
4
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Big Data & Cloud Infrastructure DDN’s Award-Winning Product Portfolio
Analytics Reference Architectures
EXAScaler™
10Ks of Clients1TB/s+, HSM
Linux HPC ClientsNFS & CIFS [2014]
Petascale Lustre® Storage
Enterprise Scale-Out File Storage
GRIDScaler™
~10K Clients1TB/s+, HSM
Linux/Windows HPC ClientsNFS & CIFS
SFA12KX™48GB/s, 1.7M IOPS1,680 Drives in 2 RacksOptional Embedded Computing
SFA7700™13GB/s; 600K IOPS• 7700X• 7700E
Storage Fusion Architecture™ Core Storage Platforms
SATA SSD
Flexible Drive ConfigurationSAS
SFX™ Automated Flash Caching
WOS® 3.032 Trillion Unique Objects
Geo-Replicated Cloud Storage256 Million Objects/Second
Self-Healing CloudEmbedded metadata mgmt
Cloud Foundation
Big Data PlatformManagement
DirectMon®
CloudTiering
Infinite Memory Engine™
Distributed File System Buffer Cache
WOS700060 Drives in 4U
Self-Contained ServersAdaptive Transparent Flash Cache SFX API Gives Users Control [pre-staging, alignment, bypass]
SC13
S3/Swift
Hype & Hopes
6
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Hype
2011 2014
#bigdata in the trough of disillusion is great news for the enterprise!
Today
7
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Back To The Future?
The term “Big Data” coined circa 1999(1)
• Pervasive in some existing markets since late 90’s– HPC sensu latissimo– Life Sciences– Intelligence– ASP (remember that word?)
Is there anything new here? Why the hype?
(1) A Personal Perspective on the Origin(s) and Development of Big Data" Diebold 2012
8
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Is There a #bigdata Definition? For some yes; for others no – or maybe there are multiple definitions
• It is “a basket of technologies”
• It creates “a mindset change in decision making”
“Data sets that exceed the boundaries and sizes of current infrastructure capabilities, forcing technologists to take a non-
traditional approach”
Normal Processin
gCapabiliti
esFile/Object Size, Content
Volume
Acti
vity
: IO
PS
Lots of data
Large file sizes
Lots of transactions
9
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
#bigdata: 2 Dimensions of the 3 V’s
Petabytes of Databut also
Trillions of Information Objects
GB/s to TB/sbut also
Millions of InformationObject per second
Structured & Unstructured
but alsoStreams & Batches
workloads
The “trillions” & “millions” are the primary drivers of complexity and challenge “Time to Results”
VelocityVolume Variety
Remember . . .1ms lost per operation on a billion operations workload= 11.5 days lost!
10
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
So, is #bigdata the new thing?
No!But, its
DEMOCRATIZATIONis!
11
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Quiz!
12
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
The Dawn of a Telemetry Revolution
Internet of
Things
SocialSensors
Telemetry Revolution
The Birth of aMindset Change inBusiness Decision Making
Hell
14
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Governance, Regulation, Compliance
The Universe of Big Data is a massive black hole into which GRC has fallen
• Governance• Regulation• Compliance• Security• Privacy
Now, welcome to the era of shadow data andbehold the plague of hyper-scalability
15
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Tackling #bigdata Is Non-trivial
Value extraction (insights driving business results) is only done on 1% of total enterprise data
Time to value & time to result is business critical
– Inadequate infrastructure = failure & credibility loss
The cardinality dimensions of the 3V’s are the infrastructure killers
Material: network, compute, storage
– Human: DBA, sysadmin & storadmin
Today #bigdata project cannot live in IT or it will fail
Dare to be different
#bigdata nullifies the feature race and favors the benefit race
16
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Let’s Talk Real #bignumbers
HPC is a forward looking time machine that eats #bigdata for lunch
• Enterprise’s #bigdata problems of today were HPC problems 3 to 5 years ago
• HPC & WEB architectures are converging
17
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
The #bigdata Effect on Existing IT Infrastructures
18
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Top 3 #bigdata Infrastructure Challenges
I/O I/O
I/O
19
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
The Scalability Devil Effect on Typical Analytics
• Economics of large capacity EDW storage• Scalability of NAS/SAN file systems• Bandwidth demand of OLAP engine• IOPS demand of modelization• Memory requirements of visualization• MPP drives I/O blending
StructuredData
UnstructuredData
ETL
ETL
EDW
NAS/SAN
ETL
ETL
OLAP Engine
Semantic Engine
Model
Visualize
Report
Hadoop
21
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Hadoop
• IS NOT a person or the solution to world famine or a BI platform or an analytics platform or an EDW or a CEP engine or …..
• IS a growing basket of technologies facilitating BI and/or analytics especially if there is a lot of unstructured data
• IS at the core of many “science projects”
• IS in the infancy of deployment in the traditional enterprise
• HDFS “data lake” concept is very important
22
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
BI & Analytics Today
Database
File System
ETL(primary)
EnterpriseData
Warehouse
Reporting&
Visualization
ETL(secondary)
AnalyticsCEP
Business
Auditing&
Planning
23
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Hadoop Effect
Database
ETL
EnterpriseData
Warehouse
Reporting&
Visualization
AnalyticsCEPBusiness
Auditing&
Planning
BuinessData
Warehouse
DynamicETL
24
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
24#bigdata “At Work” with DDNCase Studies
25
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Accelerating Fraud AwarenessHarnessing Hadoop and Big Data
DDN helps PayPal’s Financial Linking System achieve 200–250ms processing and customer transparency
“On the cost side, the same performance at 3-4 times less cost, that’s clearly important. The fact is, you’ve got scalability you didn’t have previously.”
Ryan Quick, Principal Architect, PayPal
Case
Stud
y
26
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Accelerating Financial Insights
Case
Stud
y
“Other technologies paled in comparison to the performance levels achieved with DDN’s SFA12K.”
Brian Alexseychuk, Managing Director of Infrastructure
• Resolved scaling challenges and parallelized workflows
• Exceeded competitors on metrics such as scalability, speed, density, and TCO
• Improved revenues, reduced trade slippage by 70% & cut telecom expenses
27
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Accelerating Time To Cure
Case
Stu
dy
“If you can serve some of the fastest computers on the planet, then you can help us.”
Phil Butcher, Head IT
“If you need 10K cores to perform an extra layer of analysis in an hour … you need a real solution that can address everything from very small to extremely large data sets.”
Tim Cutts, Head of Scientific Computing
28
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
Accelerating Intelligence Insights
Case
Stud
y
Naval Research Lab Large Data Program
Application• Deep storage & fast distributed
search • Super-HD, 2/3-D, and streaming
data
DDN enables rapid threat detection by speeding up real-time data and imagery up to 500%.
In Conclusion
30
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
2 Faces of #bigdata = Opportunities for Innovation
Technology– Hyper-scalability: DB & FS
– Privacy (masking, obfuscation)
– Keyless security– Visualization and navigation
of large datasets– HDFS persistence
– Provenance– In-memory computing– In-Storage Processing
– GraphDB on MPP– Brute force or machine
learning?– Predictive & prescriptive
analytics
Business– Agility– Narrow casted solutions
with higher stickiness– Data driven business
decision– Retain existing customers
and gain new ones
Information is the currency of today’s global
business
31
© 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others.Any statements or representations around future events are subject to change.ddn.com
@informationcto
Thank You!Merci!