Upload
mongodb
View
299
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Mark Lewis, Senior MArketing Director EMEA, Cloudera. Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Citation preview
1
Hadoop and the Future of Data ManagementMark J. LewisSenior Director MarketingEurope, Middle East & Africa
Twitter: @markjlewis
2
How do seed selection, planting density, ground slope, temperature, soil composition & weather interact to impact yields?
How much corn did my farm produce last year?
2
3
Can we just keep all our data on-line?
How do we set set policies about what data is worth keeping?
3
4
Can we avoid that cost altogether?
What’s our annual budget for enterprise data warehouse expansion?
4
55
How many cars are there in each parking lot?Can we use that information to refine our prediction?
How will seasonality affect our store’s earnings next quarter?
66
Which one of these people is likely to be carrying a bomb?
Do you have any liquids in your carry-on?
77
Which new technologies actually improve patient health?
What’s our budget for new equipment?
88
Is it possible to set rates based on actual risk for each particular house?
How big is your house? What are comparable insurance claims rates?
99
Information Driven
All of these organizationsare
1010
Information Driven
All of these organizationsneeded to change to become
11 ©2014 Cloudera, Inc. All rights reserved.11
Expanding Data Requires A New Approach
1980sBring Data to Compute
NowBring Compute to Data
Relative size & complexity
DataInformation-centric
businesses use all data:
Multi-structured, internal & external data
of all types
Compute
Compute
Compute
Process-centric businesses use:
• Structured data mainly• Internal data only• “Important” data only
Compute
Compute
Compute
Data
Data
Data
Data
12 ©2014 Cloudera, Inc. All rights reserved.12
The Old Way: Bringing Data to Compute
Complex Architecture• Many special-purpose
systems• Moving data around• No complete views
Missing Data• Leaving data behind• Risk and compliance• High cost of storage
Time to Data• Up-front modeling• Transforms slow• Transforms lose data
Cost of Analytics• Existing systems strained• No agility• “BI backlog”
4
1
2
3
SERVERSMARTSEDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
13 ©2014 Cloudera, Inc. All rights reserved.13
From Hadoop to an Enterprise Data Hub
Open SourceScalableFlexibleCost-Effective
✔
Managed ✖Open Architecture ✖Secure and Governed ✖
✔
✔
✔
3RD PARTYAPPS
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE
CLOUDERA’S ENTERPRISE DATA HUB
BATCHPROCESSING
MAPREDUCE
ANALYTICSQL
IMPALA
SEARCHENGINE
SOLR
MACHINELEARNING
SPARK
STREAMPROCESSINGSPARK STREAMING
WORKLOAD MANAGEMENT YARN
FILESYSTEMHDFS
ONLINE NOSQLHBASE
DATAM
ANAG
EMEN
TCLO
UD
ERA NAVIG
ATOR
SYSTEMM
ANAG
EMEN
TCLO
UD
ERA MAN
AGER
SENTRY, SECURE
1414
15 ©2014 Cloudera, Inc. All rights reserved.15
SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ESTERNAL DATA SOURCES
Diverse Analytic Platform• Bring applications to data• Combine different workloads on
common data (i.e. SQL + Search)• True analytic agility
4
1
2
3 4
The New Way: Bringing Compute to Data
Active Compliance Archive• Full fidelity original data• Indefinite time, any source• Lowest cost storage
1
Persistent Staging• One source of data for all analytics• Persist state of transformed data• Significantly faster & cheaper
2
Self-Service Exploratory BI• Simple search + BI tools• “Schema on read” agility• Reduce BI user backlog requests
3
16
Data ScienceExplorationETLAcceleration
Operational Efficiency Information Advantage
CheapStorage
Business IT
Your Journey to Achieve Full Potential
©2014 Cloudera, Inc. All Rights Reserved.
EDWOptimization
Consolidation 360° View
Advance from Strategy to ROI with Best Practices and Peak Performance
17 ©2014 Cloudera, Inc. All rights reserved.
80% of Those Surveyed Are Planning, or Have Already Begun Big Data Projects
ETL process (Extract Transform Load)
Analytic databases
Storage
EDW (Enterprise Data Warehouse)
Mainframes
0% 10% 20% 30% 40% 50% 60%
Are Augmenting or ReplacedExisting Infrastructure (46% of all Respondents):
Source: King Research survey, September 2013, 3,922 Respondents
Key Insight: Multiple Overlapping Use Cases Require Converged Analytics
18
Thank you!Mark LewisTwitter @markjlewis
18