Upload
dataversity
View
2.906
Download
0
Embed Size (px)
Citation preview
The First Step in Information Management
www.firstsanfranciscopartners.com
Producedby:
MONTHLY SERIES
Broughttoyouinpartnershipwith:
February 2, 2017Data Lake vs. Data Warehouse
TopicsforToday’sWebinar
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ DefiningtheDataLakeandDataWarehouse
§ KeydifferencesbetweentheDataLakeandDataWarehouse
§ HowtooptimizetheDataLake
§ HowtooptimizetheDataWarehouse
§ SampleDataLakeandDataWarehousearchitectures(+usecases)
§ HowaDataLakecansolvetheproblemsofaDataWarehouse
§ Keyfindingsandtakeaways
§ Wrap-upCombine?
Poll
Whichtypeofdata repositorydoesyourorganizationcurrentlyhaveandactivelyuse?
§ DataLake
§ DataWarehouse
§ BothaDataLakeandDataWarehouse
§ Neither
Ifyourorganizationhasadatarepository,arethereplanstoenhance(i.e.,improve,streamlineand/orupgradeorreplace)itin2017?
§ Yes,wewilllikelymakesomechangesthisyear.
§ No,wearenotlikelytomakeanychangesthisyear.
§ Notsurewhatwe'lldothisyear.
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
pg 4
DefiningtheDataLakeandDataWarehouse(Gartner)
§ A DataWarehouse isastoragearchitecturedesignedtoholddataextractedfromtransactionsystems,operationaldatastoresandexternalsources.Thewarehousethencombinesthatdatainanaggregate,summaryformsuitableforenterprise-widedataanalysisandreportingforpredefinedbusinessneeds.
§ ADataLakeisacollectionofstorageinstancesofvariousdataassets.Theseassetsarestoredinanear-exact,orevenexact,copyofthesourceformatandareinadditiontotheoriginatingdatastores.
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
BIGDATA
DefiningtheDataLakeandDataWarehouse
ThinkofaDataMartasastoreofbottledwater—it’scleansed,packaged,andstructuredforeasyconsumption.TheDataLake,meanwhile,isalargebodyofwaterinamorenaturalstate.ThecontentsoftheDataLakestreaminfromasourcetofillthelake,andvarioususersofthelakecancometoexamine,diveinortakesamples.
pg 5© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
JamesDixon
PentahoCTOandcreatorofthetermDataLake
KeyDifferencesBetweentheDataLakeandDataWarehouse
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Analysis Source: “A Big Data Cheat Sheet: What Marketers Want to Know” by Tamara Dull
DataLakeChallenges
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
TBDtext
Clutter
Sandbox-like
Limited#ofSMEs
Governancenotbuiltin
Security
Privacy
Inflexible
Resource-intensive
Over-confidenceincapabilities
TraditionalEnterpriseDataWarehouseChallenges
DataWarehouse
TraditionalData
Sources
BI/An
alyticalTo
ols
ODS
Mart1
Mart2
Mart3
Mart4
Martn
Source
Research
Analyst
Existing EDW Ideallybasedonanenterprisemodel,
whichhasprovendifficult
Stage
Source
Source
Source
Source
Source
Source
Source
Source Addingnewdataandsubjectstakesalongtime
Whenthedesirednumberofusersisachieved,performanceIsnotacceptable
Withoutalotofgovernance,usabilityrarelymeetsexpectations
WithoutathenewdataIneed,Iambetteroffgatheringandstoringitmyself
AnalystAnalyst
Analyst
Research
Scalabilityislimitedw/osignificantinvestmentinmigrationand
infrastructure
HADOOPrequirestrainingandtendstosilo
UseCases:BusinessStrategyDrivesEverythingintheDataLake*
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential
§ CostCenter(i.e.notdesignedforrevenuegeneration)− Analyticaltoolsforinternaluse− Elasticcomputingforinternalinfrastructureoptimization
§ OperationalDifferentiator(i.e.technologyisusedtodifferentiateoffering)− Cloudandanalyticsaresupportingtheproductoffering− Analyticsareusedtoprovidedifferentiatingfeatures
Example:JohnsonandJohnson
§ RevenueMultiplier(i.e.cloudandBigDataanalyticsintegratedintothebusinessofferings)− Elasticinfrastructureisofferedtootherscompaniesasaninfrastructureservice(marketinganalysis,back-up,analyticstesting,etc.)− BigDataanalyticsaresoldasadataservice
Example:eBay,Netflix Example:NYStockExchange
Starting point for the roles and responsibility is to decide the purpose of the Data Lake *Orbis Technologies
LargeMedicalDeviceManufacturer
UseCase:DataWarehouseArchitecturetoAddressMassiveMarketChanges
pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Data “Warehouse”
Analytical Data
OperationalData
Apps
Inte
rnal
Dat
a So
urce
sEx
tern
alD
ata
Sour
ces
(e.g.,Axciom)
(e.g.,MDM)
Others
Sources
CollectionCleansingIntegration
EnterpriseData
Repositories
PresentationAnd
Delivery
ETL
(Ext
ract
, Tra
nsfo
rm, L
oad)
ETL
(Ext
ract
, Tra
nsfo
rm, L
oad)
Mart
Mart
Del
iver
y/Pr
esen
tatio
n La
yer
End-Users
Reporting
Query/Analytics
StandardExtracts
OperationalData
AgentandPolicyholderRetention
IndependentAgents-Easytoswitchallegiances
SalesandMarketing
Customer- Widelychangingdemographics
andmarkets
Underwriting
ClaimsManagement
ButThisisNotAboutChoosing
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Usage basis What Happened?
Why did it happen?
What will happen?
Make it happen by
itself
What do I want to
happen?
What should we do next?
Perceived Maturity Reporting Analyzing Predictive Operation-
alize Adaptive Foresight
Capability Survival Defined
Characteristics Batch ETL ETL / EAI Web Services Streaming
Managed Optimized / Automous
Data Insight Maturity
ButThisisNotAboutChoosing
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Usage basis What Happened?
Why did it happen?
What will happen?
Make it happen by
itself
What do I want to
happen?
What should we do next?
Perceived Maturity Reporting Analyzing Predictive Operation-
alize Adaptive Foresight
Capability Survival Defined
Characteristics Batch ETL ETL / EAI Web Services Streaming
Managed Optimized / Automous
Data Insight Maturity
“DW-ish” “DL-ish”
HowtoOptimizetheDataLake
§ Designforrapidandscalableingestion.
§ Know,governandprotectyourdata.
§ Removedatasilosandmitigatechaoswithpervasivedataquality/management.
§ EnsureyourDataLakeisnotisolatedorcrudelyboltedontoexistinginfrastructure.
§ Monitorperformance(ensurenodegrading).
§ Assessyourusers’capabilitiesandidentifygapsandcriticalskillsrequiredtoexpertlyswiminthelake.Thencreateaskillsdevelopmentprogramtocultivaterequiredskillsandbridgeanygaps.
pg 13© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Tips
HowtoOptimizetheDataWarehouse
§ UnderstandandreinforcethestrengthsofthefouressentialcomponentsoftheDataWarehouse,including:− Focusonsupportingself-serviceandeaseofuseofMetadata− StrengthentheStructurebyboostingconsistencyandunderstandingofperformance
− EstablishtrustbyensuringQuality− Improveaccuracyandreducerisk withGovernance
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Tips
DataWarehouse blendedwith DataLake
HowaDataLakeCanSolvetheProblemsofaDataWarehouse
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Lackofagility§ Performance§ Hardtoextend§ Structureddataonly
§ Enablesexperimentation§ Satisfiestimingand
turnaroundissues§ Allowsunstructureddata
DataWarehouse blendedwith DataLake
HowaDataLakeCanSolvetheProblemsofaDataWarehouse
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Lackofagility§ Performance§ Hardtoextend§ Structureddataonly
§ Enablesexperimentation§ Satisfiestimingand
turnaroundissues§ Allowsunstructureddata
DataLaketechnologytoleapfrogDataWarehouse
OrganizationswithoutdatawarehousesimplystartwithaDataLake
Ororganizationsthatneedtoevolvetheirwarehousereplace
itwithaDataLake
KeyFindingsandTakeaways
§ Overtime,thisisnotan“either/or”debate.
§ Understandyouroverallrequirements(basedonbusinessneeds)andBLEND DataWarehouseandDataLakecapabilities,aswellastheotherstructuresrelatedtothese,forabestfitorlogicaldatawarehouse.
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Tips
SampleArchitecture pg 18
Wrap-up–WheninDoubt…
BusinessNeed
Operational Managerial Analysis Analytics
BusinessArea
Business Area
BusinessArea
Reports
Reports
Ad hoc
BI
Alert s
Trends
Predictive Analytics
Models
Metadata
Ingestion
DataLake
DataWare-house
SandboxLand
ing BI
Analytics
SampleArchitecture
Ingestion
DataLake
DataWare-house
SandboxLand
ing BI
Analytics
pg 19
Wrap-up–WheninDoubt…
BusinessNeed
Operational Managerial Analysis Analytics
BusinessArea
Business Area
BusinessArea
Reports
Reports
Ad hoc
BI
Alert s
Trends
Predictive Analytics
Models
Documentthebusinessneeds,drivers,metadataandactions
arounddata.
Analyzethecharacteristics,patternsandclassifyifthelakeorwarehouseor???supports
theneed.
Metadata
Gathertogetherthevariousgroupingsofhowyoucan
satisfyyourdatainsightsneeds.
Craftyourversionofthebestfit“logicalDataWarehouse.”
Q&A
pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
pg 21
Thankyou!SeeyouThursday,March2 forthenextwebinar,Descriptive,PrescriptiveandPredictiveAnalytics
JohnLadley@[email protected]
KelleO’Neal@[email protected]
© 2016 First San Francisco Partners www.firstsanfranciscopartners.com