21
www.firstsanfranciscopartners.com Produced by: MONTHLY SERIES Brought to you in partnership with: February 2, 2017 Data Lake vs. Data Warehouse

DI&A Slides: Data Lake vs. Data Warehouse

Embed Size (px)

Citation preview

Page 1: DI&A Slides: Data Lake vs. Data Warehouse

The First Step in Information Management

www.firstsanfranciscopartners.com

Producedby:

MONTHLY SERIES

Broughttoyouinpartnershipwith:

February 2, 2017Data Lake vs. Data Warehouse

Page 2: DI&A Slides: Data Lake vs. Data Warehouse

TopicsforToday’sWebinar

pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

§ DefiningtheDataLakeandDataWarehouse

§ KeydifferencesbetweentheDataLakeandDataWarehouse

§ HowtooptimizetheDataLake

§ HowtooptimizetheDataWarehouse

§ SampleDataLakeandDataWarehousearchitectures(+usecases)

§ HowaDataLakecansolvetheproblemsofaDataWarehouse

§ Keyfindingsandtakeaways

§ Wrap-upCombine?

Page 3: DI&A Slides: Data Lake vs. Data Warehouse

Poll

Whichtypeofdata repositorydoesyourorganizationcurrentlyhaveandactivelyuse?

§ DataLake

§ DataWarehouse

§ BothaDataLakeandDataWarehouse

§ Neither

Ifyourorganizationhasadatarepository,arethereplanstoenhance(i.e.,improve,streamlineand/orupgradeorreplace)itin2017?

§ Yes,wewilllikelymakesomechangesthisyear.

§ No,wearenotlikelytomakeanychangesthisyear.

§ Notsurewhatwe'lldothisyear.

pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Page 4: DI&A Slides: Data Lake vs. Data Warehouse

pg 4

DefiningtheDataLakeandDataWarehouse(Gartner)

§ A DataWarehouse isastoragearchitecturedesignedtoholddataextractedfromtransactionsystems,operationaldatastoresandexternalsources.Thewarehousethencombinesthatdatainanaggregate,summaryformsuitableforenterprise-widedataanalysisandreportingforpredefinedbusinessneeds.

§ ADataLakeisacollectionofstorageinstancesofvariousdataassets.Theseassetsarestoredinanear-exact,orevenexact,copyofthesourceformatandareinadditiontotheoriginatingdatastores.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

BIGDATA

Page 5: DI&A Slides: Data Lake vs. Data Warehouse

DefiningtheDataLakeandDataWarehouse

ThinkofaDataMartasastoreofbottledwater—it’scleansed,packaged,andstructuredforeasyconsumption.TheDataLake,meanwhile,isalargebodyofwaterinamorenaturalstate.ThecontentsoftheDataLakestreaminfromasourcetofillthelake,andvarioususersofthelakecancometoexamine,diveinortakesamples.

pg 5© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

JamesDixon

PentahoCTOandcreatorofthetermDataLake

Page 6: DI&A Slides: Data Lake vs. Data Warehouse

KeyDifferencesBetweentheDataLakeandDataWarehouse

pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Analysis Source: “A Big Data Cheat Sheet: What Marketers Want to Know” by Tamara Dull

Page 7: DI&A Slides: Data Lake vs. Data Warehouse

DataLakeChallenges

pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

TBDtext

Clutter

Sandbox-like

Limited#ofSMEs

Governancenotbuiltin

Security

Privacy

Inflexible

Resource-intensive

Over-confidenceincapabilities

Page 8: DI&A Slides: Data Lake vs. Data Warehouse

TraditionalEnterpriseDataWarehouseChallenges

DataWarehouse

TraditionalData

Sources

BI/An

alyticalTo

ols

ODS

Mart1

Mart2

Mart3

Mart4

Martn

Source

Research

Analyst

Existing EDW Ideallybasedonanenterprisemodel,

whichhasprovendifficult

Stage

Source

Source

Source

Source

Source

Source

Source

Source Addingnewdataandsubjectstakesalongtime

Whenthedesirednumberofusersisachieved,performanceIsnotacceptable

Withoutalotofgovernance,usabilityrarelymeetsexpectations

WithoutathenewdataIneed,Iambetteroffgatheringandstoringitmyself

AnalystAnalyst

Analyst

Research

Scalabilityislimitedw/osignificantinvestmentinmigrationand

infrastructure

HADOOPrequirestrainingandtendstosilo

Page 9: DI&A Slides: Data Lake vs. Data Warehouse

UseCases:BusinessStrategyDrivesEverythingintheDataLake*

pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com Proprietary and Confidential

§ CostCenter(i.e.notdesignedforrevenuegeneration)− Analyticaltoolsforinternaluse− Elasticcomputingforinternalinfrastructureoptimization

§ OperationalDifferentiator(i.e.technologyisusedtodifferentiateoffering)− Cloudandanalyticsaresupportingtheproductoffering− Analyticsareusedtoprovidedifferentiatingfeatures

Example:JohnsonandJohnson

§ RevenueMultiplier(i.e.cloudandBigDataanalyticsintegratedintothebusinessofferings)− Elasticinfrastructureisofferedtootherscompaniesasaninfrastructureservice(marketinganalysis,back-up,analyticstesting,etc.)− BigDataanalyticsaresoldasadataservice

Example:eBay,Netflix Example:NYStockExchange

Starting point for the roles and responsibility is to decide the purpose of the Data Lake *Orbis Technologies

LargeMedicalDeviceManufacturer

Page 10: DI&A Slides: Data Lake vs. Data Warehouse

UseCase:DataWarehouseArchitecturetoAddressMassiveMarketChanges

pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Data “Warehouse”

Analytical Data

OperationalData

Apps

Inte

rnal

Dat

a So

urce

sEx

tern

alD

ata

Sour

ces

(e.g.,Axciom)

(e.g.,MDM)

Others

Sources

CollectionCleansingIntegration

EnterpriseData

Repositories

PresentationAnd

Delivery

ETL

(Ext

ract

, Tra

nsfo

rm, L

oad)

ETL

(Ext

ract

, Tra

nsfo

rm, L

oad)

Mart

Mart

Del

iver

y/Pr

esen

tatio

n La

yer

End-Users

Reporting

Query/Analytics

StandardExtracts

OperationalData

AgentandPolicyholderRetention

IndependentAgents-Easytoswitchallegiances

SalesandMarketing

Customer- Widelychangingdemographics

andmarkets

Underwriting

ClaimsManagement

Page 11: DI&A Slides: Data Lake vs. Data Warehouse

ButThisisNotAboutChoosing

pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Usage basis What Happened?

Why did it happen?

What will happen?

Make it happen by

itself

What do I want to

happen?

What should we do next?

Perceived Maturity Reporting Analyzing Predictive Operation-

alize Adaptive Foresight

Capability Survival Defined

Characteristics Batch ETL ETL / EAI Web Services Streaming

Managed Optimized / Automous

Data Insight Maturity

Page 12: DI&A Slides: Data Lake vs. Data Warehouse

ButThisisNotAboutChoosing

pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Usage basis What Happened?

Why did it happen?

What will happen?

Make it happen by

itself

What do I want to

happen?

What should we do next?

Perceived Maturity Reporting Analyzing Predictive Operation-

alize Adaptive Foresight

Capability Survival Defined

Characteristics Batch ETL ETL / EAI Web Services Streaming

Managed Optimized / Automous

Data Insight Maturity

“DW-ish” “DL-ish”

Page 13: DI&A Slides: Data Lake vs. Data Warehouse

HowtoOptimizetheDataLake

§ Designforrapidandscalableingestion.

§ Know,governandprotectyourdata.

§ Removedatasilosandmitigatechaoswithpervasivedataquality/management.

§ EnsureyourDataLakeisnotisolatedorcrudelyboltedontoexistinginfrastructure.

§ Monitorperformance(ensurenodegrading).

§ Assessyourusers’capabilitiesandidentifygapsandcriticalskillsrequiredtoexpertlyswiminthelake.Thencreateaskillsdevelopmentprogramtocultivaterequiredskillsandbridgeanygaps.

pg 13© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Tips

Page 14: DI&A Slides: Data Lake vs. Data Warehouse

HowtoOptimizetheDataWarehouse

§ UnderstandandreinforcethestrengthsofthefouressentialcomponentsoftheDataWarehouse,including:− Focusonsupportingself-serviceandeaseofuseofMetadata− StrengthentheStructurebyboostingconsistencyandunderstandingofperformance

− EstablishtrustbyensuringQuality− Improveaccuracyandreducerisk withGovernance

pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Tips

Page 15: DI&A Slides: Data Lake vs. Data Warehouse

DataWarehouse blendedwith DataLake

HowaDataLakeCanSolvetheProblemsofaDataWarehouse

pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

§ Lackofagility§ Performance§ Hardtoextend§ Structureddataonly

§ Enablesexperimentation§ Satisfiestimingand

turnaroundissues§ Allowsunstructureddata

Page 16: DI&A Slides: Data Lake vs. Data Warehouse

DataWarehouse blendedwith DataLake

HowaDataLakeCanSolvetheProblemsofaDataWarehouse

pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

§ Lackofagility§ Performance§ Hardtoextend§ Structureddataonly

§ Enablesexperimentation§ Satisfiestimingand

turnaroundissues§ Allowsunstructureddata

DataLaketechnologytoleapfrogDataWarehouse

OrganizationswithoutdatawarehousesimplystartwithaDataLake

Ororganizationsthatneedtoevolvetheirwarehousereplace

itwithaDataLake

Page 17: DI&A Slides: Data Lake vs. Data Warehouse

KeyFindingsandTakeaways

§ Overtime,thisisnotan“either/or”debate.

§ Understandyouroverallrequirements(basedonbusinessneeds)andBLEND DataWarehouseandDataLakecapabilities,aswellastheotherstructuresrelatedtothese,forabestfitorlogicaldatawarehouse.

pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Tips

Page 18: DI&A Slides: Data Lake vs. Data Warehouse

SampleArchitecture pg 18

Wrap-up–WheninDoubt…

BusinessNeed

Operational Managerial Analysis Analytics

BusinessArea

Business Area

BusinessArea

Reports

Reports

Ad hoc

BI

Alert s

Trends

Predictive Analytics

Models

Metadata

Ingestion

DataLake

DataWare-house

SandboxLand

ing BI

Analytics

Page 19: DI&A Slides: Data Lake vs. Data Warehouse

SampleArchitecture

Ingestion

DataLake

DataWare-house

SandboxLand

ing BI

Analytics

pg 19

Wrap-up–WheninDoubt…

BusinessNeed

Operational Managerial Analysis Analytics

BusinessArea

Business Area

BusinessArea

Reports

Reports

Ad hoc

BI

Alert s

Trends

Predictive Analytics

Models

Documentthebusinessneeds,drivers,metadataandactions

arounddata.

Analyzethecharacteristics,patternsandclassifyifthelakeorwarehouseor???supports

theneed.

Metadata

Gathertogetherthevariousgroupingsofhowyoucan

satisfyyourdatainsightsneeds.

Craftyourversionofthebestfit“logicalDataWarehouse.”

Page 20: DI&A Slides: Data Lake vs. Data Warehouse

Q&A

pg 20© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Page 21: DI&A Slides: Data Lake vs. Data Warehouse

pg 21

Thankyou!SeeyouThursday,March2 forthenextwebinar,Descriptive,PrescriptiveandPredictiveAnalytics

JohnLadley@[email protected]

KelleO’Neal@[email protected]

© 2016 First San Francisco Partners www.firstsanfranciscopartners.com