43
@SnowflakeDB #CloudAnalytics17 LONDON

London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

@SnowflakeDB@SnowflakeDB #CloudAnalytics17

LONDON

Page 2: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

EnablingtheAgileDataWarehouseSteveHerskovitzVPSalesEngineering,SnowflakeComputing

Page 3: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• AgileWarehouseScaling• SeparationofWorkloads• VirtualWHScalingTechniques

• AgileDataLifecycle• Cloning

• AgileDataAnalytics• TimeTravel

• RealCustomerStory• GTA– Gulliver'sTravelAssociates

EnablingtheAgileDataWarehouse- Agenda

Page 4: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Agile Warehouse ScalingSeparation of Workloads

Page 5: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Struggle– multipleworkloadssharingafixedresource• OvernightbatchETL

• ETLmustcompletebeforebusinessworkloadsstart• PlannedorunexpecteddatasurgescancauseETLtorunlate• Worseyet,overnightinUSisexactlytheUKbusinessday• ETLandbusinessworkloadsimpacteachother'sSLAs

• Competingbusinessworkloads• Sales,Marketing,Finance,DataScience

• Conventionalsolutions• Dividefixedresourceintotimeslotsforeachworkload• Complexworkloadprioritizationschemes• Periodicorseasonalsurgeshandledbylockingoutsomeusers

SeparationofWorkloads

Page 6: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Struggle– multipleworkloadssharingafixedresource

• Snowflakesolution• AssigneachworkloaditsownVirtualWH• Ataminimum,twoWH:ETLandBusiness

• ETLcanruncontinuouslyifitmakessenseforthebusiness

• ETL/Businessworkloadcontentioniseliminated• Furthersubdividebusinessworkloadsintoownclustersasneeded

• EliminatecontentionbetweenSales,Marketing,Finance,DataScienceworkloads

• Internationalgroupscanoperateclustersontheirownlocaltimeschedules

• Permitsdepartmentalchargebacks

SeparationofWorkloads

VirtualWarehouse

Databases

VirtualWarehouse

ETL&DataLoading

BusinessWorkloads

Finance

VirtualWarehouse

Test/Dev

VirtualWarehouse

S

Marketing

VirtualWarehouse

Sales

VirtualWarehouse

S

Research

Page 7: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Agile Warehouse ScalingTechniques

Page 8: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• IncreaseT-shirtsize• Moredatabeinganalyzed• Morecomplexqueries• Getsomeconcurrencyboost

• Multi-clusterWHisbestforconcurrency

• Workloadquerieshaveusualweight• Butmoreofthem,e.g.20dashboardusersratherthantheusual5

• Combinethesetwotechniquesforbesteffect

WarehouseScalingTechniques

Page 9: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Schedulerchecksifitshouldspinupanothercluster• Queriesmustqueuefor>30seconds• Spinningupclusterisoftenimmediate(forXXLorsmaller)• Queriesbegintogetload-balancedacrossnewclusters• One-minuterule:60secondsofloadbalancingbeforenextqueuinginterval

• Repeatuptomaximumclustersconfiguredforwarehouse• Designedto

• Balanceresponsivenessagainstcost• EnsureSLAs

MCWHScalingAlgorithms– ScalingUp

Page 10: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Schedulerchecksforclusterstodistributequeries• Clusterisactive(notquiesced)• Clusterhaslatestversionofsoftware(incasesystemupdateinprogress)• Clusterhashead-roomformorequeries• Clusteristheleastbusy• Sessionaffinitybreaksanyties(forcache)

• Designedtomaximize• Individualqueryperformance• Overallthroughput• Overallconcurrency

MCWHScalingAlgorithms– LoadBalancing

Page 11: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Schedulerchecksifitcanspindownacluster• One-minuterule:60secondsofloadbalancingbeforecheckifWHunderloaded• CheckifWHwithonelessclustercouldhavehandledtheloadover15minutes• Quiesce cluster:finishcurrentqueriesbutacceptnonewqueries• Waitanother15minutesbeforecheckingifcanquiesce anothercluster

• Repeatdowntominimumclustersconfiguredforwarehouse• Designedto

• Maximizethevalueoftherunningclusters• MinimizethecostoftheMCWH

MCWHScalingAlgorithms– ScalingDown

Page 12: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Anticipatedsurges• ExplicitlyincreaseWHnodes(T-shirtsize)whenexpectingmoredata• ExplicitlyincreaseMCWHminimumclusterswhenexpectingmorequeries• CandobothatoncewithALTERWAREHOUSE• Usecron orotherscheduling/orchestrationtool

• Unanticipatedsurges• RelyonMCWHmaximumclustersforsomeextraheadroom

• Maximize• Responsivenessforusers• Throughputandvalueextractedfromvariablecomputepower

• Minimize• Costandadministrativeoverhead

WarehouseScaling– BestPractices

Page 13: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Agile Data Lifecycle

Page 14: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• SeparationofWorkloads• Individualvirtualwarehouseforeachdev/test/prodfunctionalarea

• CLONEfordev/test• Fulllogicalcopyofthedata,butusesnoextrastorage• Test/dev operationsagainstclonehavenoeffectonoriginaldata• Security

• RBAClimitsdev/testaccesstocloneandnotproductiondata• SecureViewspermitrole- oruser-basedobfuscation/masking/projection

• ClonetoTRANSIENTreducesstorageusagebydev/testoperations• TRANSIENTtablescanhaveretentionperiodsetto0daysiftime-travelisnotpartofyourapp

• BusinessImpact– betterqualitycode• Dev andtestteamsareworkingondataatscale,seetrueappperformance• Fullrangeofvaluesmeansfewersurpriseswhenappencounterslivedata

AgileDataLifecycle

Page 15: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Demo

Page 16: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Createdevelopment(DEV)andintegration(INT)databasesfromproduction(PROD)

Scenario1

PROD

PUBLIC

TableA TableB

INT

PUBLIC

TableA TableB

DEV

PUBLIC

TableA TableBCLONE

CLONE

Page 17: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Createtwonewtables,CandD,inthedevelopment(DEV)database

Scenario2:newdevelopment

PROD

PUBLIC

TableA TableB

INT

PUBLIC

TableA TableB

DEV

PUBLIC

TableA TableB

TableC TableD

Page 18: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Mini-release:promotetableCforintegrationtesting

Scenario2:newdevelopment

PROD

PUBLIC

TableA TableB

INT

PUBLIC

TableA TableB

DEV

PUBLIC

TableA TableB

TableC TableD

TableC

CREATETABLELIKE

Page 19: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Deploytoproduction:promotetableCtoPRODdatabase

Scenario2:newdevelopment

PROD

PUBLIC

TableA TableB

INT

PUBLIC

TableA TableB

DEV

PUBLIC

TableA TableB

TableC TableD

TableC

CREATETABLELIKE

TableC

Page 20: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• Refreshdev:getlatestPRODdataintoDEVandINT

Scenario2:newdevelopment

PROD

PUBLIC

TableA TableB

INT

PUBLIC

TableA TableB

DEV

PUBLIC

TableA TableB

TableC TableD

TableC

TableC

CLONE

DEV

PUBLIC

TableA TableB

TableC TableD

DEV2

CLONE

Page 21: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• CLONEforDataScientists• QuickandSafesandboxfordiscoveryandtesting• Combinewithownvirtualwarehouseforcompleteisolation• BusinessImpact– betterdatascience

• Morefine-graineddataoverlongertimeintervals• Deeperinsights,betterforecasting,moremonetizable results

• CLONEforCompliance• Monthly,quarterly,annualclones– financialreporting,auditingrequirements• BusinessImpact– simplercompliance

• Your"backups"areliveandimmediatelyavailable

AgileDataLifecycle

Page 22: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Time Travel

Page 23: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• CLONEoperatesonmetadatarepo• Tablemicro-partitionsaretrackedbypointersinmetadatarepo• Cloningcopiespointersonly,notthemicro-partitions

• TimeTravelleveragesmetadatapointers• Pointerhasmillisecond-granularitytimestamp• Snowflakeknowswhichmicro-partitionsareactiveinyourtableatanymoment

AgileDataAnalytics

Page 24: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• DataRetentionforTimeTravel• Defaultis24hours,maximum90days• Configurableper-tablebytableowner• Usesmorestoragebecausekeepsthemicro-partitionsaroundlonger

• SimpleSQLsyntax• SELECTcols…FROMt1 AT(TIMESTAMP=>timestamp);• CREATEobj2 CLONEobj1 BEFORE(STATEMENT=>query-id);

AgileDataAnalytics

Page 25: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• SELECTcount(*)FROMlineitem AT(TIMESTAMP=>'2020-01-0112:00:00'::timestamp);

• SELECT(SELECTcount(*)FROMlineitem AT(OFFSET=>-60*2))before_etl,(SELECTcount(*)FROMlineitem)after_etl;

AgileDataAnalytics

Page 26: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Demo

Page 27: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• BusinessImpactofTimeTravel• UNDROP— table,schema,database• Un-TRUNCATEtable(CLONEtable,thenswapnames)• RecoverfromETL/ELTupdate(CLONEdatabase,thenswapnames)• Temporalqueries

• Forexample,whatwasinventoryonagivendate?• Type2slowlychangingdimensions

• upto90-dayrunningwindow• Fastprototypeforlonger-windowType2analytics

• Testpredictivemodelsagainsthistoricaldata• Don'tneedtomakeandstoredailybackups

AgileDataAnalytics

Page 28: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

• SeparationofWorkloads• Individualvirtualwarehouseforeachdev/test/prodfunctionalarea

• VirtualWarehousescaling• T-shirtsizes,numberofWHs,andMCWH

• CLONEfordev/testandotherusescases• Fulllogicalcopyofthedata,butusesnoextrastorage

• TimeTravelandCDP• SELECT"asof"fortestingofpredictivemodels,type2changingdimensions• Easy"undo"ofupdates– UNDROP,un-TRUNCATE,CLONE"asof"

AgileDataWarehouse– Summary

Page 29: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Customer Story – GTA

Page 30: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

§

CastStudy:EnablingtheAgileDataWarehousewithSnowflake

[email protected]

Page 31: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

DataWarehousingprojects…

• HowSnowflakehelpedGTAbemoreAgile.

1. FocusonValue: Choosingtherighttools

2. Ourplumbing:Pipelinetransparency

3. AgileDevelopment:Iteration,prototyping&testing

4. Scaling:startsmall,remainflexible

ComplexLongChange

=Risky

Page 32: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

1.Choosingyourtools

• Whatwouldletusfocusonaddingvaluequickest?

• Whatisgoingtogivehighestproductivity?

• Whatislowestrisk– butstillfutureproof?

Page 33: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

1.Ournewstack…

- Don’tre-inventthewheel

- Standard,proven,&skillsavailability

- LowerdependencyonIT

- Python,opensource

- SQL

- On-demand– getgoingquickly

- Zeroadmin– welostourDBA

- Managedplatform

Airflow

Snowflake

EC2+S3

GoldenGate

Page 34: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Extract,Load,thenTransform

CaptainObvious isheretosaysomethingobvious…

WedoeverythinginSnowflake- Replicatesource- Transparency!- NoETLtool

2.Plumbing•

Page 35: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

2.Weingestevery5mins…Dataage:1.5days->1.5hrs

Airflow

Bookings

Inventory

Finance

Salesforce

AWSS3BI/Viz

PythonJupyternotebooks

On-premise

5min 60min 90minDataage

Page 36: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

3.AgileDevelopment- Cloning!

S3Pre-proddatafeed

S3Proddatafeed

Pre-Production

Production

Dev(n)

Page 37: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

3.UsingCloning– next

S3Proddatafeed

Deltas Production

• Maintain1feed

• (n)Dev&TestpairsOn-demand (cost)

• LiveDeltasfeedfrompointofcloning

Page 38: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

4.Scaling– flexibility!

Beforeyoustartyourproject,howconfidentareyouon:

– Load&concurrency?;Storage?Test&Dev.?

– Ad-hocAnalytics/DataSciencepulls?

Startedsmall,andscaleuporoutasneeded:

– DirectBIqueryingonlargedatasets

– Projectworkloads

– Re-processing

Page 39: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

4.Snowflakereducesopportunitycosttotrythings

Mynewpricinganalysiscodetakes3mtorun…

CanIrunit7000times?

Snowflakescaling=ReducingOpportunitycosts forexperimentation

Page 40: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Agiletips

– Agilemodellingonwhiteboardswiththebusiness

– Prototyping – shareearlyinexcel&BItool

– Iterate – 1st versioninuseearly

– Milestones - nobigbang

– Verification – plantimeforthisandtackleearly

Page 41: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

TakeawaysonSnowflake+agile

– Choosetoolsthatfityourteam’sskillset

– Choosetoolsthatmoveyouquicklydodeliveringbusinessvalue

– Transforminyourtargetenvironment

– Createanagiledevelopmentenvironment

– Choosetoolsthatareflexibleandon-demandtostartsmall

Page 42: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Q&A

[email protected]

Page 43: London 4 - Enabling Agile Data Warehouse€¦ · Enabling the Agile Data Warehouse Steve Herskovitz VP Sales Engineering, Snowflake Computing •Agile Warehouse Scaling •Separation

Thank You to Our PartnersPlatinum

Gold