Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Postgres asBIplatformAndyFefelovmastery.pro
1
Agenda
• Problemstatement• Opensourcesolution• ROLAP• Ourarchitecturereview• Postgresfeaturessuitablefor BI• ETLvsELT(stage-nds-ddm)• Columndatastorage• Configuration• Specialfeatures
• Prosandconsofoutsolution
2
Problemstatement
• OurcustomerisoneofthelargestpharmacysupplychaingroupinIreland• 4typesofdispensarysoftware• 250pharmacies• Tobeanalyzed:
• Orders• Scripts (prescription,recipe)• Claims
• Goalstobeachieved:• Purchasingpolicyoptimization• Marketingkillingfeature
3
Opensource
• SpagoBI• Pentaho• Mondrian• Saiku• Cubes(databrewery)
4
ROLAP(R-ROLAP)
• Starscheme• Facts• Dimensions• Measures
• Nopre-calculatedaggregates• SSD• Columnstorage• ???• Profit!
5
ROLAP
6
Ourarchitecture
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
7
Architecture- extractors
• Cyclone_client• Mssql (2008-2012)• Golang• CSV+rsync overssh
• Kachok• Webscrapper
• Skytools replication• Fromexistingproducts
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
8
Architecture– API+UI
• Cubes- cubes.databrewery.org• Easydrilling-down• Slicinganddicing• Servesaggregates,dimensiondetails,facts
• Providesallnecessarymetadataforareportingapplication
• Rails,React• Authorization• d3,dc,crossfilter
• Saiku• Onlyforbackoffice
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
9
Architecture– Postgres(load,transform)
• rawdata• load_something_to_nds(_pharmacy_id integer)stage• normalizeddatastore• load_something_to_ddm(_pharmacy_id integer)nds•cubesandsnapshots•viewsddm
10
Architecture– Postgres(load,transform)
Stage• «Raw»data• CleanedupcompletelyineveryELTcycle• IsasdatasourceforNDS
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage•normalized datastore•load_something_to_ddm(_pharmacy_idinteger)nds
•cubesandsnapshots•viewsddm
11
Architecture– Postgres(load,transform)
• NormalizedDataStore• Heredataisnormalizingandvalidating• Isasourceforddm
• Measuresforddm iscalculatedthere• deltacalculatingforloadingintoddm basedon last_updated field
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)
nds
• cubesandsnapshots• viewsddm
12
Architecture– Postgres(load,transform)
• Dimensionaldatamodel• Cubes• Snapshots
• Deploycalmly• Analyzebefore-afterreleasestates• Viewisentrypointforapplication
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)
nds
• cubesandsnapshots• viewsddm
13
Architecture– Postgres(snapshots)
fact_order_item
vw_order_item
s1_order_item
s2_order_item
14
Architecture– Postgres(snapshots)
fact_order_item
vw_order_item
s1_order_item
s2_order_item
15
Columnstorage
• Suitablefor:• aggregations• showingfixednumbersofcolumns
• cstore_fdw ->https://github.com/citusdata/cstore_fdw• Compression:Reducesin-memoryandon-diskdatasizeby2-4x.Canbeextendedtosupportdifferentcodecs.• Columnprojections:Onlyreadscolumndatarelevanttothequery.ImprovesperformanceforI/Oboundqueries.• Skipindexes:Storesmin/maxstatisticsforrowgroups,andusesthemtoskipoverunrelatedrows.
16
Columnstorage
• Ourexpirience:
• Isnotfasterthanvanilla postgres (sayhelloto cubes)• Volumereducedupto 12times.Wow.• Nowaytobackuptraditionalway(noneed?)• Nosupportfor delete/update (snapshots)
17
Configuration
• Loadprofile:• BigvolumeRWI/O• Mostof I/Oissittingin stage,nds• ddm isnothighloaded
• shared_buffers =½RAM• work_mem=2GB• maintenance_work_mem=3GB• temp_buffers =2GB• effective_cache_size =½RAM• max_wal_sizr =32GB
18
Features
• DDMcouldbeplacedindedicatedserver(londiste,pg_logical)• Use COPY/BULKINSERTS,don’tuse UPDATE(ke ke ke)• Youshouldthinkabouthorizontalandverticalpartitioning,pleasefindproperkeysforthat• Youshouldthinkaboutparallelismfromverybeginning• Use TABLESPACES/PARTIALINDEXES (andmoreandmoredisks)• Youshouldusedatastorepolicy• Statisticsshouldbecollectedintempfs volume
19
Featuresvol 2
• Usemigrations– sqitch bytheory• You’dbettertestELT- sqitch bytheory• Use pg_stat_statements (addthisintomonitoring)• Useprofiling– PLPROFILER3• Sometimes,youhave(not)touse cstore_fdw• Sometimes,youhave(not)touse unloggedtables
20
Prosandcons
• Cons• Noeasywaytoscalehorizontally• Reasonabledifficultdeploy
• Pros• Localdata(nobignetworktransfers)• Effectivelyparallelized(thankstopharmacy_id)• PL/pgSQL
21
Speedlimit
• cubesisnotfast (duetoserialization)• json (12sec)• ujson (4sec)• postgres json output (1.5sec)db selftime0.3-0.7sec
23