23
Postgres as BI platform Andy Fefelov mastery.pro 1

Postgres as BI platform

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Postgres as BI platform

Postgres asBIplatformAndyFefelovmastery.pro

1

Page 2: Postgres as BI platform

Agenda

• Problemstatement• Opensourcesolution• ROLAP• Ourarchitecturereview• Postgresfeaturessuitablefor BI• ETLvsELT(stage-nds-ddm)• Columndatastorage• Configuration• Specialfeatures

• Prosandconsofoutsolution

2

Page 3: Postgres as BI platform

Problemstatement

• OurcustomerisoneofthelargestpharmacysupplychaingroupinIreland• 4typesofdispensarysoftware• 250pharmacies• Tobeanalyzed:

• Orders• Scripts (prescription,recipe)• Claims

• Goalstobeachieved:• Purchasingpolicyoptimization• Marketingkillingfeature

3

Page 4: Postgres as BI platform

Opensource

• SpagoBI• Pentaho• Mondrian• Saiku• Cubes(databrewery)

4

Page 5: Postgres as BI platform

ROLAP(R-ROLAP)

• Starscheme• Facts• Dimensions• Measures

• Nopre-calculatedaggregates• SSD• Columnstorage• ???• Profit!

5

Page 6: Postgres as BI platform

ROLAP

6

Page 7: Postgres as BI platform

Ourarchitecture

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

7

Page 8: Postgres as BI platform

Architecture- extractors

• Cyclone_client• Mssql (2008-2012)• Golang• CSV+rsync overssh

• Kachok• Webscrapper

• Skytools replication• Fromexistingproducts

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

8

Page 9: Postgres as BI platform

Architecture– API+UI

• Cubes- cubes.databrewery.org• Easydrilling-down• Slicinganddicing• Servesaggregates,dimensiondetails,facts

• Providesallnecessarymetadataforareportingapplication

• Rails,React• Authorization• d3,dc,crossfilter

• Saiku• Onlyforbackoffice

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

9

Page 10: Postgres as BI platform

Architecture– Postgres(load,transform)

• rawdata• load_something_to_nds(_pharmacy_id integer)stage• normalizeddatastore• load_something_to_ddm(_pharmacy_id integer)nds•cubesandsnapshots•viewsddm

10

Page 11: Postgres as BI platform

Architecture– Postgres(load,transform)

Stage• «Raw»data• CleanedupcompletelyineveryELTcycle• IsasdatasourceforNDS

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage•normalized datastore•load_something_to_ddm(_pharmacy_idinteger)nds

•cubesandsnapshots•viewsddm

11

Page 12: Postgres as BI platform

Architecture– Postgres(load,transform)

• NormalizedDataStore• Heredataisnormalizingandvalidating• Isasourceforddm

• Measuresforddm iscalculatedthere• deltacalculatingforloadingintoddm basedon last_updated field

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)

nds

• cubesandsnapshots• viewsddm

12

Page 13: Postgres as BI platform

Architecture– Postgres(load,transform)

• Dimensionaldatamodel• Cubes• Snapshots

• Deploycalmly• Analyzebefore-afterreleasestates• Viewisentrypointforapplication

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)

nds

• cubesandsnapshots• viewsddm

13

Page 14: Postgres as BI platform

Architecture– Postgres(snapshots)

fact_order_item

vw_order_item

s1_order_item

s2_order_item

14

Page 15: Postgres as BI platform

Architecture– Postgres(snapshots)

fact_order_item

vw_order_item

s1_order_item

s2_order_item

15

Page 16: Postgres as BI platform

Columnstorage

• Suitablefor:• aggregations• showingfixednumbersofcolumns

• cstore_fdw ->https://github.com/citusdata/cstore_fdw• Compression:Reducesin-memoryandon-diskdatasizeby2-4x.Canbeextendedtosupportdifferentcodecs.• Columnprojections:Onlyreadscolumndatarelevanttothequery.ImprovesperformanceforI/Oboundqueries.• Skipindexes:Storesmin/maxstatisticsforrowgroups,andusesthemtoskipoverunrelatedrows.

16

Page 17: Postgres as BI platform

Columnstorage

• Ourexpirience:

• Isnotfasterthanvanilla postgres (sayhelloto cubes)• Volumereducedupto 12times.Wow.• Nowaytobackuptraditionalway(noneed?)• Nosupportfor delete/update (snapshots)

17

Page 18: Postgres as BI platform

Configuration

• Loadprofile:• BigvolumeRWI/O• Mostof I/Oissittingin stage,nds• ddm isnothighloaded

• shared_buffers =½RAM• work_mem=2GB• maintenance_work_mem=3GB• temp_buffers =2GB• effective_cache_size =½RAM• max_wal_sizr =32GB

18

Page 19: Postgres as BI platform

Features

• DDMcouldbeplacedindedicatedserver(londiste,pg_logical)• Use COPY/BULKINSERTS,don’tuse UPDATE(ke ke ke)• Youshouldthinkabouthorizontalandverticalpartitioning,pleasefindproperkeysforthat• Youshouldthinkaboutparallelismfromverybeginning• Use TABLESPACES/PARTIALINDEXES (andmoreandmoredisks)• Youshouldusedatastorepolicy• Statisticsshouldbecollectedintempfs volume

19

Page 20: Postgres as BI platform

Featuresvol 2

• Usemigrations– sqitch bytheory• You’dbettertestELT- sqitch bytheory• Use pg_stat_statements (addthisintomonitoring)• Useprofiling– PLPROFILER3• Sometimes,youhave(not)touse cstore_fdw• Sometimes,youhave(not)touse unloggedtables

20

Page 21: Postgres as BI platform

Prosandcons

• Cons• Noeasywaytoscalehorizontally• Reasonabledifficultdeploy

• Pros• Localdata(nobignetworktransfers)• Effectivelyparallelized(thankstopharmacy_id)• PL/pgSQL

21

Page 23: Postgres as BI platform

Speedlimit

• cubesisnotfast (duetoserialization)• json (12sec)• ujson (4sec)• postgres json output (1.5sec)db selftime0.3-0.7sec

23