TOASTing an Elephant : Building a Custom Data Warehouse Using PostgreSQL

Building a Custom Data Warehouse Using PostgreSQL

TOASTing an Elephant

Illustration by Zoe Lubitz

David Kohn

Chief Elephant Toaster and Data Engineer at Moat

david.kohn@moat.com

We measure attention online, for both advertisers and publishers.

We don’t track cookies/ip addresses.

Rather we process billions of events per day that allow us to measure how many people saw an ad or interacted with it.

We are a neutral third party and our metrics are used by both advertisers and publishers to measure their performance and agree on a fair price.

Those billions of events are aggregated in our realtime system and end up as millions of rows per day added to our stats databases.

Moat Interface

tuple client filter1date filterN metric1 metricN

Partition Keys Filters (~10 text) Metrics (~170 int8)

Production queries have single client.

Production queries sum all of these.Subset(s) are hierarchical.

Basic Row Structure

Production queries have single client.

Production queries sum all of these.Subset(s) are hierarchical.

Basic Row Structure

SELECT filter1, filter2 … SUM(metric1), SUM(metric2) … SUM(metricN) FROM rollup_table_name WHERE client = ‘foo’ AND date >= ‘bar’ AND date <= ‘baz’

GROUP BY filter1, filter2 …

Typical Query

Moat Interface

Client Filters Date Range

Metrics

Moat Interface

Metrics (And there’s a lot more of them you can choose)

Lots of Data

Sum large amounts of data quickly (but only a small fraction of total data, easily partition-able)

Sum all columns of very wide rows

Compress data (for storage and i/o reasons)

Support medium read concurrency (or at least degrade predictably) ie 4-12 requests/second some of which can take minutes to finish

Data is derivative and structured to meet needs of client-facing app high read/aggregation throughput for clients

ETL quickly, some bulk delete/redo operations, once per day

Requirements

Should we choose a row store or a column store?

Old Systems

• 2 masters + 2 replicas each

• Handled last 7 days

• High concurrency

• Highly disk bound

• Heavily partitioned

• Shield for column stores

• ~3 mos/cluster (30 TB license - 8 nodes - $$$)

• Fast, but slowed down under concurrency

• Performance degradation unpredictable

• Projections can lead to slow ETL

• 1 cluster (8 nodes, spinning disk)

• 2012-Present

• No roll up tables, too big

• Incredibly slow for client facing queries (many columns)

• Bulk Insert ETL, delete/update hard

Postgres Vertica Redshift

tuple tuple

header

tuple header attr attr

attr attr

attr attr attrattr attr

table (on disk)

page page page page

Row Store

A table is a collection of rows, each row split into columns/attrs

Each row must fit into a page.

tuple tuple

header

attr attr

table (on disk)

page page page page

Row Store• Accesses small subsets of rows

quickly

• Little penalty for many columns

selected

• Great for individual inserts,

updates and deletes

• Often normalize data structure

• OLTP workloads

• High concurrency, less

throughput per user

• Data stored uncompressed,

unless too large for a block

pagecompressed values (possibly with surrogate keys)

table (on disk)

attr Apage page page page

page page page

attr Bpage page page page

page page page page

page page

Column Store

A table is a collection of columns.

Each column split into values position corresponds to row.

Values in columns often compressed.

pagecompressed values (possibly with surrogate keys)

table (on disk)

attr Apage page page page

page page page

attr Bpage page page page

page page page page

page page

Column Store• Scans and aggregates large

numbers of rows quickly

• Best when selecting a subset of

columns

• Great for bulk inserts, harder to

delete or update

• Often denormalized data

structure

• OLAP workloads

• Lower concurrency, much higher

throughput per user

• Data can be compressed

tuple tuple

attr attr

What happens when an attr is too big to fit in a page?

header

TOAST tablepage

The Oversized Attribute StorageTechnique

tuple header attr pointer

attr attr

tuple id

compressed attr

segment

pagetuple id

compressed attr

segment

Project Marjory

Moat Interface

Metrics

Original Row

Subtype

subtype filter1 filterN metric1 metricN

Original Row

Subtype

subtype filter1 filterN metric1 metricN

tuple array

MegaRow

clientdate

Partition Keys

subtype

Array of Composite Type (~5000 rows/array)

subtype subtype subtype

subtype subtype subtype subtype

segment

INSERT INTO array_table_name SELECT date, client, segment, ARRAY_AGG( (filter1, filter2 … metric1, metric2 … metricN)::subtype) FROM temp_table_for_etl GROUP BY date, client, segment

Typical ETL Query

INSERT INTO array_table_name SELECT date, client, segment, ARRAY_AGG( (filter1, filter2 … metric1, metric2 … metricN)::subtype) FROM temp_table_for_etl GROUP BY date, client, segment

Typical ETL Query

Reporting Query SELECT a.date, a.client,

s.filter1 ... s.filterN SUM(s.metric1)... SUM(s.metricN)

FROM array_table_name a, LATERAL UNNEST(subtype[]) s (filter1, filter2, … metricN) WHERE client = ‘foo’ AND date >= ‘bar’ AND date <= ‘baz’

1 Client, 10 days, ~150,000 rows/day (~1.5m rows total)

MarjoryRedshift

1 Client, 10 days, ~3,000,000 rows/day (~30m rows total)

MarjoryRedshift

1 Client, 4 months, ~150,000 rows/day (~18m rows total)

MarjoryRedshift

• Performs quite well on our typical queries (lots of columns, large subset of rows)

• Sort order matters less than in column stores

• Query time scales with number of rows unpacked and aggregated, lightly depends on number of columns

• Utilizes resources efficiently for concurrency (Postgres’ stinginess can serve us well)

• 8-10x compression for our data (with a bit of extra tuning of our composite type)

• All done in PL/PGSQL etc, no C-code required.

• Doesn’t do as well on general SQL queries, have to unpack all of the rows

• Not getting you much compared to a column store if you’re accessing only a few columns (one might be able to design it differently though)

• Doesn’t dynamically scale number of workers for size of query (Postgres’ stinginess doesn’t serve us well for more typical BI cases, but that wasn’t what we optimized for)

• Isn’t going to do as well when scanning very large numbers of rows (ie more typical BI)

• All done in PL/PGSQL etc, no C-code required.

The Good The Not-So-Good

Trade generality for fit to our use case.

I’ll Drink to That!

Rollups

SELECT filter1, filter2, filterN, SUM(metric1), SUM(metricN) GROUP BY GROUPING SETS(filter1, filter2 ... filterN-1, filterN), (filter1 ... filterN-1, filterN), ... (filter1, filter2), (filter1)

INSERT INTO byfilter1 ... INSERT INTO byfilter2 ...

tuple arrayarrayarrayarrayclientdate

Partition Keys

subtype

subtype subtype

segment

subtype

subtypesubtype

subtype

byfilter4[ ]byfilter3[ ]byfilter2[ ]byfilter1[ ]

MegaRow

tuple arrayarrayclientdate

Partition Keys

subtype

segmentsubtype

subtype

MegaRow

subtype

subtypearray

subtype

tuple arrayarrayclientdate

Partition Keys

subtype

segmentsubtype

subtype

MegaRow

NULL NULL

tuple clientdate

Partition Keys

segment

Rollup Arrays

Summary Statistics

total_rows metadata

Summary Statistics

tuple clientdate

Partition Keys

segment

Rollup Arrays

Summary Statistics

total_rows metadata

Summary Statistics

SELECT date, client, SUM(total_rows) as rows_per_day FROM array_table_name GROUP BY date, client

Count Rows/Day by Client

Partition Keys Rollup Arrays

Distinct ListsSummary Stats

clientdate segment

total rows metadata arrayarray

val val

val val val

val val

Distinct Filter Values

Targeted Reporting Query

clientdate segment

total rows metadata

SELECT a.date, a.client, s.filter1, s.filter2, … s.metricN FROM array_table_name a, LATERAL UNNEST(subtype[]) s (filter1, filter2, … metricN) WHERE client = ‘foo’ AND date >= ‘bar’ AND date <= ‘baz’ AND s.filter1 = ‘fizz’

arrayarray

val val

val val val

val val

Targeted Reporting Query

clientdate segment

total rows metadata

SELECT a.date, a.client, s.filter1, s.filter2, … s.metricN FROM array_table_name a, LATERAL UNNEST(subtype[]) s (filter1, filter2, … metricN) WHERE client = ‘foo’ AND date >= ‘bar’ AND date <= ‘baz’ AND s.filter1 = ‘fizz’

arrayarray

val val

val val val

val val

SELECT a.date, a.client, s.filter1, s.filter2, … s.metricN FROM array_table_name a, LATERAL UNNEST(subtype[]) s (filter1, filter2, … metricN) WHERE client = ‘foo’ AND date >= ‘bar’ AND date <= ‘baz’ AND s.filter1 = ‘fizz’ AND a.distinct_filter1 @> ‘[fizz]’::text[]

• Marjory (All data since 2012) has about the same on disk footprint as Elmo (last 33ish days)

• ~20x compression compared to normal format Postgres (~10x TOAST + ~2x avoided storage of rollups)

• 5 Marjory instances, each with all of the data for all time (on local store spinning disk drives) have basically taken over what we had on our Vertica and Redshift instances (at least 16 instances)

• Overall tradeoff is I/O for CPU, so had to do some tuning to get parallel to planning/running properly

ALTER TABLE array_table_name ALTER client SET STATISTICS 10000; ALTER TABLE array_table_name ALTER byfilter1 SET STATISTICS 0; ALTER TABLE array_table_name ALTER byfilter2 SET STATISTICS 0; ... ALTER TABLE array_table_name ALTER byfilterN SET STATISTICS 0;

Only Do Meaningful Statistics (But Make Them Good)

Useful Tuning Tips

Make Data-Type Specific Functions For Unnest With Proper StatsCREATE FUNCTION unnest(byfilter4) RETURNS SET OF array_subtype as $func$ ... $func$ LANGUAGE PLPGSQL ROWS 5000 COST 5000;

Useful Tuning Tips

Make Data-Type Specific Functions For Unnest With Proper StatsCREATE FUNCTION unnest(byfilter4) RETURNS SET OF array_subtype as $func$ ... $func$ LANGUAGE PLPGSQL ROWS 5000 COST 5000;

min_parallel_relation_size parallel_setup_cost parallel_tuple_cost max_worker_processes max_parallel_workers_per_gather cpu_operator_cost?

Futz With Parallelization Parameters Until They Work

Yep. CPU Bound

david.kohn@moat.com

We’re hiring!

http://grnh.se/os4er71

TOASTing an Elephant : Building a Custom Data Warehouse Using PostgreSQL

Data & Analytics

25 Interesting features of PostgreSQL 12 · 1 © 2019 Percona Jobin Augustine 25 Interesting features of PostgreSQL 12 PostgreSQL 12 Senior Support Engineer - PostgreSQL Percona PostgreSQL

Hyperstage Database for PostgreSQL Reference Guide …infocenter.informationbuilders.com/...PostgreSQL/hyperstagepost.pdf · Hyperstage Database for PostgreSQL Reference Guide

Future In-Core Replication for PostgreSQL - PostgreSQL wiki

PostgreSQL 9 · PostgreSQL 9.5 Postgres Open 2015 Dallas, TX Magnus Hagander magnus@hagander.net. Magnus Hagander •PostgreSQL •Core Team member •Committer •PostgreSQL Europe

Manually Upgrading PostgreSQL 9.1to PostgreSQL 9.4

Migration to PostgreSQL Preparation and · PDF fileMigration to PostgreSQL Preparation and Methodology ... Data types PostgreSQL supported Data ... your PostgreSQL application ugly

Securing PostgreSQL Exploring the PostgreSQL STIG · PDF fileSecuring PostgreSQL { Exploring the PostgreSQL STIG and ... $ systemctl restart postgresql-9.5 ... Securing PostgreSQL

Toasting and Drying Oilseed Meals. Which is the Best ... · Solvent Extraction . Toasting and Drying Oilseed Meals. Which is the Best Approach ? 1) Desolventization Basic Technology

Migration to PostgreSQL - preparation and … · Overview Oracle to PostgreSQL Informix to PostgreSQL MySQL to PostgreSQL MSSQL to PostgreSQL Replication and/or High Availability

The PostgreSQL Global Development Group · 2019-09-26 · PostgreSQL 9.5.19 Documentation The PostgreSQL Global Development Group. PostgreSQL 9.5.19 Documentation by The PostgreSQL

Elephant Puppets Deployment automation for PostgreSQL · Elephant Puppets Deployment automation for ... Deployment Automation ... Puppet Architecture Puppetmaster Server Puppet agent

Migration to PostgreSQL - preparation and methodology · 2020-02-02 · Overview Oracle to PostgreSQL Informix to PostgreSQL MySQL to PostgreSQL MSSQL to PostgreSQL Replication and/or

PokerStrategy.com Elephant Setup & Getting Started Guideresources.pokerstrategy.com/Software/el3phant/doc/ElephantSetupG… · The PokerStrategy.com Elephant needs a PostgreSQL database

Crunchy Enterprise PostgreSQL Support R2...PostgreSQL Support and Open Source Solutions Secure, high-availability PostgreSQL deployments Elastic, hybrid cloud PostgreSQL solutions

Расширяемость PostgreSQL для хакеров и архитекторовmegera/postgres/talks/PostgreSQL... · Расширяемость PostgreSQL для хакеров

PostgreSQL, PostgreSQL monitoring and monitoring postgresql · PostgreSQL, PostgreSQL monitoring and monitoring postgresql.org ... stefan@kaltenbrunner.cc Nagios conference 2008

Securing PostgreSQL · Securing PostgreSQL Christophe Pettus PostgreSQL Experts, Inc. PGConf EU Tallinn, November 2016

PostgreSQL: Introduction and Concepts - justpainjustpain.com/eBooks/Databases/PostgreSQL/PostgreSQL Introductio… · Title: PostgreSQL: Introduction and Concepts Author: Bruce Momjian

PostgreSQL and XML - Prague PostgreSQL Developers Day … · PostgreSQL and XML Peter Eisentraut petere@postgresql.org Prague PostgreSQL Developers’ Day 2008

PostgreSQL-IE: An Image-handling Extension for PostgreSQL