23
© 2013 EDB All rights reserved. 1 Materialized views in PostgreSQL Ashutosh Bapat | 28 th March, 2014

Materialized views in PostgreSQL

  • Upload
    -

  • View
    1.211

  • Download
    4

Embed Size (px)

DESCRIPTION

Presentation introducing materialized views in PostgreSQL with use cases. These slides were used for my talk at Indian PostgreSQL Users Group meetup at Hyderabad on 28th March, 2014

Citation preview

Page 1: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 1

Materialized views in PostgreSQL

Ashutosh Bapat | 28th March, 2014

Page 2: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 2

Theoretical backgroundPostgreSQL's supportUse cases

Page 3: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 3

(SQL) View

● “Virtual relation” defined by a query

● Represents the result of the query

● Can be queried similar to a table

● Referencing view in a query, requires the defining query to be executed each time

View: emp_with_good_salary

SELECT emp_name

FROM emp

WHERE salary > 15000;

Table: emp

emp_name salaryKiran 10000

Mohan 20000

Leela 30000

Page 4: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 4

Materialized View (MV)

● A “view” with results of associated query stored in the database

● Referencing a materialized view does not require execution of the query

● Needs to be “maintained” to keep up with changes in underlying objects (tables or views)

● Can be indexed unlike non-materialized view

Table: emp

emp_name salaryKiran 10000

Mohan 20000

Leela 30000

MV: emp_with_good_salary

emp_name salary

Mohan 20000

Leela 30000

Page 5: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 5

Theoretical backgroundPostgreSQL's supportUse cases

Page 6: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 6

● Creation

– CREATE MATERIALIZED VIEW● Maintainance

– REFRESH MATERIALIZED VIEW● Destruction

– DROP MATERIALIZED VIEW● Supported from 9.3

● Enhancements in 9.4

– REFRESH MATERIALIZED VIEW CONCURRENTLY

Materialized Views in PostgreSQL

Page 7: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 7

● Lazy refresh

– Materialized view usually contains stale data

– REFRESH periodically or suitable independent of DML activity

● Aggressive refresh

– Materialized view contains latest data in serializable transactions and nearly fresh data at other isolation levels

– REFRESH using triggers/rules

Refreshing MV

Page 8: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 8

● Incremental refresh

– Refreshing only those rows affected by changes to the underlying table

– Being worked on community● Using Materialized views for query optimization

– Using MVs automatically● Auto-refresh

– Refreshing materialized view automatically when the underlying tables change

What's not supported in 9.4

Page 9: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 9

Theoretical backgroundPostgreSQL's supportUse cases

Page 10: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 10

Reporting using stale data

● Very frequently updated tables

● Approximate reports are fine

● Create materialized view/s for reporting queries

● Refresh every night or on weekly/monthly basis

Page 11: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 11

Reporting region-wise sales● Table schema

CREATE TABLE salesman(salesman_no integer PRIMARY KEY, name varchar(100), region varchar(100));CREATE TABLE invoice (invoice_no integer PRIMARY KEY, salesman_no integer REFERENCES salesman, invoice_amt numeric(13, 2), invoice_date date);

● Reporting QuerySELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10;

Page 12: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 12

Reporting region-wise sales

EXPLAIN ANALYZE SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10; QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=44294.16..44294.18 rows=10 width=234) (actual time=2609.868..2609.870 rows=10 loops=1) -> Sort (cost=44294.16..44294.66 rows=200 width=234) (actual time=2609.860..2609.861 rows=10 loops=1) Sort Key: (sum(i.invoice_amt)) Sort Method: top-N heapsort Memory: 26kB -> HashAggregate (cost=44287.84..44289.84 rows=200 width=234) (actual time=2609.347..2609.366 rows=26 loops=1) -> Hash Join (cost=559.84..39828.84 rows=891800 width=234) (actual time=29.751..1374.305 rows=1000000 loops=1) Hash Cond: (i.salesman_no = s.salesman_no) -> Seq Scan on invoice i (cost=0.00..15288.00 rows=891800 width=20) (actual time=0.048..398.745 rows=1000000 loops=1) -> Hash (cost=345.15..345.15 rows=5015 width=222) (actual time=29.602..29.602 rows=10000 loops=1) Buckets: 1024 Batches: 2 Memory Usage: 685kB -> Seq Scan on salesman s (cost=0.00..345.15 rows=5015 width=222) (actual time=0.009..5.221 rows=10000 loops=1) Total runtime: 2610.316 ms

Page 13: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 13

Reporting region-wise sales

CREATE MATERIALIZED VIEW sales_by_region AS SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region;

EXPLAIN ANALYZE SELECT * FROM sales_by_region ORDER BY region_sale LIMIT 10; QUERY PLAN--------------------------------------------------------------------------------------------------------------------------- Limit (cost=19.17..19.19 rows=10 width=250) (actual time=0.065..0.066 rows=10 loops=1) -> Sort (cost=19.17..19.89 rows=290 width=250) (actual time=0.064..0.064 rows=10 loops=1) Sort Key: region_sale Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on sales_by_region (cost=0.00..12.90 rows=290 width=250) (actual time=0.007..0.013 rows=26 loops=1) Total runtime: 0.094 ms(6 rows)

Page 14: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 14

Complex queries

● Relatively stable underlying tables

● Complex and slow running queries

● Bonus– Stale data not tolerable – use triggers to refresh

– Faster query results – use indexes on MV

Page 15: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 15

Shortest route problem

● Table schemaCREATE TABLE roads (source char, dest char, length numeric(5, 2));

● Slow queryWITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads WHERE source = 'A' UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)))SELECT * FROM paths WHERE dest = 'L' ORDER BY length LIMIT 1;

Page 16: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 16

SRP: without MV

EXPLAIN ANALYZE output WITH RECURSIVE paths (source, dest, length, path) AS ( ORDER BY length LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=686.43..686.43 rows=1 width=56) (actual time=897.159..897.159 rows=1 loops=1) CTE paths -> Recursive Union (cost=0.00..581.31 rows=4667 width=76) (actual time=0.039..720.175 rows=138640 loops=1) -> Seq Scan on roads (cost=0.00..27.52 rows=7 width=28) (actual time=0.036..0.061 rows=5 loops=1) Filter: (source = 'A'::bpchar) Rows Removed by Filter: 75 -> Hash Join (cost=2.28..46.04 rows=466 width=76) (actual time=9.528..38.388 rows=8665 loops=16) Hash Cond: (r.source = p.dest) Join Filter: (r.dest <> ALL (p.path)) -> Seq Scan on roads r (cost=0.00..24.00 rows=1400 width=28) (actual time=0.010..0.025 rows=80 loops=16) -> Hash (cost=1.40..1.40 rows=70 width=56) (actual time=9.159..9.159 rows=8665 loops=16) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> WorkTable Scan on paths p (cost=0.00..1.40 rows=70 width=56) (actual time=0.008..3.959 rows=8665 loops=16) -> Sort (cost=105.12..105.18 rows=23 width=56) (actual time=897.154..897.154 rows=1 loops=1) Sort Key: paths.length Sort Method: top-N heapsort Memory: 25kB -> CTE Scan on paths (cost=0.00..105.01 rows=23 width=56) (actual time=0.696..896.652 rows=912 loops=1) Filter: (dest = 'L'::bpchar) Rows Removed by Filter: 137728 Total runtime: 900.970 ms(20 rows)

Page 17: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 17

SRP: Materialized View

CREATE MATERIALIZED VIEW paths ASWITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)))SELECT * FROM paths;

EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN--------------------------------------------------------------------------------------------------------------------- Limit (cost=10623.33..10623.33 rows=1 width=56) (actual time=125.326..125.327 rows=1 loops=1) -> Sort (cost=10623.33..10623.35 rows=10 width=56) (actual time=125.324..125.324 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on paths (cost=0.00..10623.28 rows=10 width=56) (actual time=0.283..124.988 rows=912 loops=1) Filter: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Rows Removed by Filter: 281233 Total runtime: 125.377 ms(8 rows)

Page 18: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 18

SRP: MV with indexes

CREATE INDEX i_paths_source on paths(source, dest);

EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.80..31.80 rows=1 width=56) (actual time=1.265..1.265 rows=1 loops=1) -> Sort (cost=31.80..31.81 rows=7 width=56) (actual time=1.264..1.264 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paths (cost=4.49..31.76 rows=7 width=56) (actual time=0.327..0.982 rows=912 loops=1) Recheck Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) -> Bitmap Index Scan on i_paths_source (cost=0.00..4.49 rows=7 width=0) (actual time=0.304..0.304 rows=912 loops=1) Index Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Total runtime: 1.317 ms(9 rows)

Page 19: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 19

SRP: latest data using triggers

CREATE FUNCTION refresh_mvs() RETURNS trigger LANGUAGE plpgsql AS$$BEGIN REFRESH MATERIALIZED VIEW paths; RETURN NULL;END;$$;CREATE TRIGGER paths_trig AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON roads FOR EACH STATEMENT EXECUTE PROCEDURE refresh_mvs();

Page 20: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 20

SRP: latest data using triggers

SELECT * FROM paths WHERE source = 'T'; source | dest | length | path--------+------+--------+------(0 rows)

EXPLAIN ANALYZE INSERT INTO roads VALUES ('T', 'Z', 100.4); QUERY PLAN--------------------------------------------------------------------------------------------- Insert on roads (cost=0.00..0.01 rows=1 width=0) (actual time=0.033..0.033 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) Trigger paths_trig: time=9080.960 calls=1 Total runtime: 9081.028 ms(4 rows)

SELECT * FROM paths WHERE source = 'T'; source | dest | length | path--------+------+--------+------ T | Z | 100.4 | {}(1 row)

Page 21: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 21

Caching foreign data

● Materialized views on foreign tables– Data availability in case of foreign server failure

– Faster data access

– Possibly stale data

● Aggressive refresh– Triggers on foreign tables not supported

● Being discussed in the community

– External method for firing REFRESH when foreign data changes

● Lazy refresh– Fire REFRESH periodically

Page 22: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 22

Caching foreign data

postgres=# \d+ remote_emp Foreign table "public.remote_emp" Column | Type | Modifiers | FDW Options | Storage | Stats target | Description --------+-----------------------+-----------+-------------+----------+--------------+------------- empno | numeric(4,0) | | | main | | ename | character varying(10) | | | extended | | job | character varying(10) | | | extended | | Server: local_ppasFDW Options: (schema_name 'public', table_name 'emp')Has OIDs: no

postgres=# create materialized view cached_remote_emp as select * from remote_emp;

postgres=# explain analyze select * from cached_remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on cached_remote_emp (cost=0.00..16.90 rows=690 width=88) (actual time=0.020..0.024 rows=14 loops=1) Planning time: 0.076 ms Total runtime: 0.068 ms(3 rows)

postgres=# explain analyze select * from remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Foreign Scan on remote_emp (cost=100.00..131.93 rows=731 width=88) (actual time=0.834..0.836 rows=14 loops=1) Planning time: 0.077 ms Total runtime: 1.451 ms(3 rows)

Page 23: Materialized views in PostgreSQL

© 2013 EDB All rights reserved. 23

Thank you