Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift
Insider Series
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Prasad Varakur, Product Manager, Redshift, AWS
Christian Romming, Founder and CEO, Etleap
May 2020
Accelerating performance
with Amazon Redshift
materialized views
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Prasad Varakur
Product Manager,
Amazon Redshift, AWS
Speakers
Christian RommingFounder and CEO, Etleap
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Introduction
• Use cases for Amazon Redshift materialized views
• materialized views – details
• Customer success story – by Eeteap
• Demo
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift benefitsTens of thousands of customers use Amazon Redshift and process over 2 EB of data per day
3x faster than other cloud data warehouses
Up to 75% less than other cloud data warehouses
Predictable costs
Lake formation catalog & security
Exabyte querying & AWS integrated (e.g., AWS DMS, Amazon CloudWatch)
AWS-grade security (e.g., VPC, encryption with AWS KMS, AWS CloudTrail)
Certifications such as SOC, PCI, DSS, ISO, FedRRAMP, HIPAA
Easy to provision & manage, automated backups, AWS support, and 99.9% SLAs
Virtually unlimited elastic linear scaling
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Robust result set
caching
Large # of tables support
~20,000
Copy command support
for ORC, ParquetIAM role chaining Elastic resize Groups
Amazon Redshift Spectrum: date
formats, scalar JSON and ION file
formats support, region expansion,
predicate filtering
Auto
analyze
Health and performance
monitoring w/Amazon
CloudWatch
Automatic table distribution
style
CloudWatch
support for
WLM queues
Performance enhancements:
hash join, vacuum, window
functions, resize ops, aggregations,
console, union all, efficient compile
code cache
Cost
controlsAuto WLM
~25 query monitoring
rules (QMR) support
200+new features in the past
18 months
AQUA (Advanced Query Accelerator)
Concurrency scaling DC1 migration to DC2Resiliency of
ROLLBACK processing
Manage multi-part query
in AWS console
Auto analyze for
incremental changes
on table
Spectrum Request
Accelerator
Apply new
distribution key
Amazon Redshift
Spectrum: Row group
filtering in Parquet and
ORC, Nested data support,
enhanced VPC routing,
multiple partitions
Faster classic
resize with optimized
data transfer
protocol
Performance: Bloom filters in
joins, complex queries that
create internal table,
communication layer
Amazon Redshift Spectrum:
Concurrency scaling
Integration with AWS
Lake Formation
Auto-vacuum sort,
auto-analyze, and
auto-table sort
Auto WLM with
query prioritiesSnapshot scheduler
Performance: Join
pushdowns to subquery,
mixed workloads temporary
tables, rank functions, null
handling in join, single row insert
Advisor recommendations
for distribution keysAZ64 compression
encoding
Console
redesign Stored procedures
Spatial processing Column level access
control with AWS lake
formationRA3
Performance of
inter-region
snapshot transfers
Federated
QueryMaterialized
views
Pause
and resume
Features Delivered to Meet Customer Needs
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Materialized Views
What are Materialized Views?
o New kind of database object, to consider in database modeling
o Combines benefits of tables and views
Designed to get orders of magnitude performance improvement
o Mileage varies depending on multiple factors
Stores pre-computed results of a query AND efficiently maintains it
o Converts a complex SPJA query into a simple select query
Typically, useful for pre-canned workloads
o Predictable and repeated query patterns, for example ETL, BI,
dashboards
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Materialized Views
1. Speed up queries by orders of magnitude
• For predictable workloads
• Save work with precomputed, materialized views
2. Simplify and accelerate maintenance of precomputed results
3. Easier and faster migration to Amazon Redshift
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case: ELT and BI
Amazon Redshift
Simplifies maintenance and boost perf of pre-aggregated tables and reporting tables
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speedup: Original query
item_key store_key cust_key price
i1 s1 c1 12.00
i2 s2 c1 3.00
i3 s2 c2 7.00
store_key owner loc
s1 Joe SF
s2 Ann Chicago
s3 Lisa SF
“Query: What were the total sales in SF?”
Loc total_sales
SF 12.00
salesstore_info
Join-Aggregate
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speedup: Materialized Views Precomputed Results
loc total_sales
SF 12.00
Chicago 10.00
loc_sales
item_key store_key cust_key price
i1 s1 c1 12.00
i2 s2 c1 3.00
i3 s2 c2 7.00
store_key owner loc
s1 Joe SF
s2 Ann Chicago
s3 Lisa SF
sales store_inf
o
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Defining a Materialized View
CREATE MATERIALIZED VIEW loc_sales DISTKEY (loc) AS (
SELECT si.loc AS loc, SUM(s.price) AS total_sales
FROM sales s, sales_info si
WHERE s.store_key = si.store_key
GROUP BY si.loc);
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Materialized Views Speedup Query
loc total_sales
SF 12.00
“Query: What were the total sales in SF?”
loc total_sales
SF 12.00
Chicago 10.00
loc_sales
SELECT loc, total_sales
FROM loc_sales
WHERE loc = “SF”;
Use the MV like a Table
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case: Decouple Read and Write Workloads Consider a hot table which is frequently updated and read
Create MV for the read queries, and the hot table acts as base table for writes
REFRESH the MV periodically
ReadersWriters
Readers
Writers
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Materialized Views
1. Speed up queries by orders of magnitude
2. Simplify and accelerate maintenance of precomputed results
• Fast refresh: Efficient, incremental
• Example: ETL/BI pipelines
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fast Refresh: Amazon Redshift Incrementally Maintains
item_key store_key cust_key price
i1 s1 c1 12.00
i2 s2 c1 3.00
i3 s2 c2 7.00
store_key owner loc
s1 Joe SF
s2 Ann Chicago
s3 Lisa SF
loc total_sales
SF 12.00
Chicago 10.00 sales
store_info
loc_sales
i1 s3 c3 5.00
i2 s2 c4 8.00
db> REFRESH MATERIALIZED VIEW loc_sales;
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fast Refresh: Amazon Redshift Incrementally Maintains
item_key store_key cust_key price
i1 s1 c1 12.00
i2 s2 c1 3.00
i3 s2 c2 7.00store_key owner loc
s1 Joe SF
s2 Ann Chicago
s3 Lisa SF
loc total_sales
SF 12.00+5.00
Chicago 10.00+8.00
sales
store_info
loc_sales
i1 s3 c3 5.00
i2 s2 c4 8.00
db> REFRESH MATERIALIZED VIEW loc_sales;
IncrementalChanges!
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fast Refresh: Amazon Redshift Incrementally Maintains
item_key store_key cust_key price
i1 s1 c1 12.00
i2 s2 c1 3.00
i3 s2 c2 7.00store_key owner loc
s1 Joe SF
s2 Ann Chicago
s3 Lisa SF
loc total_sales
SF 17.00
Chicago 18.00sales
store_info
loc_sales
db> REFRESH MATERIALIZED VIEW loc_sales;
i1 s3 c3 5.00
i2 s2 c4 8.00
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Refresh Types and Limitations Incremental refresh One or more tables: INNER JOIN
Aggregates: count(), sum()
Expressions, pure functions, WHERE, GROUP BY, HAVING
Recompute refresh Everything else: window functions, set operations, ORDER BY, etc.
MV fully recomputed (basically, a CTAS)
Unsupported MV Table types: external, views, other MV’s, temps, system tables
Function types: volatile, unstable
Clauses: order, limit, offset
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operations That Force Recompute on REFRESH
Vacuum
Truncate
Alter distkey
Alter sortkey
Dist all -> dist even (small table)
DDL operations on the base tables
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Materialized Views: System Tables
STV_MV_INFO
• The STV_MV_INFO table contains a row for every materialized
view, whether the data is stale, and state information.
STL_MV_STATE
• Contains a row for every state transition of a materialized view.
SVL_MV_REFRESH_STATUS
• View contains a row for the refresh activity of materialized views.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Materialized Views
1. Speed up queries by orders of magnitude
2. Simplify and accelerate maintenance of precomputed results
3. Easier and faster migration to Amazon Redshift
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Success story
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Web Store DW
Purchase User
LineItem Item
LoginEvents Logins
by User
Materialized
View
Redshift
schemastables
S3
Event
Logs
Purchase User
LineItem Item
* 1
* 1
1
*
MySQL
managed by Etleap
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speeding up Models @ AXS
1. Regular ingest of 3.8 M records per day
2. Before: CTAS (CREATE TABLE AS)
3. After: Materialized Views
Faster ingest times!
Time to ingest new records is constant
CTAS MV Speedup
Steady-State
Average371s 49s 7.9x
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary Combines benefits of tables/CTAS and views
Compute once, use multiple times, efficient maintenance
Typically, useful for ETL, BI, Dashboarding workloads
Boost performance of repeated and predictable queries
Simplify maintenance of precomputed results
User creates materialized views that use one or more tablesCREATE MATERIALIZED VIEW mv_name AS (<SELECT_query>);
Speed up queries by accessing materialized viewsSELECT * FROM mv_name WHERE ...;
REFRESH to incrementally maintain the materialized views
REFRESH MATERIALIZED VIEW mv_name;
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Resources• Amazon Redshift documentation:
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html
• Blog: Materialize your Amazon Redshift Views to Speed Up Query Execution: https://aws.amazon.com/blogs/aws/materialize-your-amazon-redshift-views-to-speed-up-query-execution/
• Blog: Speeding up Etleap models at AXS with Amazon Redshift materialized viewshttps://aws.amazon.com/blogs/big-data/speeding-up-etleap-models-at-axs-with-amazon-redshift-materialized-views/
• Blog: Speed up your ELT and BI queries with Amazon Redshift materialized viewshttps://aws-preview.aka.amazon.com/blogs/big-data/speed-up-your-elt-and-bi-queries-with-amazon-redshift-materialized-views/