21
The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.

The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd

Embed Size (px)

Citation preview

Tony Hasler, Anvil Computer Services Ltd.

The Model Clause explainedTony Hasler, UKOUG Birmingham 2012

Who is Tony Hasler?

Google Tony Hasler! My blog contains all the material from this presentation on

the front page Additional blog entry (October) on the model clause

Tony Hasler http://tonyhasler.wordpress.com

Health Warning about the model clause

No support for Active Session History (ASH) whilst on the CPU

Limited support for SQL Performance Monitor No session statistics related to modelling No trace events for diagnosis (that I can find) Performance is poor when data spills to disk Introduced in 10g but no new features in 11g The main example in chapter 22 (10g) or chapter 23 (11g)

in the Data Warehousing Guide (calculating mortgage amortization) is incorrect and always has been (mortgage fact ‘Payment’ should be ‘PaymentAmt’)

Perhaps this is not a flagship component! However, the model clause:

−Enables you to do stuff that cannot otherwise be (easily) done in SQL

−The parallelisation capabilities can be useful in some cases

Tony Hasler http://tonyhasler.wordpress.com

What is in this talk and what is not

IN:−What the model clause is−What it is used for and what it isn’t−Syntax and semantics of the main features−Recommendations on how to use the model clause−A real life example

Not in due to time constraints:−An exhaustive list of all model clause features−This can be found in the Data Warehousing Guide

Chapter 22 (10g) or Chapter 23 (11g)−No demos

Tony Hasler http://tonyhasler.wordpress.com

What the model clause is and what is it used for

Part of a query block meaning it can appear after any select keyword.

Provides spreadsheet like capabilities inside SQL Can perform unimplemented SQL features, e.g. moving

medians. => Just because something is inefficient to execute

doesn’t mean you don’t need to do it! Iteration is possible, such as with ‘Feedback loops’ (see

Example 4   Calculating Using Simultaneous Equations in the Data Warehousing guide)

A series of calculations within the model can be sequenced manually or dependencies can be determined by the engine (only use when necessary, not because of laziness)

Tony Hasler http://tonyhasler.wordpress.com

Model terms compared with Excel

Tony Hasler http://tonyhasler.wordpress.com

Rows/columns <-> Dimensions

Worksheets <-> Partitions

Formulas <-> RulesValues <-> Measures

Comparison of Model clause terms with Excel spreadsheet terms

Model Term Excel term Differences

PARTITION Worksheet It is possible for formulas in one Excel worksheet to reference cells in another. It is not possible for partitions in a model clause to reference each other.

DIMENSION Row and column In Excel there are always exactly two dimensions. In a model clause there are one or more.

MEASURE A cell value In Excel only one value is referenced by the dimensions. The model clause allows one or more measures.

RULES Formulas Nested cell references possible with models

Tony Hasler http://tonyhasler.wordpress.com

Overview of a query block with no model clause

Tony Hasler http://tonyhasler.wordpress.com

ORDER BY clause Scalar, aggregate, and analytic clauses allowed

Joins and outer join predicates Only scalar functions allowed

WHERE clause: inner join predicates and selection predicates

Only scalar functions allowed

Logically precedes

GROUP BY clause Only scalar functions allowed

Logically precedes

HAVING clause Scalar and aggregate functions allowed

DISTINCT

Scalar, aggregate, and analytic functions allowed

Logically precedes

Logically precedes

Analy

tic calcu

latio

ns

SELECT list

Logically precedes

Logically precedes

Overview of a query block with a model clause

Tony Hasler http://tonyhasler.wordpress.com

Joins, join predicates, selection predicates and GROUP BY clause

Only scalar functions allowed

HAVING clause Scalar and aggregate functions allowed

Logically precedes

Model PARTITION, DIMENSION and MEASURES clauses

Scalar, aggregate, and analytic functions allowed

Logically precedes

Model RULES clause Independent calculations

SELECT list Only scalar functions on model outputs allowed

Logically precedes

ORDER BY clause Only scalar functions on model outputs allowed

Logically precedes

Logically precedes

Analy

tic calcu

latio

ns

DISTINCT

Logically precedes

Recommendations on use of the model clause

Use factored subqueries to avoid mixing aggregate functions and analytic functions with the model clause

Use only scalar values as inputs to the model clause Perform all calculations in the model clause so that… The final select list is just a simple list of identifiers Be careful using the model clause in a sub query as the

CBO assumes that the cardinality of the output of a model clause is the same as the cardinality of the input.

Tony Hasler http://tonyhasler.wordpress.com

A simplified version of the real world problem

CREATE TABLE stock_holdings( customer_name VARCHAR2 (100) ,stock_name VARCHAR2 (100) ,business_date DATE ,VALUE NUMBER);

Calculate the moving one year average (mean) value and moving standard deviation, and zscore for each customer, stock, and business date.

Tony Hasler http://tonyhasler.wordpress.com

Mathematical terms loosely explained

The Average (mean) is a “typical value” The Standard Deviation is a “typical difference between

a value and the mean” The Zscore is the number of standard deviations that a

particular value deviates from the mean. In other words it is a measure of how unusual a value is.

Tony Hasler http://tonyhasler.wordpress.com

Problem 1: Oracle doesn’t support moving intervals of months or years properly!!!

WITH q1 AS ( SELECT DATE '2012-01-01' + ROWNUM mydate FROM DUAL CONNECT BY LEVEL <= 400) ,q2 AS (SELECT mydate ,COUNT (*) OVER ( ORDER BY mydate RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND CURRENT ROW) cnt FROM q1)SELECT * FROM q2 WHERE mydate = DATE '2013-01-10';

Result: MYDATE CNT

10/01/2013 367

Tony Hasler http://tonyhasler.wordpress.com

Model clause solution (1)

WITH q1 AS ( SELECT DATE '2012-01-01' + ROWNUM mydate, 0 cnt FROM DUAL CONNECT BY LEVEL <= 400)SELECT mydate, cnt FROM q1MODEL RETURN UPDATED ROWS DIMENSION BY (mydate) MEASURES (cnt) RULES (cnt [DATE '2013-01-10'] = COUNT (*)

[mydate BETWEEN ADD_MONTHS (CV () + 1, -12) AND CV ()]);

Tony Hasler http://tonyhasler.wordpress.com

Model clause solution (2)CREATE TABLE business_dates( business_date PRIMARY KEY NOT NULL ,business_days_in_year NOT NULL ,first_day_in_year NOT NULL ,business_day_number NOT NULL)ORGANIZATION INDEXASWITH q1 AS (SELECT DISTINCT business_date, 0 AS business_days_in_year, SYSDATE AS first_day_in_year, 0 AS business_day_number FROM stock_holdings)SELECT business_date ,business_days_in_year ,first_day_in_year ,business_day_number FROM q1MODELDIMENSION BY (business_date)MEASURES(business_days_in_year,first_day_in_year,business_day_number)RULES ( first_day_in_year[ANY] = ADD_MONTHS(CV(business_date)+1,-12), business_days_in_year[ANY] = COUNT(*)[business_date BETWEEN first_day_in_year[CV()] AND CV()], business_day_number[ANY] = ROW_NUMBER() OVER(ORDER BY BUSINESS_DATE));

Tony Hasler http://tonyhasler.wordpress.com

Execution plan for previous statement

-------------------------------------------------------| Id | Operation | Name |-------------------------------------------------------| 0 | CREATE TABLE STATEMENT | || 1 | LOAD AS SELECT | BUSINESS_DATES || 2 | SQL MODEL ORDERED | || 3 | VIEW | || 4 | HASH UNIQUE | || 5 | TABLE ACCESS FULL | STOCK_HOLDINGS || 6 | WINDOW (IN SQL MODEL) SORT| |-------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

Problem 2: avoiding data densification

Oracle recommends the use of partitioned outer joins for data densification to simplify analytic functions.

The model clause can also be used for data densification In my case, however, this would have multiplied the

number of rows enormously as customers tended to hold particular stocks for about 10% of the time.

We can use the model clause to avoid this by performing multiple calculations in sequence

Tony Hasler http://tonyhasler.wordpress.com

Model clause solutionSELECT customer_name,stock_name,business_date,first_day_in_year,VALUE,mov_avg,mov_stdd,zscore FROM stock_holdings sh, business_dates bd WHERE sh.business_date=bd.business_dateMODELPARTITION BY (customer_name,stock_name)DIMENSION BY (sh.business_date)MEASURES (VALUE,business_days_in_year, first_day_in_year,0 AS mov_avg, 0 AS mov_stdd, 0 AS zscore, 0 AS mov_sum, 0 AS mov_cnt)RULES (mov_sum[ANY] = SUM(VALUE)[business_date BETWEEN first_day_in_year[CV()]AND CV()],mov_cnt[ANY] = COUNT(*) [business_date BETWEEN first_day_in_year[CV()] AND CV()],mov_avg[ANY] = mov_sum[CV()]/ business_days_in_year[CV()],mov_stdd[ANY] = SQRT(( (VAR_POP(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()] *mov_cnt[CV()]) +(mov_sum[CV()]* (AVG(VALUE)[business_date BETWEEN first_day_in_year[CV()] AND CV()] - mov_avg[CV()]))) /business_days_in_year[CV()] ),zscore[ANY] = DECODE(mov_stdd[CV()],0,0,(VALUE[CV()]-mov_avg[CV()])/mov_stdd[CV()]));

Tony Hasler http://tonyhasler.wordpress.com

Parallel execution plan

--------------------------------------------------------| Id | Operation | Name | --------------------------------------------------------| 0 | SELECT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | | 3 | BUFFER SORT | | | 4 | SQL MODEL ORDERED | | | 5 | PX RECEIVE | | | 6 | PX SEND HASH | :TQ10000 | | 7 | NESTED LOOPS | | | 8 | PX BLOCK ITERATOR | | | 9 | TABLE ACCESS FULL| STOCK_HOLDINGS | |* 10 | INDEX UNIQUE SCAN | SYS_IOT_TOP_78729 | --------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

Summary

The model clause allows you to build your own analytical functions and/or your own analytical windows – amongst other things.

The model clause also allows you to parallelise calculations and can be useful even if the calculations are supported without the model clause

However, model clause aggregates will be slower than standard analytic functions

Performance degrades rapidly when partitions spill to disk.

Tony Hasler http://tonyhasler.wordpress.com

Questions

Tony Hasler http://tonyhasler.wordpress.com