Analytic & Windowing functions in oracle

Analytic and Window Functions in Oracle

Logan Palanisamy

Agenda

Difference between aggregate and analytic functions

Introduction to various analytic functionsFunctions that are both aggregate and

analyticBreakMore examplesEnhanced Aggregation (CUBE, ROLLUP)

Meeting Basics

Put your phones/pagers on vibrate/mute

Messenger: Change the status to offline or in-meeting

Remote attendees: Mute yourself (*6). Ask questions via Adobe Connect.

Aggregates vs. Analytics

Aggregate functions Rows are collapsed. One row per group Non-Group-By columns not allowed in SELECT list.

Analytic functions Rows are not collapsed As many rows in the output as in the input No restrictions on the columns in the SELECT list Evaluated after joins, WHERE, GROUP BY, HAVING

clauses Nesting not allowed Can appear only in the SELECT or ORDER BY clause

analytic_aggr_diff.sql

Analytics vs. other methods

Show the dept, empno, sal and the sum of all salaries in their dept

Three possible ways Using Joins Using Scalar Sub-queries Using Analytic Functions

analytics_vs_others.sql

Anatomy of an analytic funcion

function (arg1, ..., argN) OVER ([partition_by_clause] [order_by_clause [windowing_clause]])

The OVER keywordpartition_by_clause: Optional. Not related to

table/index partitions. Analogous to GROUP BYorder_by_clause: Mandatory for Ranking and

Windowing functions. Optional or meaningless for others

windowing_clause: Optional. Should always be preceded by ORDER BY clause

Types of analytical functions

Ranking functionsFIRST_VALUE/LAST_VALUE/NTH_VALUEWindowing functionsReporting functionsLAG/LEADFIRST/LAST

Ranking Functions

ROW_NUMBER()RANK() – Skips ranks after duplicate ranksDENSE_RANK() – Doesn't skip rank after duplicate

ranksNTILE(n) – Sorts the rows into N equi-sized bucketsCUME_DIST() – % of rows with values lower or equal PERCENT_RANK() - (rank of row -1)/(#rows – 1)function OVER ([PARTITION BY <c1,c2..>] ORDER BY

<c3, ..>)PARTITION BY clause: OptionalORDER BY clause: Mandatoryrank_dense_rank.sql

FIRST_VALUE/LAST_VALUE/NTH_VALUE

Returns the first/last/nth value from an ordered set

FIRST_VALUE(expr, [IGNORE NULLS]) OVER ([partitonby_clause] orderby_clause)

IGNORE NULLS options helps you "carry forward". Often used in "Data Densification"

Operates on Default Window (unbounded preceding and current row) when a window is not explicitly specified.

NTH_VALUE introduced in 11gR2flnth_value.sql

Window functions

Used for computing cumulative/running totals (YTD, MTD, QTD), moving/centered averages

function(args) OVER([partition_by_clause] order_by_clause [windowing_clause])

ORDER BY clause: mandatory.Windowing Clause: Optional. Defaults to: UNBOUNDED

PRECEDING and CURRENT ROWanchored or sliding windowsTwo ways to specify windows: ROWS, RANGE[ROW | RANGE ] BETWEEN <start_exp> AND

<end_exp>window.sql

ROWS type windows

Physical offset. Number of rows before or after current row

Non deterministic results if rows are not sorted uniquely

Any number of columns in the ORDER BY clauseORDER BY columns can be of any typefunction(args) OVER ([partition_by_clause] order

by c1, .., cN ROWS between <start_exp> and <end_exp>)

windows_rows.sql

RANGE type Windows

Logical offsetnon-unique rows treated as one logical rowOnly one column allowed in ORDER BY

clauseORDER BY column should be numeric or datefunction(args) OVER ([partition_by_clause]

order by c1 RANGE between <start_exp> and <end_exp>)

windows_range.sql

Reporting function

Computes the ratio of a value to the sum of a set of values

RATIO_TO_REPORT(arg) OVER ([PARTITION BY <c1, .., cN>]

PARTITION BY clause: Optional. ratio_to_report.sql

LAG/LEAD

Gives the ability to access other rows without self-join.Allows you to treat cursor as an arrayUseful for making inter-row calculations (year-over-year

comparison, time between events)LEAD (expr, <offset>, <default value>) [IGNORE

NULLS] OVER ([partioning_clause] orderby_clause)Physical offset. Can be fixed or varying. default offset is 1default value: value returned if offset points to a non-

existent rowIGNORE NULLS determines whether null values of are

included or eliminated from the calculation. lead_lag.sql

FIRST/LAST

Very different from FIRST_VALUE/LAST_VALUEReturns the results of aggregate/analytic function

applied on column B on the first or last ranked rows sorted by column A

function (expr_with_colB) KEEP (DENSE_RANK FIRST/LAST ORDER BY colA) [OVER (<partitioning_clause)>)]

Slightly different syntax. Note the word KEEPanalytic clause is optional. first_last.sql

Above/Below average calculation

Find the list of employees whose salary is higher than the department average.

above_average.sql

Top-N queries

Find the full details of "set of" employees with the top-N salaries

Find the two most recent hires in each department

List the names and employee count of departments with the highest employee count

top_n.sql

Top-N% queries

List the top 5% of the customers by revenuetop_np.sql

Multi Top-N queries

For each customer, find out the maximum sale in the last 7 days the date of that sale the maximum sale in the last 30 days the date of that sale

multi_top.sql

De-Duping

Deleting duplicate recordsdedup.txt

Hypothetical Ranking

CUME_DIST, DENSE_RANK, RANK, PERCENT_RANK

Used for what-if analysishypothetical_rank.sql

Inverse Percentile functions

Return the value corresponding to a certain percentile (opposite of CUME_DIST)

PERCENTILE_CONT (continuous)PERCENTILE_DISC (discrete)PERCENTILE_CONT(0.5) is the same as

MEDIANinverse_p.sql

String Aggregation: LISTAGG, STRAGG

Concatenated string of values for a particular group (e.g. employees working in a dept)

Tom Kyte's STRAGG11gR2 has LISTAGG10g has COLLECTlistagg.sql

Pivoting/Unpivoting

Pivoting transposes rows to columns DECODE/CASE and GROUP BY used

Unpivoting Converts columns to rows Join the base table with a one column serial number

table

11gR2 introduced PIVOT and UNPIVOT clauses to SELECT

pivot.sql

Data Densification

Data normally stored in sparse form (e.g. No rows if there is no sales for a particular period)

Missing data needed for comparison (e.g. month-over-month comparison)

Data Densification comes in handyLAG (col, INGORE NULLS), and PARTITION BY

OUTER JOIN are used. http://hoopercharles.wordpress.com/

2009/12/07/sql-filling-in-gaps-in-the-source-data/

When not to use analytics

When a simple group by would do the jobwhen_not_to_use_analytics.sq

Drawback of analytics

Lot of sorting. Set

PGA_AGGREGATE_TARGET/SORT_AREA_SIZE appropriately

New versions reduce the number of sorts (same partition_by and order_by clauses on multiple analytic functions use single sort)

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:1137250200346660664

http://jonathanlewis.wordpress.com/2009/09/07/analytic-agony/

Recap of Analytic Functions

Analytic Functions: Were introduced in 8.1.6 (~1998) Are supported within PL/SQL only from 10g. Use "view"

or "dynamic sql" older versions. Compute the 'aggregates' while preserving the 'details' Eliminate the need for self-joins or multiple passes on

the same table Reduce the amount of data transferred between DB and

client. Can be used only in SELECT and ORDER BY clauses.

Use sub-queries if there is a need to filter. Are computed at the end - after join, where, group by,

having clauses

Advanced Aggregation

GROUP BY col1, col2GROUP BY ROLLUP(col1, col2)GROUP BY CUBE(col1, col2)GROUP BY GROUPING SETS ((col1, col2),

col1)

ROLLUP

GROUP BY ROLLUP(col1, col2)Generates subtotals automaticallyGenerally used in hierarchical dimensions

(region, state, city), (year, quarter, month, day)n + 1 different groupings where n is the

number of expressions in the ROLLUP operator in the GROUP BY clause.

Order of the columns in ROLLUP matter. ROLLUP(col1, col2), ROLLUP(col2, col1) produce different outputs

CUBE

GROUP BY CUBE(col1, col2)Gives subtotals automatically for every possible combinationUsed in cross-tabular reports.Suitable when dimensions are independent of each other2n different groupings where n is the number of expressions

in the CUBE operator in the GROUP BY clause. Have to be careful with higher values for nOrder of the columns in CUBE doesn’t really matter.

CUBE(col1, col2), CUBE(col2, col1) produce same results, but in a different order.

Grouping Sets

GROUP BY GROUPING SETS (col1, (col1, col2))

Explicitly lists the needed groupingsGROUPING, GROUPING_ID, GROUP_ID

functions help you differentiate one grouping from the other.

Advanced aggregation functions more efficient than their UNION ALL equivalents (why?)

advanced_agg.sql

Grouping Equivalent GROUPING SETS

CUBE(a,b) GROUPING SETS((a,b), (a), (b), ())

ROLLUP(a,b)

GROUPING SETS((a,b), (a), ())

ROLLUP(b,a)

GROUPING SETS((a,b), (b), ())

ROLLUP(a) GROUPING SETS((a), ())

Composite Columns

Treat multiple columns as a single columnComposite_columns.sql

Concatenated Groupings

GROUP BY GROUPING SETs (a,b), GROUPING SETS (c,d)

The above is same as GROUP BY GROUPING SETS ((a,c), (a,d), (b,c), (b,d))

References

http://orafaq.com/node/55http://orafaq.com/node/56http://www.orafaq.com/node/1874http://www.morganslibrary.org/reference/

analytic_functions.htmlhttp://morganslibrary.org/reference/rollu

p.htmlhttp://www.oracle.com/technology/orama

g/oracle/05-mar/o25dba.htmlhttp://www.gennick.com/magic.html

http://orafaq.com/node/55

http://orafaq.com/node/56

http://www.morganslibrary.org/reference/analytic_functions.html

http://www.morganslibrary.org/reference/analytic_functions.html

http://morganslibrary.org/reference/rollup.html

http://morganslibrary.org/reference/rollup.html

http://www.oracle.com/technology/oramag/oracle/05-mar/o25dba.html

http://www.oracle.com/technology/oramag/oracle/05-mar/o25dba.html

References

Chapter 12 of "Expert Database Architecture" by Tom Kyte

Business-Savy SQL by Ganesh Variar, Oracle magazine, Mar/April 2002

http://asktom.oracle.com/pls/asktom/asktom.search?p_string=rock+and+roll

http://forums.oracle.com/forums/search.jspa?q=analytic&objID=f75&dateRange=all&numResults=30&forumID=75&rankBy=10001&start=0







Q&A

devel_oracle@

Predicate merging in views with analytics

create view v select .. over(partition by ...) from t; select ... from v where col1 = 'A' In some cases predicates don't get merged. Reasons:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:12864646978683#30266389821111


http://forums.oracle.com/forums/thread.jspa?messageID=4169151&#4169151





Technology

Analytic & Windowing functions in oracle