65

SQL Model Caluse

Embed Size (px)

DESCRIPTION

Sql Model Clause insight

Citation preview

  • Andy Witkowski, Architect Thomas Kyte, VP Oracle CorporationOracle Database 10g SQL Model Clause40166

  • Whats now in SQL for ModelingAggregation EnhancementsCube, Rollup, Grouping SetsNew aggregates: Inverse Distribution, FIRST/LAST,etcAnalytic FunctionsWindow Functions: Rank, Moving, CumulativeStatistical Functions: Correlation, Linear Regression,etcOld tools still have more modeling power than SQLSpreadsheets, MOLAP enginesSQL Model enhances SQL with modeling power

  • Case Study Modeling with Excel Excel fits well at the personal scaleUI and Formatting Calculations (build-in functions, formulas)What-If analysisExcel fits poorly at corporate scale for modelingCryptic row-column addressing No metadata, No standards, No mathematical model100s of spreadsheets and consolidation by handDoes not scale (1000s formulas, TB of data)Perpetual data exchange: databases->ExcelReplace Excel Modeling with SQL Modeling

  • Modeling with SQL ModelLanguage: Spreadsheet-like calculations in SQLInter-row calculation. Treats relations as an N-Dim arraySymbolic references to cells and their rangesMultiple Formulas over N-Dim arraysAutomatic Formula OrderingRecursive Model SolvingModel is a relation & can be processed further in SQLMultiple arrays with different dimensionality in one queryPerformanceParallel Processing in partitioning & formulasMultiple-self joins with one data access structureMultiple UNIONs with one data access structureWhy Better?Automatic Consolidation (models as views combine using SQL)Self Adjusting (as database changes no need to re-define)One version of truth (calc directly over data base, no exchange)

  • SQL ModelConcepts

  • Define Relation as ArraySELECT prod, time, s FROM salesRelationArray199920002001vcrdvdtvpcvcr 2001 9dvd 2001 0prodtime5 6 7 81 2 3 49 0 1 2DIMENSION BY (prod, time) MEASURES (s)prod time s

  • Define Business RulesRelationDIMENSION BY (prod, time) MEASURES (s)Array199920002001vcrdvdtvpcprodtime5 6 7 81 2 3 49 0 1 2RULES UPSERT( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] =AVG(s) [CV(prod), time
  • Evaluate Formulas 1st RelationDIMENSION BY (prod, time) MEASURES (s)Array199920002001vcrdvdtvpc1 2 3 49 0 1 2RULES UPSERT( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time
  • Evaluate Formulas 2nd RelationDIMENSION BY (prod, time) MEASURES (s)199920002001vcrdvdtvpc20022 4 6 81 2 3 49 0 1 211vcr 2001 9dvd 2001 0prod time sSELECT prod, time, s FROM salesSales in 2000 2x of previous yearPredict vcr sales in 2002Predict dvd sales in 2002RULES UPSERT( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time
  • Evaluate Formulas 3rd RelationDIMENSION BY (prod, time) MEASURES (s)199920002001vcrdvdtvpc20022 4 6 81 2 3 49 0 1 211 3vcr 2001 9dvd 2001 0prod time sSELECT prod, time, s FROM salesSales in 2000 2x of previous yearPredict vcr sales in 2002Predict dvd sales in 2002RULES UPSERT( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time
  • Return as Relation RelationDIMENSION BY (prod, time) MEASURES (s)199920002001vcrdvdtvpc20022 4 6 81 2 3 49 0 1 211 3Relation againvcr 2002 11dvd 2002 3vcr 2001 9dvd 2001 0Self-join.join + UNIONjoin + UNIONvcr 2001 9dvd 2001 0prod time sSELECT prod, time, s FROM salesRULES UPSERT( s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2, s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000], s[dvd, 2002] = AVG(s) [CV(prod), time
  • Model Clause ComponentsModel clause

  • Key Concepts (1)New SQL Model Clause: Data as N-dim arrays with DIMENSIONS & MEASURESData can be PARTITION-ed - creates an array per partitionFormulas defined over the arrays express a (business) modelFormulas within a Model: Use symbolic addressing using familiar array notationCan be ordered automatically based on dependency between cellsCan be recursive with a convergence condition recursive modelsCan UPDATE or UPSERT cells Support most SQL functions including aggregates

  • Key Concepts (2)Result of a SQL Model is a relationCan participate further in processing via joins, etc.Can define views containing Model computations SQL Model is the last query clauseExecuted after joins, aggregation, window functionsBefore ORDER BYMain Model and Reference ModelsCan relate models of different dimensionality

  • Formula Fundamentals (1) Formulas: SQL expressions over cells with aggs, functions, etc.Formula has a left and right side and represents assignments[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000] single refs[vcr, 2002] = AVG(s)[vcr, t
  • Formula Fundamentals (2) Function CV(dimension) propagates values from left to the right side. In example, products in 2002 are sum of two previous years. s[ANY, 2002] = s[CV(p), CV(t)-1] +s[CV(p), CV(t) 2]

  • Formula Fundamentals (2) Function CV(dimension) propagates values from left to the right side. In example, products in 2002 are sum of two previous years. s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) 2]

    Formula result can depend on processing order. Can specify order in each formula. E.g., shift by time:s[vcr, ANY] ORDER BY t = s[vcr, CV(t) - 1]

  • Formula Fundamentals (2) Function CV(dimension) propagates values from left to the right side. In example, products in 2002 are sum of two previous years. s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) 2]

    Formula result can depend on processing order. Can specify order in each formula. E.g., shift by time:s[vcr, ANY] ORDER BY t = s[vcr, CV(t) - 1]vcr 2001 300.00 0vcr 2002 350.00 300.00vcr 2003 400.00 vcr 2004 450.00 vcr 2005 500.00 ORDER BY t

  • Formula Fundamentals (2) Function CV(dimension) propagates values from left to the right side. E.g, products in 2002 are sum of two previous yearss[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) 2]

    Formula result can depend on processing order. Can specify order in each formula. E.g., shift by time:s[vcr, ANY] ORDER BY t = s[vcr, CV(t) - 1]vcr 2001 300.00 0vcr 2002 350.00 300.00vcr 2003 400.00 350.00vcr 2004 450.00 vcr 2005 500.00 ORDER BY t

  • Formula Fundamentals (2) Function CV(dimension) propagates values from left to the right side. E.g, products in 2002 are sum of two previous yearss[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) 2]

    Formula result can depend on processing order. Can specify order in each formula. E.g., shift by time:s[vcr, ANY] ORDER BY t = s[vcr, CV(t) - 1]vcr 2001 300.00 0vcr 2002 350.00 300.00vcr 2003 400.00 350.00vcr 2004 450.00 400.00vcr 2005 500.00 450.00ORDER BY t

  • Model Options Fundamentals rule optionsglobal options

  • NAV Options: Handling Sparse DataWest dvd 2001 300.00West tv 2002 500.00West vcr 2001 200.00West vcr 2002 400.00West dvd 2001 300.00West tv 2002 500.00West dvd 2003 -West tv 2003 500.00West vcr 2001 200.00West vcr 2002 400.00?2001keep nav

  • NAV Options: Handling Sparse Data West dvd 2001 300.00West tv 2002 500.00West vcr 2001 200.00West vcr 2002 400.00West dvd 2001 300.00West tv 2002 500.00West dvd 2003 300.00West tv 2003 500.00West vcr 2001 200.00West vcr 2002 400.00assume 0ignore nav

  • NAV Options: Handling Sparse Data West dvd 2001 300.00West tv 2002 500.00West vcr 2001 200.00West vcr 2002 400.00West dvd 2001 300.00West tv 2002 500.00West dvd 2003 -West tv 2003 500.00West vcr 2001 200.00West vcr 2002 400.00West dvd 2001 300.00West tv 2002 500.00West dvd 2003 300.00West tv 2003 500.00West vcr 2001 200.00West vcr 2002 400.00?2001assume 0ignore navkeep nav

  • Automatic Formula Ordering

  • Automatic Formula Ordering

  • Automatic Formula Ordering

  • UPDATE, UPSERT & PartitionsRegionProduct Time s East dvd 2001 100 Eastdvd 2002 150East vcr 2002 100West dvd 2001 200

  • UPDATE, UPSERT & Partitions RegionProduct Time Old s New sEast dvd 2001 100 100Eastdvd 2002 150 100East vcr 2002 100 120

    West dvd 2001 200 200updatedRegionProduct Time s East dvd 2001 100 Eastdvd 2002 150East vcr 2002 100West dvd 2001 200

  • UPDATE, UPSERT & Partitions RegionProduct Time Old s New sEast dvd 2001 100 100Eastdvd 2002 150 100East vcr 2002 100 120 East dvd 2003 - 250West dvd 2001 200 200West dvd 2003 - 200updatedupsertedRegionProduct Time s East dvd 2001 100 Eastdvd 2002 150East vcr 2002 100West dvd 2001 200

  • Different dimensions: Reference cp t sUSA dvd 2001 300.00 $USA tv 2001 500.00 $Poland vcr 2001 200.00 zlFrance vcr 2001 100.00 frc ratioUSA 1Poland 0.24France 0.12 Sales TableConv table converts currency to $Relate Models with different dimensions. Represent each as n-dimensional array: one main, others as reference or lookup arrays.

  • Different dimensions: ReferenceSales TableConv table converts currency to $USA dvd 2001 300.00 $USA tv 2001 500.00 $Poland vcr 2001 48.00 $France vcr 2001 12.00 $ Converted valuescp t sUSA dvd 2001 300.00 $USA tv 2001 500.00 $Poland vcr 2001 200.00 zlFrance vcr 2001 100.00 frc ratioUSA 1Poland 0.24France 0.12

  • Recursive Model Solving Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditionsIteration 1 2 3 4 5 6 7 8 S value 1024 512 128 64 32 16 8 4

  • Recursive Model Solving Iteration 1 2 3 4 5 6 7 8 S value 1024 512 128 64 32 16 8 4 Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditions

  • Recursive Model Solving Iteration 1 2 3 4 5 6 7 8 S value 1024 512 128 64 32 16 8 4 Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditions

  • Recursive Model Solving Iteration 1 2 3 4 5 6 7 8 S value 1024 512 128 64 32 16 8 4 Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditions

  • Recursive Model Solving with UntilIteration 1 2 3 4 5 6 7 8 9 10 S value 1024 512 256 128 64 32 16 8 4 2 previous(s[1])- s[1] = 512Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditions

  • Recursive Model Solving with Until previous(s[1])- s[1] = 256Iteration 1 2 3 4 5 6 7 8 9 10 S value 1024 512 256 128 64 32 16 8 4 2Model can contain cyclic (recursive) formulas. - If cyclic formulas desired, use ITERATE option - If ITERATE not present, cyclic formulas automatically detected, and an error reported. Use ITERATE clause to specify # of iterations orUse UNTIL clause to specify convergence conditions

  • Recursive Model Solving with Until Model can contain cyclic (recursive) formulas. They are automatically detected, and error is reported. Unless cycles are intentional which is indicated with ITERATE optionUse ITERATE clause to specify # of iterations orUse UNTIL to specify convergence conditions. Stop if true. previous(s[1])- s[1] = 4Iteration 1 2 3 4 5 6 7 8 9 10 S value 1024 512 256 128 64 32 16 8 4 2

  • SQL ModelBusiness Examples

  • Time Series Calculation (1)Compute the ratio of current month sales of each product to sales one year ago, one quarter ago and one month ago.time table: maps t to y_ago, q_ago, m_agoAssume: Sales cube with product sales per year, quarter, and month & a time table mapping periods to prior year, quarter and monthSales cube: prod sales per y, q, m

    ty_agoq_agom_ago1999-m011998-m011998-m101998-m121999-m021998-m021998-m111999-m011999-q011998-q011998-q04NULL1999-y1998-yNULLNULL

    tproductsales1999-m01vcr100.001999-m02vcr120.001999-q01vcr360.001999-yvcr2200.00

  • Time Series Calculation (2)SELECT product, sales, r_y_ago, r_q_ago, r_m_agoFROM sales_cube MODEL REFERENCE r ON (SELECT * from time) DIMENSION BY (t) MEASURES (y_ago, q_ago, m_ago) MAIN PARTITION BY (product) DIMENSION BY (t) MEASURES (sales, 0 r_y_ago, 0 r_q_ago, 0 r_m_ago) RULES ( r_y_ago[ANY] = s[CV(t)] / s[ y_ago[CV(t)] ], -- year ago r_q_ago[ANY] = s[CV(t)] / s[ q_ago[CV(t)] ], -- quarter ago r_m_ago[ANY] = s[CV(t)] / s[ m_ago[CV(t)] ] -- month ago); Reference model with Time table acts like look-up table CV carries values from the left side to the right side Without Model, you need 3 outer joins and a regular join

  • Time Series Calculation (3)Compute the ratio of current period sales of each product to sales a year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.Sales cube: prod sales per y, q, m

    tproductsalesr_y_agor_q_agoa_m_ago1999-m01vcr100.000.0500.2800.8301999-m02vcr120.001999-q01vcr360.001998-q04vcr370.001999-yvcr2200.002000-yvcr2100.00

  • Time Series Calculation (3)Sales cube: prod sales per y, q, mCompute the ratio of current period sales of each product to sales a year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.

    tproductsalesr_y_agor_q_agoa_m_ago1999-m01vcr100.000.0500.2800.8301999-m02vcr120.000.0550.3301999-q01vcr360.000.1600.970null1998-q04vcr370.001999-yvcr2200.002000-yvcr2100.00

  • Time Series Calculation (3)Sales cube: prod sales per y, q, mCompute the ratio of current period sales of each product to sales a year ago, quarter ago and a month ago. For each row, we use the reference Model to find 3 other rows.

    tproductsalesr_y_agor_q_agoa_m_ago1999-m01vcr100.000.0500.2800.8301999-m02vcr120.000.0550.330...1999-q01vcr360.000.1600.970null1998-q04vcr370.00null1999-yvcr2200.001.050nullnull2000-yvcr2100.00nullnull

  • Recursive Model Solving: Ledger (1)I want to have 30 % of my Net income as Interest (F1)My Net income is Salary minus Interest, minus Tax (F2)Taxes are 38% of Gross (salaryinterest) and 28% of Capital_gain (F3)In my ledger, I have accounts: Net income, Interest, Taxes, etc.

  • Recursive Model Solving: Ledger (1)I want to have 30 % of my Net income as Interest (F1)My Net income is Salary minus Interest, minus Tax (F2)Taxes are 38% of Gross (salaryinterest) and 28% of Capital_gain (F3)In my ledger, I have accounts: Net income, Interest, Taxes, etc. F1F2F3interestinterestnettaxtwo cycles in the formulas

  • Recursive Model Solving: Ledger (2)Account Balancesalary 100,000 capital_gains 15,000net 0tax 0interest 0Account Balancesalary 100,000 capital_gains 15,000net 100,000tax 42,220interest 30,000Input LedgerOutputIterate till accuracy of .01 In my ledger, I know Salary & Capital_gains. What are my Net income, Interest expense & Taxes?after 1st iteration

  • Recursive Model Solving: Ledger (2)Account Balancesalary 100,000 capital_gains 15,000net 27,800tax 30,800interest 8,340Input LedgerOutputIterate till accuracy of .01 In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?after 2nd iteration Account Balancesalary 100,000 capital_gains 15,000net 0tax 0interest 0

  • Recursive Model Solving: Ledger (2)Account Balancesalary 100,000 capital_gains 15,000net 48,735tax 36,644interest 14,620Input LedgerOutputIterate till accuracy of .01 In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?after reaching accuracy(26 iterations) Account Balancesalary 100,000 capital_gains 15,000net 0tax 0interest 0

  • Financial Functions: NPV NPV net present value of a series of periodic cash flows.Cash_Flow table

    yeariprodamountnpv19990vcr-100.0020001vcr12.0020012vcr10.0020023vcr20.0019990dvd-200.0020001dvd22.0020012dvd12.0020023dvd14.00

  • Financial Functions: NPV NPV net present value of a series of periodic cash flows.Cash_Flow tableamount[1]/power(1+rate,1) + npv[1-1]

    yeariprodamountnpv19990vcr-100.0020001vcr12.0020012vcr10.0020023vcr20.0019990dvd-200.0020001dvd22.0020012dvd12.0020023dvd14.00

  • Financial Functions: NPVNPV net present value of a series of periodic cash flows.Cash_Flow tableamount[2]/power(1+rate,2) + npv[2-1]

    yeariprodamountnpv19990vcr-100.0020001vcr12.0020012vcr10.0020023vcr20.0019990dvd-200.0020001dvd22.0020012dvd12.0020023dvd14.00

  • Financial Functions: NPVNPV net present value of a series of periodic cash flows.Cash_Flow tableamount[3]/power(1+rate,3) + npv[3-1]

    yeariprodamountnpv19990vcr-100.0020001vcr12.0020012vcr10.0020023vcr20.0019990dvd-200.0020001dvd22.0020012dvd12.0020023dvd14.00

  • Financial Functions: NPVNPV net present value of a series of periodic cash flows.Cash_Flow tableamount[i]/power(1+rate, i) + npv[i-1]

    yeariprodamountnpv19990vcr-100.0020001vcr12.0020012vcr10.0020023vcr20.0019990dvd-200.0020001dvd22.0020012dvd12.0020023dvd14.00

  • Financial Functions: NPV (2)NPV Net present value of a series of periodic cash flows.Cash_Flow table and npv for rate = 0.14

    yeariprodamountnpv19990vcr-100.00-100.0020001vcr12.00 -89.4720012vcr10.00-81.7820023vcr20.00-68.2819990dvd-200.00-200.0020001dvd22.00-180.7020012dvd12.00-171.4720023dvd14.00-162.02

  • SQL ModelPerformance

  • SQL Model Time Series Earlier example: ratio of sales to year, quarter and month ago SELECT product, sales, r_y_ago, r_q_ago, r_m_agoFROM sales_cube MODEL REFERENCE r ON (SELECT * from time) DIMENSION BY (t) MEASURES (y_ago, q_ago, m_ago) MAIN PARTITION BY (product) DIMENSION BY (t) MEASURES (sales, 0 r_y_ago, 0 r_q_ago, 0 r_m_ago) RULES ( r_y_ago[ANY] = s[CV(t)] / s[ y_ago[CV(t)] ], -- year ago r_q_ago[ANY] = s[CV(t)] / s[ q_ago[CV(t)] ], -- quarter ago r_m_ago[ANY] = s[CV(t)] / s[ m_ago[CV(t)] ] -- month ago); ANSI SQL version needs outer join for each formula plus a join for reference model. N formulas, M reference models N+M joins 4 joins in this example: sales_cube time sales_cube sales_cube sales_cube

  • SQL Model vs. ANSI Joins 501001502002503003501623451098711121314400Number of rules or joinsQuery response timeSQL ModelANSI joins

  • SummaryNew facility for spreadsheet-like computations in SQLHigh Performance Replaces multiple joins, unionsScalable in size and parallel processingPowerful optimizationsCollaborative analysisMove external processing such as spreadsheets into RDBMs for manageability and consolidation

  • Next Steps.Demonstration at Oracle DEMOgroundsExhibit hall, Booth 1326, Database AreaMonday: 5:00 PM - 8:00, Tuesday: 10:30 - 1:00, 3:00 - 6:00, Wednesday: 11:00 - 4:30, Thursday: 10:30 - 2:00 Hands-on LabMarriott Hotel - Golden Gate B1Lab Section: Use Information from your Data Warehouse Lesson 1: Using the SQL Model clauseMonday: 10:30 - 5:00, Tuesday: 8:30 - 12:30, 3:00 - 5:00, Wednesday: 8:30 - 4:30, Thursday: 8:30 - 2:30

  • Reminder please complete the OracleWorld online session survey

    Thank you.

  • A

  • 245691010101010101210111212121212121212121212121212121212121212121212121291212121212912124445