38
<Insert Picture Here> Oracle Business Intelligence Enterprise Edition Some Guidelines for Data Modeling Kurt Wolff March 14, 2008

Data Modelling

Embed Size (px)

DESCRIPTION

DM

Citation preview

  • TopicsNormalized versus denormalized data modeling strategies. General approaches to modeling operational vs. star and snowflake schemas. How to create logical table sources when you have metrics from tables that don't join to a logical dimension. How, when and where to set outer joins. General OBIEE Metadata Modeling Best Practices.

  • Normalized vs. Denormalized: DefinitionA schema is said to be normalized when it minimizes data storage redundancy. All values depend on the key and only the key.

    Denormalized product table Notice, for example, that the value BigG is stored 10 times.

  • Normalized Model

    BigG stored only once

  • Normalized: Most Efficient For Data Inserts, UpdatesTherefore, popular in transaction (i.e. ERP) systems where key measure is transactions/secondDBAs are trained to normalize its in their genesQuasi-debates about whether data warehouses should have normalized table structures. Debates feature multiple gurus, analysts, etc.Therefore you may encounter normalized data warehouses, too.

  • Normalized Schemas and BIMore tables in FROM clauseMore tables to joinOptimizer able to pick best join strategy?

  • Common DWH SchemasStarSnowflakeConstellation3 NF (bring out your ERWIN diagram)

  • Commonly Encountered Views About Star SchemasBusiness intelligence schemas should be built as a single star i.e. all facts in a single fact tableDrilling across (facts from multiple stars) is technically difficult.

    Wouldn't it be easier if users just pointed their query tool at a single fact table ? If the metrics are frequently compared to one anotherit makes more sense to physically combine the data into a single fact table. Margy RossStar schemas are limiting Star schema designs in traditional [relational] databases require that business users declare all queries they are likely to run so that the appropriate dimensions and facts may be brought together. Each query run must fit within a single star schema, thus eliminating the ability to ask ad hoc or unplanned queries. Claudia Imhoff

  • OBI EE Provides Flexibility Multiple stars mapped within the business modelDrilling across (from measure in one fact table to measure in another fact table) is easyNo worries about chasm trapsNo worries about fan trapsAdd additional measures or starsStars can be at different grain

  • Another Advantage of Star SchemasIn Oracle, Star Join Transformations high performance joinsNot talked about much within Oracle (??)Requires bit-mapped indexes on fact table foreign keys (among other things)Has been used in analytic applications

  • Myth: Logical Schema Has to Be StarImporting snowflake does works quite nicely

    Demo

  • Snowflake Logical SchemasBenefitsCreate Dimension creates all levels and level keysEstimate levels works betterGet levels for aggregate tables works better

    DrawbacksMore complex business model -- more tables, joins, columns. Logical dimension columns are mapped to a single physical columnLogical joins dont cover as many physical joins

  • Importing Full 3NF Database As IsLikely to produce inconsistent business model that BI Server cannot navigateBridge tables Table self-joinsSingle table that has multiple roles

    Modeling is needed to dimensionalize the business model.All 3NF models can be dimensionalized.

  • DimensionalizeSeparate aggregatable from non-aggregatable columnsLogical dimension tables are collections of non-aggregatable columns whose values are functionally dependent on the logical table keyX is logical table key, Y is another attributeeach X value is associated with precisely one Y value Logical fact tables are collections of aggregatable columns (or columns defined by formulas that include aggregatable columns)Logical dimension tables have a 1:N relationship to fact tables (expressed in business model joins)

  • Dimensionalization CorollariesNo non-aggregatable columns in logical fact tablesNo logical fact table keysModel non-aggregatable columns as separate dimension tableSQL generated will reflect physical joinsLogical joins will determine join type (inner, outer)

  • SOP Modeling SequenceBegin with logical fact table (usually only one unless multi-user development expected)Build base measures mapped to sources at lowest grainAdd logical dimension tables and logical joinsCreate dimensions and hierarchiesAdd additional base measures (from higher grain sources) set aggregate levels of sourcesAdd aggregate sources (fact and dimension)Create compound measuresTestRenamePresentation layer (folders, column names) finalized but extendableSecurity (groups, authentication, permissions, initialization blocks, filters)

  • Operational Data SourcesCommon question: can we use BI Server to query operational data? For example, SAP?Operational system schema (3NF, 4NF, BCNF) not the only issue Operational system logic has to be duplicatedCan result in very complex SQL use SELECT objects in the physical layerSometimes table structure itself is an issue

  • Multiple Table Types in SAPTransparent. Can be read from outside SAP using SQL. Store transaction data. Query performance an issue unless you know indexes and access methods.Pooled : Logical tables that can be combined in a table pool (i.e. 10 1000 small tables stored in a single physical table). Data combined in one field. Store control data. Cannot be read from outside SAP.Cluster : Logical tables that are assigned to a table cluster (1-10 very large tables combined). Data combined in one field. Primarily used to store control data or temporary data. Cannot be read from outside SAP.

  • Complex SQLSee sample from Siebel forecastingMetadata development will take a lot of time

  • OLTP/DWH Fragmentation on TimeOLTP: Forecasting.Forecasts."Forecast Date" > VALUEOF("ETLRunDateMinusInterval")DWH: Forecasting.Forecasts."Forecast Date"
  • Join Elimination Rules

  • Inner Joins in LTSComplex Joins Not Eliminated

    select sum(T18915."Amount") as c1, T18912."Employee" as c2 from "EmpDept" T18909, "Employees" T18912, "Facts" T18915 whereComplexEmployeeAmount

  • Inner Joins in LTSK/FK Joins Eliminated Depending on Cardinalityselect sum(T18915."Amount") as c1, T18912."Employee" as c2 from "Employees" T18912, "Facts" T18915 where Key/FKEmployeeAmount

  • Inner Joins in LTSK/FK Joins Eliminated Depending on Cardinalityselect sum(T18915."Amount") as c1, T18912."Employee" as c2 from "Employees" T18912, "EmpDept" T19037, "Facts" T18915 where Key/FK, reversedEmployeeAmount

  • Outer Joins in Logical Table Sources Never Eliminated SQL generated so that BI Server can do OJ; OJ not supported in DB

  • Outer Joins Between Logical Tables Are Eliminatedselect sum(T18915."Amount") as c1, T18912."Employee" as c2 from "Employees" T18912, "Facts" T18915 where

  • Joins Across LTSsJoins Can Occur Across Dimension Table Sources But Not in Time DimensionNo need to have this in the CUSTOMERS logical table sourceIf time dimension, use aliases to avoid joins across sources

  • Outer Joins To Preserve Dimensions: Two OptionsOuter joins in the business modelResult in OJs in SQLJoins, if performed, will always be OJsOJ syntax can be ambiguous you may not get what you wantOJs can be expensive and SLOW

    Outer joins in result setsCreate pseudo-measure that will always return all dimension rowsInclude pseudo-measure in logical query (can be in a filter)Let BI Server do the outer join of result sets Lets users control when OJs occur

  • How to Preserve DimensionsDataDesired Output: All Months, All Products with Amount > 0, Show 0 Instead of NullFactsItemsMonths

  • Metadata Setup to Preserve DimensionsStrategy: Use fact-based partitioningFact exists for all Item/Month combinations. Set filter for Fact.BI Server will full outer join result setsUse IfNull function to convert Nulls to 0sFilter Items via subquery

    One row fact tableComplex join to Dummy where 1=1If Months is being used as a formal time dimension, the complex join is not allowed. Create K/FK join where K=1 for all rows, FK= 1.

  • Query SetupSubquery PreserveDim SubPrefer over setting up expensive outer joins in the metadata. Gives users control.

  • Metrics That Dont Join to Dimensions

  • Avoiding Errors Using an Empty TableOKNone of the fact tables are compatible with the query request

  • Avoiding Errors Using an Empty Table

  • Additional Thoughts re Best PracticesAdvice: dont pay much attention to Admin Tool consistency checker Best Practices. Analytic Apps repository is a good model (perhaps overly complex in number of logical fact tables)Only uses aliases in business model mappingConsistent naming conventions for aliases so they group together in a convenient way in the admin toolAverage aggregation rarely used. Use Sum/Count instead.Focus on usability in the presentation layer not too many things (
  • Alias Naming Conventions

  • Just because everything could be in a single business model doesnt mean it has to be!

  • One More ThingDont use BI Server time series functions Ago, ToDate unless the BI Server can function ship the Rank function to the database(s)!