Upload
rodrigo-radtke-de-souza
View
32
Download
2
Embed Size (px)
Citation preview
Data Warehouse 2.0 Master techniques for EPM guys
(powered by ODI)
Ricardo Giampaoli
Rodrigo Radtke
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMAbout the Speakers
Giampaoli, Ricardo
• Oracle Ace
• Master in Business Administration and IT management
• EPM Consultant @ Dell
• Essbase/Planning/OBIEE/ODI Certified Specialist
• Blogger @ devepm.com
Radtke, Rodrigo
• Oracle Ace
• Graduated in Computer Engineering
• Software Developer Sr. Advisor at Dell
• ODI, Oracle and Java Certified
• Blogger @ devepm.com
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMWhat we'll learn
• EPM Application Processes
• Traditional Data Warehouse
• DW for EPM Applications• Metadata Process
• Data Load Process
• Data Extract Process
• Oracle Partitioning
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMEPM Tools
• The architecture of EPM applications are very similar and for the simplicity purpose this presentation is going to use Planning/Essbase as example
• Three main possible processes that an EPM applications could have:• Metadata process: sync the metadata between the source system and the
EPM applications
• Data Load process: load the data to the EPM Applications
• Data Extract process: extract data from the EPM Applications
• Normally it is done manually or using a script to load a text file or SQL with all data/metadata to the EPM application.
Why this is not so good?
• Manual processes are always error prone
• Tons of files to load/manage
• Not centralized
• Not scalable for big environments
• Not change friendly
• Data quality issues
• Harder to achieve audit standards
• Not feasible for huge volume of data
All can be fixed by creating a
supporting Data Warehouse (DW)
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMTraditional Data Warehouse
• The DW should be implemented in a relational database (RDBMS) since they are more suitable for the Central Data Warehouse role than multidimensional databases (OLAP servers)
• The data model for the DW should be based on a dimensional design (Star Schema, Snow Flake or Hybrid) to facilitate integration and scalability, and provide greater performance for analytical processing. • No matter if is Star Schema, Snow Flake or Hybrid all models are based in
Dimensions that joins with a fact table thought PK’s and FK’s
• The DW can then provide data directly to other systems like EPM Applications
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMDW for EPM Applications
Traditional DW
• The data is spread over numerous tables
• The data is related between the table by PKs and FKs
• We can have different data in different tables that has no direct relationship
• We can query any table to get any data
• The metadata inside the tables has no meaning for the database (it’s just data)
EPM Applications
• The data will be confine inside a cube
• The data is directly related with the members of the dimensions
• It’s impossible to have a data that is not related with all dimensions
• To query we must inform at least one member of each dimension
• The metadata has a parent/child relationship, has a specific order and each member will behave depending on its dimension type
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMDW for EPM Applications
• The problem is: A DW for EPM applications should be totally different from a Traditional DW• EPM is already a “DW” since it has all dimensions on it and stores all data
inside the cubes
• We don’t need a Star Schema, Snow Flake or Hybrid model to manage dimensions inside EPM
• We can manage dimensions more efficiently using a “metadata repository”
• The relationship between the EPM apps and the outside systems are the members POV, and this information is already inside our data table
• We don’t need any PK’s or FK’s to our “metadata repository”
• We need to model our DW thinking about EPM concepts/needs
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMDW for EPM Applications
Fact
Dim 1
Dim 6
Dim 5
Dim 2
Dim 3
Dim 4
Cell
Dim
1Dim 2
Dim
4
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process
• The first process needs to be the Metadata Process since without the members in the EPM we cannot load data to the cubes
• Depending of the EPM application and the dimension we want to load we will have different properties and its values• But for all EPM suit we always have the member information like its parent,
type of storage, consolidation sign and more
• To create a good Metadata process we need to design our table in the most efficient way and for that we need to know what each EPM Applications requires
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process: Dimensions
• Planning/Essbase has 4 different types of Dimensions• Account
• Entity
• User Defined Dimension
• Attribute Dimension
• Each Dimension has its own properties but most of them are the same
Account Dimension Entity Dimension User Defined Dimension Attibute DimensionMember Member Member Member
Parent Parent Parent Parent
Alias: Default Alias: Default Alias: Default Alias: Default
Operation Operation Operation Operation
Valid For Consolidations Valid For Consolidations Valid For Consolidations
Data Storage Data Storage Data Storage
Two Pass Calculation Two Pass Calculation Two Pass Calculation
Description Description Description
Formula Formula Formula
UDA UDA UDA
Smart List Smart List Smart List
Data Type Data Type Data Type
Aggregation Aggregation Aggregation
Plan Type Plan Type Plan Type
Account Type
Time Balance
Skip Value
Exchange Rate Type
Variance Reporting
Source Plan Type
Base Currency
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process: Generic Table
• One Table to “rule” them all• Instead of having one
table per dimension, a generic table will have one unique column for each source (white)
• One extra column to identify to where that member belongs (yellow)
• Any other useful information (Orange)
Account Dimension Entity Dimension User Defined Dimension Attibute Dimension Metadata TableAccount Entity Products Prod_Attrib MEMBER
Parent Parent Parent Parent PARENT
Alias: Default Alias: Default Alias: Default Alias: Default ALIAS
Operation Operation Operation Operation OPERATION
Valid For Consolidations Valid For Consolidations Valid For Consolidations VALID_FOR_CONSOL
Data Storage Data Storage Data Storage DATASTORAGE
Two Pass Calculation Two Pass Calculation Two Pass Calculation TWOPASS_CALC
Description Description Description DESCRIPTION
Formula Formula Formula FORMULA
UDA UDA UDA UDA
Smart List Smart List Smart List SMARTLIST
Data Type Data Type Data Type DATA_TYPE
Aggregation Aggregation Aggregation CONS_PLAN_TYPE1
Plan Type Plan Type Plan Type PLAN_TYPE1
Account Type ACCOUNT_TYPE
Time Balance TIME_BALANCE
Skip Value SKIP_VALUE
Exchange Rate Type EXC_RATE
Variance Reporting VARIANCE_REP
Source Plan Type SRC_PLAN_TYPE
Base Currency CURRENCY
APP_NAME
DIM_TYPE
HIER_NAME
GENERATION
HAS_CHILDREN
POSITION
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process: Connect By
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process: Generic Table Benefits
• Centralized: metadata repository that contains all metadata for all EPM applications
• Scalable: architecture that can have any number of Metadata without need of changes
• Dynamic: can use generic objects to load any number of EPM applications
• Accessible: All metadata from all EPM application are easily available if needed (Data quality, queries, as metadata to other systems…)
• Performance: Table can be partitioned by Application or/and Hierarchy
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMMetadata Process: Overview
Sources
Oracle
Stage
Area
Table 1
Table 2
Table 3
Table 4
Table N
EPM
App
1
App
2
App
N
SQL Server
Teradata
Excel
XML
Metadata Table
Metadata
Generic Components
Send
Error
Handling
App
3
1
2
4
3
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMDW Powered by ODI: Metadata Process
• ODI can read the EPM application repositories to understand the structure and configuration of that application• Based on the repository ODI can create dynamic code
• ODI can tie out metadata from the source based on the application repository
• Metadata load becomes more efficient and powerful allowing better management of Moved Members, Attribute Member movement, Reorder sibling members, Deleted or move Shared Members
• No extra code to add new applications/dimensions
• Complete details at https://devepm.com/2014/12/18/otnarchbeat-publication/
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes
• To load data into any EPM Application we must inform one member for each dimension and the value we want to load
• Depending of the Application we can have more or less Dimensions but by default we have some standard dimensions that exists in all Apps• Accounts
• Entity
• Years
• Periods
• Scenario
• Version
• Currency
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPM
• We can create only one generic inbound table (Fact table) that contains one column from each planning dimension (Distinct of all dimensions from all Applications) to build a centralized structure to hold all data
Data Load Processes: Inbound Tables
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPM
• We can go further in the inbound design and create one column for each period• Smaller table (less rows) and faster to query
• Load performance greatly improved (one line has the entire year information)
• In either case we have• Centralized repository of data (easy to add
new applications)
• Scalable to all EPM Applications
• Data is reusable (No data replication)
• Generic objects (to load, error handling, email sending…)
Data Load Processes: Multi-Periods
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes: Pivot/Unpivot
• To use the Multi-Periods architecture we will need to have the ability to pivot and unpivot data• Most of the source system will not have the capability to provide data in
multi-period format as well to receive in this format
• To these we can use PIVOT/UNPIVOT command for Oracle DB• The PIVOT operator takes data in separate rows, aggregates it and converts it
into columns
• The UNPIVOT operator converts column-based data into separate rows
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes: Pivot
1. Define the columns to be Pivoted
2. Use an consolidation function on the data column1. SUM, AVG, MIN, MAX, COUNT…
2. Specify the data to me Pivoted
3. The data MUST be a constant in the “IN” Clouse
3. Data is Pivoted
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes: Unpivot
1. Define the columns to be UnPivoted
2. Select a name for the Data column and the Member column1. Specify the data to me UnPivoted
2. The data MUST be a constant in the “IN” Clouse
3. Data is UnPivoted
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes: Data Quality
• EPM applications does not like bad data• For example, if we try to load an invalid member in Essbase using ODI, it
switches to cell mode greatly impacting the load process performance
• Having just one metadata and inbound table makes the data quality process way simpler• All metadata is stored in a single place
• All data is stored in a single place
• Data quality check can be done for all applications in a single process
• Error handling/send email process are easy to create since everything is gathered in the same place
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPM
• With only one inbound generic table, we will have only one generic E$ table• Stores all the POV and the data that fails
the validation
• ODI_Cons_name, Interface_Name, App_Name, Cube and ODI_Sess_NOidentifies what was the error, from which package that error came from and to which application it should have loaded
DW Powered by ODI: Data Quality
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Load Processes: Overview
Sources
Oracle
Stage
Area
Table 1
Table 2
Table 3
Table 4
Table N
EPM
App
1
App
2
App
N
SQL Server
Teradata
Excel
XML
E$ Table
E$ Inbound
Generic
Inbound Table
Inbound
Generic
Generic Components
Send
Error
Handling
App
3
1
2
3
4
3
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Extract Processes
• The structure for the outbound table is the same as the inbound and the benefits are almost the same• Faster to export (mainly if is one year export and a BSO cube)
• Smaller table (less rows) and faster to query
• Centralized repository of data (easy to add new applications)
• Scalable to all EPM Applications
• Data is reusable (No data replication)
• Create views for the target system access the data
• In the same way that we have multi-periods in the inbound table we can have it in the outbound table
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMData Extract Processes: Overview
1 Outbound Table
Outbound
Generic
Generic Components
Send
Error
Handling
3
View
Layer
View 1
View 2
View 3
View 4
View N
Targets
Oracle
SQL Server
Teradata
EPM
App
1
App
2
App
N
App
3
2
4
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMOracle Partition
• Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data• Partitioning allows tables, indexes, and index-organized tables to
be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity
• Oracle provides a rich variety of partitioning strategies and extensions to address every business requirement
• Since it is entirely transparent, partitioning can be applied to almost any application without the need for potentially expensive and time consuming application changes.
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMOracle Partition: Types
Hash
Partitioning
H1
H2
H3
H4
Scenario
List
Partitioning
Actual
Forecast
Budget
Range
PartitioningPeriod
Jan to Mar
Apr to Jun
Jul to Sep
Oct to Dec
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMOracle Sub Partition: Types
• Composite Partitioning• Range-Range
• Range-Hash
• Range-List
• List-Range
• List-Hash
• List-List
Composite Partitioning
List - Range
Scenari
o
Actual
Forecast
Budget
Period
Jan to Mar Apr to Jun Jul to Sep Oct to Dec
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMDW Powered by ODI: Partitioning
• ODI can be used to Manage table partitions• Using command on source to query
All_Tab_Partitions and verify if the partition exists or not
• Using command on target to Truncate/Drop/Create partitions
• ODI can also manage Sub-partitions• Harder to maintain
• Better to use a composite key
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPMOverview of our environment
• 10000+ users around the world
• 24x7 operation
• 10+ source systems• 18 billions+ inserts/month
• 50 millions+ updates/month
• 60 millions+ deletes/month
• 14 thousand+ ODI sessions/month
DevEpm.com
@RZGiampaoli
@RodrigoRadtke
@DEVEPM
Ricardo Giampaoli – TeraCorp
Rodrigo Radtke de Souza - Dell
Thank you!