12
Blackboard Intelligence HEA Core ETL Process HEA is the core ETL process for Blackboard Analytics modules. The ETL process is dynamic, and is controlled by populating several tables with meta data. This documents and describes the meta data used in the ETL process., and provides an overview of the ETL process. Contents Customizing the ETL Process ......................................................................................................................... 2 ETL Process Flow - Overview ......................................................................................................................... 3 Source Extract Phase..................................................................................................................................... 4 HEA.TargetTable ....................................................................................................................................... 4 HEA.TargetColumn .................................................................................................................................... 5 Processing Dimensions and Facts (Entities) .................................................................................................. 7 Entity Transformation Process (ETL Sub Process) ..................................................................................... 7 HEA.Entity ................................................................................................................................................. 8 HEA.SystemDimension.............................................................................................................................. 8 HEA.Dimension ......................................................................................................................................... 8 HEA.Fact .................................................................................................................................................... 9 HEA.EntityPrimaryKey ............................................................................................................................... 9 HEA.EntitySourceKey ................................................................................................................................ 9 HEA.FactDeleteControlKey ..................................................................................................................... 10 HEA.EntityHelper .................................................................................................................................... 10 ETL Helpers.................................................................................................................................................. 11 HEA.ETLHelper ........................................................................................................................................ 11 Set Dimension Members Active.................................................................................................................. 12

HEA Core ETL Process

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HEA Core ETL Process

Blackboard Intelligence

HEA Core ETL Process

HEA is the core ETL process for Blackboard Analytics modules. The ETL process is dynamic, and is controlled by populating several tables with meta data. This documents and describes the meta data used in the ETL process., and provides an overview of the ETL process.

Contents Customizing the ETL Process ......................................................................................................................... 2

ETL Process Flow - Overview ......................................................................................................................... 3

Source Extract Phase ..................................................................................................................................... 4

HEA.TargetTable ....................................................................................................................................... 4

HEA.TargetColumn .................................................................................................................................... 5

Processing Dimensions and Facts (Entities) .................................................................................................. 7

Entity Transformation Process (ETL Sub Process) ..................................................................................... 7

HEA.Entity ................................................................................................................................................. 8

HEA.SystemDimension .............................................................................................................................. 8

HEA.Dimension ......................................................................................................................................... 8

HEA.Fact .................................................................................................................................................... 9

HEA.EntityPrimaryKey ............................................................................................................................... 9

HEA.EntitySourceKey ................................................................................................................................ 9

HEA.FactDeleteControlKey ..................................................................................................................... 10

HEA.EntityHelper .................................................................................................................................... 10

ETL Helpers .................................................................................................................................................. 11

HEA.ETLHelper ........................................................................................................................................ 11

Set Dimension Members Active .................................................................................................................. 12

Page 2: HEA Core ETL Process

Customizing the ETL Process The ETL process has several points where custom components can be added. When adding a new custom object, the object should be added to the appropriate Custom schema to identify the object as a customization in the data warehouse. Baseline Schemas – objects part of the baseline deliverable

• Source – A copy of the source data extracted from an ERP

• Stage – Tables and objects used after the transform stage of the ETL Process

• Final – Tables and objects intended to be queried by reports

• HEA – Contains the core procedures, objects, and meta data used by the ETL process

Custom Schemas – used to identify customizations

• CustomSource – Custom tables identifying custom sources, contains a copy of the source data

extracted from an ERP

• CustomStage – Custom tables and objects used after the transform stage of the ETL Process

• CustomFinal – Custom tables and objects intended to be queried by reports

Page 3: HEA Core ETL Process

ETL Process Flow - Overview The flow of the ETL process can be broken down to a few main phases:

1. Source Extract – Load data from source system

2. Process Dimensions – Transform data into Dimension entities

3. Process Facts – Transform data into Fact entities

4. Process OLAP – Process Facts and Dims into OLAP DB.

Details of the ETL phases and other ETL processes are listed in the following diagram.

ETL Process Overview

Page 4: HEA Core ETL Process

Source Extract Phase The source extract is performed in parallel to other source extracts, so multiple tables will be extracted from the ERP at the same time. The source extract operations are setup using the meta data stored in HEA.TargetTable and

HEA.TargetColumn. The meta data defined in these tables will instruct the ETL process where the data should come from, and where it will be loaded. This meta data is checked before loading source, and the meta data (Tables and Columns) must exactly match the Source or CustomSource table’s table name and column names. No column can be omitted from the meta data if it exists in the Source or CustomSource table’s schema.

HEA.TargetTable

Defines table information for source extract. This meta data is used to dynamically build a source extract query.

Name The Source or CustomSource schema table where data will be loaded to.

SourceTable The name of the ERP table from where data will be extracted

SourceAlias Typically a single or two character string used when generating the source extract query. In iBBLA installs, this has further meaning to define Learn sources versus ERP sources.

DoConvert 1 if source system is Oracle, 0 if source system is MSSQL

LinkedServer The name of the linked server where the SourceTable resides

SourcePrefix The schema of the ERP source table

IsLoadEnabled 1 to load this source, 0 to not load the source

IsStandard 1 for baseline sources, 0 for custom sources. This is also used to determine if the data will be loaded to Source.<Name> or to CustomSource.<Name>

LoadType Either FULL or INCR. FULL will cause the Source or CustomSource table to be truncated before loading data; INCR does delete any data before loading. Setting to INCR (incremental) will require additional coding to handle incremental loads.

IsLinkedServer 1 if the source data is loaded from a Linked Server, 0 if data is loaded via the Learn web service extract method

Page 5: HEA Core ETL Process

HEA.TargetColumn

Defines column information for source extract. This meta data is used to dynamically build a source extract query.

Name The name of the column in the Source or CustomSource schema table where this data will be loaded to

TargetTableName The name of the Source or CustomSource schema table where data will be loaded to. This is a foreign key reference to HEA.TargetTable.Name

SourceColumnName The name of the column in the ERP, from where data will be extracted

IsLoadEnabled 1 if the column will get data from the ERP, 0 if the column will not be loaded ( value loaded as NULL)

IsStandard 1 for baseline columns, 0 for custom columns.

DoTrim 1 to truncate data value to match Source or CustomSource table column text data length; 0 to not truncate data.

The meta data is used to generate a query which will be used to extract the source data.

The following only applies when the data is sourced via a linked server setup, and should not be confused with extracting data from Learn via the web service extract method.

The stored procedure HEA.BuildClauses is executed to generate the dynamic query. The

resulting query can be easily viewed by querying HEA.ViewStatement. After the query is generated, the data is extracted by executing the query using the stored procedure HEA.ExtractSourceData.

This source extract can be manually invoked by executing the following SQL commands. In this case, a full extract for TargetTableName “PersonLookup” will be executed. exec HEA.BuildClauses

@DomainCode='Full Extract',

@TargetTableName='PersonLookup'

go

exec HEA.ExtractSourceData

@DomainCode='Full Extract',

@TargetTableName='PersonLookup'

go Query: Manually extract source data

Page 6: HEA Core ETL Process

If you needed to view the query used to extract the source data, you can execute the following query to see query used to extract the source data for the specified target table name. select CompositeFragment

from HEA.ViewStatement

where TargetTableName = 'PersonLookup'

order by Ordinal Query: View source extract statement

A predicate can be defined to further limit the data being extracted from the source system. The predicate can be thought of as adding a where clause to the query.

Extracting source data which is sourced from Bb Learn and is extracted via the web service extract method does use the setup in HEA.TargetTable, but the load is not invoked in the same way as stated above. The Learn web service is invoked using a separate application, and loaded to the data warehouse using a bulk insert of data. To achieve this, there is a set of Source views which are used in place of the HEA.TargetColumn setup to load the data from a file to the appropriate Source schema table.

Page 7: HEA Core ETL Process

Processing Dimensions and Facts (Entities) Dimensions and Facts are considered entities in the ETL process. These entities follow the same sub process when transforming the entity from Source to Stage to Final table. The differences are primarily in how each phase processes these entities. When processing dimensions, all dimension entities process in parallel. This means the transformation logic for a dimension should not reference any other dimension. When processing facts, an execution order is observed. Facts will execute in parallel for fact entities that share the same execution order. This allows a later executing fact entity to reference a prior executed fact in its transformation logic. Entity Transformation Process (ETL Sub Process) When processing an entity, the following steps occur (in order)

1. Execute Source type helpers (Stored Procedures) 2. Execute Transform view to populate Stage table 3. Execute Source Count view for comparing record counts 4. Execute Stage type helpers to further transform stage data 5. Execute data Quality Firewall routine to report errors and

warnings. Records identified as an error (ex: duplicates) will not be represented in the Final schema table.

6. Delete conflicting records from final table. 7. Load Final table with data from Stage table (except for records

flagged as an error) 8. Perform version snapshot (Facts only) 9. Execute Final type helpers 10. Execute order by routine (dimensions only)

An entity needs to be defined in the meta data before it can be processed. The entity will need to be registered in HEA.Entity, and in addition it must also be registered in the appropriate

entity type table: HEA.Dimension, HEA.SystemDimension, or HEA.Fact.

A System Dimension is a dimension that exists as a final table only. It does not go through a transformation process, but there are portions of the ETL process that can affect the member ordering and whether or not a member will be visible in the OLAP database.

Page 8: HEA Core ETL Process

HEA.Entity

Defines the existence of an entity that may be transformed during the ETL process. All entities(Dims and Facts) must be registered in HEA.Entity.

Code The name of the Entity, typically matching its related final table name. Examples: DimAgeBand, FactStudentTerm.

Type Fact, Dimension, or SystemDimension

FinalTable The final table for the entity, containing the table schema. Examples: CustomFinal.ShoeSize, Final.FactStudentTerm

IsActive 1 if the entity should be processed during the ETL execution, 0 if it should not process.

IsLogged 1 if the entity should be logged when processed, 0 if it should not be logged. If IsActive = 1, IsLogged should be 1.

IsCustom 1 if this is a custom entity, 0 if it is a baseline entity.

LoadType FULL or INCR. This will be set to FULL unless specific code is in place to use INCR. INCR prevents the delete from final statement from executing.

HEA.SystemDimension

Further defines attributes that are specific to a System Dimension entity.

EntityCode The entity code from HEA.Entity.Code

Type SystemDimension

ActiveConfig XML fragment indicating if the set active members routine should execute

OrderByConfig XML fragment indicating how the dimension members should be ordered

HEA.Dimension

Further defines attributes that are specific to a Dimension entity.

EntityCode The entity code from HEA.Entity.Code

Type Dimension

StageTable The table used as this dimension’s staging table. Examples: Stage.DimCourse, CustomStage.DimShoeSize.

TransformView The view used to populate the stage table. Examples: Stage.ViewDimCourseTransform, CustomStage.ViewDimShoeSizeTransform

SourceCountView The view used to assess record count as a data quality count during the ETL process. Examples: Stage.ViewDimCourseSourceCount, CustomStage.DimShoeSizeSourceCount.

DataQualityFirewallConfig XML setup or the entity’s data quality routine.

HistoryConfig XML setup for tracking changes in dimension members

OrderByConfig XML fragment indicating how the dimension members should be ordered

ActiveConfig XML fragment indicating if the set active members routine should execute

Page 9: HEA Core ETL Process

HEA.Fact

Further defines attributes that are specific to a Fact entity.

EntityCode The entity code from HEA.Entity.Code

Type Fact

StageTable The table used as this Fact’s staging table. Example: Stage.FactStudentTerm

TransformView The view used to populate the stage table. Example: Stage.ViewFactStudentTermTransform

SourceCountView The view used to assess record count as a data quality count during the ETL process. Example: Stage.ViewFactStudentTermSourceCount

DataQualityFirewallConfig XML setup or the entity’s data quality routine.

VersionType Value is specific to module. Used to determine how version snapshot should function for this entity.

ExecutionOrder Numerical order that facts will execute. Facts can share an execution order. Facts that share an execution order will process in parallel.

DisableForeignKeys Only one fact per execution order can have this set to 1. May cause deadlock issues, and should only be set in special circumstances.

Each entity also needs to have its primary key defined in the meta data. This is used in several phases of the entity processing. This value is used when processing the dimension order by and activating dimension members, and validating data when executing the data quality firewall routines.

HEA.EntityPrimaryKey

Defines the single column that is used as the entity’s primary key. This is typically an identity column on the final table.

EntityCode The entity code from HEA.Entity.Code

KeyColumn The primary key column for the entity. Typically the identity column on the entity’s final table. Example: ShoeSizeKey

The meta data in HEA.EntitySourceKey must be specified for Dimensions and Facts, it is

not used by System Dimensions. This meta data is utilized in several steps when processing entities, such as, validating data and reporting errors when executing the data quality firewall routines.

HEA.EntitySourceKey

Defines the source system values for an entity, one record per source column. This is the set of columns that identify the uniqueness of the records in the entity. When referencing a fact table this is considered its grain (granularity). Note: The VersionKey column should not be specified in this meta data.

EntityCode The entity code from HEA.Entity.Code

KeyColumn The column that contains the source values that identify uniqueness for records in the entity. Example: SourceKey Fact tables will typically have more than one source key column, each column is specified as a separate record.

Page 10: HEA Core ETL Process

Each fact needs to have a set of records defined in HEA.FactDeleteControlKey. This meta data determines which records will be deleted from the final fact table before loading the final fact data.

HEA.FactDeleteControlKey

Defines the columns used to determine what records need to be removed from the Final fact table before loading the final fact table. A separate row should be defined for each column used as a delete control column.

EntityCode The entity code from HEA.Entity.Code

DeleteControlColumn The fact table column used in the where clause when deleting records from the final fact table. “All” can be specified if the final fact table should be purged in full before loading. Example: TermKey

Additional transformation logic can be performed through the use of stored procedures (referred to as helpers in this document.) Helpers can be defined for an entity in the HEA.EntityHelper table. Any helper registered here will execute when the entity

processed. The helper will process during the phase specified, and in the execution order specified.

HEA.EntityHelper

Defines the stored procedures that execute when processing an entity.

EntityCode The entity code from HEA.Entity.Code

Type Source, Stage, Fact

ExecutionOrder Numerical order of execution

Helper Name of the stored procedure. Examples: Stage.HelperDimTermStage, CustomStage.HelperDimCourseStage

IsCustom 1 if a custom procedure, 0 if baseline procedure.

An entity can be processed manually by executing the HEA.ProcessEntity stored procedure. A debug mode can be defined to return the manifest following or during the execution (1: return at end of execution, 2: return at each step of execution.) The following statement will process the Course dimension and return the manifest after processing completes.

exec HEA.ProcessEntity @EntityCode='DimCourse', @DebugMode=1

Query: Manually processing an entity

Page 11: HEA Core ETL Process

ETL Helpers ETL Helpers were introduced in HEA version 4.1. ETL Helpers are phases where stored procedures can be executed between the main phases of the ETL process. The ETL Helper stages are illustrated in the ETL Process Overview image at the beginning of this document. The ETL Helpers execute at the following times:

1. Start – Before the extract phase

2. Extract – After the extract phase

3. EntityCopy – After the EntityCopy phase (Entity Copies are not discussed in this document.)

4. Dimension – After all the Dimension entites process

5. Fact – After all the Fact entities process

6. OLAP – After the OLAP DB is processed

The ETL Helpers are defined in HEA.ETLHelper. ETL Helpers can be enable or disabled, and they can share an execution order, so they can be executed in parallel.

HEA.ETLHelper

Defines the ETL Helpers used during the ETL process.

Helper Name of the stored procedure. Example: CustomSource.HelperCourseSource

Type Start, Extract, Dimension, Fact, OLAP

ExecutionOrder Numerical execution order with in the stage (type) the ETL helper is executed.

IsCustom 1 for custom helpers, 0 for baseline helpers.

IsEnabled 1 for enabled, 0 for disabled.

Page 12: HEA Core ETL Process

Set Dimension Members Active Before processing the OLAP database, the routine to mark dimension members active is executed. This routine compares the fact to dimension foreign key relationships to determine which members in the dimension table should have Active set to 1. This only happens if the dimension’s ActiveConfig property is set to 1, <SetActive>1</SetActive>. For this routine to function properly there must be a foreign key defined on the fact table to the appropriate dimension table. If you are processing dimensions and facts manually, and you also want to process the OLAP database manually; you will likely need to execute the HEA.SetDimensionAttributeActive stored procedure before processing the OLAP database. This stored procedure is executed using the following query.

exec HEA.SetDimensionAttributeActive

Query: Manually invoking the process to set dimension members active

Blackboard.com Copyright © 2017. Blackboard Inc. All rights reserved. Blackboard, the Blackboard logo, BbWorld, Blackboard Learn, Blackboard Transact, Blackboard Connect, Blackboard Mobile, Blackboard Collaborate, Blackboard Analytics, Blackboard Engage, Edline, the Edline logo, the Blackboard Outcomes System, Behind the Blackboard, and Connect-ED are trademarks or registered trademarks of Blackboard Inc. or its subsidiaries in the United States and/or other countries. Blackboard products and services may be covered by one or more of the following U.S. Patents: 8,265,968, 7,493,396; 7,558,853; 6,816,878; 8,150,925