B I - Extraction, Transformation & Loading - POWER POINT SHOW

Embed Size (px)

Citation preview

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    1/17

    xtractionTransformationandLoadingin

    USINESS INTELLEGENCEE T LBI

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    2/17

    Topics to be covered

    Overview of the ETL process

    Different Types of Source Systems

    Overview of Data Sources

    PSA

    Transformations / Rule Types

    DTPs / Infopackage

    Data Reconciliation

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    3/17

    Overview of a Generic ETL Process

    It is the process of taking raw data from a source system,

    applying transformation rules to it, and loading it to an

    InfoProvider (target).

    The ETL Process

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    4/17

    BI ETL Process BI Data Flow Details

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    5/17

    Source Systems in SAP

    A Source Systemis any system that isavailable to BI for data extraction and

    transfer purposes. Examples include

    mySAP ERP, mySAP CRM, custom

    system-based Oracle DB.

    Source System Types and Interfaces

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    6/17

    Overview of Data Source DataSourcesare BI Objects used to extract and

    stage data from source systems. It contains anumber of logically-related fields that are

    arranged in a flat structure (extraction structure)

    and contain data to be transferred into BI.

    There are 2 types of Data Sources-DataSource for Transaction Data-DataSource for Master Data

    DataSource for Attributes

    DataSource for Texts

    DataSource for Hierarchy

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    7/17

    Overview of Data Source Use

    - DataSources supply the metadata description of source data.- They are used to extract data from a source system and to

    transfer the data to the BI system

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    8/17

    Persistent Staging Area PSA)/ Infopackage Persistent Staging Area (PSA)

    is a transparent database table in which request data isstored

    is created per DataSource and Source system.

    It represents an initial store in BI, in which the requested data is saved

    unchanged from the Source System.

    Key Point

    It can be bypassed but its highly un recommended as it is verycrucial for Data Backup purpose.

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    9/17

    Transformation The transformation process allows you to

    consolidate, cleanse, and integrate data. The data

    can be semantically synchronized from

    heterogeneous sources.

    When you load data from one BI object into a further

    BI object, the data is passed through a

    transformation.

    A transformation converts the fields of the source

    into the format of the target.

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    10/17

    Transformation

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    11/17

    Transformation Creation of very simple to highly complex transformations is possible using

    Rule types

    Aggregation types

    Routines

    Rule Types

    A rule type determines whether and how a characteristic or key

    figure, or a data field or key field is updated into the target.Different rule types are as follows:

    KEY FIGURES

    Direct Assignment

    Formula

    No Transformation

    Routine

    Routine with Unit

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    12/17

    Transformation Rule Types

    CHARACERISTICS

    Constant

    Direct Assignment

    Formula

    Initial (Only for Key fields)

    Read Master Data

    Routine

    Aggregation Types

    Controls how a key figure or a data field is updated in the InfoProvider .

    For InfoCubes:

    Always Summation

    For DataStore Objects:

    Either Summation or Overwrite

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    13/17

    Transformation Routine Types

    Start Routine

    run for each data package at the start of the transformation

    has a table in the format of the source structure as input and output parameters.

    It is used to perform preliminary calculations and store these in a global data structure

    can modify or delete data in the data package

    End routine

    is a routine with a table in the target structure format as input and output parameters

    to post process data after transformation on a package-by-package basis.

    For e.g., records can be deleted that are not to be updated, or perform data checks

    Characteristic Routine

    This routine is available as a transformation rule for a key figure or a characteristic.

    The input and output values depend on the selected field in the transformation rule.

    Expert Routine

    Only intended for use in special cases if there are not sufficient functions to perform a

    transformation.

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    14/17

    Infopackage / Data TransferProcess DTP)Infopackage and DTPs initiate the Data Flow. Infopackage

    Infopackage is used to load data into the PSAfrom any source using the Data Source structure.

    An InfoPackage is a BI object that contains allthe settings directing exactly how this data should

    be uploaded from the source system.

    The target of the InfoPackage is the PSA table tied

    to the specific DataSource associated with theInfoPackage

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    15/17

    Infopackage / Data Transfer Process DTP)

    Data Transfer Process It is this object that controls the actual data flow (filters,

    update mode (delta or full) for a specific transformation.

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    16/17

    Data LoadsData Loading is a two step process:

    The first process is loading the data from the source system. involves multiple steps that differ depending on which source system is involved.

    For example, if it is a SAP source system, a function call must be made to the other system, and an

    extractor program associated with the DataSource might be initiated.

    An InfoPackage is the BI object that contains all the settings directing exactly how this data should

    be uploaded from the source system.

    The target of the InfoPackage is the PSA table tied to the specific DataSource associated with the

    InfoPackage.

    The second process the data transfer process. DTP controls the actual data flow (filters, update mode (delta or full) for a specific transformation.

    There can more than one data transfer process if there are more than one transformation step ortarget in the ETL flow.

    Slide 16

  • 7/22/2019 B I - Extraction, Transformation & Loading - POWER POINT SHOW

    17/17

    Data Reconciliation

    Important aspect in ensuring the quality of

    data in BI is the consistency of the data

    Data reconciliation allows to check theintegrity of the loaded data, for example,

    comparing the totals of a key figure in the

    DataStore object with the corresponding

    totals that the PSA stores directly from thesource system.