Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

Embed Size (px)

Citation preview

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    1/46

    Introduction to Extraction Methods in Data Warehouses

    Extraction is the operation of extracting data from a source system for further use in a data

    warehouse environment. This is the first step of the ETL process. After the extraction,

    this data can be transformed and loaded into the data warehouse.

     The extraction method you should choose is highly dependent on the source system

    and also from the business needs in the target data warehouse environment. Very

    often there!s no possibility to add additional logic to the source systems to enhance

    an incremental extraction of data due to the performance or the increased wor"load

    of these systems. #ometimes even the customer is not allowed to add anything to

    an out$of$the$box application system.

     The estimated amount of the data to be extracted and the stage in the ET% process&initial load or maintenance of data' may also impact the decision of how to extract

    from a logical and a physical perspective. (asically you have to decide how to

    extract data logically and physically.

    %ogical Extraction Methods

     There are two "inds of logical extraction)

    *ull Extraction

    Incremental Extraction

    *ull Extraction

     The data is extracted completely from the source system. #ince this extraction

    re+ects all the data currently available on the source system there!s no need to"eep trac" of changes to the data source since the last successful extraction. The

    source data will be provided as$is and no additional logical information &for example

    timestamps' is necessary on the source site. ,n example for a full extraction may

    be an export -le of a distinct table or a remote #% statement scanning the

    complete source table.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    2/46

    Incremental Extraction

    ,t a speci-c point in time only the data that has changed since a well$de-ned

    event bac" in history will be extracted. This event may be the last time of extraction

    or a more complex business event li"e the last boo"ing day of a -scal period. To

    identify this delta change there must be a possibility to identify all the changed

    information since this speci-c time event. This information can be either provided

    by the source data itself li"e an application column re+ecting the last$changed

    timestamp or a change table where an appropriate additional mechanism "eeps

    trac" of the changes besides the originating transactions. In most cases using the

    latter method means adding extraction logic to the source system.

    Many data warehouses do not use any change$capture techni/ues as part of the

    extraction process. Instead entire tables from the source systems are extracted to

    the data warehouse or staging area and these tables are compared with a previous

    extract from the source system to identify the changed data. This approach may not

    have signi-cant impact on the source systems but it clearly can place a

    considerable burden on the data warehouse processes particularly if the data

    volumes are large.

    0racle!s 1hange Data 1apture mechanism can extract and maintain such delta

    information.

    #ee ,lso)

    1hapter 23 41hange Data 1apture4 for further details about the 1hange Data

    1apture framewor"

    5hysical Extraction Methods

    Depending on the chosen logical extraction method and the capabilities and

    restrictions on the source side the extracted data can be physically extracted by

    two mechanisms. The data can either be extracted online from the source system or

    from an o6ine structure. #uch an o6ine structure might already exist or it might be

    generated by an extraction routine.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    3/46

     There are the following methods of physical extraction)

    0nline Extraction

    06ine Extraction

    0nline Extraction

     The data is extracted directly from the source system itself. The extraction process

    can connect directly to the source system to access the source tables themselves or

    to an intermediate system that stores the data in a precon-gured manner &for

    example snapshot logs or change tables'. 7ote that the intermediate system is not

    necessarily physically di8erent from the source system.

    With online extractions you need to consider whether the distributed transactions

    are using original source ob9ects or prepared source ob9ects.

    06ine Extraction

     The data is not extracted directly from the source system but is staged explicitly

    outside the original source system. The data already has an existing structure &for

    example redo logs archive logs or transportable tablespaces' or was created by an

    extraction routine.

     :ou should consider the following structures)

    *lat -les

    Data in a de-ned generic format. ,dditional information about the source ob9ect is

    necessary for further processing.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    4/46

    Dump -les

    0racle$speci-c format. Information about the containing ob9ects is included.

    ;edo and archive logs

    Information is in a special additional dump -le.

    Data transformation is the process of converting data from one format &e.g. a

    database -le

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    5/46

     The structure of stored data may also vary between applications re/uiring semantic

    mapping prior to the transformation process. *or instance two applications might

    store the same customer credit card information using slightly di8erent structures)

    $ #ee more at) https)>>www.mulesoft.com>resources>esb>data$

    transformation?sthash.7@y7:#@.dpuf 

    Transportation in Data Warehouses

    The following topics provide information about transporting data into a data

    warehouse:

    • Overview of Transportation in Data Warehouses

    • Introduction to Transportation Mechanisms in Data Warehouses

    Overview of Transportation in Data Warehouses

    Transportation is the operation of moving data from one system to another system. In

    a data warehouse environment, the most common reuirements for transportation arein moving data from:

    • ! source system to a staging database or a data warehouse database

    • ! staging database to a data warehouse

    • ! data warehouse to a data mart

    Transportation is often one of the simpler portions of the "T# process, and can be

    integrated with other portions of the process. $or e%ample, as shown in &hapter '',("%traction in Data Warehouses(, distributed uery technology provides a mechanism for

     both e%tracting and transporting data.

    Introduction to Transportation Mechanisms in DataWarehouses

    https://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#11969https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13103https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#11969https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13103https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpuf

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    6/46

    )ou have three basic choices for transporting data in warehouses:

    • Transportation *sing $lat $iles

    • Transportation Through Distributed Operations

    • Transportation *sing Transportable Tablespaces

    Transportation Using Flat Files

    The most common method for transporting data is by the transfer of flat files, using

    mechanisms such as $T+ or other remote file system access protocols. Data is

    unloaded or e%ported from the source system into flat files using techniues discussed

    in &hapter '', ("%traction in Data Warehouses(, and is then transported to the target

     platform using $T+ or similar mechanisms.

    ecause source systems and data warehouses often use different operating systems

    and database systems, using flat files is often the simplest way to e%change data

     between heterogeneous systems with minimal transformations. -owever, even when

    transporting data between homogeneous systems, flat files are often the most efficient

    and most easytomanage mechanism for data transfer.

    Transportation Through Distributed Operations

    Distributed ueries, either with or without gateways, can be an effective mechanism

    for e%tracting data. These mechanisms also transport the data directly to the targetsystems, thus providing both e%traction and transformation in a single step.

    Depending on the tolerable impact on time and system resources, these mechanisms

    can be well suited for both e%traction and transformation.

    !s opposed to flat file transportation, the success or failure of the transportation is

    recogni/ed immediately with the result of the distributed uery or transaction.

    Transportation Using Transportable Tablespaces

    Oracle0i introduced an important mechanism for transporting data: transportable

    tablespaces. This feature is the fastest way for moving large volumes of data between

    two Oracle databases.

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13152https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12367https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12010https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13152https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12367https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12010https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    7/46

    +revious to Oracle0i, the most scalable data transportation mechanisms relied on

    moving flat files containing raw data. These mechanisms reuired that data be

    unloaded or e%ported into files from the source database, Then, after transportation,

    these files were loaded or imported into the target database. Transportable tablespaces

    entirely bypass the unload and reload steps.

    *sing transportable tablespaces, Oracle data files 1containing table data, inde%es, and

    almost every other Oracle database ob2ect3 can be directly transported from one

    database to another. $urthermore, li4e import and e%port, transportable tablespaces

     provide a mechanism for transporting metadata in addition to transporting data.

    Transportable tablespaces have some notable limitations: source and target systems

    must be running Oracle0i 1or higher3, must be running the same operating system,

    must use the same character set, and, prior to Oracle5i, must use the same bloc4 si/e.

    Despite these limitations, transportable tablespaces can be an invaluable data

    transportation techniue in many warehouse environments.

    The most common applications of transportable tablespaces in data warehouses are in

    moving data from a staging database to a data warehouse, or in moving data from a

    data warehouse to a data mart.

    ee !lso"

    Oracle9i Database Concepts for more information on transportabletablespaces

    Transportable Tablespaces Example

    6uppose that you have a data warehouse containing sales data, and several data marts

    that are refreshed monthly. !lso suppose that you are going to move one month of

    sales data from the data warehouse to the data mart.

    tep #" $lace the Data to be Transported into its own Tablespace

    The current month7s data must be placed into a separate tablespace in order to be

    transported. In this e%ample, you have a tablespace ts_temp_sales, which will hold acopy of the current month7s data. *sing the CREATE TABLE ... AS SELECT statement, the

    current month7s data can be efficiently copied to this tablespace:

    CREATE TABLE temp_jan_salesNOLOGGINGTABLESPACE ts_temp_salesAS

    https://docs.oracle.com/cd/A97630_01/server.920/a96524/c04space.htm#CNCPT003https://docs.oracle.com/cd/A97630_01/server.920/a96524/c04space.htm#CNCPT003

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    8/46

    SELECT * FROM salesWHERE time_id BETWEEN !"#$EC#"%%% AN$ &"#FEB#'&&&(

    $ollowing this operation, the tablespace ts_temp_sales is set to readonly:

    ALTER TABLESPACE ts_temp_sales REA$ ONL)(

    ! tablespace cannot be transported unless there are no active transactions modifying

    the tablespace. 6etting the tablespace to readonly enforces this.

    The tablespace ts_temp_sales may be a tablespace that has been especially created to

    temporarily store data for use by the transportable tablespace features. $ollowing (6tep

    8: &opy the Datafiles and "%port $ile to the Target 6ystem(, this tablespace can be set to

    read9write, and, if desired, the table temp_jan_sales can be dropped, or the tablespace

    can be reused for other transportations or for other purposes.

    In a given transportable tablespace operation, all of the ob2ects in a given tablespace

    are transported. !lthough only one table is being transported in this e%ample, the

    tablespace ts_temp_sales could contain multiple tables. $or e%ample, perhaps the data

    mart is refreshed not only with the new month7s worth of sales transactions, but also

    with a new copy of the customer table. oth of these tables could be transported in the

    same tablespace. Moreover, this tablespace could also contain other database ob2ects

    such as inde%es, which would also be transported.

    !dditionally, in a given transportabletablespace operation, multiple tablespaces can be transported at the same time. This ma4es it easier to move very large volumes of

    data between databases. ote, however, that the transportable tablespace feature can

    only transport a set of tablespaces which contain a complete set of database ob2ects

    without dependencies on other tablespaces. $or e%ample, an inde% cannot be

    transported without its table, nor can a partition be transported without the rest of the

    table. )ou can use the $BMS_TTS pac4age to chec4 that a tablespace is transportable.

    ee !lso"

    Oracle9i Supplied PL/SQL Packages and Types Reference for detailed

    information about the $BMS_TTS pac4age

    In this step, we have copied the ;anuary sales data into a separate tablespace< however,

    in some cases, it may be possible to leverage the transportable tablespace feature

    without even moving data to a separate tablespace. If the sales table has been

     partitioned by month in the data warehouse and if each partition is in its own

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/appdev.920/a96612/d_tts.htm#ARPLS063https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/appdev.920/a96612/d_tts.htm#ARPLS063

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    9/46

    tablespace, then it may be possible to directly transport the tablespace containing the

    ;anuary data. 6uppose the ;anuary partition, sales_jan'&&&, is located in the

    tablespace ts_sales_jan'&&&. Then the tablespace ts_sales_jan'&&& could potentially

     be transported, rather than creating a temporary copy of the ;anuary sales data in

    the ts_temp_sales.

    -owever, the same conditions must be satisfied in order to transport the

    tablespace ts_sales_jan'&&& as are reuired for the specially created tablespace. $irst,

    this tablespace must be set to REA$ ONL). 6econd, because a single partition of a

     partitioned table cannot be transported without the remainder of the partitioned table

    also being transported, it is necessary to e%change the ;anuary partition into a separate

    table 1using the ALTER TABLE statement3 to transport the ;anuary data.

    The ECHANGE operation is very uic4, but the ;anuary data will no longer be a part of

    the underlying sales table, and thus may be unavailable to users until this data is

    e%changed bac4 into the sales table after the e%port of the metadata. The ;anuary data

    can be e%changed bac4 into the sales table after you complete step 8.

    tep %" Export the Metadata

    The "%port utility is used to e%port the metadata describing the ob2ects contained in

    the transported tablespace. $or our e%ample scenario, the "%port command could be:

    EP TRANSPORT_TABLESPACE+,TABLESPACES+ts_temp_sales

      FILE+jan_sales-dmp

    This operation will generate an e%port file, jan_sales-dmp. The e%port file will be

    small, because it contains only metadata. In this case, the e%port file will contain

    information describing the tabletemp_jan_sales, such as the column names, column

    datatype, and all other information that the target Oracle database will need in order to

    access the ob2ects in ts_temp_sales.

    tep &" 'op( the Datafiles and Export File to the Target (stem

    &opy the data files that ma4e up ts_temp_sales, as well as the e%port

    file jan_sales-dmp to the data mart platform, using any transportation mechanism forflat files.

    Once the datafiles have been copied, the tablespace ts_temp_sales can be set

    to REA$ WRITE mode if desired.

    tep )" Import the Metadata

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    10/46

    Once the files have been copied to the data mart, the metadata should be imported into

    the data mart:

    IMP TRANSPORT_TABLESPACE+, $ATAFILES+.d/.tempjan-0TABLESPACES+ts_temp_salesFILE+jan_sales-dmp

    !t this point, the tablespace ts_temp_sales and the table temp_sales_jan are

    accessible in the data mart. )ou can incorporate this new data into the data mart7s

    tables.

    )ou can insert the data from the temp_sales_jan table into the data mart7s sales table

    in one of two ways:

    INSERT .*1 APPEN$ *. INTO sales SELECT * FROM temp_sales_jan(

    $ollowing this operation, you can delete the temp_sales_jan table 1and even the

    entire ts_temp_sales tablespace3.

    !lternatively, if the data mart7s sales table is partitioned by month, then the new

    transported tablespace and the temp_sales_jan table can become a permanent part of

    the data mart. The temp_sales_jan table can become a partition of the data mart7s sales

    table:

    ALTER TABLE sales A$$ PARTITION sales_&&jan 2AL3ES

      LESS THAN 4TO_$ATE4&"#0e/#'&&&5dd#m6n#,,,,77(ALTER TABLE sales ECHANGE PARTITION sales_&&janWITH TABLE temp_sales_jan

    INCL3$ING IN$EES WITH 2ALI$ATION(

    Other Uses of Transportable Tablespaces

    The previous e%ample illustrates a typical scenario for transporting data in a data

    warehouse. -owever, transportable tablespaces can be used for many other purposes.

    In a data warehousing environment, transportable tablespaces should be viewed as a

    utility 1much li4e Import9"%port or 6=#>#oader3, whose purpose is to move large

    volumes of data between Oracle databases. When used in con2unction with paralleldata movement operations such as

    the CREATE TABLE ... AS SELECT and INSERT ... AS SELECT statements, transportable

    tablespaces provide an important mechanism for uic4ly transporting data for many

     purposes.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    11/46

    *oading and Transformation

    This chapter helps you create and manage a data warehouse, and discusses:

    • Overview of #oading and Transformation in Data Warehouses

    • #oading Mechanisms

    • Transformation Mechanisms

    • #oading and Transformation 6cenarios

    Overview of *oading and Transformation in Data

    WarehousesData transformations are often the most comple% and, in terms of processing time, the

    most costly part of the "T# process. They can range from simple data conversions to

    e%tremely comple% data scrubbing techniues. Many, if not all, data transformations

    can occur within an Oracle5i database, although transformations are often

    implemented outside of the database 1for e%ample, on flat files3 as well.

    This chapter introduces techniues for implementing scalable and efficient data

    transformations within Oracle5i. The e%amples in this chapter are relatively simple.

    ?ealworld data transformations are often considerably more comple%. -owever, thetransformation techniues introduced in this chapter meet the ma2ority of realworld

    data transformation reuirements, often with more scalability and less programming

    than alternative approaches.

    This chapter does not see4 to illustrate all of the typical transformations that would be

    encountered in a data warehouse, but to demonstrate the types of fundamental

    technology that can be applied to implement these transformations and to provide

    guidance in how to choose the best techniues.

    Transformation Flow

    $rom an architectural perspective, you can transform your data in two ways:

    • Multistage Data Transformation

    • +ipelined Data Transformation

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#11197https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13130https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13132https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13785https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13754https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14040https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#11197https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13130https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13132https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13785https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13754https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14040

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    12/46

    Multistage Data Transformation

    The data transformation logic for most data warehouses consists of multiple steps. $or 

    e%ample, in transforming new records to be inserted into a sales table, there may be

    separate logical transformation steps to validate each dimension 4ey.

    $igure '8' offers a graphical way of loo4ing at the transformation logic.

    Figure 13-1 Multistage Data Transformation

    Te%t description of the illustration [email protected] 

    When using Oracle5i as a transformation engine, a common strategy is to implement

    each different transformation as a separate 6=# operation and to create a separate,

    temporary staging table 1such as the

    tables ne8_sales_step" and ne8_sales_step' in $igure '8'3 to store the incremental

    results for each step. This loadthentransform strategy also provides a natural

    chec4pointing scheme to the entire transformation process, which enables to the

     process to be more easily monitored and restarted. -owever, a disadvantage to

    multistaging is that the space and time reuirements increase.

    It may also be possible to combine many simple logical transformations into a single

    6=# statement or single +#96=# procedure. Doing so may provide better

     performance than performing each step independently, but it may also introduce

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    13/46

    difficulties in modifying, adding, or dropping individual transformations, as well as

    recovering from failed transformations.

    $ipelined Data Transformation

    With the introduction of Oracle5i, Oracle7s database capabilities have beensignificantly enhanced to address specifically some of the tas4s in "T# environments.

    The "T# process flow can be changed dramatically and the database becomes an

    integral part of the "T# solution.

    The new functionality renders some of the former necessary process steps obsolete

    whilst some others can be remodeled to enhance the data flow and the data

    transformation to become more scalable and noninterruptive. The tas4 shifts from

    serial transformthenload process 1with most of the tas4s done outside the database3

    or loadthentransform process, to an enhanced transformwhileloading.

    Oracle5i offers a wide variety of new capabilities to address all the issues and tas4s

    relevant in an "T# scenario. It is important to understand that the database offers

    tool4it functionality rather than trying to address a onesi/efitsall solution. The

    underlying database has to enable the most appropriate "T# process flow for a

    specific customer need, and not dictate or constrain it from a technical

     perspective.$igure '8A illustrates the new functionality, which is discussed throughout

    later sections.

    Figure 13-2 Pipelined Data Transformation

    Te%t description of the illustration dwg0'@CB.gif 

    *oading Mechanisms

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htm

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    14/46

    )ou can use the following mechanisms for loading a warehouse:

    • 6=#>#oader 

    • "%ternal Tables

    • O&I and Direct+ath !+Is

    • "%port9Import

    +*,*oader 

    efore any data transformations can occur within the database, the raw data must

     become accessible for the database. One approach is to load it into the

    database. &hapter 'A, (Transportation in Data Warehouses(, discusses several techniues

    for transporting data to an Oracle data warehouse. +erhaps the most common

    techniue for transporting data is by way of flat files.

    6=#>#oader is used to move data from flat files into an Oracle data warehouse.

    During this data load, 6=#>#oader can also be used to implement basic data

    transformations. When using directpath 6=#>#oader, basic data manipulation, such

    as datatype conversion and simple N3LL handling, can be automatically resolved

    during the data load. Most data warehouses use directpath loading for performance

    reasons.

    Oracle7s conventionalpath loader provides broader capabilities for data

    transformation than a directpath loader: 6=# functions can be applied to any column

    as those values are being loaded. This provides a rich capability for transformations

    during the data load. -owever, the conventionalpath loader is slower than directpath

    loader. $or these reasons, the conventionalpath loader should be considered primarily

    for loading and transforming smaller amounts of data.

    ee !lso"

    Oracle9i Database Utilities for more information on 6=#>#oader 

    The following is a simple e%ample of a 6=#>#oader controlfile to load data into

    the sales table of the s9 sample schema from an e%ternal file s9_sales-dat. The

    e%ternal flat file s9_sales-dat consists of sales transaction data, aggregated on a daily

    level. ot all columns of this e%ternal file are loaded into sales. This e%ternal file will

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13131https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13134https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13135https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13136https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#1020https://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13131https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13134https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13135https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13136https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#1020https://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htm

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    15/46

    also be used as source for loading the second fact table of the s9 sample schema,

    which is done using an e%ternal table:

    The following shows the controlfile 4s9_sales-:tl3 to load the sales table:

    LOA$ $ATAINFILE s9_sales-datAPPEN$ INTO TABLE salesFIEL$S TERMINATE$ B) ; s?lld@ s9.s9 :6nt@6l+s9_sales-:tl di@e:t+t@e

    External Tables

    !nother approach for handling e%ternal data sources is using e%ternal tables.

    Oracle5is e%ternal table feature enables you to use e%ternal data as a virtual table that

    can be ueried and 2oined directly and in parallel without reuiring the e%ternal data to

     be first loaded in the database. )ou can then use 6=#, +#96=#, and ;ava to access the

    e%ternal data.

    "%ternal tables enable the pipelining of the loading phase with the transformation

     phase. The transformation process can be merged with the loading process without

    any interruption of the data streaming. It is no longer necessary to stage the data insidethe database for further processing inside the database, such as comparison or

    transformation. $or e%ample, the conversion functionality of a conventional load can

     be used for a directpath INSERT AS SELECT statement in con2unction with

    the SELECT from an e%ternal table.

    The main difference between e%ternal tables and regular tables is that e%ternally

    organi/ed tables are readonly. o DM# operations 13P$ATE9INSERT9$ELETE3 are

     possible and no inde%es can be created on them.

    Oracle5i7s e%ternal tables are a complement to the e%isting 6=#>#oader functionality,and are especially useful for environments where the complete e%ternal source has to

     be 2oined with e%isting database ob2ects and transformed in a comple% manner, or

    where the e%ternal data volume is large and used only once. 6=#>#oader, on the other 

    hand, might still be the better choice for loading of data where additional inde%ing of

    the staging table is necessary. This is true for operations where the data is used in

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    16/46

    independent comple% transformations or the data is only partially used in further

     processing.

    ee !lso"

    Oracle9i SQL Reference for a complete description of e%ternal table synta%and restrictions and Oracle9i Database Utilities for usage e%amples

    )ou can create an e%ternal table named sales_t@ansa:ti6ns_et, representing the

    structure of the complete sales transaction data, represented in the e%ternal

    file s9_sales-dat. The product department is especially interested in a cost analysis on

     product and time. We thus create a fact table named :6st in the sales 9ist6@, schema.

    The operational source data is the same as for the sales fact table. -owever, because

    we are not investigating every dimensional information that is provided, the data in

    the cost fact table has a coarser granularity than in the sales fact table, for e%ample, alldifferent distribution channels are aggregated.

    We cannot load the data into the cost fact table without applying the previously

    mentioned aggregation of the detailed information, due to the suppression of some of

    the dimensions.

    Oracle7s e%ternal table framewor4 offers a solution to solve this. *nli4e 6=#>#oader,

    where you would have to load the data before applying the aggregation, you can

    combine the loading and transformation within a single 6=# DM# statement, as

    shown in the following. )ou do not have to stage the data temporarily before insertinginto the target table.

    The Oracle ob2ect directories must already e%ist, and point to the directory containing

    the s9_sales-dat file as well as the directory containing the bad and log files.

    CREATE TABLE sales_t@ansa:ti6ns_et4  PRO$_I$ N3MBER475  C3ST_I$ N3MBER5  TIME_I$ $ATE5  CHANNEL_I$ CHAR4"75

      PROMO_I$ N3MBER475  =3ANTIT)_SOL$ N3MBER4!75  AMO3NT_SOL$ N3MBER4"&5'75  3NIT_COST N3MBER4"&5'75  3NIT_PRICE N3MBER4"&5'77ORGANIDATION ete@nal4  T)PE 6@a:le_l6ade@  $EFA3LT $IRECTOR) data_0ile_di@

    https://docs.oracle.com/cd/A97630_01/server.920/a96540/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96540/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htm

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    17/46

      ACCESS PARAMETERS4

      RECOR$S $ELIMITE$ B) NEWLINE CHARACTERSET 3SASCII  BA$FILE l6_0ile_di@s9_sales-/ad_t  LOGFILE l6_0ile_di@s9_sales-l6_t  FIEL$S TERMINATE$ B) ;

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    18/46

    )ou have the following choices for transforming data inside the database:

    • Transformation *sing 6=#

    • Transformation *sing +#96=#

    • Transformation *sing Table $unctions

    Transformation Using +*

    Once data is loaded into an Oracle5i database, data transformations can be e%ecuted

    using 6=# operations. There are four basic techniues for implementing 6=# data

    transformations within Oracle5i:

    • &?"!T" T!#" ... !6 6"#"&T !nd I6"?T 9>E!++"D>9 !6 6"#"&T

    • Transformation *sing *+D!T"

    • Transformation *sing M"?F"

    • Transformation *sing Multitable I6"?T

    '/E!TE T!0*E 111 ! E*E'T !nd I2E/T .,3!$$E2D,. ! E*E'T

    The CREATE TABLE ... AS SELECT statement 1&T!63 is a powerful tool for manipulating

    large sets of data. !s shown in the following e%ample, many data transformations can

     be e%pressed in standard 6=#, and &T!6 provides a mechanism for efficiently

    e%ecuting a 6=# uery and storing the results of that uery in a new database table.

    The INSERT 9>EAPPEN$>9 ... AS SELECT statement offers the same capabilities with

    e%isting database tables.

    In a data warehouse environment, &T!6 is typically run in parallel

    using NOLOGGING mode for best performance.

    ! simple and common type of data transformation is data substitution. In a data

    substitution transformation, some or all of the values of a single column are modified.

    $or e%ample, our sales table has a:9annel_id column. This column indicates whether

    a given sales transaction was made by a company7s own sales force 1a direct sale3 or

     by a distributor 1an indirect sale3.

    )ou may receive data from multiple source systems for your data warehouse. 6uppose

    that one of those source systems processes only direct sales, and thus the source

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13133https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13137https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13138https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14004https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13240https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14183https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14370https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13133https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13137https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13138https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14004https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13240https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14183https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14370

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    19/46

    system does not 4now indirect sales channels. When the data warehouse initially

    receives sales data from this system, all sales records have a N3LL value for

    the sales-:9annel_id field. These N3LL values must be set to the proper 4ey value. $or 

    e%ample, )ou can do this efficiently using a 6=# function as part of the insertion into

    the target sales table statement:

    The structure of source table sales_a:tiit,_di@e:t is as follows:

    S=LJ $ESC sales_a:tiit,_di@e:tName NllK T,pe############ ##### ################SALES_$ATE $ATEPRO$3CT_I$ N3MBERC3STOMER_I$ N3MBERPROMOTION_I$ N3MBERAMO3NT N3MBER=3ANTIT) N3MBER

    INSERT .*1 APPEN$ NOLOGGING PARALLEL *.INTO salesSELECT p@6d:t_id5 :st6me@_id5 TR3NC4sales_date75 S5p@6m6ti6n_id5 ?antit,5 am6nt

    FROM sales_a:tiit,_di@e:t(

    Transformation Using U$D!TE

    !nother techniue for implementing a data substitution is to use an 3P$ATE statement

    to modify the sales-:9annel_id column. !n 3P$ATE will provide the correct result.

    -owever, if the data substitution transformations reuire that a very large percentage

    of the rows 1or all of the rows3 be modified, then, it may be more efficient to use a&T!6 statement than an 3P$ATE.

    Transformation Using ME/4E

    Oracle7s merge functionality e%tends 6=#, by introducing the 6=# 4eyword MERGE, in

    order to provide the ability to update or insert a row conditionally into a table or out of 

    line single table views. &onditions are specified in the ON clause. This is, besides pure

     bul4 loading, one of the most common operations in data warehouse synchroni/ation.

    +rior to Oracle5i, merges were e%pressed either as a seuence of DM# statements oras +#96=# loops operating on each row. oth of these approaches suffer from

    deficiencies in performance and usability. The new merge functionality overcomes

    these deficiencies with a new 6=# statement. This synta% has been proposed as part of 

    the upcoming 6=# standard.

    When to Use Merge

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    20/46

    There are several benefits of the new MERGE statement as compared with the two other

    e%isting approaches.

    • The entire operation can be e%pressed much more simply as a single 6=#

    statement.

    • )ou can paralleli/e statements transparently.

    • )ou can use bul4 DM#.

    • +erformance will improve because your statements will reuire fewer scans of

    the source table.

    Merge Examples

    The following discusses various implementations of a merge. The e%amples assumethat new data for the dimension table products is propagated to the data warehouse

    and has to be either inserted or updated. The table p@6d:ts_delta has the same

    structure as p@6d:ts.

    Example # Merge Operation Using +* in Oracle5i

    MERGE INTO p@6d:ts t3SING p@6d:ts_delta sON 4t-p@6d_id+s-p@6d_id7WHEN MATCHE$ THEN3P$ATE SETt-p@6d_list_p@i:e+s-p@6d_list_p@i:e5t-p@6d_min_p@i:e+s-p@6d_min_p@i:eWHEN NOT MATCHE$ THENINSERT4p@6d_id5 p@6d_name5 p@6d_des:5p@6d_s/:ate6@,5 p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e72AL3ES4s-p@6d_id5 s-p@6d_name5 s-p@6d_des:5s-p@6d_s/:ate6@,5 s-p@6d_s/:at_des:5s-p@6d_:ate6@,5 s-p@6d_:at_des:5s-p@6d_stats5 s-p@6d_list_p@i:e5 s-p@6d_min_p@i:e7(

    Example % Merge Operation Using +* $rior to Oracle5i

    ! regular 2oin between source p@6d:ts_delta and target p@6d:ts.

    3P$ATE p@6d:ts tSET4p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    21/46

    p@6d_min_p@i:e7 +4SELECT p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_p@i:e 0@6m p@6d:ts_delta s WHERE s-p@6d_id+t-p@6d_id7(

    !n anti2oin between source p@6d:ts_delta and target p@6d:ts.

    INSERT INTO p@6d:ts tSELECT * FROM p@6d:ts_delta sWHERE s-p@6d_id NOT IN4SELECT p@6d_id FROM p@6d:ts7(

    The advantage of this approach is its simplicity and lac4 of new language e%tensions.

    The disadvantage is its performance. It reuires an e%tra scan and a 2oin of both

    the p@6d:ts_delta and the p@6d:tstables.

    Example & $re-5i Merge Using $*.+*

    CREATE OR REPLACE PROCE$3RE me@e_p@6:ISC3RSOR :@ ISSELECT p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5  p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5  p@6d_min_p@i:eFROM p@6d:ts_delta(:@e: :@@68t,pe(BEGIN  OPEN :@(  LOOP

      FETCH :@ INTO :@e:(  EIT WHEN :@n6t06nd(  3P$ATE p@6d:ts SET

    p@6d_name + :@e:-p@6d_name5 p@6d_des: + :@e:-p@6d_des:5p@6d_s/:ate6@, + :@e:-p@6d_s/:ate6@,5p@6d_s/:at_des: + :@e:-p@6d_s/:at_des:5p@6d_:ate6@, + :@e:-p@6d_:ate6@,5p@6d_:at_des: + :@e:-p@6d_:at_des:5p@6d_stats + :@e:-p@6d_stats5p@6d_list_p@i:e + :@e:-p@6d_list_p@i:e5

      p@6d_min_p@i:e + :@e:-p@6d_min_p@i:e  WHERE :@e:-p@6d_id + p@6d_id(

      IF S=Ln6t06nd THEN  INSERT INTO p@6d:ts4p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e7

      2AL3ES  4:@e:-p@6d_id5 :@e:-p@6d_name5 :@e:-p@6d_des:5 :@e:-p@6d_s/:ate6@,5

    :@e:-p@6d_s/:at_des:5 :@e:-p@6d_:ate6@,5:@e:-p@6d_:at_des:5 :@e:-p@6d_stats5 :@e:-p@6d_list_p@i:e5

    :@e:-p@6d_min_

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    22/46

    p@i:e7(  EN$ IF(  EN$ LOOP(  CLOSE :@(EN$ me@e_p@6:(.

    Transformation Using Multitable I2E/T

    Many times, e%ternal data sources have to be segregated based on logical attributes for 

    insertion into different target ob2ects. It7s also freuent in data warehouse

    environments to fan out the same source data into several target ob2ects. Multitable

    inserts provide a new 6=# statement for these 4inds of transformations, where data

    can either end up in several or e%actly one target, depending on the business

    transformation rules. This insertion can be done conditionally based on business rules

    or unconditionally.

    It offers the benefits of the INSERT ... SELECT statement when multiple tables are

    involved as targets. In doing so, it avoids the drawbac4s of the alternatives available

    to you using functionality prior to Oracle5i. )ou either had to deal

    with n independent INSERT ... SELECT statements, thus processing the same source

    data n times and increasing the transformation wor4load n times. !lternatively, you

    had to choose a procedural approach with a perrow determination how to handle the

    insertion. This solution lac4ed direct access to highspeed access paths available in

    6=#.

    !s with the e%isting INSERT ... SELECT statement, the new statement can be paralleli/edand used with the directload mechanism for faster performance.

    Example 13-1 Unconditional Insert 

    The following statement aggregates the transactional sales information, stored

    in sales_a:tiit,_di@e:t, on a per daily base and inserts into both the sales and

    the :6sts fact table for the current day.

    INSERT ALL  INTO sales 2AL3ES 4p@6d:t_id5 :st6me@_id5 t6da,5 S5 p@6m6ti6n_id5

    ?antit,_pe@_da,5 am6nt_pe@_da,7  INTO :6sts 2AL3ES 4p@6d:t_id5 t6da,5 p@6d:t_:6st5 p@6d:t_p@i:e7SELECT TR3NC4s-sales_date7 AS t6da,5

    s-p@6d:t_id5 s-:st6me@_id5 s-p@6m6ti6n_id5  S3M4s-am6nt_s6ld7 AS am6nt_pe@_da,5 S3M4s-?antit,7 ?antit,_pe@_da,5  p-p@6d:t_:6st5 p-p@6d:t_p@i:e  FROM sales_a:tiit,_di@e:t s5 p@6d:t_in06@mati6n p  WHERE s-p@6d:t_id + p-p@6d:t_id  AN$ t@n:4sales_date7+t@n:4s,sdate7  GRO3P B) t@n:4sales_date75 s-p@6d:t_id5

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    23/46

      s-:st6me@_id5 s-p@6m6ti6n_id5 p-p@6d:t_:6st5 p-p@6d:t_p@i:e(

    Example 13-2 Conditional !! Insert 

    The following statement inserts a row into the sales and :6st tables for all sales

    transactions with a valid promotion and stores the information about multiple identicalorders of a customer in a separate table:m_sales_a:tiit,. It is possible two rows

    will be inserted for some sales transactions, and none for others.

    INSERT ALLWHEN p@6m6ti6n_id IN 4SELECT p@6m6_id FROM p@6m6ti6ns7 THEN  INTO sales 2AL3ES 4p@6d:t_id5 :st6me@_id5 t6da,5 S5 p@6m6ti6n_id5

    ?antit,_pe@_da,5 am6nt_pe@_da,7  INTO :6sts 2AL3ES 4p@6d:t_id5 t6da,5 p@6d:t_:6st5 p@6d:t_p@i:e7WHEN nm_60_6@de@s J " THEN  INTO :m_sales_a:tiit, 2AL3ES 4t6da,5 p@6d:t_id5 :st6me@_id5  p@6m6ti6n_id5 ?antit,_pe@_da,5 am6nt_pe@_da,5  nm_60_6@de@s7SELECT TR3NC4s-sales_date7 AS t6da,5 s-p@6d:t_id5 s-:st6me@_id5

    s-p@6m6ti6n_id5 S3M4s-am6nt7 AS am6nt_pe@_da,5 S3M4s-?antit,7  ?antit,_pe@_da,5 CO3NT4*7 nm_60_6@de@s5  p-p@6d:t_:6st5 p-p@6d:t_p@i:eFROM sales_a:tiit,_di@e:t s5 p@6d:t_in06@mati6n pWHERE s-p@6d:t_id + p-p@6d:t_idAN$ TR3NC4sales_date7 + TR3NC4s,sdate7GRO3P B) TR3NC4sales_date75 s-p@6d:t_id5 s-:st6me@_id5

    s-p@6m6ti6n_id5 p-p@6d:t_:6st5 p-p@6d:t_p@i:e(

    Example 13-3 Conditional FI"#T Insert 

    The following statement inserts into an appropriate shipping manifest according to thetotal uantity and the weight of a product order. !n e%ception is made for high value

    orders, which are also sent by e%press, unless their weight classification is not too

    high. It assumes the e%istence of appropriate

    tables la@e_0@ei9t_s9ippin, ep@ess_s9ippin, and de0alt_s9ippin.

    INSERT FIRST  WHEN 4sm_?antit,_s6ld J "& AN$ p@6d_8ei9t_:lass 7 OR

    4sm_?antit,_s6ld J AN$ p@6d_8ei9t_:lass J 7 THEN  INTO la@e_0@ei9t_s9ippin 2AL3ES

    4time_id5 :st_id5 p@6d_id5 p@6d_8ei9t_:lass5 sm_?antit,_s6ld7  WHEN sm_am6nt_s6ld J "&&& THEN  INTO ep@ess_s9ippin 2AL3ES  4time_id5 :st_id5 p@6d_id5 p@6d_8ei9t_:lass5  sm_am6nt_s6ld5 sm_?antit,_s6ld7  ELSE  INTO de0alt_s9ippin 2AL3ES  4time_id5 :st_id5 p@6d_id5 sm_?antit,_s6ld7SELECT s-time_id5 s-:st_id5 s-p@6d_id5 p-p@6d_8ei9t_:lass5  S3M4am6nt_s6ld7 AS sm_am6nt_s6ld5

    S3M4?antit,_s6ld7 AS sm_?antit,_s6ld

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    24/46

    FROM sales s5 p@6d:ts pWHERE s-p@6d_id + p-p@6d_idAN$ s-time_id + TR3NC4s,sdate7GRO3P B) s-time_id5 s-:st_id5 s-p@6d_id5 p-p@6d_8ei9t_:lass(

    Example 13-$ Mixed Conditional and Unconditional Insert 

    The following e%ample inserts new customers into the customers table and stores all

    new customers with :st_:@edit_limit higher then GB@@ in an additional, separate

    table for further promotions.

    INSERT FIRST  WHEN :st_:@edit_limit J+ && THEN  INTO :st6me@s  INTO :st6me@s_spe:ial 2AL3ES 4:st_id5 :st_:@edit_limit7  ELSE  INTO :st6me@sSELECT * FROM :st6me@s_ne8(

    Transformation Using $*.+*

    In a data warehouse environment, you can use procedural languages such as +#96=#

    to implement comple% transformations in the Oracle5i database. Whereas &T!6

    operates on entire tables and emphasi/es parallelism, +#96=# provides a rowbased

    approached and can accommodate very sophisticated transformation rules. $or

    e%ample, a +#96=# procedure could open multiple cursors and read data from

    multiple source tables, combine this data using comple% business rules, and finally

    insert the transformed data into one or more target table. It would be difficult or

    impossible to e%press the same seuence of operations using standard 6=#statements.

    *sing a procedural language, a specific transformation 1or number of transformation

    steps3 within a comple% "T# processing can be encapsulated, reading data from an

    intermediate staging area and generating a new table ob2ect as output. ! previously

    generated transformation input table and a subseuent transformation will consume

    the table generated by this specific transformation. !lternatively, these encapsulated

    transformation steps within the complete "T# process can be integrated seamlessly,

    thus streaming sets of rows between each other without the necessity of intermediate

    staging. )ou can use Oracle5i7s table functions to implement such behavior.

    Transformation Using Table Functions

    Oracle5i7s table functions provide the support for pipelined and parallel e%ecution of

    transformations implemented in +#96=#, &, or ;ava. 6cenarios as mentioned earlier

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    25/46

    can be done without reuiring the use of intermediate staging tables, which interrupt

    the data flow through various transformations steps.

    What is a Table Function6

    ! table function is defined as a function that can produce a set of rows as output.!dditionally, table functions can ta4e a set of rows as input. +rior to Oracle5i,

    +#96=# functions:

    • &ould not ta4e cursors as input

    • &ould not be paralleli/ed or pipelined

    6tarting with Oracle5i, functions are not limited in these ways. Table functions e%tend

    database functionality by allowing:

    • Multiple rows to be returned from a function

    • ?esults of 6=# subueries 1that select multiple rows3 to be passed directly to

    functions

    • $unctions ta4e cursors as input

    • $unctions can be paralleli/ed

    • ?eturning result sets incrementally for further processing as soon as they are

    created. This is called incremental pipelining

    Table functions can be defined in +#96=# using a native +#96=# interface, or in ;ava

    or & using the Oracle Data &artridge Interface 1OD&I3.

    ee !lso"

     PL/SQL Users !uide and Reference for further information and Oracle9i

     Data Cartridge De"elopers !uide

    $igure '88 illustrates a typical aggregation where you input a set of rows and output a

    set of rows, in that case, after performing a S3M operation.

    Figure 13-3 Ta%le Function Example

    https://docs.oracle.com/cd/A97630_01/appdev.920/a96624/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#17164https://docs.oracle.com/cd/A97630_01/appdev.920/a96624/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#17164

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    26/46

    Te%t description of the illustration [email protected] 

    The pseudocode for this operation would be similar to:

    INSERT INTO 6tSELECT * FROM 4;Ta/le Fn:ti6n;4SELECT * FROM in77(

    The table function ta4es the result of the SELECT on In as input and delivers a set of

    records in a different format as output for a direct insertion into Ot.

    !dditionally, a table function can fan out data within the scope of an atomic

    transaction. This can be used for many occasions li4e an efficient logging mechanism

    or a fan out for other independent transformations. In such a scenario, a single staging

    table will be needed.

    Figure 13-$ Pipelined Parallel Transformation &it' Fanout 

    Te%t description of the illustration [email protected] 

    The pseudocode for this would be similar to:

    INSERT INTO ta@et SELECT * FROM 4t0'4SELECT *FROM 4t0"4SELECT * FROM s6@:e7777(

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htm

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    27/46

    This will insert into ta@et and, as part of t0", into Stae Ta/le " within the scope of

    an atomic transaction.

    INSERT INTO ta@et SELECT * FROM t0!4SELECT * FROM stae_ta/le"7(

    Example 13-( Ta%le Functions Fundamentals

    The following e%amples demonstrate the fundamentals of table functions, without the

    usage of comple% business rules implemented inside those functions. They are chosen

    for demonstration purposes only, and are all implemented in +#96=#.

    Table functions return sets of records and can ta4e cursors as input. esides

    the Sales Hist6@, schema, you have to set up the following database ob2ects before

    using the e%amples:

    REM 6/je:t t,pesCREATE T)PE p@6d:t_t AS OBECT 4

    p@6d_id N3MBER475p@6d_name 2ARCHAR'4&75

      p@6d_des: 2ARCHAR'4&&&75  p@6d_s/:ate6@, 2ARCHAR'4&75  p@6d_s/:at_des: 2ARCHAR'4'&&&7-  p@6d_:ate6@, 2ARCHAR'4&75  p@6d_:at_des: 2ARCHAR'4'&&&75  p@6d_8ei9t_:lass N3MBER4'75  p@6d_nit_60_meas@e 2ARCHAR'4'&75  p@6d_pa:_siQe 2ARCHAR'4!&75  spplie@_id N3MBER475  p@6d_stats 2ARCHAR'4'&75

      p@6d_list_p@i:e N3MBER45'75  p@6d_min_p@i:e N3MBER45'77(.CREATE T)PE p@6d:t_t_ta/le AS TABLE OF p@6d:t_t(.COMMIT(

    REM pa:ae 60 all :@s6@ t,pesREM 8e 9ae t6 9andle t9e inpt :@s6@ t,pe and t9e 6tpt :@s6@ :6lle:ti6nREM t,peCREATE OR REPLACE PACAGE :@s6@_PG as  T)PE p@6d:t_t_@e: IS RECOR$ 4

    p@6d_id N3MBER475p@6d_name 2ARCHAR'4&75

      p@6d_des: 2ARCHAR'4&&&75  p@6d_s/:ate6@, 2ARCHAR'4&75  p@6d_s/:at_des: 2ARCHAR'4'&&&75  p@6d_:ate6@, 2ARCHAR'4&75  p@6d_:at_des: 2ARCHAR'4'&&&75  p@6d_8ei9t_:lass N3MBER4'75  p@6d_nit_60_meas@e 2ARCHAR'4'&75  p@6d_pa:_siQe 2ARCHAR'4!&75

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    28/46

      spplie@_id N3MBER475  p@6d_stats 2ARCHAR'4'&75  p@6d_list_p@i:e N3MBER45'75  p@6d_min_p@i:e N3MBER45'77(  T)PE p@6d:t_t_@e:ta/ IS TABLE OF p@6d:t_t_@e:(  T)PE st@6n_@e0:@_t IS REF C3RSOR RET3RN p@6d:t_t_@e:(  T)PE @e0:@_t IS REF C3RSOR(EN$(.

    REM a@ti0i:ial 9elp ta/le5 sed t6 dem6nst@ate 0i@e "!#CREATE TABLE 6/s6lete_p@6d:ts_e@@6@s 4p@6d_id N3MBER5 ms 2ARCHAR'4'&&&77(

    The following e%ample demonstrates a simple filtering< it shows all obsolete products

    e%cept the p@6d_:ate6@, B6,s. The table function returns the result set as a set of

    records and uses a wea4ly typed ref cursor as input.

    CREATE OR REPLACE F3NCTION 6/s6lete_p@6d:ts4:@ :@s6@_p-@e0:@_t7RET3RN p@6d:t_t_ta/le

    IS  p@6d_id N3MBER47(

    p@6d_name 2ARCHAR'4&7(p@6d_des: 2ARCHAR'4&&&7(p@6d_s/:ate6@, 2ARCHAR'4&7(p@6d_s/:at_des: 2ARCHAR'4'&&&7(p@6d_:ate6@, 2ARCHAR'4&7(

      p@6d_:at_des: 2ARCHAR'4'&&&7(p@6d_8ei9t_:lass N3MBER4'7(

      p@6d_nit_60_meas@e 2ARCHAR'4'&7(  p@6d_pa:_siQe 2ARCHAR'4!&7(  spplie@_id N3MBER47(

      p@6d_stats 2ARCHAR'4'&7(  p@6d_list_p@i:e N3MBER45'7(

    p@6d_min_p@i:e N3MBER45'7(  sales N3MBER+&(  6/jset p@6d:t_t_ta/le + p@6d:t_t_ta/le47(  i N3MBER + &(BEGIN  LOOP  ## Fet:9 0@6m :@s6@ a@ia/le  FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5

    p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5p@6d_list_p@i:e5 p@6d_min_p@i:e(

      EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed  IF p@6d_stats+6/s6lete AN$ p@6d_:ate6@, + B6,s THEN  ## append t6 :6lle:ti6n  i+i1"(  6/jset-etend(  6/jset4i7+p@6d:t_t4 p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e7(  EN$ IF(

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    29/46

      EN$ LOOP(  CLOSE :@(  RET3RN 6/jset(EN$(.

    )ou can use the table function in a 6=# statement to show the results. -ere we use

    additional 6=# functionality for the output.

    SELECT $ISTINCT 3PPER4p@6d_:ate6@,75 p@6d_statsFROM TABLE46/s6lete_p@6d:ts4C3RSOR4SELECT * FROM p@6d:ts777(

    3PPER4PRO$_CATEGOR)7 PRO$_STAT3S#################### ###########GIRLS 6/s6leteMEN 6/s6lete

    ' @68s sele:ted-

    The following e%ample implements the same filtering than the first one. The main

    differences between those two are:

    • This e%ample uses a strong typed ?"$ cursor as input and can be paralleli/ed

     based on the ob2ects of the strong typed cursor, as shown in one of the

    following e%amples.

    • The table function returns the result set incrementally as soon as records are

    created.

    • REM Same eample5 pipelined implementati6n

    • REM st@6n @e0 :@s6@ 4inpt t,pe is de0ined7

    • REM a ta/le 8it96t a st@6n t,ped inpt @e0 :@s6@ :ann6t /epa@alleliQed

    • REM

    • CREATE OR

    • REPLACE F3NCTION 6/s6lete_p@6d:ts_pipe4:@ :@s6@_p-st@6n_@e0:@_t7

    • RET3RN p@6d:t_t_ta/le

    • PIPELINE$

    • PARALLEL_ENABLE 4PARTITION :@ B) AN)7 IS

    •   p@6d_id N3MBER47(

    •   p@6d_name 2ARCHAR'4&7(

    •   p@6d_des: 2ARCHAR'4&&&7(

    •   p@6d_s/:ate6@, 2ARCHAR'4&7(

    •   p@6d_s/:at_des: 2ARCHAR'4'&&&7(

    •   p@6d_:ate6@, 2ARCHAR'4&7(

    •   p@6d_:at_des: 2ARCHAR'4'&&&7(

    •   p@6d_8ei9t_:lass N3MBER4'7(

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    30/46

    •   p@6d_nit_60_meas@e 2ARCHAR'4'&7(

    •   p@6d_pa:_siQe 2ARCHAR'4!&7(

    •   spplie@_id N3MBER47(

    •   p@6d_stats 2ARCHAR'4'&7(

    •   p@6d_list_p@i:e N3MBER45'7(

    •   p@6d_min_p@i:e N3MBER45'7(

    •   sales N3MBER+&(

    • BEGIN

    •   LOOP

    •   ## Fet:9 0@6m :@s6@ a@ia/le

    •  FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_

    • des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_meas@e5

    • p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_p@i:e(

    •  EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed

    •   IF p@6d_stats+6/s6lete AN$ p@6d_:ate6@, +B6,s THEN

    •  PIPE ROW 4p@6d:t_t4p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_

    • s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_

    • meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_

    • p@i:e77(

    •   EN$ IF(

    •   EN$ LOOP(

    •   CLOSE :@(

    •   RET3RN(

    • EN$(

    • .

    )ou can use the table function as follows:

    SELECT $ISTINCT p@6d_:ate6@,5 $ECO$E4p@6d_stats5 6/s6lete5 NO LONGERREMO2E_A2AILABLE5 N.A7FROM TABLE46/s6lete_p@6d:ts_pipe4C3RSOR4SELECT * FROM p@6d:ts777(

    PRO$_CATEGOR) $ECO$E4PRO$_STAT3S5############# ###################Gi@ls NO LONGER A2AILABLEMen NO LONGER A2AILABLE

    ' @68s sele:ted-

    We now change the degree of parallelism for the input table products and issue the

    same statement again:

    ALTER TABLE p@6d:ts PARALLEL (

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    31/46

    The session statistics show that the statement has been paralleli/ed:

    SELECT * FROM 2>P=_SESSTAT WHERE statisti:+=e@ies Pa@alleliQed(

    STATISTIC LAST_=3ER) SESSION_TOTAL#################### ########## #############

    =e@ies Pa@alleliQed " !

    " @68 sele:ted-

    Table functions are also capable to fanout results into persistent table structures. This

    is demonstrated in the ne%t e%ample. The function filters returns all obsolete products

    e%cept a those of a specificp@6d_:ate6@, 1default Men3, which was set to status

    obsolete by error. The detected wrong p@6d_id7s are stored in a separate table

    structure. Its result set consists of all other obsolete product categories. It furthermore

    demonstrates how normal variables can be used in con2unction with table functions:

    CREATE OR REPLACE F3NCTION 6/s6lete_p@6d:ts_dml4:@:@s6@_p-st@6n_@e0:@_t5p@6d_:at 2ARCHAR' $EFA3LT Men7 RET3RN p@6d:t_t_ta/lePIPELINE$PARALLEL_ENABLE 4PARTITION :@ B) AN)7 IS  PRAGMA A3TONOMO3S_TRANSACTION(  p@6d_id N3MBER47(

    p@6d_name 2ARCHAR'4&7(p@6d_des: 2ARCHAR'4&&&7(p@6d_s/:ate6@, 2ARCHAR'4&7(p@6d_s/:at_des: 2ARCHAR'4'&&&7(p@6d_:ate6@, 2ARCHAR'4&7(

      p@6d_:at_des: 2ARCHAR'4'&&&7(p@6d_8ei9t_:lass N3MBER4'7(  p@6d_nit_60_meas@e 2ARCHAR'4'&7(  p@6d_pa:_siQe 2ARCHAR'4!&7(  spplie@_id N3MBER47(  p@6d_stats 2ARCHAR'4'&7(  p@6d_list_p@i:e N3MBER45'7(

    p@6d_min_p@i:e N3MBER45'7(  sales N3MBER+&(BEGIN  LOOP  ## Fet:9 0@6m :@s6@ a@ia/le  FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_

    des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e(  EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed  IF p@6d_stats+6/s6lete THEN  IF p@6d_:ate6@,+p@6d_:at THEN  INSERT INTO 6/s6lete_p@6d:ts_e@@6@s 2AL3ES

    4p@6d_id5 :6@@e:ti6n :ate6@,

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    32/46

      PIPE ROW 4p@6d:t_t4 p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e77(  EN$ IF(  EN$ IF(  EN$ LOOP(  COMMIT(  CLOSE :@(  RET3RN(EN$(.

    The following uery shows all obsolete product groups e%cept the p@6d_:ate6@, Men,

    which was wrongly set to status 6/s6lete.

    SELECT $ISTINCT p@6d_:ate6@,5 p@6d_stats FROM TABLE46/s6lete_p@6d:ts_

    dml4C3RSOR4SELECT * FROM p@6d:ts777(PRO$_CATEGOR) PRO$_STAT3S############# ###########B6,s 6/s6leteGi@ls 6/s6lete

    ' @68s sele:ted-

    !s you can see, there are some products of the p@6d_:ate6@, Men that were obsoleted

     by accident:

    SELECT $ISTINCT ms FROM 6/s6lete_p@6d:ts_e@@6@s(

    MSG########################################:6@@e:ti6n :ate6@, MEN still aaila/le

    " @68 sele:ted-

    Ta4ing advantage of the second input variable changes the result set as follows:

    SELECT $ISTINCT p@6d_:ate6@,5 p@6d_stats FROM TABLE46/s6lete_p@6d:ts_dml4C3RSOR4SELECT * FROM p@6d:ts75 B6,s77(

    PRO$_CATEGOR) PRO$_STAT3S############# ###########Gi@ls 6/s6leteMen 6/s6lete

    ' @68s sele:ted-

    SELECT $ISTINCT ms FROM 6/s6lete_p@6d:ts_e@@6@s(

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    33/46

    MSG#########################################:6@@e:ti6n :ate6@, BO)S still aaila/le

    " @68 sele:ted-

    ecause table functions can be used li4e a normal table, they can be nested, as shown

    in the following:

    SELECT $ISTINCT p@6d_:ate6@,5 p@6d_statsFROM TABLE46/s6lete_p@6d:ts_dml4C3RSOR4SELECT *

    FROM TABLE46/s6lete_p@6d:ts_pipe4C3RSOR4SELECT * FROMp@6d:ts777777(

    PRO$_CATEGOR) PRO$_STAT3S############# ###########Gi@ls 6/s6lete

    " @68 sele:ted-

    ecause the table function 6/s6lete_p@6d:ts_pipe filters out all products of

    the p@6d_:ate6@, B6,s, our result does no longer include products of

    the p@6d_:ate6@, B6,s. The p@6d_:ate6@, Men is still set to be obsolete by accident.

    SELECT CO3NT4*7 FROM 6/s6lete_p@6d:ts_e@@6@s(MSG########################################:6@@e:ti6n :ate6@, MEN still aaila/le

    The biggest advantage of Oracle5i "T# is its tool4it functionality, where you can

    combine any of the latter discussed functionality to improve and speed up your "T#

     processing. $or e%ample, you can ta4e an e%ternal table as input, 2oin it with an

    e%isting table and use it as input for a paralleli/ed table function to process comple%

     business logic. This table function can be used as input source for a MERGE operation,

    thus streaming the new information for the data warehouse, provided in a flat file

    within one single statement through the complete "T# process.

    *oading and Transformation cenariosThe following sections offer e%amples of typical loading and transformation tas4s:

    • +arallel #oad 6cenario

    • ey #oo4up 6cenario

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13139https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13835https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13139https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13835

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    34/46

    • "%ception -andling 6cenario

    • +ivoting 6cenarios

    $arallel *oad cenario

    This section presents a case study illustrating how to create, load, inde%, and analy/e a

    large data warehouse fact table with partitions in a typical star schema. This e%ample

    uses 6=#>#oader to e%plicitly stripe data over 8@ dis4s.

    • The e%ample 'A@ F table is named 0a:ts.

    • The system is a '@&+* shared memory computer with more than '@@ dis4

    drives.

    • Thirty dis4s 1G F each3 are used for base table data, '@ dis4s for inde%es, and

    8@ dis4s for temporary space. !dditional dis4s are needed for rollbac4

    segments, control files, log files, possible staging area for loader flat files, and

    so on.

    • The 0a:ts table is partitioned by month into 'A partitions. To facilitate bac4up

    and recovery, each partition is stored in its own tablespace.

    • "ach partition is spread evenly over '@ dis4s, so a scan accessing few partitions

    or a single partition can proceed with full parallelism. Thus there can be intra partition parallelism when ueries restrict data access by partition pruning.

    • "ach dis4 has been further subdivided using an operating system utility into G

    operating system files with names li4e .de.$"-"5 .de.$"-'5 --- 5

    .de.$!&-.

    • $our tablespaces are allocated on each group of '@ dis4s. To better balance I9O

    and paralleli/e table space creation 1because Oracle writes each bloc4 in a

    datafile when it is added to a tablespace3, it is best if each of the four

    tablespaces on each group of '@ dis4s has its first datafile on a different dis4.Thus the first tablespace has .de.$"-" as its first datafile, the second

    tablespace has .de.$-' as its first datafile, and so on, as illustrated

    in $igure '8B.

    Figure 13-( Datafile !a)out for Parallel !oad Example

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13930https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13842https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13398https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13930https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13842https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13398

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    35/46

    Te%t description of the illustration [email protected] 

    tep #" 'reate the Tablespaces and !dd Datafiles in $arallel

    The following is the command to create a tablespace named Ts0a:ts". Other

    tablespaces are created with analogous commands. On a '@&+* machine, it should

     be possible to run all 'A CREATE TABLESPACEstatements together. !lternatively, it might

     be better to run them in two batches of C 1two from each of the three groups of dis4s3.

    CREATE TABLESPACE TS0a:ts"$ATAFILE .de.$"-" SIDE "&'MB RE3SE5$ATAFILE .de.$'-" SIDE "&'MB RE3SE5$ATAFILE .de.$!-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$%-" SIDE "&'MB RE3SE5$ATAFILE .de.$"&-" SIDE "&'MB RE3SE5

    $EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---

    CREATE TABLESPACE TS0a:ts'$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$%-' SIDE "&'MB RE3SE5

    https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htm

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    36/46

    $ATAFILE .de.$"&-' SIDE "&'MB RE3SE5$ATAFILE .de.$"-' SIDE "&'MB RE3SE5$ATAFILE .de.$'-' SIDE "&'MB RE3SE5$ATAFILE .de.$!-' SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---CREATE TABLESPACE TS0a:ts$ATAFILE .de.$"&- SIDE "&'MB RE3SE5$ATAFILE .de.$"- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$!- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$%- SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---CREATE TABLESPACE TS0a:ts"'

    $ATAFILE .de.$!&- SIDE "&'MB RE3SE5$ATAFILE .de.$'"- SIDE "&'MB RE3SE5$ATAFILE .de.$''- SIDE "&'MB RE3SE5$ATAFILE .de.$'!- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'%- SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(

    "%tent si/es in the STORAGE clause should be multiples of the multibloc4 read si/e,where bloc4si/e > M3LTIBLOC_REA$_CO3NT J multibloc4 read si/e.

    INITIAL and NET should normally be set to the same value. In the case of parallel

    load, ma4e the e%tent si/e large enough to 4eep the number of e%tents reasonable, and

    to avoid e%cessive overhead and seriali/ation due to bottlenec4s in the data dictionary.

    When PARALLEL+TR3E is used for parallel loader, the INITIAL e%tent is not used. In this

    case you can override the INITIAL e%tent si/e specified in the tablespace default

    storage clause with the value specified in the loader control file, for e%ample, CG.

    Tables or inde%es can have an unlimited number of e%tents, provided you have setthe COMPATIBLE initiali/ation parameter to match the current release number, and use

    the MAETENTS 4eyword on the CREATE or ALTER statement for the tablespace or ob2ect.

    In practice, however, a limit of '@,@@@ e%tents for each ob2ect is reasonable. ! table or

    inde% has an unlimited number of e%tents, so set the PERCENT_INCREASE parameter to

    /ero to have e%tents of eual si/e.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    37/46

    2ote"

    If possible, do not allocate e%tents faster than about A or 8 for each minute.

    Thus, each process should get an e%tent that lasts for 8 to B minutes.

     ormally, such an e%tent is at least B@ M for a large ob2ect. Too small ane%tent si/e incurs significant overhead, which affects performance and

    scalability of parallel operations. The largest possible e%tent si/e for a G F

    dis4 evenly divided into G partitions is ' F. '@@ M e%tents should perform

    well. "ach partition will have '@@ e%tents. )ou can then customi/e the defaultstorage parameters for each ob2ect created in the tablespace, if needed.

    tep %" 'reate the $artitioned Table

    We create a partitioned table with 'A partitions, each in its own tablespace. The tablecontains multiple dimensions and multiple measures. The partitioning column is

    named dim_' and is a date. There are other columns as well.

    CREATE TABLE 0a:ts 4dim_" N3MBER5 dim_' $ATE5 ---  meas_" N3MBER5 meas_' N3MBER5 --- 7PARALLELPARTITION B) RANGE 4dim_'74PARTITION jan% 2AL3ES LESS THAN 4&'#&"#"%%7 TABLESPACETS0a:ts"5PARTITION 0e/% 2AL3ES LESS THAN 4&!#&"#"%%7 TABLESPACETS0a:ts'5---

    PARTITION de:% 2AL3ES LESS THAN 4&"#&"#"%%7 TABLESPACETS0a:ts"'7(

    tep &" *oad the $artitions in $arallel

    This section describes four alternative approaches to loading partitions in parallel. The

    different approaches to loading help you manage the ramifications of

    the PARALLEL+TR3E 4eyword of 6=#>#oader that controls whether individual partitions

    are loaded in parallel. The PARALLEL 4eyword entails the following restrictions:

    •Inde%es cannot be defined.

    • )ou must set a small initial e%tent, because each loader session gets a new

    e%tent when it begins, and it does not use any e%isting space associated with the

    ob2ect.

    • 6pace fragmentation issues arise.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    38/46

    -owever, regardless of the setting of this 4eyword, if you have one loader process for

    each partition, you are still effectively loading into the table in parallel.

    Example 13-* !oading Partitions in Parallel Case 1

    In this approach, assume 'A input files are partitioned in the same way as your table.)ou have one input file for each partition of the table to be loaded. )ou start 'A

    6=#>#oader sessions concurrently in parallel, entering statements li4e these:

    S=LL$R $ATA+jan%-dat $IRECT+TR3E CONTROL+jan%-:tlS=LL$R $ATA+0e/%-dat $IRECT+TR3E CONTROL+0e/%-:tl - - -S=LL$R $ATA+de:%-dat $IRECT+TR3E CONTROL+de:%-:tl

    In the e%ample, the 4eyword PARALLEL+TR3E is not  set. ! separate control file for each

     partition is necessary because the control file must specify the partition into which the

    loading should be done. It contains a statement such as the following:

    LOA$ INTO 0a:ts pa@titi6n4jan%7

    The advantage of this approach is that local inde%es are maintained by 6=#>#oader.

    )ou still get parallel loading, but on a partition levelwithout the restrictions of

    the PARALLEL 4eyword.

    ! disadvantage is that you must partition the input prior to loading manually.

    Example 13-+ !oading Partitions in Parallel Case 2 

    In another common approach, assume an arbitrary number of input files that are not

     partitioned in the same way as the table. )ou can adopt a strategy of performing

     parallel load for each input file individually. Thus if there are seven input files, you

    can start seven 6=#>#oader sessions, using statements li4e the following:

    S=LL$R $ATA+0ile"-dat $IRECT+TR3E PARALLEL+TR3E

    Oracle partitions the input data so that it goes into the correct partitions. In this caseall the loader sessions can share the same control file, so there is no need to mention it

    in the statement.

    The 4eyword PARALLEL+TR3E must be used, because each of the seven loader sessions

    can write into every partition. In &ase ', every loader session would write into only

    one partition, because the data was partitioned prior to loading. -ence all

    the PARALLEL 4eyword restrictions are in effect.

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    39/46

    In this case, Oracle attempts to spread the data evenly across all the files in each of the

    'A tablespaceshowever an even spread of data is not guaranteed. Moreover, there

    could be I9O contention during the load when the loader processes are attempting to

    write to the same device simultaneously.

    Example 13-, !oading Partitions in Parallel Case 3

    In this e%ample, you want precise control over the load. To achieve this, you must

     partition the input data in the same way as the datafiles are partitioned in Oracle.

    This e%ample uses '@ processes loading into 8@ dis4s. To accomplish this, you must

    split the input into 'A@ files beforehand. The '@ processes will load the first partition

    in parallel on the first '@ dis4s, then the second partition in parallel on the second '@

    dis4s, and so on through the 'Ath partition. )ou then run the following commands

    concurrently as bac4ground processes:

    S=LL$R $ATA+jan%-0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"-"---S=LL$R $ATA+jan%-0ile"&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"&-"WAIT(---S=LL$R $ATA+de:%-0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$!&----S=LL$R $ATA+de:%-0ile"&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$'%-

    $or Oracle ?eal !pplication &lusters, divide the loader session evenly among the

    nodes. The datafile being read should always reside on the same node as the loadersession.

    The 4eyword PARALLEL+TR3E must be used, because multiple loader sessions can write

    into the same partition. -ence all the restrictions entailed by the PARALLEL 4eyword are

    in effect. !n advantage of this approach, however, is that it guarantees that all of the

    data is precisely balanced, e%actly reflecting your partitioning.

    2ote"

    !lthough this e%ample shows parallel load used with partitioned tables, thetwo features can be used independent of one another.

    Example 13- !oading Partitions in Parallel Case $

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    40/46

    $or this approach, all partitions must be in the same tablespace. )ou need to have the

    same number of input files as datafiles in the tablespace, but you do not need to

     partition the input the same way in which the table is partitioned.

    $or e%ample, if all 8@ devices were in the same tablespace, then you would arbitrarily

     partition your input data into 8@ files, then start 8@ 6=#>#oader sessions in parallel.The statement starting up the first session would be similar to the following:

    S=LL$R $ATA+0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"- - -S=LL$R $ATA+0ile!&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$!&

    The advantage of this approach is that as in &ase 8, you have control over the e%act

     placement of datafiles because you use the FILE 4eyword. -owever, you are not

    reuired to partition the input data by value because Oracle does that for you.

    ! disadvantage is that this approach reuires all the partitions to be in the same

    tablespace. This minimi/es availability.

    Example 13-1. !oading External Data

    This is probably the most basic use of e%ternal tables where the data volume is large

    and no transformations are applied to the e%ternal data. The load process is performed

    as follows:

    ' )ou create the e%ternal table. Most li4ely, the table will be declared as parallelto perform the load in parallel. Oracle will dynamically perform load balancing

     between the parallel e%ecution servers involved in the uery.

    ' Once the e%ternal table is created 1remember that this only creates the metadata

    in the dictionary3, data can be converted, moved and loaded into the database

    using either a PARALLEL CREATE TABLE ASSELECT or a PARALLEL INSERT statement.

    ' CREATE TABLE p@6d:ts_et! 4p@6d_id N3MBER5 p@6d_name 2ARCHAR'4&75 ---5 p@i:e N3MBER4-'75 dis:6nt N3MBER4-'77

    ORGANIDATION ETERNAL 4 $EFA3LT $IRECTOR) 4stae_di@7 ACCESS PARAMETERS% 4 RECOR$S FIE$ !&"& BA$FILE /ad./ad_p@6d:ts_et"" LOGFILE l6.l6_p@6d:ts_et"' 4 p@6d_id POSITION 4"7 CHAR5"! p@6d_name POSITION 4*51&7 CHAR5" p@6d_des: POSITION 4*51'&&7 CHAR5

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    41/46

    " - - -7" REMO2E_LOCATION 4ne8.ne8_p@6d"-tt5ne8.ne8_p@6d'-tt77" PARALLEL " REECT LIMIT '&&("% U l6ad it in t9e data/ase sin a pa@allel inse@t'& ALTER SESSION ENABLE PARALLEL $ML('" INSERT INTO TABLE p@6d:ts SELECT * FROM p@6d:ts_et(''

    In this e%ample, stae_di@ is a directory where the e%ternal flat files reside.

     ote that loading data in parallel can be performed in Oracle5i by using 6=#>#oader.

    ut e%ternal tables are probably easier to use and the parallel load is automatically

    coordinated. *nli4e 6=#>#oader, dynamic load balancing between parallel e%ecution

    servers will be performed as well because there will be intrafile parallelism. The

    latter implies that the user will not have to manually split input files before starting the

     parallel load. This will be accomplished dynamically.

    7e( *oo8up cenario

    !nother simple transformation is a 4ey loo4up. $or e%ample, suppose that sales

    transaction data has been loaded into a retail data warehouse. !lthough the data

    warehouse7s sales table contains a p@6d:t_idcolumn, the sales transaction data

    e%tracted from the source system contains *niform +rice &odes 1*+&3 instead of

     product IDs. Therefore, it is necessary to transform the *+& codes into product IDs

     before the new sales transaction data can be inserted into the sales table.

    In order to e%ecute this transformation, a loo4up table must relatethe p@6d:t_id values to the *+& codes. This table might be the p@6d:t dimension

    table, or perhaps another table in the data warehouse that has been created specifically

    to support this transformation. $or this e%ample, we assume that there is a table

    named p@6d:t, which has a p@6d:t_id and an p:_:6de column.

    This data substitution transformation can be implemented using the following &T!6

    statement:

    CREATE TABLE temp_sales_step'NOLOGGING PARALLEL ASSELECT

    sales_t@ansa:ti6n_id5  p@6d:t-p@6d:t_id sales_p@6d:t_id5

    sales_:st6me@_id5  sales_time_id5

    sales_:9annel_id5sales_?antit,_s6ld5sales_d6lla@_am6nt

      FROM temp_sales_step"5 p@6d:t

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    42/46

      WHERE temp_sales_step"-p:_:6de + p@6d:t-p:_:6de(

    This &T!6 statement will convert each valid *+& code to a valid p@6d:t_id value. If 

    the "T# process has guaranteed that each *+& code is valid, then this statement alone

    may be sufficient to implement the entire transformation.

    Exception 9andling cenario

    In the preceding e%ample, if you must also handle new sales data that does not have

    valid *+& codes, you can use an additional &T!6 statement to identify the invalid

    rows:

    CREATE TABLE temp_sales_step"_inalid NOLOGGING PARALLEL AS  SELECT * FROM temp_sales_step"  WHERE temp_sales_step"-p:_:6de NOT IN 4SELECT p:_:6de FROM p@6d:t7(

    This invalid data is now stored in a separate table, temp_sales_step"_inalid, and can

     be handled separately by the "T# process.

    !nother way to handle invalid data is to modify the original &T!6 to use an outer

     2oin:

    CREATE TABLE temp_sales_step'NOLOGGING PARALLEL AS

      SELECTsales_t@ansa:ti6n_id5

      p@6d:t-p@6d:t_id sales_p@6d:t_id5sales_:st6me@_id5

      sales_time_id5sales_:9annel_id5sales_?antit,_s6ld5sales_d6lla@_am6nt

      FROM temp_sales_step"5 p@6d:t  WHERE temp_sales_step"-p:_:6de + p@6d:t-p:_:6de 417(

    *sing this outer 2oin, the sales transactions that originally contained invalidated *+&

    codes will be assigned a p@6d:t_id of N3LL. These transactions can be handled later.

    !dditional approaches to handling invalid *+& codes e%ist. 6ome data warehouses

    may choose to insert nullvalued p@6d:t_id values into their sales table, while other

    data warehouses may not allow any new data from the entire batch to be inserted into

    the sales table until all invalid *+& codes have been addressed. The correct approach

    is determined by the business reuirements of the data warehouse. ?egardless of the

  • 8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses

    43/46

    specific reuirements, e%ception handling can be addressed by the same basic 6=#

    techniues as transformations.

    $ivoting cenarios

    ! data warehouse can receive data from many different sources. 6ome of these sourcesystems may not be relational databases and may store data in very different formats

    from the data warehouse. $or e%ample, suppose that you receive a set of sales records

    from a nonrelational database having the form:

    p@6d:t_id5 :st6me@_id5 8eel,_sta@t_date5 sales_sn5 sales_m6n5 sales_te5sales_8ed5 sales_t95 sale