26
6 Extraction, Transformation, and Loading (ETL) Transformation

6 Extraction, Transformation, and Loading (ETL) Transformation

Embed Size (px)

Citation preview

Page 1: 6 Extraction, Transformation, and Loading (ETL) Transformation

6Extraction, Transformation,and Loading (ETL)

Transformation

Page 2: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-2

Objectives

After completing this lesson, you should be able to do the following:

• Explain transformation using SQL– CREATE TABLE AS SELECT (CTAS)– UPDATE– MERGE– Multitable INSERT

• Describe transformation using table functions and PL/SQL

• Implement DML Error Logging

Page 3: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-3

Data Transformation

• Data transformations are often the most complex and costly part of the ETL process.

• Transformations can range from simple data conversion to complex scrubbing techniques.

• Many, if not all, transformations can occur within the Oracle database.

• External transformations outside the database are supported (flat files, for instance).

• Data can be transformed in two ways:– Multistage data transformation – Pipelined data transformation

Page 4: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-4

Multistage Data Transformation

Load into staging table.

NEW_SALES_STEP1

NEW_SALES_STEP2 Convert source

product keys

to warehouse

product keys.

SALES

Flat files

Validate customer

keys (look up in

customer

dimension table).

NEW_SALES_STEP3Insert into SALES

warehouse table.

Page 5: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-5

Pipelined Data Transformation

SALES

Flat files

Validate customer keys (look up in

customer dimension table).

External table

Convert source

product keys

to warehouse

product keys.

Insert into SALES

warehouse table.

Page 6: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-6

Transformation Mechanisms

You have the following choices for transforming data inside the database:

• SQL

• PL/SQL

• Table functions

Page 7: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-7

Transformation Using SQL

After data is loaded into the database, transformations can be executed using SQL operations:

• CREATE TABLE ... AS SELECT

• INSERT /*+APPEND*/ AS SELECT• UPDATE• MERGE• Multitable INSERT

Page 8: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-8

CREATE TABLE … AS SELECTand UPDATE

DESC sales_activity_directName Null? Type------------ ----- ---------SALES_DATE DATEPRODUCT_ID NUMBERCUSTOMER_ID NUMBERPROMOTION_ID NUMBERAMOUNT NUMBERQUANTITY NUMBER

INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */INTO sales asSELECT product_id, customer_id, TRUNC(sales_date), 3,promotion_id, quantity, amountFROM sales_activity_direct;

Page 9: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-10

MERGE Statement: Overview

• MERGE statements provide the ability to conditionally UPDATE/INSERT into the database.

• The DELETE clause for UPDATE branch is provided for implicit data maintenance.

• Conditions are specified in the ON clause.

• MERGE statements do an UPDATE if the row exists and an INSERT if it is a new row.

• Using MERGE statements avoids multiple updates.– MERGE statements use a single SQL

statement to complete an UPDATE or INSERT or both.

Page 10: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-11

Data Warehousing MERGE: Example

MERGE INTO customer C

USING cust_src S

ON (c.customer_id = s.src_customer_id)

WHEN MATCHED THEN

UPDATE SET c.cust_address = s.cust_address

WHEN NOT MATCHED THEN

INSERT ( Customer_id, cust_first_name,…) VALUES (src_customer_id, src_first_name,…);

Page 11: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-12

Data Maintenance with MERGE/DELETE

MERGE USING product_changes SINTO products D ON (d.prod_id = s.prod_id)WHEN MATCHED THENUPDATE SET d.prod_list_price = s.prod_new_price, d.prod_status = s.prod_newstatusDELETE WHERE (d.prod_status = "OBSOLETE")WHEN NOT MATCHED THENINSERT (prod_id, prod_list_price, prod_status)VALUES (s.prod_id, s.prod_new_price, s.prod_new_status);

Page 12: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-13

Overview of Multitable INSERT Statements

• Allows the INSERT … AS SELECT statement to insert rows into multiple tables as part of a single DML statement

• Can be used in data warehousing systems to transfer data from one or more operational sources to a set of target tables

• Types of multitable INSERT statements:– Unconditional INSERT– Pivoting INSERT– Conditional ALL INSERT– Conditional FIRST INSERT

Page 13: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-14

Example of Unconditional INSERT

INSERT ALLINTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day)INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price)SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_priceFROM sales_activity_direct s, products pWHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE)GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;

Page 14: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-15

Example of Conditional ALL INSERT

INSERT ALL

WHEN product_id IN (SELECT product_id

FROM promotional_items) THEN

INTO promotional_sales VALUES(product_id,list_price)

WHEN order_mode = 'online' THEN

INTO web_orders VALUES(product_id, order_total)

SELECT product_id, list_price, order_total, order_mode

FROM orders;

Page 15: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-16

Example of Pivoting INSERT

INSERT ALL

INTO sales (product_id, customer_id, week, amount)

VALUES (product_id, customer_id, weekly_start_date, sales_sun)

INTO sales (product_id, customer_id, week, amount)

VALUES (product_id, customer_id, weekly_start_date+1, sales_mon)

INTO sales (product_id, customer_id, week, amount)

VALUES (product_id, customer_id, weekly_start_date+2, sales_tue)

INTO sales (product_id, customer_id, week, amount)

VALUES (product_id, customer_id, weekly_start_date+3, sales_wed)

...

SELECT product_id, customer_id, weekly_start_date, sales_sun,

sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat

FROM sales_summary;

Page 16: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-17

Example of Conditional FIRST INSERT

INSERT FIRST

WHEN order_total > 10000 THEN

INTO priority_handling VALUES (id)

WHEN order_total > 5000 THEN

INTO special_handling VALUES (id)

WHEN total > 3000 THEN

INTO privilege_handling VALUES (id)

ELSE

INTO regular_handling VALUES (id)

SELECT order_total , order_id id

FROM orders ;

Page 17: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-18

Overview of Table Functions

• A table function is a function that can produce a set of rows as output.– Input can be a set of rows.

• Table functions support pipelined and parallel execution of transformations using:– PL/SQL– C programming language– Java

• Table functions are used in the FROM clause of a SELECT statement.

Page 18: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-19

Creating Table Functions

CREATE OR REPLACE FUNCTION transform(p ref_cur_type)

RETURN table_order_items_type

PIPELINED

PARALLEL_ENABLE( PARTITION p BY ANY) IS

BEGIN

FOR rec IN p LOOP

… -- Transform this record

PIPE ROW (rec);

END LOOP;

RETURN;

END;

Page 19: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-21

Using Table Functions

SELECT *

FROM TABLE(transform(cursor(SELECT *

FROM

order_items_ext)));

INSERT /*+ APPEND, PARALLEL(order_items) */

INTO order_items

SELECT *

FROM TABLE(transform(cursor(SELECT *

FROM

order_items_ext)));

Page 20: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-22

DML Error Logging: Overview

• Aborting long-running bulk DML operations is wasteful of time and system resources.

• DML Error Logging allows bulk DML operations to continue processing after a DML error occurs.– Errors are logged to an error-logging table.

• DML Error Logging combines the power and speed of bulk processing with the functionality of row-level error handling.

Page 21: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-23

DML Error Logging Concepts

• An error-logging table is created and associated with an existing table by using the DBMS_ERRLOG package.

• The LOG ERRORS INTO... clause is added to the bulk DML load statement providing:– Error-logging table name– Statement tag– Reject limit

INSERT INTO bonusesSELECT employee_id, salary*1.1 FROM employeesWHERE commission_pct > .2LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10;

EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');

Page 22: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-25

Error-Logging Table

• The error-logging table logically consists of two parts:– Fixed: Columns that describe the error– Variable: Columns that contain data values

corresponding to the row in error

• Column names for the variable part determine the DML table columns logged when an error occurs.

• The columns of the DML table to be logged is the intersection of the column names of the DML table and the error-logging table.

Page 23: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-27

Error-Logging Table Format

Fixed Portion: Error information columns

User-supplied tag value via DML statement

varchar2(4000) ORA_ERR_TAG$

Type of operation: (I)nsert, (U)pdate, (D)elete

varchar2(2) ORA_ERR_OPTYP$

Row ID of row in error (UPDATE/DELETE)

row ID ORA_ERR_ROWID$

Oracle error message text varchar2(4000) ORA_ERR_MESG$

Oracle error number varchar2(10) ORA_ERR_NUMBER$

Information Data type Column name

Page 24: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-28

Inserting into a Table with Error Logging

Example:

CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER CHECK(sal > 8000));

EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');

INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct > .2 LOG ERRORS INTO errlog (’my_bad’) REJECT LIMIT 10;

1

2

3

Page 25: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-29

Summary

In this lesson, you should have learned how to:

• Explain transformation using SQL– CREATE TABLE AS SELECT (CTAS)– UPDATE– MERGE– Multitable INSERT

• Describe transformation using table functions

• Implement DML Error Logging

Page 26: 6 Extraction, Transformation, and Loading (ETL) Transformation

Copyright © 2005, Oracle. All rights reserved.6-30

Practice 6: Overview

This practice covers the following topics:

• Loading data using multitable inserts

• Loading data using multitable conditional insert

• Loading data using MERGE• Logging DML load errors