25
Informatica confidential. For discussion purposes only. Pushdown Optimization Jason Hamby

PowerCenter Pushdown

Embed Size (px)

DESCRIPTION

pushdown powercenter

Citation preview

Page 1: PowerCenter Pushdown

1Informatica confidential. For discussion purposes only.

Pushdown Optimization

Jason Hamby

Page 2: PowerCenter Pushdown

2Informatica confidential. For discussion purposes only.

Agenda

• Pushdown Optimization Overview and Benefits

• How it works

• How to Configure Pushdown Optimization

• What Is and What Is Not Supported• What can/can not be pushed down• Limitations – details of rules• When is Pushdown Optimization appropriate

• Demo

Page 3: PowerCenter Pushdown

3Informatica confidential. For discussion purposes only.

Overview

Page 4: PowerCenter Pushdown

4Informatica confidential. For discussion purposes only.

Pushdown Optimization Overview

Push transformation processing to data sources

Benefits- Reduce data moved when source and target are the same- Utilize database-specific processing that may be more

optimal- Maintain metadata and lineage in PowerCenter

Page 5: PowerCenter Pushdown

5Informatica confidential. For discussion purposes only.

Customer Scenario

Batch transformation and load -- staging and target tables in the same target database

Transformation and load from real-time status table to data warehouse in the same database

Staging Warehouse

Step 1 Step 2

DataSources

TargetDatabase

Page 6: PowerCenter Pushdown

6Informatica confidential. For discussion purposes only.

Solution Overview

Pushdown optimization is an option that user selects SQL to be processed in DB is automatically generated

A session may be partially, or completely pushed down

DI Server

MetadataRepository

Optimizer

SQL

Staging Warehouse

Step 1 Step 2

DataSources

TargetDatabase

Page 7: PowerCenter Pushdown

7Informatica confidential. For discussion purposes only.

How Does It Work

Page 8: PowerCenter Pushdown

8Informatica confidential. For discussion purposes only.

How It Works Available as a session property Pushdown Optimization Options

– Partial pushdown optimization to source– Partial pushdown optimization to target– Full pushdown optimization

• Integration Service analyzes the mapping and generates one or more SQL statements based on the mapping transformation logic

• Integration Service executes SQL against the database instead of processing the transformation logic itself

Page 9: PowerCenter Pushdown

9Informatica confidential. For discussion purposes only.

How It Works (cont’d) Integration Service analyzes the mapping and

session to determine the transformation logic it can push to the database

Integration Service processes transformation logic that it cannot push down to the database

Generated SQL is not saved in the repository Displayed results in session mapping tab (in

Workflow Manager)– Transformations that can/can’t be pushed down– Generated SQL– Reason why certain transformations can’t be pushed down

Page 10: PowerCenter Pushdown

10Informatica confidential. For discussion purposes only.

Configuration (from Workflow Mgr)

Page 11: PowerCenter Pushdown

11Informatica confidential. For discussion purposes only.

Viewing the Result

Page 12: PowerCenter Pushdown

12Informatica confidential. For discussion purposes only.

Preview from Session—Mapping Tab

Transformations Pushed to Source or Target Database

Generated SQL Statement

Page 13: PowerCenter Pushdown

13Informatica confidential. For discussion purposes only.

What Is and What Is Not Supported

Page 14: PowerCenter Pushdown

14Informatica confidential. For discussion purposes only.

Supported Databases

• Teradata (V2R5 or above)

• Oracle (9i or above)

• DB2 (v8 or above)

• SQL Server (7 and above)

• Sybase (ASE 12.5)

• ODBC source/target

Page 15: PowerCenter Pushdown

15Informatica confidential. For discussion purposes only.

Supported Transformations

• To Source• Aggregator• Expression• Filter• Joiner• Lookup• Sorter• Union

• To Target• Expression• Lookup

Page 16: PowerCenter Pushdown

16Informatica confidential. For discussion purposes only.

Unsupported Transformations

• Custom Transformation

• External Procedure

• XML

• Normalizer

• Rank

• Router

• Sequence Generator

• Stored Procedure

• TCT

• Update Strategy

Page 17: PowerCenter Pushdown

17Informatica confidential. For discussion purposes only.

Partial Source Pushdown

• Condition:• One or more transformations can be processed in source database

• Virtual source – transformations pushed to source

• Generated SQL: • SELECT … FROM s … WHERE (filter/join condition)… GROUP

BY…

• a

SourceDB

Extract

Target

LoadTransform

Page 18: PowerCenter Pushdown

18Informatica confidential. For discussion purposes only.

• Condition:• One or more transformations can be processed in target

database

• Virtual target – transformations pushed to target

• Generated SQL: • INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))

• a

Partial Target Pushdown

Source

Extract

TargetDB

LoadTransform

Page 19: PowerCenter Pushdown

19Informatica confidential. For discussion purposes only.

Full Pushdown

• Condition:• Source and target are in the same RDBMS• All transformations can be processed in database

• Data not extracted outside of DB

• Generated SQL:• INSERT INTO t (…) SELECT … FROM s …

• z

SourceDB

TargetDB

LoadTransformExtract

Page 20: PowerCenter Pushdown

20Informatica confidential. For discussion purposes only.

Design (Two-Pass)

• Pass 1: • Start from the source and traverse transformations

downstream, and build SQL query (SELECT statement).• Stop if a transformation cannot be processed in source

database and settle for partial pushdown to source.• If target is reached, then full pushdown can be done with

INSERT SELECT statement

Page 21: PowerCenter Pushdown

21Informatica confidential. For discussion purposes only.

Design (Two-Pass)

Pass 2: Bypass if phase 1 results in full pushdown optimization Start from the target and traverse transformations upstream

and build SQL statement (INSERT, DELETE, and UPDATE) for partial pushdown to target

Stop if a transformation cannot be processed in target database or already pushed to source database

Page 22: PowerCenter Pushdown

22Informatica confidential. For discussion purposes only.

Considerations

Error handling subject to DBMS error handling No row-level error logging For mappings that generate long transaction

– Require more database resources (locks and log space)– No partial commit: entire transaction rolled back when an error is encountered

Result when executing in PowerCenter vs. pushed to DB may be different based on DB config– Case sensitivity– How null is treated in sort order– Formats (numeric value conversion to char; date conversion to char)– Data precision

Page 23: PowerCenter Pushdown

23Informatica confidential. For discussion purposes only.

Limitations

A transformation will not be pushed down / stops the optimization if: A Source Qualifier, lookup, update transformation contains a SQL override

Optimizer does not parse user-defined SQL override (i.e. lookup, update, DSQ) DSQ SQL override limitation will be removed in GA by using temporary views

Use mapping variable Contains a variable port Override default values for input/output ports An expression uses a function that has no equivalent function in the

database It is part of a data profiling session Debugging is turned on An external loader is used (can only push to source, not to target) Row error logging is enabled

Page 24: PowerCenter Pushdown

24Informatica confidential. For discussion purposes only.

Limitations

• A transformation will not be pushed down / stops the optimization if:• Mapping has too complex – i.e. too many pipeline branches (max 64

two-way branches, 43 three-way branches, or 32 four-way branches) • Partitioning is configured where:

• The partition type is not pass thru • There are different partition types for transformations in the pipeline and the

optimizer can’t remerge the partitions • Multiple match for lookup is configured (except for error report)• Limited by single SQL statement generated at target (INSERT into).

Optimizer doesn’t use temp tables or views (in FCS, GA will use temporary views)

• Generated SQL can’t be modified

Page 25: PowerCenter Pushdown

25Informatica confidential. For discussion purposes only.

Appropriate Use of Pushdown Optimization

• Pushdown Optimization is ideal where:• Source and target are located

in the same database• Transformations processed in

the source DB reduces the amount of data moved

• Such as filters, aggregators

• Processing within PowerCenter is used when :• Operation can’t be done in

database (i.e. using SQL)• Source or target is not a

database