Informatica Power Center Advanced

INFORMATICA POWER CENTER ADVANCED

AGENDA• Mapplets and types of mapplets• Reusable transformations• User defined functions• Types of batch processing• Link conditions• Task and types of tasks (Reusable, non reusa)• Worklets and types of worklets• Scheduling workflow• Constraint based load ordering• Target Load Plan (TLP)

• Importing and exporting objects• PMCMD Utility• PMREP Utility• SCD type II implementation using start date

and end date• Lookup caches• Performance Optimization(source, t/r, session,

System)• ETL Unit Testing• ETL Performance Testing• Caches

• Mapping debugger• Pushed down optimization• Power center 8 enhencements• Session recovery• Mapping parameters• Mapping variables• Parameterization of sessions• Difference between normal and bulk loading• Session partitions and types

Mapplets and Types of Mapplets

• A mapplet is a reusable object created with business logic using set of transformations

• A mapplet is created using mapplet designer tool• There are two types of mapplets:

a. Active mappletb. Passive mapplet

• Limitations of mapplet• When we want to use the stored procedure

transformation, we should use the stored procedure transformation with the type normal.

Re-usable Transformations

• A reusable transformation is a reusable object, created with business logic using single transformation.

• A Re-usable transformation is created in two different ways:a. Using transformation developer toolb. Converting a non reusable tr. Into a reusable tr.

• Limitations: Source Qualifier transformation can not be used as a re-usable transformation.

User Defined Functions

• A user defined functions is a power center object which is created using power center transformation language.

• The power center transformation language is set of built in functions (nearly 74 in number), to define the business logic.

Batch processing and types of batch processing

• When multiple sessions are run in a single workflow, this is called as batch processing

• A workflow can execute one or more session. Execution of the sessions or batch processing could be of two type:a. Parallel batch processingb. Sequential batch processing

Link Conditions

• A link condition controls the execution of sessions during workflow run.

• A link condition is defined using a pre defined variable called status

Tasks and Types of Tasks

• A task is defined as set of executable actions, commands and functions.

• There are two types of tasks:a. Re-usable task (Using task developer tool)

ex. Session, command, email, worklet, event wait, event raise, timer, decision, assignment, controlb. Non re-usable task (Using workflow designer tool)

Worklets and types of worklets

• A worklet is defined as group of tasks.• There are two types of worklets:

a. Re-usable workletb. Non re-usable worklet

• Business purpose: A worklet is required to simplify the complex workflow designs and to meet the process operational order.

• A workflow which contain a worklet is known as a parent workflow.

Scheduling of workflow

• A schedule is an administrative task, which specifies the date and time to run the workflow.

• A schedule is a automation of running the workflow.

• There are two types of schedules:a. Re-usable scheduleb. Non re-usable schedule

Constraint Based Load Ordering

• A constraint based load order, defines the order in which data loads into the multiple targets, based on primary and foreign key relationship.

• Business purpose: Use the Constraint based load ordering to load the data into flake dimensions, which are related with primary and foreign key relationships (Recall joiner transformation)

Target Load Plan

• A target load plan defines the order in which data is extracted from source qualifier transformation.

Importing and Exporting Objects

• The repository objects such as mappings, sessions, workflows, worklets etc. can be exported into .xml files(backup files).

• Procedure of exporting• Procedure of importing

PMCMD Utility

• The PMCMD is a command line client program, that communicates with integration service.

• Use PMCMD to start the workflow, on integration service.• Issue the following commands to work with PMCMD:

a. pmcmd>connectb. pmcmd>start workflowc. pmcmd>set folderd. pmcmd>unset foldere. pmcmd>disconnectf. pmcmd>exit

PMREP Utility• The PMREP is a command line client program, which connects to

repository service to perform administrative tasks.• It connects to Repository service with following syntax:

a. pmrep>connect –r repository –d domain –n user –x passwordEx. Pmrep>connect –r nipuna_rep –d domain_nipuna –n

administrator –x administratorb. create folderc. delete folderd. delete objecte. backupf. Restoreg. Exit

SCD type 2 implementation using start date and end date

• Example about employee start career (26/04/2011) with end date as null for the designation SE

• After some time new designation is SSE, so the end date ‘null’ should be updated.

Lookup Caches• There are two types of cache memory: index cache & data cache• All port values from the lookup table where the port is part of the lookup

condition are loaded into index cache.• The index cache contains all port values from the lookup table, where the port

is specified in the lookup condition.• The data cache contain all port values from the lookup table that are not in

lookup condition and that are specified as ‘output’ port.• After the cache is loaded values from the lookup input ports that are part of

the lookup condition are compared to the index cache.• Upon match the rows from the cache are included in stream.• Following are the types of lookup caches:

a. Static lookup cacheb. Dynamic lookup cachec. Persistent lookup cache

Performance Optimization

• Source: Use the following technique to improve the performance of data extractiona. Create Source Filtersb. Create indexes

• Transformation: Filter, Joiner, Aggregator, Expression, Router, Update Strategy, Lookup, Sequence Generator, sorter

• Sessions: Tune parameter, create partitions• System: Increase CPU perf, Increase network speed

ETL Unit Testing• A unit test for a data warehouse is a white box testing• We should check the ETL specification, mappings and ETL

procedures• Following are the test cases:

a. Test case 1: Data availabilityb. Test case 2: Data Load- Insertc. Test case 3: Data load – updated. Test case 4: Incremental data loade. Test case 5: Data accuracyf. Test case 6: Data loseg. Test case 7: Column mappingsh. Test case 8: Naming standards

ETL Performance Testing

• Most performance issues are encountered when IS writes the data into the target

• The first step in performance tuning is to identify the performance bottlenecks, in the following ordera. Test case 1: Identify the target bottleneckb. Test case 2: Identify the source bottleneckc. Test case 3: Identify the mapping bottleneckd. Test case 4: Identify the session bottlenecke. Test case 5: Identify the system bottleneck

Caches

• The following transformations need the cache memory to process the data:a. Joiner transformationb. Lookup transformationc. Aggregator transformationd. Rank transformatione. Sorter transformation

Mapping Debugger

• It is used to debug the mappings while doing data validations.

• Ex. Create a mapping to load the employees whose ename starts with ‘S’ and calculate tax based on salary(20%)

• Procedure to use debugger

Pushed Down Optimization• Pushed down optimization is a session property that analyzes the

mapping and determines the transformations to be sent to source or target database.

• When we configure the session for a pushed down optimization, the IS analyzes the transformations and converts the transformation logic into SQL and send the SQL to the source or target database.

• It improves the performance of the session• Configure the session to perform the pushed down optimization in

the following ways:a. Source side pushed down optimizationb. Target side pushed down optimizationc. Full pushed down optimization

Power Center 8 enhancements

1. Pushed down optimization2. User defined functions3. Service Oriented Architecture (SOA)4. SQL Transformation5. JAVA Transformation

Session Recovery

• When we stop a session or an error causes the session to stop, then identify the reasons for the failure of the session and start the session using following options:a. Restart the session: If IS has not issued at least one commitb. Perform session recovery: If IS has issued at least one commit. When we start the session in recovery mode, IS reads the rowID of last record committed from table:

OPB_SRVR_RECOVERY (Repository table 522)This IS start processing the data records from the next rowID.

Mapping Parameters• A mapping parameter represents a constant value that can be defined before

mapping run.• A mapping parameter is created with name, type, data type, precision and scale.• The values for mapping parameters are defined in a parameter file.• Save the parameter file with an extension .prm or .pst• Mapping parameter is represented with $$ symbol.• Syntax to create a parameter file:

[folder name.wf: workflow.st: session]• Mapping parameters are used to reduce dev overhead(avoid creation of

multiple mappings when you want to change the constant value)• A mapping parameter is specific to that particular mapping• Mapping parameters are created to standardize the business logic• The mapping parameters and variables can also be used in a SQ T/R• The mapping parameters can also be defined while creating mapplet

Mapping Variables

• Mapping variable represents a value that can be changed during mapping run.

• After each successful completion of session, the IS stores the variables with its current values, in the repository. The IS uses the current variable values for next run.

• A mapping variable can be defined using following variable functions:set variable(), setcountvariable(), setmaxvariable(), setminvariable()

• Mapping variables for sequence numbers• Mapping variables for incremental extraction or reading

Parameterization of Sessions

• Connection is designed to path to the database or file system

Difference Between Normal and Bulk Loading

• Normal Load: When we configure the session with target load type normal, the IS reads the transaction details, from database log. The target database server creates the db log and enters the records in target db via db log.

Since the db log is maintained by db server, the IS can perform rollback on transaction errors. As a result the IS enables the session to perform recovery.

• Advantage• Disadvantage

• Bulk Loading: When we configure the session with target load type bulk, the IS improves the session performance, that inserts large amounts of data in the target database.

We can enable the bulk loading for following database types:a. Oracle b. SQL Server c. Sybase d. DB2

When we enable the bulk loading for other database types, the IS reverts to normal loading. The bulk loading can not be performed to an indexed target table.

• Advantage• Disadvantage

Session Partitions

• A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. The number of partitions in any pipeline stage equals the number of threads in that stage.

• By default the IS creates one partition in every pipeline stage.• Partition points mark the boundaries between threads in a

pipeline. The IS redistributes rows of data at partition points.• We can add partition points, to increase the number of T/R,

threads and increase session perf.• Types of partitions: Key range, Pass through, Round Robin,

Has, Database

Documents

Informatica Power Center Advanced