Performance Tuning PDD FINAL

1

2

Performance Tuning Version 8.6

Bert PetersGlobal Education Services, Principal Instructor

3

Objectives

After completing this course you will be able to:

• Control how PowerCenter uses memory

• Control how PowerCenter uses CPUs

• Understand the performance counters

• Isolate source, target and engine bottlenecks

• Tune different types of bottlenecks

• Configure Workflow and Session on Grid

4

Agenda

• Memory optimization

• Performance tuning methodology

• Tuning source, target, & mapping bottlenecks

• Pipeline partitioning

• Server Grid

• Q & A

• Course evaluation

5

Anatomy of a Session

Integration Service

Data Transformation Manager(DTM)

READER WRITER

DTM Buffer

Transformationcaches

TRANSFORMER

Sourcedata

Targetdata

Targetdata

6

Memory Optimization

READER

TRANSFORMER

WRITER

DTM Buffer

Transformation Caches

7

DTM Buffer

• Temporary storage area for data

• Buffer is divided into blocks

• Buffer size and block size are tunable

• Default setting for each is Auto

8

DTM Buffer Size – Session Property

• Default is Auto meaning DTM estimates optimal size

• Check session log for actual size allocation

9

DTM Buffer Block Size

• Default is Auto

• Check session log for actual size allocation

10

Reader Bottleneck

READER

TRANSFORMER

WRITER

DTM Buffer

Transformer & writer threads wait for data

Slow reader

waiting waiting

waiting

11

Transformer Bottleneck

READER

TRANSFORMER

WRITER

DTM Buffer

Slow transformer

waiting waiting

Reader waits for free blocks; writer waits for data

12

Writer Bottleneck

READER

TRANSFORMER

WRITER

DTM Buffer

Slow writer

waiting

waiting waiting

Reader & transformer wait for free blocks

13

Source Row Logging

READER

TRANSFORMER

WRITER

DTM Buffer

Source rows must remain in the buffers until transformation/writer threads process corresponding rows downstream

waiting

14

Large Commit Interval

Target rows remain in the buffers until the DTM reaches the commit point

READER

TRANSFORMER

WRITER

DTM Bufferwaiting

15

Tuning the DTM Buffer

READER

TRANSFORMER

WRITER

DTM BufferExtra buffers can keep threads busy

16


• Temporary slowdowns in reading, transforming or writing may cause large fluctuations in throughput

• A “slow” thread typically provides data in spurts

• Extra memory blocks can act as a “cushion”, keeping other threads busy in case of a bottleneck

17


• Buffer block size• Recommendation: at least 100 rows / block• Compute based on largest source or target row size• Typically not a significant bottleneck unless below 10

rows/buffer

• Number of blocks• Minimum of 2 blocks required for each source, target and

XML group• (number of blocks) =

0.9 x ((DTM buffer size)/(buffer block size))

18


• Determine the minimum DTM buffer size (DTM buffer size) = (buffer block size) x (minimum number of blocks) / 0.9

• Increase by a multiple of the block size

• If performance does not improve, return to previous setting

• There is no “formula” for optimal DTM buffer size

• Auto setting may be adequate for some sessions

19

Transformation Caches

• Temporary storage area for certain transformations

• Except for Sorter, each is divided into a Data & Index Cache

• The size of each transformation cache is tunable

• If runtime cache requirement > setting, overflow written to disk

• The default setting for each cache is Auto

20

Tuning the Transformation Caches

Default is Auto

21

Max Memory for Transformation Caches

Only applies to transformation caches set to Auto

22

Max Memory for Transformation Caches

• Two settings: fixed number & percentage• System uses the smaller of the two• If either setting is 0, DTM assigns a default size to each

transformation cache that’s set to Auto

• Recommendation: use fixed limit if this is the only session running; otherwise, use percentage

• Use percentage if running in grid or HA environment

23


• If a cache setting is too small, DTM writes overflow to disk

• Determine if transformation caches are overflowing:• Watch the cache directory on the file system while the

session runs• Use the session performance counters

• Options to tune:• Increase the maximum memory allowed for Auto

transformation cache sizes• Set the cache sizes for individual transformations manually

24

Session Performance Counters

25

Performance Counters

26


• Non-0 counts for readfromdisk and writetodisk indicate sub-optimal settings for transformation index or data caches

• This may indicate the need to tune transformation caches manually

• Any manual setting allocates memory outside of previously set maximum

• Cache Calculators provide guidance in manual tuning of transformation caches

27

Aggregator Caches

• Unsorted Input• Must read all input before releasing any output rows• Index cache contains group keys• Data cache contains non-group-by ports

• Sorted Input• Releases output row as each input group is processed• Does not require data or index cache

(both =0)• May run much faster than unsorted BUT

must consider the expense of sorting

28

Aggregator Caches – Manual Tuning

29

Joiner Caches: Unsorted Input

MASTER

DETAIL

All master data loaded into cache

Specify smaller data set as master

Staging algorithm:

• Index cache contains join keys

• Data cache contains non-key connected outputs

30

Joiner Caches: Sorted Input

• Index cache contains up to 100 keys

• Data cache contains non-key connected outputs associated with the 100 keys

MASTER

DETAIL

Both inputs must be sorted on join keys

Selected master data loaded into cache

Specify data set with fewest records under a single key as master

Streaming algorithm:

31

Joiner Caches – Manual Tuning

Cache calculator detects the sorted input property

32

Lookup Caches

• To cache or not to cache?• Large number of invocations – cache• Large lookup table – don’t cache• Flat file lookup is always cached

33

Lookup Caches

• Data cache• Only connected output ports included in data cache • For unconnected lookup, only “return” port included in

data cache

• Index cache size• Only lookup keys included in index cache

34

Lookup Caches

• Lookup Transformation – Fine-tuning the Cache• SQL override• Persistent cache (if the lookup data is static)• Optimize sort

• Default- lookup keys, then connected output ports in port order• Can be commented out or overridden in SQL override• Indexing strategy on table may impact performance• Use Any Value property suppresses sort

35

Lookup Caches

• Can build lookup caches concurrently• May improve session performance when there is significant

activity upstream from the lookup & the lookup cache is large• This option applies to the individual session

• Integration Service builds lookup caches at the beginning of the session run, even if no row has entered a Lookup transformation

Session properties > Config Object tab > Advanced settings

36

Lookup Caches – Manual Tuning

37

Rank Caches

• Index cache contains group keys

• Data cache contains non-group-by ports

• Cache sizes related to the number of groups & the number of ranks

38

Rank Caches – Manual Tuning

39

Sorter Cache

• Sorter Transformation• May be faster than a DB sort or 3rd party sorter• Index read from RDB = pre-sorted data• SQL SELECT DISTINCT may reduce the volume of data

across the network versus sorter with “Distinct” property set

• Single cache (no separation of index & data)

40

Sorter Cache – Manual Tuning

41

64 bit vs. 32 bit OS

• Take advantage of large memory support in 64- bit

• Cache based transformations like Sorter, Lookup, Aggregator, Joiner, and XML Target can address larger blocks of memory

42

Maximum Memory Allocation Example

• Parameters• 64 Bit OS • Total system memory: 32 GB• Maximum allowed for transformation caches: 5 GB or 10%• DTM Buffer: 24 MB• One transformation manually configured

Index Cache: 10 MB Data Cache: 20 MB

• All other transformations set to Auto

43

Maximum Memory Allocation Example

• Result• 10% = 3.2 GB < 5 GB:

max allowed for transformation caches = 3.2 GB = 3200 MB

• Manually configured transformation uses 30 MB• DTM Buffer uses 24 MB• 3200 + 30 + 24 = 3254 MB• Note that 3254 MB represents an upper limit; cached

transformations may use less than the 3200 MB max

44

Performance Tuning Methodology

• It is an iterative process• Establish benchmark• Optimize memory• Isolate bottleneck• Tune bottleneck• Take advantage of under-utilized CPU & memory

45

Disk

PowerCenter

Disk Disk

Disk Disk

Disk Disk

OSDBMSLAN /

WAN

Disk

Disk Disk

Disk Disk

Disk Disk

The Production Environment

• Multi-vendor, multi-system environment with many components:• Operating systems, databases, networks and I/O• Usually need to monitor performance in several places• Usually need to monitor outside Informatica as well

46

Disk

PowerCenter

Disk Disk

Disk Disk

Disk Disk

OSDBMSLAN /

WAN

Disk

Disk Disk

Disk Disk

Disk Disk

The Production Environment

• Tuning involves an iterative approach1. Identify the biggest performance problem2. Eliminate or reduce it3. Return to step 1

47

Preliminary Steps

• Eliminate transformation errors & data rejects“first make it work, then make it faster”• Source row logging requires reader to hold onto buffers

until data is written to target, EVEN IF THERE ARE NO ERRORS; can significantly increase DTM buffer requirement

• You may want to set stop on errors to 1

48

Preliminary Steps

• Override tracing level to terse or normal• Override at session level to avoid having to examine each

transformation in the mapping• Only use verbose tracing during development & only with

very small data sets• If you expect row errors that you will not need to correct,

avoid logging them by overriding the tracing level to terse

(not recommended as a long-term error handling solution)

49

Benchmarking

• Hardware (CPU bandwidth, RAM, disk space, etc.) should be similar to production

• Database configuration should be similar to production

• Data volume should be similar to production

• Challenge: production data is constantly changing• Optimal tuning may be data dependent• Estimate “average” behavior• Estimate “worst case” behavior

50

Benchmarking – Conditional Branching

Scenario: a high percentage of test data goes to TARGET1;but a high percentage of production data goes to TARGET2

Tuning of sorter & aggregator could be overlooked in test

51


Scenario: a high percentage of production data goes to TARGET1 on Monday’s load; but a high percentage of production data goes to TARGET2 on Tuesday’s load

Performance of 2 loads may differ significantly

52


• Conditional branching poses a challenge in performance tuning

• Volume & CHARACTERISTICS of data should be consistent between test & production

• May need to estimate average behavior

• May want to tune for worst-case scenario

53

Identifying Bottlenecks

• The first challenge is to identify the bottleneck • Target • Source• Transformations• Mapping/Session

• Tuning the most severe bottleneck may reveal another one

• This is an iterative process

54

Thread Statistics

• The DTM spawns multiple threads

• Each thread has busy time & idle time

• Goal – maximize the busy time & minimize the idle time

55

Thread Statistics - Terminology

• A pipeline consists of:• A source qualifier• The sources that feed that source qualifier• All transformations and targets that receive data from that

source qualifier

56


MASTER

DETAIL

PIPELINE 1

PIPELINE 2

A pipeline on the master input of a joiner terminates at the joiner

57


• Stage a portion of a pipeline; implemented at runtime as a thread

• Partition Point boundary between 2 stages; always associated with a transformation

58

Using Thread Statistics

• By default PowerCenter assigns a partition point ( ) at each Source Qualifier, Target, Aggregator and Rank.

Reader Thread Transformation Thread Transform Writer Thread Thread

(First Stage) (Second Stage) (Third Stage) (Fourth Stage)

Partition Points

59

Target Bottleneck

• The Aggregator transformation stage is waiting for target buffers



Busy% Busy% Busy%=15 Busy%=95

60

Transformation Bottleneck

• Both the reader & writer are waiting for buffers



Busy%=15 Busy%=60 Busy%=95 Busy%=10

61

Thread Statistics in Session Log

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****

Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_SortMergeDataSize_Detail] has completed.

Total Run Time = [318.271977] secs

Total Idle Time = [176.488675] secs

Busy Percentage = [44.547843]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_SortMergeDataSize_Detail] has completed.

Total Run Time = [707.803168] secs

Total Idle Time = [105.303059] secs

Busy Percentage = [85.122550]

Thread work time breakdown:

JNRTRANS: 10.869565 percent

SRTTRANS: 89.130435 percent

62

Performance Counters in WF Monitor

63

Integration Service Monitor in WFMonitor

64

Session Statistics in WFMonitor

65

Other Methods of Bottleneck Isolation

• Write to flat file If significantly faster than relational target – Target Bottleneck

• Place FALSE Filter right after Source Qualifier If significantly faster – Transformation Bottleneck

• If target & transformation bottlenecks are ruled out – Source Bottleneck

66

Target Optimization

• Target Optimization often involves non- Informatica components

• Drop Indexes and Constraints• Use pre/post SQL to drop and rebuild• Use pre/post-load stored procedures

• Use constraint-based loading only when necessary

67

Target Optimization

• Use Bulk Loading• Informatica bypasses the database log• Target cannot perform rollback• Weigh importance of performance over recovery

• Use External Loader• Similar to bulk loader, but the DB reads from a flat file

68

Target Optimization

• Target commit type• Best performance, least precise control• System avoids writing partially-filled buffers

• Source commit type• Last active source to feed a target becomes a transaction generator• Commit interval provides precise control• Slower than target commit type• Avoid setting commit interval too low

• User Defined commit type• Required when mapping contains transaction control transformation• Provides precise data-driven control• Slower than target and source commit types

Transaction Control

69

Target Optimization

• “update else insert” session property• Works well if you rarely insert• Index required for update key but slows down insert• PowerCenter must wait for database to return error before

inserting

• Alternative – lookup followed by update strategy

70

Source Bottlenecks

• Source optimization often involves non- Informatica components

• Generated SQL available in session log• Execute directly against DB• Update statistics on DB• Used tuned SELECT as SQL override

• Set the Line Sequential Buffer Length session property to correspond with the record size

71

Source Bottlenecks

• Avoid transferring more than once from remote machine

• Avoid reading same data more than once

• Filter at source if possible (reduce data set)

• Minimize connected outputs from the source qualifier• Only connect what you need• The DTM only includes connected outputs when it

generates the SQL SELECT statement

72

Reduce Data Set

• Remove Unnecessary Ports• Not all ports are needed• Fewer ports = better performance & lower memory req.

• Reduce Rows in Pipeline• Place Filter Transformation as far upstream as possible• Filter before aggregator, rank, or sorter if possible• Filter in source qualifier if possible

73

Avoid Unnecessary Sorting

jnr_ENT_EXCH_ IDNT_GRP_TO _SEDOL

srt_ENT_EXCH_ IDNT_GRP_RIC

jnr_ENT_EXCH_ IDNT_GRP_TO _RIC

srt_ENT_EXCH_ CODE_PK

srt_ENT_EXCH_ GRP_PK

jnr_ENT_MKT_ GRP_TO_MKT_ IDNT_GRP

srt_ENT_MKT_ GRP

XML_PARSER_ PME_EQT_ENT _v1_2

srt_ENTITLEME T

srt_ENT_MKT_ GRP1

srt_ENT_MKT_ AND_MKT_IDN T_GRP

jnr_ENT_MKTG RP_TO_EXCHG RP_WITH_MKT _CODES

jnr_ENT_EXCH_ GRP_TO_EXCH _IDNT

srt_ENT_EXCH_ IDNT_GRP_PK

srt_ENT_EXCH ANGE_GRP

srt_ENT_EXCH_ IDNT_GRP

srt_ENT_EXCH_ IDNT_SEDOL

srt_ENT_EXCH_ IDNT_TICKER_ SYM

srt_ENT_EXCH_ IDNT_BBT_EXC H_TICKR

jnr_ENT_TO_M KT_GRP

srt_ENT_MKT_I DNT_GRP

srt_ENT_EXCH_ GRP_PK2

jnr_ENT_EXCH_ IDNT_GRP_TO _TICK_SYM

74

Expressions Language Tips

• Functions are more expensive than operators• Use || instead of CONCAT()

• Use variable ports to factor out common logic

75

Expressions Language Tips

• Simplify nested functions when possible

instead of: IIF(condition1,result1,IIF(condition2, result2,IIF… ))))))))))))

try: DECODE (TRUE,

condition1, result1, : conditionn, resultn)

76

General Guidelines

• Data Type Conversions are expensive, avoid if possible

• All-input transformations (such as Aggregator, Rank etc) are more expensive than pass-through transformations • An all-input transformation must process multiple input

rows before it can produce any output

77

General Guidelines

• High precision (session property) is expensive but only applies to “decimal” data type

• UNICODE requires 2 bytes per character; ASCII requires 1 byte per character• Performance difference depends on number of string ports

only.

78

Transformation Specific

Reusable Sequence Generator –Number of Cached Values Property• Purpose: enables different sessions to share the

same sequence without generating the same numbers

• >0: allocates the specified number of values & updates the current value in the repository at the end of each block (each session gets a different block of numbers)

79

Transformation Specific

Reusable Sequence Generator –Number of Cached Values Property• Setting too low causes frequent repository

access, which impacts performance

• Unused values in a block are lost; this leads to gaps in the sequence

• Consider alternatives example: non-reusable sequence generators, one generates even numbers, & the other generates odd numbers

80

Other Transformations

• Normalizer• This transformation INCREASES the number of rows• Place as far downstream as possible

• XML Reader/ Mid Stream XML Parser• Remove groups that are not projected• We do not allocate memory for these groups, but still need

to maintain PK/FK relationships• Don’t leave port size lengths as infinite. Use appropriate

length.

81

Iterative Process

• After tuning your bottlenecks, revisit memory optimization

• Tuning often REDUCES memory requirements (you might even be able to change some settings back to Auto)

• Change one thing at a time & record your results

82

Partitioning

• Apply after optimizing source, target, & transformation bottlenecks

• Apply after optimizing memory usage

• Exploit under-utilized CPU & memory

• To customize partitioning settings, you need the partitioning license

83

Partitioning Terminology

• Partition subset of the data

• Stage a portion of a pipeline

• Partition Point boundary between 2 stages

• Partition Type algorithm for distributing data among partitions; always associated with a partition point

84



Threads, Partition Points and Stages

• The DTM implements each stage as a thread; hence, stages run in parallel

• You may add or remove partition points

85

Rules for Adding Partition Points

• You cannot add a partition point to a Sequence Generator

• You cannot add a partition point to an unconnected transformation

• You cannot add a partition point on a source definition

• If a pipeline is split and then concatenated, you cannot add a partition point on any transformation between the split and the concatenation

• Adding or removing partition points requires the partitioning license

86

Guidelines for Adding Partition Points

• Make sure you have ample CPU bandwidth

• Make sure you have gone through other optimization techniques

• Add on complex transformations that could benefit from additional threads

• If you have >1 partitions, add where data needs to be re-distributed• Aggregator, Rank, or Sorter, where data must be grouped• Where data is distributed unevenly• On partitioned sources and targets

87

Partition Points & Partitions

• Partitions subdivide the data• Each partition represents a thread within a stage• Each partition point distributes the data among the partitions

3 Reader Threads 3 Transformation Threads 3 more trans threads 3 Writer Threads


Threads - partition 1

Threads – partition 2

Threads – partition 3

88

Session Partitioning GUI

• The number next to each flag shows the number of partitions• The color of each flag indicates the partition type

89

Rules for Adding Partitions

• The master input of a joiner can only have 1 partition unless you add a partition point at the joiner

• A pipeline with an XML target can only have 1 partition

• If the pipeline has a relational source or target and you define n partitions, each database must support n parallel connections

• A pipeline containing a custom or external procedure transformation can only have 1 partition unless those transformations are configured to allow multiple partitions

90

Rules for Adding Partitions

• The number of partitions is constant on a given pipeline• If you have a partition point on a Joiner, the number of

partitions on both inputs will be the same

• At each partition point, you can specify how you want the data distributed among the partitions (this is known as the partition type)

91

Guidelines for Adding Partitions

• Make sure you have ample CPU bandwidth & memory

• Make sure you have gone through other optimization techniques

• Add 1 partition at a time & monitor the CPU• When CPU usage approaches 100%, don’t add anymore

partitions

• Take advantage of database partitioning

92

Partition Types

• Each partition point is associated with a partition type

• The partition type defines how the DTM is to distribute the data among the partitions

• If the pipeline has only 1 partition, the partition point serves only to add a stage to the pipeline

• There are restrictions, enforced by the GUI, on which partition types are valid at which partition points

93

Partition Types – Pass Through

• Data is processed without redistributing the rows among partitions

• Serves only to add a stage to the pipeline

• Use when you want an additional thread for a complex transformation but you don’t need to redistribute the data (or you only have 1 partition)

94

Partition Types – Key Range

• The DTM passes data to each partition depending on user-specified ranges

• You may use several ports to form a compound partition key

• The DTM discards rows not falling in any specified range

• If 2 or more ranges overlap, a row can go down more than 1 partition resulting in duplicate data

• Use key range partitioning when the sources or targets in the pipeline are partitioned by key range

95

Partition Types – Round Robin

• The Integration Service distributes rows of data evenly to all partitions

• Use when there is no need to group data among partitions

• Use when reading flat file sources of different sizes

• Use when data has been partitioned unevenly upstream and requires significantly more processing before arriving at the target

96

Partition Types – Hash Auto Keys

• The DTM applies a hash function to a partition key to group data among partitions

• Use hash partitioning to ensure that groups of rows are processed in the same partition

• The DTM automatically determines the partition key based on:• aggregator or rank group keys• join keys• sort keys

97

Partition Types – Hash User Keys

• This is similar to hash auto keys except the user specifies which ports make up the partition key

• Alternative to hard-coded key range partition on relational target (if DB table is partitioned)

98

Partition Types – Database

• Only valid for DB2 and Oracle databases in a multi-node database• Sources: Oracle and DB2• Targets: DB2 only

• The number of partitions does not have to equal the number of database nodes• Performance may be better if they are equal, however

99

Partitioning with Relational Sources

• PowerCenter creates a separate source database connection for each partition

• If you define n partitions, the source database must support n parallel connections

• The DTM generates a separate SQL Query for each partition

• Each query can be overridden

• PowerCenter reads the data concurrently

100

Partitioning with Flat File Sources

• Multiple flat files• Each partition reads a different file• PowerCenter reads the files in parallel• If the files are of unequal sizes, you may want to repartition

the data round-robin

• Single flat file• PowerCenter makes multiple parallel connections to the

same file based on the number of partitions specified• PowerCenter distributes the data randomly to the partitions• Over a large volume of data, this random distribution tends

to have an effect similar to round robin—partition sizes tend to be equal

101

Partitioning with Relational Targets

• The DTM creates a separate target database connection for each partition

• The DTM loads data concurrently

• If you define n partitions, database must support n concurrent connections

102

Partitioning with Flat File Targets

• The DTM writes output for each partition to a separate file

• Connection settings and properties can be configured for each partition

• The DTM can merge the target files if all have connections local to the Integration Service machine

• The DTM writes the data concurrently

103

Partitioning—Memory Requirements

• Minimum number of buffer blocks multiplied by number of partitions (2 blocks per source, target, & XML group) x (number of partitions)

• Optimal number of buffer blocks = (optimal number for 1 partition) x (number of partitions)

104

Cache Partitioning

• DTM may create separate caches for each partition for each cached transformation; this is called cache partitioning

• DTM treats cache size settings as per partition for example, if you configure an aggregator with:

2 MB for the index cache, 3 MB for the data cache, & you create 2 partitions–

DTM will allocate up to 4 MB & 6 MB total

• DTM does not partition lookup or joiner caches unless the lookup or joiner itself is a partition point

105

Cache Partitioning

Each partition has its own cache(s)

Sorter cache

Index cache

Data cache

Index cache

Data cache

Sorter cache

106

Cache Partitioning

With a partition point on the joiner,each partition has its own cache(s)

Index cache

Data cache

Index cache

Data cache

107

Cache Partitioning

With nopartition pointon the joiner,however, all partitions share 1 set of caches

Index cache

Data cache

108

Monitoring Partitions

• The Workflow Monitor provides runtime details for each partition

• Per partition, you can determine the following:• Number of rows processed• Memory usage• CPU usage

• If one partition is doing more work than the others, you may want to redistribute the data

109

Pipeline Partitioning Example

• Scenario:• Student record processing• XML source and Oracle target• XML source is split into 3 files

110


Solution: Solution: Define a partition for each of the 3 source filesDefine a partition for each of the 3 source files

Partition 1

Partition 2

Partition 3

111


Problem: Problem: Source files vary in size, resultingSource files vary in size, resultingin unequal workloads for each partitionin unequal workloads for each partitionSolution: Solution: Use Round Robin partitioning on the Use Round Robin partitioning on the filter to balance loadfilter to balance load

RRRR

RRRR

RRRR

112


Problem: Problem: Potential for splitting rank groupsPotential for splitting rank groups

Solution: Solution: Use hash autoUse hash auto--keys partitioning on the rankkeys partitioning on the rank to group rows appropriatelyto group rows appropriately

RRRR

RRRR

RRRR HH

HH

HH

113


Problem: Problem: Target tables are partitioned on Oracle by key rangeTarget tables are partitioned on Oracle by key rangeSolution: Solution: Use target Key Range partitioning to optimize writing Use target Key Range partitioning to optimize writing to target tablesto target tables

RRRR

RRRR

RRRR HH

HH

HH

KK

KK

KK

114

Dynamic Partitioning

• Integration Service can automatically set the number of partitions at runtime.

• Useful when the data volume increases or the number of CPU’s available changes.

• Basis for the number of partitions is specified as a session property

115

Concurrent Workflow Execution (8.5)

• Prior to 8.5

• Only one instance of Workflow can run

• Users duplicate workflows – maintenance issues

• Concurrent sessions required duplicate of session

116

Concurrent Workflow Execution

• Allow workflow instances to be run concurrently

• Override parameters/ variables across run instances

• Same scheduler across multiple instances

• Supports independent recovery/ failover semantics

117

Concurrent Workflow Execution

118

Workflow on Grid (WonG)

• Integration Service is deployed on a Grid – an IS service process (pmserver) runs on each node in the grid

• Allows tasks of a workflow to be distributed across a grid – no user configuration necessary if all nodes homogenous

119

Workflow on Grid (WonG)

• Different sessions in a workflow are dispatched on different nodes to balance load

• Use workflow on grid if:• There are many concurrent sessions and workflows• Leverage multiple machines in the environment

120

Load Balancer Modes

• Round Robin• Honors Max Number of Processes per Node

• Metric-based• Evaluates nodes in round-robin • Honors resource provision thresholds• Uses stats from last 3 runs - if no statistics is collected yet,

defaults used (40 MB memory, 15% CPU)

121

Load Balancer Modes

• Adaptive• Selects node w/ the most available CPU• Honors resource provision thresholds• Uses statistics from last 3 runs of a task to determine whether a

task can run on a node• Bypass in dispatch queue: skip tasks in the queue that are more

resource intensive and can’t be dispatch to any currently available nodes

• CPU Profile - Ranks node CPU performance against a baseline system

• All modes take into account the service level assigned to workflows

122

Session on Grid (SonG)

• Session partitioned and dispatched across multiple nodes

• Allows Unlimited Scalability

• Source and targets may be on different nodes

• More suited for large sessions

• Smaller machines in a grid is a lower cost option than large multi-CPU machines

123

Session on Grid (SonG)

• Session on Grid will scale if:• Sessions are CPU/memory intensive and overcomes

overhead of data movement over network• I/O is kept localized to each node running the partition • There is a fast shared storage (e.g. NAS, clustered FS)• Partitions are independent

• Source and target have different connections that are only available on different machines• E.g. source Excel files on Windows and target is only

available on UNIX

• Supported on a homogeneous grid

124

Configuring Session on Grid

• Enable Session on Grid attribute in session configuration tab

• Assign workflow to be executed by an integration service that has been assigned to a grid

125

Dynamic Partitioning

• Based on user specification (# partitions)• Can parameterize as $DynamicPartitionCount

• Based on # of nodes in grid

• Based on source partitioning (Database partitioning)

126

SonG Partitioning Guidelines

• Set # of partitions = # of nodes to get an even distribution• Tip: use dynamic partitioning feature to ease expansion of

grid

• In addition, continue to create partition-points to achieve parallelism

127

SonG Partitioning Guidelines

• To minimize data traffic across nodes:• Use pass-through partition type which will try to keep

transformations on the same node• Use resource-map to dispatch the source and target

transformations to the node where source or target are located

• Keep the target files unmerged whenever possible (e.g. if being used for staging)

• Resource requirement should be specified at the lowest granularity e.g. transformation instead of session (as far as possible)• This will ensure better distribution in SonG

128

File Placement Best Practices

• Files that should be placed on a high-bandwidth shared file system (CFS / NAS)• Source files• Lookup source files [sequential file access]• Target files [sequential file access]• Persistent cache files for lookup or incremental aggregation [random file

access]

• Files that should be placed on a shared file system but bandwidth requirement is low (NFS)• Parameter files • Other configuration files• Indirect source or target files• Log files.

129

File Placement Best Practices

• Files that should be put on local storage• Non-persistent cache files (i.e. sorter temporary files)• Intermediate target files for sequential merge• Other temporary files created during a session execution

• $PmTempFileDir should point to a local file system

• For best performance, ensure sufficient bandwidth for shared file system and local storage (possibly by using additional disk i/o controllers)

130

Data Integration Certification PathLevel Certification Title Recommended Training Required Exams

»Architecture & Administration;»Advanced Administration

Informatica Certified Administrator

Informatica Certified Developer

Informatica Certified Consultant

» PowerCenter QuickStart (eLearning) » PowerCenter 8.5+ Administrator (4 days)

» PowerCenter QuickStart (eLearning) » PowerCenter 8.5+ Administrator (4 days) » PowerCenter Developer 8.x Level I (4 days) » PowerCenter Developer 8 Level II (4 days)

»Architecture & Administration;»Mapping Design»Advanced Mapping Design

» PowerCenter QuickStart (eLearning) » PowerCenter 8.5+ Administrator (4 days) » PowerCenter Developer 8.x Level I (4 days) » PowerCenter Developer 8 Level II (4 days)

» PowerCenter 8 Data Migration (4 days) » PowerCenter 8 High Availability (1 day)

»Architecture & Administration;»Advanced Administration»Mapping Design »Advanced Mapping Design »Enablement Technologies

Additional Training:» PowerCenter 8.5 New Features» PowerCenter 8.6 New Features» PowerCenter 8 Upgrade

» PowerCenter 8 Team-Based Development» PowerCenter 8.5 Unified Security `

»Architecture & Administration;»Advanced Administration

http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=168&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=

http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=185&app_id=1&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment



http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=202&keyword=&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=

http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=180&app_id=1&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=




http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=180&app_id=1&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=





http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=177&app_id=3&keyword=&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=

http://inter.viewcentral.com/events/cust/search_results.aspx?event_id=142&app_id=3&keyword=&postingForm=default.aspx&cid=informatica&pid=1&lid=1&cart_currency_code=&payment_type=&orderby_location=&orderby_date=&newRegistration=&errmsg=


131

Q & A


132

Course Evaluation


133

Appendix Informatica Services by Solution

134

B2B Data Exchange Recommended Services

Strategy Engagements• B2B Data Transformation

Architectural Review

Baseline Engagements• B2B Data Transformation

Baseline Architecture

Implement Engagements• B2B Full Project Lifecycle• Transaction/Customer/

Payment Hub

Professional Services Education ServicesRecommended Courses• Informatica B2B Data

Transformation (D)• Informatica B2B Data Exchange

(D)

B2B

Target Audience for Courses D = DeveloperA = Administrator

M = Project Manager

135

Data Governance Recommended Services

Strategy Engagements• Informatica Environment

Assessment Service • Metadata Strategy and Enablement• Data Quality Audit

Baseline Engagements• Data Governance Implementation • Metadata Manager Quick Start• Informatica Data Quality Baseline

Deployment

Implement Engagements• Metadata Manager Customization • Data Quality Management

Implementation

Professional Services Education ServicesRecommended Courses• PowerCenter Level I Developer (D)• Informatica Data Explorer (D)• Informatica Data Quality (D)

Related Courses• PowerCenter Administrator (A)• Metadata Manager (D)

Certifications:• PowerCenter• Data Quality


M = Project Manager

136

Data Migration Recommended Services

Strategy Engagements• Data Migration Readiness

Assessment • Informatica Data Quality Audit

Baseline Engagements• PowerCenter Baseline Deployment• Informatica Data Quality (IDQ),

and/or Informatica Data Explorer (IDE) Baseline Deployment

Implement Engagements• Data Migration Jumpstart• Data Migration End-to-End

Implementation

Professional Services Education ServicesRecommended Courses• Data Migration (M)• Informatica Data Explorer (D)• Informatica Data Quality (D)• PowerCenter Level I Developer (D)

Related Courses• PowerExchange Basics (D)• PowerCenter Administrator (A)

Certifications• PowerCenter• Data Quality

Data Migration


M = Project Manager

137

Data Quality Recommended Services

Strategy Engagements• Data Quality Management Strategy• Informatica Data Quality Audit

Baseline Engagements• Informatica Data Quality (IDQ),


• Informatica Data Quality Web Services Quick Start

Implement Engagements• Data Quality Management

Implementation

Professional Services Education ServicesRecommended Courses• Informatica Data Explorer (D)• Informatica Data Quality (D)

Related Courses• Informatica Identity Resolution (D)• PowerCenter Level I Developer (D)

Certifications• Data Quality

Data Quality


M = Project Manager

138

Data Synchronization Recommended Services

Strategy Engagements• Project Definition and AssessmentBaseline Engagements• PowerExchange Baseline

Architecture Deployment• PowerCenter Baseline Architecture

DeploymentImplement Engagements• Data Synchronization

Implementation

Professional Services Education ServicesRecommended Courses• PowerCenter Level I Developer (D)• PowerCenter Level II Developer (D)• PowerCenter Administrator (A)

Related Courses• PowerExchange Basics Oracle Real-

Time CDC (D)• PowerExchange SQL RT (D)• PowerExchange for MVS DB2 (D)Certifications• PowerCenter

Data Synchronization


M = Project Manager

139

Enterprise Data Warehousing Recommended Services

Strategy Engagements• Enterprise Data Warehousing (EDW)

Strategy• Informatica Environment

Assessment Service• Metadata Strategy & Enablement

Baseline Engagements• PowerCenter Baseline Architecture

Deployment

Implement Engagements• EDW Implementation

Professional Services Education ServicesRecommended Courses• PowerCenter Level I Developer (D)• PowerCenter Level II Developer (D)• PowerCenter Metadata Manager (D)

Related Courses• Informatica Data Quality (D)• Data Warehouse Development (D)

Certifications• PowerCenter

Data Warehouse


M = Project Manager

140

Integration Competency Centers Recommended Services

Strategy Engagements• ICC Assessment

Baseline Engagements• ICC Master Class Series• ICC Director

Implement Engagements• ICC Launch• ICC Implementation• Informatica Production Support

Professional Services Education ServicesRecommended Courses• ICC Overview (M)• PowerCenter Level I Developer (D)• PowerCenter Administrator (A)

Related Courses• Metadata Manager (D)• Informatica Data Explorer (D)• Informatica Data Quality (D)


ICC


M = Project Manager

141

Master Data Management Recommended Services

Strategy Engagements• Master Data Management (MDM)

Strategy• Informatica Data Quality Audit

Baseline Engagements• Informatica Data Explorer (IDE)

Baseline Deployment• Informatica Data Quality (IDQ)

Baseline Deployment• PowerCenter Baseline Architecture

Deployment

Implementation• MDM Implementation

Professional Services Education Services

Master Data Management

Recommended Courses• Informatica Data Explorer (D)• Informatica Data Quality (D)• PowerCenter Level I Developer (D)

Related Courses• Metadata Manager (D)• Informatica Identity Resolution (D)



M = Project Manager

142

Services Oriented Architecture Recommended Services

Strategy Engagements• Data Services (SOA) Strategy

Baseline Engagements• Informatica Web Services Quick

Start• Informatica Data Quality Web

Services Quick Start

Implement Engagements• Data Services (SOA) Implementation

Professional Services Education ServicesRecommended Courses• PowerCenter Level I Developer (D)• Informatica Data Quality (D)


Data Services


M = Project Manager

143

Governance, Risk & Compliance (GRC) Recommended Services

Strategy Engagements• Informatica Environment

Assessment Service • Enterprise Data Warehouse Strategy • Data Quality Audit

Baseline Engagements• Informatica Data Quality Baseline

Deployment • Metadata Manager Quick Start

Implement Engagements• Risk Management Enablement Kit • Enterprise Data Warehouse

Implementation

Professional Services Education ServicesRecommended Courses• PowerCenter Level I Developer (D)• Informatica Data Explorer (D)• Informatica Data Quality (D)

Related Courses• Data Warehouse Development (D)• ICC Overview (M)• Metadata Manager (D)



M = Project Manager

144

Mergers & Acquisitions (M&A) Recommended Services

Strategy Engagements• Data Migration Readiness

Assessment • Informatica Data Quality Audit

Baseline Engagements• PowerCenter Baseline Deployment• Informatica Data Quality (IDQ),


Implement Engagements• Data Migration Jumpstart• Data Migration End-to-End

Implementation

Professional Services Education ServicesRecommended Courses• Data Migration (M)• PowerCenter Level I Developer (D)

Related Courses• Informatica Data Explorer (D)• Informatica Data Quality (D)• PowerExchange Basics (D)



M = Project Manager

145

Deliver Your Project Right the First Time with Informatica Professional Services

146

Informatica Global Education Services

"We launched an aggressive data migration project that was to be completed in one year. The complexity of the data schema along with the use of Informatica PowerCenter tools proved challenging to our top colleagues.

We believe that Informatica training led us to triple productivity, helping us to complete the project on its original 1-year schedule.”

Joe Caputo, Director, Pfizer

147

Informatica Contact Information

Informatica Corporation Headquarters100 Cardinal WayRedwood City, CA 94063Tel: 650-385-5000Toll-free: 800-653-3871Toll-free Sales: 888-635-0899Fax: 650-385-5500

Informatica EMEA HeadquartersInformatica Nederland B.V.Edisonbaan 14a3439 MN NieuwegeinPostbus 1163430 AC NieuwegeinTel: +31 (0) 30-608-6700Fax: +31 (0) 30-608-6777

Informatica Asia/Pacific HeadquartersInformatica Australia Pty LtdLevel 5, 255 George StreetSydneyN.S.W. 2000AustraliaTel: +612-8907-4400Fax: +612-8907-4499

Global Customer [email protected] at my.informatica.com to open a new service request or to check on the status of an existing SR.

http://www.informatica.com

mailto:[email protected]

http://www.informatica.com/

Documents

Performance Tuning PDD FINAL