In for Matic a Tuning

06/21/2000 www.coreintegration.com 2

Introduction• These slides are a visual presentation of tuning an

Informatica map.• These supporting material for these slides is the tuning

guidelines on our web site - but for purposes ofdiscussion, some key points will be presented here.

• This is only a case of how and why to architect maps ina specific manner. It is only our opinion that this is thebest method to get Informatica tools to performoptimally.


Troubled MapSource Src Qual Expression Filter Maplet 1

Maplet 2

Maplet 3

Maplet 4

Aggregator 1 Aggregator 2

Aggregator 3

Aggregator 4

Target 1

Target 2

Target 3

Target 4

The following is defined for this map:

• The expression passes the whole row through but only performs a single calculation on a single port.

• The source is wide - 25 to 40 columns, the source will be discussed as both flat file and databaseconnection.

• The filter condition is huge, contains IIF logic with OR statements on 5 or more fields

• Each maplet contains at least 3 cached lookups. Maplet 4 contains an aggregator. Maplet 1 splits theflow in to 4 outputs

• There are 4 aggregators on this path.

• Each target has an update strategy. One of the update strategies is DD_UPDATE only.

• For purposes of discussion, we will talk about both target scenarios: all targets as flat files, all targets asdatabase tables.

Update

Update

Update

Update


Troubled Map Color CodedSource Src Qual Expression Filter Maplet 1

Maplet 2

Maplet 3

Maplet 4

Aggregator 1 Aggregator 2

Aggregator 3

Aggregator 4

Target 1

Target 2

Target 3

Target 4

Update

Update

Update

Update

Legend:

Disk Contention (I/O Problems)

Disk Contention & RAM Contention

Simple Tuning Issues (speed)

Statements:• Each target as a flat file, the source is a flat file• Cache Directory - source - and target files reside

on the same physical device.• Expression parses whole row, only one transform• Filter houses large IIF condition.• All previous slide statements apply.

Potential Problems (not all, just some)• Source, target, Cache Directory, are all on the same disk• Multiple targets, single thread used for write processes.• Multiple aggregators, single thread used for moving data• Stacked aggregators fight for memory, disk, and Cache directory in a single

session• Reader, Transformer, Writer threads fight for memory in single session with

Cache settings for indexes, and aggregators.• Too many lookups in the maplets - also fight for Cache, index/data and memory

in a single session.• Filter condition is too lengthy - not optimized• Expression only performs a single calculation, forcing the entire row through

processing when only the one field should be flowed through.• Disk Contention is high - 4 targets single writer thread, I/O and seek are hotspots• OS fights with RAM contention by THRASHING the “cache”, reader, writer, and

transform memory block areas - to it’s own SWAP directory.


I/O Single Thread ContentionTarget 1

Target 2

Target 3

Target 4

1. Seek2. Write

1. Seek2. Write

1. Seek2. Write

1. Seek2. Write

1

2

3

4

Execution Order

In examining the execution order of just the DISK activity of the map,we discover that each row or block of rows to be written to disk (flatfiles) requires at minimum 2 operations, Seek and Write. Not onlythat, but when the WRITER DTM blocks are full, it must commit(write) to the disk more of ten. We also discover that each time theWRITER DTM fills up, it must write to 4 different files, in order.Because the WRIT ER D TM is only one single thread.

Now, add to that the potential for mult iple seeks, because a diskblock is full - and maybe add the fact that the Aggregators are ALSOusing Disk, and add to that the Reader D TM - reading from a flat fileon the same disk , all of the sudden you have a lot of disk activity .Also add the potential for the OS to begin “SWAPPING” / Thrashing,because it doesn’t have enough memory to keep all the processesrunning properly.

The monitoring devices may not show “hot spots” or content ion fordisk activity because the processing is slow ed down to a point wherethe single threads must make a decision: what to write, and how.This method is simply inefficient.

Target 1

Target 2

Target 3

Target 4

1. Clear2. Parse2. Ex ec3. Ackn

1. Clear2. Parse2. Ex ec3. Ackn

(Repeated)

(Repeated)

SQL Buffer

1

2

3

4

Execution OrderFlat Files

In the SQL Buffer world, you have 1 SQL connection for ALL 4targets. This not only makes it tough on the database, but tough onthroughput as well - each target must make use of the SAMEmemory area setup for the connection to the database. EachWRITE to the database goes through between 4 and 6 steps toexecute - not to ment ion waiting for the server replies/responses.

Thus - the map can only be as fast as the slowes t poss ible write…This goes for the FLAT FILES too.

The execution order is obeyed, and each of the targets write in theirrespective order through the SINGLE DATABASE CONNECTION -not a whole lot of chances here to do much tuning. (IF we stick withthese configurations).

All of the D TM Buffer information is packed in and waits for the SQLconnection, and Informatica to execute the writes one target at atime.


I/O Target Contention SolutionTarget 1

Target 2

Target 3

Target 4

1. Seek2. Write

1. Seek2. Write

1. Seek2. Write

1. Seek2. Write

Execution: Simultaneous

Target 1

Target 2

Target 3

Target 4

1. Ex ec2. Ackn

1. Ex ec2. Ackn

SQL BufferFlat Files

By placing each target in a separate map,you win in both cases.

Taking advantage of parallel processingas best as possible.

Potential gain: 3x to 6x performance

• OS is built to multi-thread output files• Each target can be written at the same time, allowing the OS to do

the proper buffering and I/O blocking that it might need.• Each session in Informatica has it’s own RAM, so each target can go

as fast as possible - no single target “waits” for the other• The overall timing could improve UP TO a factor of 4 (dividing the

work by 4)• Because each session has it’s own RAM, there is less contention for

RAM, and each WRITER DTM thread can be optimized with rowsfor that given target.

• HETEROGENEOUS TARGETS BECOME A REALITY!! Eachsession can specify Database target, or flat file, etc...

• 4 connections are now open to the database• The DBMS is built for bulk parallel processing of multiple

connections, it isn’t built for BULK operations against 4 targettables in a single connection.

• Each connection buffer removes CLEAR / PARSE operations(in bulk mode) reducing the number of operations by 8 in thiscase. The CLEAR/PARSE only happens for the first row / set ofrows.

• Each database connection can be sized appropriately to handlemaximum throughput to the target (including network packetsize if necessary).

1. Ex ec2. Ackn

1. Ex ec2. Ackn

Execution: Simultaneous


Update StrategiesTarget 1

Target 2

Target 3

Target 4

Update

Update

Update

Update

• The problem with this map causes a need for update strategies. By splitting the map in to 1 map per target, theneed for update strategies almost always can be eliminated.

• Use update strategies only if the session has a low # of rows to process, or the map doesn’t seem to be wellsuited for 1 map per target.

• The other problem with this map is that when the targets are flat files, update strategies don’t do any good.Deleting and updating rows in a target Flat File is not supported (as far as I know).

• Update strategies force each row to be analyzed. Each row then assumes it’s own SQL statement against theRDBMS. This can be a performance bottleneck. Changing the map to 1 map per target allows the updatestrategy to be removed, and the session to be set specifically for the operation.

• If each row must be examined, speed will be negatively impacted.• Remove the update strategies. If there is still a need to “break” apart the operations, take a single map and split

the functionality in to two maps - one for update, one for delete (for instance). This will increase speed as wellas allow parallel operations.


Aggregator ContentionAggregator 1 Aggregator 2

Aggregator 3

Aggregator 4

Single Map Means:• Single data / index cache. Each aggregator fights for memory.• Single threaded I/O going to the Cache Directory - fighting for

OS Resources• If I/O to Cache Directory is utilized, and target files are flat

files, then they both fight for I/O resources. Aggregator Readsfrom disk, target has to write to disk - thus dramaticallyincreasing SEEK/READ, SEAK/WRITE operations -dramatically reducing overall speed.

• Single threaded row by row aggregation. Each row is pushedout from Agg1 to Agg2, then again to Agg3, and again to Agg4.One row pushed out 3 times - multiplying the amount of RAMneeded to MOVE the row / block of rows from Agg1 to eachtarget Aggregator.

• Single threaded execution order. Agg2 to 4 execute in a singleorder, 2 then 3, then 4.

• Your map only runs as fast as the slowest aggregation process.

Benefits of Splitting the Map Means:• Multiple I/O threads - each map has it’s own single aggregator

I/O process. Thus the OS is forced to handle multi-threaded I/Orequests (less attempts by PMServer to block I/O). The OS ismore equipped to handle multi-threaded I/O requests.

• Only one Aggregator per map is necessary to achieve the samegoal - thus increasing speed.

• Each map can now run at it’s own speed - but in parallel withthe other “split” maps - thus CPU and RAM are utilized moreefficiently on the PMServer machine.

• Because each Session runs on it’s own SET of threads:Reader/Transformer/Writer - each session, each aggregator getsits’ own index and cache memory to operate under. It’s alsomore tunable this way.

• The only potential contention left in the aggregators, is the I/Oto the Cache Directory - memory contention is no longer anissue.

• Rows can be processed in bulk.

Aggregator 1

Aggregator 2

Aggregator 3

Aggregator 4

(each is anind ividual map withit’s own target)


Maplet ContentionFilter Maplet 1

Maplet 2

Maplet 3

Maplet 4

Aggregator 1 Filter Maplet 1

Maplet 2

Maplet 3

Maplet 4

Aggregator 1

Aggregator 2

Aggregator 3

TARGET 4

Filter

Filter

Filter

•These maplets may contain many lookups, an aggregator,update strategies, filters, etc… By placing them in a singlefeed, and out to a single aggregator Informatica again is forcedto handle them in a single serial process.• The maplets in our example contain 3 to 5 lookups - thus theyfight with the aggregator for Index, data cache, and I/Outilization (Cache Directory).•Execution order again is: 1, then 2, then 3, then 4. And again,the entire mapping is only as fast as the slowest maplet.•Each single row is pushed from the filter 4 times - repetitionand memory constraints.•Speed is extremely difficult to achieve in this situation.

•Each map has it’s own maplet, as well as it’s own filter - andtarget (which for maps 1 through 3 is an aggregator).•Each maplet has less contention for memory.•Rows can be processed in bulk through a single maplet, andare only pushed from the filter 1 time.•Each map (again) is given it’s own threads, RAM basis, index,and data caches - therefore has more room to work - and lesscontention arises.

Single Map - Many Maplets Issues: Multiple Maps, 1 per Maplet:


Final Flow - Split Maps/Sessions

Source Src Qual Expression Filter Maplet 2 Aggregator 2 Target 1



Source Src Qual Expression Filter Maplet 1 Aggregator 1 Target 4 (update)

(insert)

(update)

(update)

Session SettingsFinal Notes

•Splitting the maps could increase performance by a factor between 2x and 6x.•Separating Source, Cache Directory, and Target files on to separate disks will reduce / remove the contention still outlined in REDhere.•The expression for the filter was built in the expression object - thus making the filter faster. Both the expression and the filter can bemade reusable now - and even placed in each of the maplets.•The aggregators can also be placed in the maplets.•Each map is independently responsible for their own I/O Cache.•Each map has their own set of threads: Reader, Transformer, Writer•Each map has it’s own memory - more flexible.•Each map is independently tunable.•Each map can run at it’s own speed.•Parallel processing is pushed on to the OS and hardware - where the architecture is built to support it.

Documents

In for Matic a Tuning