110651710 Informatica

What is a source qualifier?

What is a surrogate key?

What is difference between Mapplet and reusable transformation?

What is DTM session?

What is a Mapplet?

What is a look up function? What is default transformation for the look up function?

What is difference between a connected look up and unconnected look up?

What is up date strategy and what are the options for update strategy?

What is subject area?

What is the difference between truncate and delete statements?

What kind of Update strategies are normally used (Type 1, 2 & 3) & what are the differences?

What is the exact syntax of an update strategy?

What are bitmap indexes and how and why are they used?

What is bulk bind? How does it improve performance?

What are the different ways to filter rows using Informatica transformations?

What is referential Integrity error? How do you rectify it?

What is DTM process?

What is target load order?

What exactly is a shortcut and how do you use it?

What is a shared folder?

What are the different transformations where you can use a SQL override?

What is the difference between a Bulk and Normal mode and where exactly is it defined?

What is the difference between Local & Global repository?

What are data driven sessions?

What are the common errors while running a Informatica session?

What are worklets and what is their use?

What is change data capture?

What exactly is tracing level?

What is the difference between constraints based load ordering and target load plan?

What is a deployment group and what is its use?

When and how a partition is defined using Informatica?

How do you improve performance in an Update strategy?

How do you validate all the mappings in the repository at once?

How can you join two or more tables without using the source qualifier override SQL or a Joiner

transformation?

How can you define a transformation? What are different types of transformations in

Informatica?

How many repositories can be created in Informatica?

How many minimum groups can be defined in a Router transformation?

How do you define partitions in Informatica?

How can you improve performance in an Aggregator transformation?

How does the Informatica know that the input is sorted?

How many worklets can be defined within a workflow?

How do you define a parameter file? Give an example of its use.

If you join two or more tables and then pull out about two columns from each table into the

source qualifier and then just pull out one column from the source qualifier into an Expression

transformation and then do a ‘generate SQL’ in the source qualifier how many columns will

show up in the generated SQL.

In a Type 1 mapping with one source and one target table what is the minimum number of

update strategy transformations to be used?

At what levels can you define parameter files and what is the order?

In a session log file where can you find the reader and the writer details?

For joining three heterogeneous tables how many joiner transformations are required?

Can you look up a flat file using Informatica?

While running a session what default files are created?

Describe the use of Materialized views and how are they different from a normal view.

Contributed by Mukherjee, Saibal (ETL Consultant)

Many readers are asking “Where’s the answer?” Well it will take some time before I get time to

write it… But there is no reason to get upset… The informatica help files should have all of these

answers!

Posted in ETL Tools, Informatica, Informatica FAQs, Interview FAQs, Uncategorized | 25

Comments »

Loading & testing fact/transactional/balances (data), which is valid between

dates!

Tuesday, July 25th, 2006 This is going to be a very interesting topic for ETL & Data modelers who design

processes/tables to load fact or transactional data which keeps on changing between dates. ex:

prices of shares, Company ratings, etc.

The table above shows an entity in the source system that contains time variant values but they

don’t change daily. The values are valid over a period of time; then they change.

1 .What the table structure should be used in the data warehouse?

Maybe Ralph Kimball or Bill Inmon can come with better data model! But for

ETL developers or ETL leads the decision is already made so lets look for a solution.

http://etlguru.com/blog/category/etl-tools/

http://etlguru.com/blog/category/informatica/

http://etlguru.com/blog/category/informatica-faqs/

http://etlguru.com/blog/category/interview-faqs/

http://etlguru.com/blog/category/uncategorized/

http://etlguru.com/blog/2006/11/21/informatica-interview-questions-faqs/#comments

http://etlguru.com/blog/2006/11/21/informatica-interview-questions-faqs/#comments

http://etlguru.com/blog/2006/07/25/loading-testing-facttransactionalbalances-data-which-is-valid-between-dates/

http://etlguru.com/blog/2006/07/25/loading-testing-facttransactionalbalances-data-which-is-valid-between-dates/

http://web.archive.org/web/20070523031241/http:/etlguru.com/blog/wp-content/uploads/2006/08/variable_bond_interest_fct1.JPG




2. What should be the ETL design to load such a structure?

Design A

There is one to one relationship between the source row and the target row.

There is a CURRENT_FLAG attribute, that means every time the ETL process get a new

value it has add a new row with current flag and go to the previous row and retire it. Now

this step is a very costly ETL step it will slow down the ETL process.

From the report writer issue this model is a major challange to use. Because what if the

report wants a rate which is not current. Imagine the complex query.

Design B

In this design the sanpshot of the source table is taken every day.

The ETL is very easy. But can you imagine the size of fact table when the source which

has more than 1 million rows in the source table. (1 million x 365 days = ? rows per

year). And what if the change in values are in hours or minutes?

But you have a very happy user who can write SQL reports very easily.

Design C

Can there be a comprimise. How about using from date (time) – to date (time)! The

report write can simply provide a date (time) and the straight SQL can return a value/row

that was valid at that moment.

However the ETL is indeed complex as the A model. Because while the current row will

be from current date to- infinity. The previous row has to be retired to from date to todays

date -1.

This kind of ETL coding also creates lots of testing issues as you want to make sure that

for nay given date and time only one instance of the row exists (for the primary key).

Which design is better, I have used all depending on the situtation.

3. What should be the unit test plan?

There are various cases where the ETL can miss and when planning for test cases and your plan

should be to precisely test those. Here are some examples of test plans

a. There should be only one value for a given date/date time

b. During the initial load when the data is available for multiple days the process should go

sequential and create snapshots/ranges correctly.

c. At any given time there should be only one current row .

d. etc

NOTE: This post is applicable to all etl tools or databases like Informatica, DataStage, Syncsort

DMExpress, Sunopsis or Oracle, Sybase, SQL Server Integration Services (SSIS)/DTS, Ab

Initio, MS SQL Server, RDB, etc.

Posted in Data Architecture, Data Modeling, Data Transformation, ETL Strategy, ETL Testing,

ETL Tools, Informatica, Uncategorized | 2 Comments »

What ETL performance (processing speed) is good enough?

Sunday, May 14th, 2006

Every ETL developer / Data Warehouse Manager wants to know if the ETL processing

speed (Usually measured in NUMBER OF ROWS PROCESSED PER SECOND on target

side)is good enough? What is the industry standard? I can give a politically correct answer like…

Well it depends…

But I would rather be blunt and say that usually for INSERTs, 1000 and for UPDATEs 300

rows/sec should be good enough. If you don’t have special requirements then any average

around these numbers should make you comfortable.

Posted in Database Tuning, ETL Performance Tuning, Informatica, Uncategorized | 1 Comment

»

Why Informatica sequences & Sybase/SQL server identity columns, should not

be used?

Sunday, May 7th, 2006

Informatica provides the sequence object to create surrogate keys during load. These object is

also sharable within various mappings hence can be used in parallel. But I will recommend never

using it.

Here’s the Reason why….

1. MIGRATION ISSUE: The sequence is stored in the Informatica repository. That means it is

disconnected from the target database. So during migration of code between environments the

sequence has to be reset manually. Bigger problem arise when the tables with data is brought

from production environment to QA environment; the mapping cannot be immediately run

because the sequences are out of sync.

2. Sequences belong to the target schema and the database and do not belong to the processes as

the table object should define & hold values for the attributes and not something external to the

system.

3. At first it might seem that Sybase/SQL server does not support sequences but checkout the

post & that will solve the problem.

http://etlguru.com/blog/category/data-architecture/

http://etlguru.com/blog/category/data-modeling/

http://etlguru.com/blog/category/data-transformation/

http://etlguru.com/blog/category/etl-strategy/

http://etlguru.com/blog/category/etl-testing/

http://etlguru.com/blog/category/etl-tools/



http://etlguru.com/blog/2006/07/25/loading-testing-facttransactionalbalances-data-which-is-valid-between-dates/#comments

http://etlguru.com/blog/2006/05/14/what-etl-performance-processing-speed-is-good-enough/

http://etlguru.com/blog/category/database-tuning/

http://etlguru.com/blog/category/etl-performance-tuning/



http://etlguru.com/blog/2006/05/14/what-etl-performance-processing-speed-is-good-enough/#comments

http://etlguru.com/blog/2006/05/14/what-etl-performance-processing-speed-is-good-enough/#comments

http://etlguru.com/blog/2006/05/07/why-informatica-sequences-sybasesql-server-identity-columns-should-not-be-used/

http://etlguru.com/blog/2006/05/07/why-informatica-sequences-sybasesql-server-identity-columns-should-not-be-used/

4. Logically it seems that Informatica calls to oracle procedure will slowdown the ETL process

but in real world, I have found it not to be true. Additionally this process is only called when a

new reference data/Dimension is added. New reference data is not as volatile as transactions; so

the adverse effect is nullified.

5. If Identity columns are used instead, it causes more problems as you loose programmatic

control on it. Example any type II dimension changes are become a nightmare to manage.

Posted in Informatica, Oracle, SQL Server, Sybase | No Comments »

Simulating Oracle Sequences in Sybase & SQL Server

Wednesday, April 26th, 2006 Programmatic control is lost when identity columns are used in Sybase and SQL Server. I do not

recommend using Identity columns to create surrogate keys during ETL process. There are many

more reasons for that. Oracle has the sequence feature which is used extensively by Oracle

programmers. I have no clue why other vendors are not providing the same. This custom code

has been used extensively by me and thoroughly tested. I ran multiple processes simultaneously

to check if there is deadlock and also made sure that the process returns different sequences to

different client process.

Notes: -

1. The table should have ‘ROW LEVEL LOCKING’

2. The sequence generator process is stateless (See more details in Object Oriented

Programming)

3. Create one row for each target table in the sequence master table. Do not try to use one

sequence for multiple tables. It will work but probably is not a good idea.

Step 1: -Create a table with following structure.

CREATE TABLE sequence_master (

sequence_nm varchar (55) NOT NULL ,

sequence_num integer NOT NULL

)

GO

Step 2: -Create a stored procedure that will return the next sequence.

CREATE PROCEDURE p_get_next_sequence

@sequence_name varchar(100)

AS

BEGIN

DECLARE @sequence_num INTEGER

— Returns an error if sequence row is entered into the table.

SET @sequence_num = -1


http://etlguru.com/blog/category/oracle/

http://etlguru.com/blog/category/sql-server/

http://etlguru.com/blog/category/sybase/

http://etlguru.com/blog/2006/05/07/why-informatica-sequences-sybasesql-server-identity-columns-should-not-be-used/#respond

http://etlguru.com/blog/2006/04/26/simulating-oracle-sequences-in-sybase-sql-server/

UPDATE sequence_master

SET @sequence_num = sequence_num = sequence_num + 1

WHERE Sequence_name = @sequence_name

RETURN @sequence_num

END

GO

Documents

110651710 Informatica