22
INFORMATICA COOKBOOK INFORMATICA DEVELOPERS GUIDE Author : Sastry Kolluru Creation Date : Last Modified : Version : 1.00 Approvals Stephen Musgrove : :

In for Ma Tic a Cookbook

Embed Size (px)

Citation preview

Page 1: In for Ma Tic a Cookbook

INFORMATICA COOKBOOK

INFORMATICA DEVELOPER’S GUIDE

Author : Sastry KolluruCreation Date :Last Modified :Version : 1.00

ApprovalsStephen Musgrove :

:

Page 2: In for Ma Tic a Cookbook

Informatica Cookbook

Change Record DATE Author Versio

nReference

19-Apr-2004 Sastry Kolluru 1.00 Added section 7.7 and 7.8

Reviewers NAME POSITION

Table of Contents

Version 1.00 Page 2 of 19

Page 3: In for Ma Tic a Cookbook

Informatica Cookbook

1.0 OVERVIEW......................................................................................................................................5

2.0 GETTING STARTED.....................................................................................................................5

2.1 ABOUT INFORMATICA.............................................................................................................................52.1.1 Version in use..................................................................................................................................5

3.0 INFORMATICA DEVELOPMENT CYCLE............................................................................5

3.1 STARTING A NEW PROJECT.....................................................................................................................53.1.1 Project Initialization........................................................................................................................53.1.2 Login................................................................................................................................................63.1.3 Folders and Groups setup...............................................................................................................6

3.2 DEVELOPMENT AND TESTING PROCESS..................................................................................................63.3 MIGRATION TO PRODUCTION..................................................................................................................6

3.3.1 Information to be provided..............................................................................................................63.3.2 Review before movement.................................................................................................................6

3.4 CHANGES TO AN EXISTING PROJECT......................................................................................................7

4.0 TRANSITION OF PROJECTS FOR SUPPORT..................................................................7

4.1 REQUIREMENTS FOR SUPPORT.................................................................................................................74.2 SUPPORT PROCESS ON FAILURE...............................................................................................................84.3 SUPPORT WINDOW..................................................................................................................................8

5.0 INFORMATICA ENVIRONMENTS.........................................................................................8

5.1 DEVELOPMENT........................................................................................................................................85.2 PRODUCTION...........................................................................................................................................8

6.0 ENGINE MANAGEMENT...........................................................................................................9

6.1 MANAGING THE ENGINE.........................................................................................................................96.2 RESTARTING THE ENGINE........................................................................................................................9

7.0 BEST PRACTICES.........................................................................................................................9

7.1 NAMING STANDARDS..............................................................................................................................97.1.1 Challenge.........................................................................................................................................97.1.2 Description......................................................................................................................................9

7.2 TEMPLATES...........................................................................................................................................127.2.1 Challenge.......................................................................................................................................127.2.2 Description....................................................................................................................................12

7.3 USAGE OF CONNECTION OBJECTS........................................................................................................147.3.1 Challenge.......................................................................................................................................147.3.2 Description....................................................................................................................................14

7.4 FAILURE SCRIPTS..................................................................................................................................157.4.1 Challenge.......................................................................................................................................157.4.2 Description....................................................................................................................................15

7.5 TRUNCATING DATA...............................................................................................................................157.5.1 Challenge.......................................................................................................................................157.5.2 Description....................................................................................................................................15

7.6 BUILT-IN RE-STARTABILITY..................................................................................................................167.6.1 Challenge.......................................................................................................................................167.6.2 Description....................................................................................................................................16

7.7 PROJECT DIRECTORY STRUCTURE IN UNIX...........................................................................................177.7.1 Challenge.......................................................................................................................................177.7.2 Description....................................................................................................................................17

7.8 PARAMETERIZATION OF SESSION INFORMATION...................................................................................177.8.1 Challenge.......................................................................................................................................17

Version 1.00 Page 3 of 19

Page 4: In for Ma Tic a Cookbook

Informatica Cookbook

7.8.2 Description....................................................................................................................................17

Version 1.00 Page 4 of 19

Page 5: In for Ma Tic a Cookbook

Informatica Cookbook

1.0 OVERVIEW

The objective of the Informatica Cookbook is to provide the Informatica user community at Fidelity Investments information regarding

Informatica infrastructure at FEB Processes for the development life cycle Best practices/ tips and techniques

The cookbook hopes to be a starting point for developers so that they can understand standards/processes and best practices before starting work on the FEB Informatica infrastructure. It also will act as a refresher for experienced developers for best practices and learning’s from other users.

We hope to update this document on a regular basis to incorporate better practices and learning’s.

2.0 GETTING STARTED

2.1 ABOUT INFORMATICA

Informatica PowerCenter is a data integration platform for building, deploying, and managing enterprise data warehouses, and other data integration projects. Informatica PowerCenter enables users to easily transform data from disparate enterprise systems and sources into reliable information to support strategic business initiatives.

2.1.1 Version in useThe versions of Informatica currently running are 5.1 and 6.2. All new developments should be done in version 6.2. All projects currently in Informatica 5.1 will be migrated to Informatica 6.2

3.0 INFORMATICA DEVELOPMENT CYCLE

3.1 STARTING A NEW PROJECT

3.1.1 Project InitializationA mail has to be sent to the Informatica Support team before the start of any project. The mail should contain the following information.

1. Project Name2. Project Contact3. Folder Name4. List of users accessing the folder5. Informatica version planned to be used6. Expected date of moving to Production

Version 1.00 Page 5 of 19

Page 6: In for Ma Tic a Cookbook

Informatica Cookbook

7. Expected number of sessions/mappings in the project

A minimum of 5 days notice should be given for code to be moved to production to help plan the same.

3.1.2 LoginEvery user should have a login into development as well as production. The Corp id will be used as login for individual users. In development users will be given access to both create and execute mappings/sessions whereas in production only read access will be given. The request for creating a new login may come as a part of the project initialization mail or a separate mail maybe sent to the Informatica Support Group. A selective execute privilege can be requested for some sessions or workflows.

3.1.3 Folders and Groups setupFolders will be created based on the information provided to the Informatica Support Group as a part of the project initialization process. Groups will be setup in Informatica to manage access of users to various folders.

3.2 DEVELOPMENT AND TESTING PROCESS

All development should be done in the development instance of Informatica and Oracle. Separate folders should be created in the same development repository for development, QA and SIT.

The folders marked as <folder_name>_prod will be moved to production on request. The naming convention to be followed will be as described in the Naming Convention best practice in section 7.1.

3.3 MIGRATION TO PRODUCTION

3.3.1 Information to be provided

After coding and testing has been done in development, the following information should be provided to the Informatica Development Team so as to facilitate movement of code. This could also be true for enhancements/Bugfixes existing mappings/Sessions

1. Project name2. Folder in development3. List of session/mapping names4. If any scripts need to be moved then the list of the same5. Date when the movement has to be made

3.3.2 Review before movement

The Informatica Support group will review mappings and Sessions before it is moved from development to Production, following are some of the important points

1. Check if existing database connector/FTP connectors/ External loaders have

Version 1.00 Page 6 of 19

Page 7: In for Ma Tic a Cookbook

Informatica Cookbook

been used2. Check if the failure scripts have been added, refer to the Best Practices

session for more details3. Location of scripts/intermediate files and any datafiles4. Location of lookup and other Caches5. If any intermediate files are being generated as a part of the process then

they should be deleted at the end of the process6. Check if any existing code or setting can cause a known bug7. Check if the Scheduling could effect the performance of existing sessions8. Suggest process improvements to improve efficiency. Any project team could

approach the Informatica Support Team during the initial stages of the project for process review. If the methods used to code may affect the existing systems then they will not be moved into production.

9. Restartability

3.4 CHANGES TO AN EXISTING PROJECT

If enhancements/Bugfixes have been made to existing mappings/Sessions, the same need to be tested in development and the following information should be provided to the Informatica Support Team

1. Project name2. Folder in development3. List of session/mapping names4. If any scripts need to be moved then the list of the same.5. Date when the movement has to be made

4.0 TRANSITION OF PROJECTS FOR SUPPORT

4.1 REQUIREMENTS FOR SUPPORT

1. A operations document on the functionality of sessions/Mappings to be supported. 2. A re-start and recovery document explaining the actions to be taken if there is a

failure for every session/batch. It would be recommended to create cleanup scripts so as to avoid unnecessary manual intervention.

3. Information on support contacts should be provided so that in case there is a need they can be contacted. The types of contacts to be provided should be a primary contact and a secondary contact.

4.2 SUPPORT PROCESS ON FAILURE

1. On failure the session will send out a mail/Page to the support team. The Informatica support team shall follow the Re-start and recovery process provided.

2. A mail will be sent to the primary and secondary contacts summarizing the reason for failure and the action taken

Version 1.00 Page 7 of 19

Page 8: In for Ma Tic a Cookbook

Informatica Cookbook

4.3 SUPPORT WINDOW

Support for Informatica jobs will be provided between the following hours

Monday to Friday

OnSite Support - 9:00AM to 6:00PMOffShore Support - 11:00PM to 6:00PM

Saturday/Sunday and HolidaysOffShore Support - 11:00PM to 6:00PM

5.0 INFORMATICA ENVIRONMENTS

5.1 DEVELOPMENT

The Informatica Development Engine is setup in webstatdev. The repository is in oracle and it has been hosted in smmk94 so that backup’s of it are taken from time to time. There are development instances in version 5.1 and 6.2.

Power center 5.1Repository name – EsiteDevRepository Database -

Power center 6.2Repository name – PMNEWHost Name - webstatdevPort number - 5031

5.2 PRODUCTION

The Informatica Development Engine is setup in smmk94. The repository is in oracle and it has been hosted in smmk94. There are production instances in version 5.1 and 6.2.

Power center 5.1Repository name – EsiteProdRepository Database -

Power center 6.2Repository name – eSite62testHost Name - smmk94Port number - 5031

Version 1.00 Page 8 of 19

Page 9: In for Ma Tic a Cookbook

Informatica Cookbook

6.0 ENGINE MANAGEMENT

6.1 MANAGING THE ENGINE

The development and production engines shall be managed by the Informatica support Team. Information regarding planned downtime/ upgrades shall be provided to the user community from time to time.

6.2 RESTARTING THE ENGINE

A mail shall be sent to the user community regarding the re-starting of the engine and after the engine has been brought up this confirmation will be sent so that users can double check status of their sessions. If the sessions have not been scheduled properly the uses should inform the Informatica support team.

7.0 BEST PRACTICES

7.1 NAMING STANDARDS

7.1.1 ChallengeDefine standards to be used during development in Informatica

7.1.2 Description

FoldersFolders are a collection of mappings, sources, targets, sessions, and batches.

Syntax:ProjectName_phase

Description:

Phase ‘DEV’ - Development‘SIT’ - Integration Testing‘UAT’ - Acceptance Testing‘PROD’ - Production

ProjectName Acronym of Group Project

Note: not all phases may be required by each development group. Additional folders can be created to meet the testing needs of the development teams.

PortsPorts are another name for fields. There are many kinds of Ports: Input, Output, Variable, Lookup etc.

Variable port names begin with the ‘v_’ prefix. Output ports that have been added during coding should begin with ‘o_’ prefix

Version 1.00 Page 9 of 19

Page 10: In for Ma Tic a Cookbook

Informatica Cookbook

All other port names are at the discretion of the programming team.

TransformsThe names of these objects should describe what the transform does. Be as clear and concise as possible. Prefixes are:

exp_ - Expressionsjnr_ - Joinersfil_ - Filterslkp_ - Lookupsagg_ - Aggregatorsseq_ - Sequence Generatorsq_ - Source Qualifierupd_ - Update Strategysp_ - Stored Procedurenrm_ - Normlizerrnk_ - Rankrtr_ - Routerxsq_ - XML Source qualifiersrt_ - Sorter

Sources and Targets

For databases tables, default Source and Target names are derived from the ODBC data source name and the table name/view name of the object in the DBMS.

For files, default Source names are derived from FLATFILE:name of file.

Mappings

There are no standards for this category of object. However, it is strongly suggested NOT to use the default name. It is suggested that all mappings begin with the letter m.

Sessions/Batches and workflows

Sessions and Batches are the descriptive components that wrap the mappings and provide the detail regarding how, when and with what sources/targets to use during a mapping execution.

Syntax :Qualifier_Batch/SessionName

Description:Qualifier - ‘s’ for Session

‘b’ for Batch‘wf’ for workflow‘wl’ for worklet

Batch/SessionName - Free form text, usually the Mapping Name without the prefix ‘m’.

Database Connections at the Server

Version 1.00 Page 10 of 19

Page 11: In for Ma Tic a Cookbook

Informatica Cookbook

The PowerMart™ engine requires database connections on the machine the engine is running. In order to establish clear connection names the following standard should be used:

For Oracle Connections:

Syntax:database_LogonID

Description:

database - The Oracle SchemaLogonID - The user id to use when logging into the source/target

Example:CAP1_powerm

For Sybase Connections:

Syntax:server_database_LogonID

Description:Server - The server nameLogonID - The user id to use when logging into the source/target

Example:dbp1_powerm

For MS-SQLServer Connections:

Syntax:Server_Database_LogonID

Description:Database - The Database nameLogonID - The user id to use when logging into the source/target

Example:dbp1_powerm

External loader at the Server

The PowerMart™ engine requires external loader on the machine the engine is running to use bulk loading utilities to load data to databases. In order to establish clear loader names the following standard should be used:

For Oracle loader:

Syntax:SQLLDR_Schema_LogonID

Description:

Version 1.00 Page 11 of 19

Page 12: In for Ma Tic a Cookbook

Informatica Cookbook

Schema - The Oracle SchemaLogonID - The user id to use when logging into the source/target

Example:CAP1_powerm

7.2 TEMPLATES

7.2.1 ChallengeDevelop a method by which the code in Informatica can be documented so that it is easy for development and transitioning to a support team.

7.2.2 Description

A template document has been created to document the logic in the Informatica transforms. This document will be a master list of all activities to be done. One template document will be created for every mapping. The template document consists of the following sections

SetupThis section would contain the details of source and target, the intermediate data elements and any comments at the template level.

Process over viewThis section would consist of the pictorial representation of the mapping for clarifying the data flow. Target to source mappingThis section would have details on transformations to be done between the source and the target fields. These transformations would be mapped with respect to each target field.

Error handlingThis section would contain the error conditions and the actions to be taken for each of the error conditions.

Re-start and RecoveryThis section would detail the restart and recovery strategy in cases of failure.

Setup

Setup has the following details

# Name Description1. Mapping Name The name of the mapping document. 2. Description Any detailed description found necessary for the

document.3. Source Details source for the mapping4. Target Details the target for the mapping

Version 1.00 Page 12 of 19

Page 13: In for Ma Tic a Cookbook

Informatica Cookbook

5. Initial Rows The average number of records expected to be processed; this will be used for database size estimation and load window.

6. Load Frequency The frequency of loads, this could be daily, weekly, monthly etc.

7. Load Window The time period during which the upload will take place

8. Pre-processor The activities to be done before processing the transformations. Any specific checks will have to be added here.

9. Post Processing The activities after the transformation process are complete. Any specific checks will have to be added here.

10. Remarks Any remarks applicable at the Mapping level.

Sources1. Tables The source table name, the schema/owner name

and any filter condition to be applied for the table. If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column.

2. File The source file name, the location of the file, the file type, the file format, relationship between various files and information regarding presence of header and footer.

Target1. Tables The target table name, the schema/owner name

If multiple tables are present then all the table names will have to be added. The relationship between the tables will be provided in the relationship column.

2. File The target file name, the location of the file, the file type, the file format, relationship between various files and information regarding presence of header and footer.

Lookups1. Look up name The name of the lookup.2. Lookup Table The source of data3. Table Owner The owner of the table4. Lookup Columns The columns that are to be included in the

lookup5. Filter The condition to be applied to the data to be

fetched from the table6. Comments The context of usage of the lookup

Source to target mapping

# Name Description1. Target Table name The table name of the ODS table2. Target field name Field name in the target field3. Target datatype The datatype of the Target field

Version 1.00 Page 13 of 19

Page 14: In for Ma Tic a Cookbook

Informatica Cookbook

4. Target mandatory To indicate if the field is mandatory5. Default value The default value if field is null6. Source Table/File name The table/file name of the source7. Source field name Field name in the source field8. Comments and detailed

transformationsThe details of all transformations to be done

Error HandlingAny specific error handling needs can be specified in this section of the template.

Re-start and RecoveryAny recovery needs of the mapping should be described in this section. If any special script needs to be run or data needs to be deleted before re-running a session it should be described here.

7.3 USAGE OF CONNECTION OBJECTS

7.3.1 ChallengeDefine and Use connection objects like database connectors, FTP connections and external loader connections so that redundancies are eliminated and management of these objects becomes easy.

7.3.2 Description When connecting to the database the administrative user should not be used, an

application specific batch user should be used The naming convention to be followed is as specified in the naming convention

section 7.1 The name of the connection object in QA and production should be the same When using the external loader, for the external loader executable name instead

of using /webstatmmk1/oracle/product/9.2.0.2/bin/sqlldr use the shell script /webstatmmk1/ia/pm47/sh_load or /webstatmmk1/ia/pm47/ sh_load_parallel_direct

7.4 FAILURE SCRIPTS

7.4.1 Challenge

Develop a mechanism by which errors can be tracked and comprehended

7.4.2 Description

Implementation of the failure script

Failure Scripts in Informatica 5.1

Failure Scripts in Informatica 6.2

Version 1.00 Page 14 of 19

Page 15: In for Ma Tic a Cookbook

Informatica Cookbook

General guidelines – from Failure perspective

All sessions should have a failure call in the post processing If there is a requirement to call an SQL block before or after a session it is better

to write it as a stored Procedure and call it than writing an SQL block It would be a good practice to call the stored procedure as a part of the mapping

than calling it in a shell script Run if previous successful should be set for every session so as to avoid run away

sessions. Fail parent if session fails property should be checked in every session when

coding in Informatica 6.2 The limit of number of acceptable errors should always be set. It should

preferably be 1000.

7.5 TRUNCATING DATA

7.5.1 Challenge

Truncate data before loading, when an application user is being used to connect to the database.

7.5.2 Description

If existing data needs to be truncated and re-loaded then a procedure should be written in Oracle to truncate the data instead of setting the property at the target as truncate before load. By this method data can be truncated even when the Informatica sessions are connecting to the database using a non DBA user. A sample of the procedure is as given under. Only batch id’s should have the access to execute this proc.

This procedure can then be called from Informatica within the mapping or in the preprocessing using a shell script.

PROCEDURE TruncateTable (p_tname in varchar2, p_towner in varchar2)is

v_ddl_line varchar2(1000) ;begin

v_ddl_line := 'truncate table '||p_towner||'.'||p_tname||' drop storage' ;

execute immediate v_ddl_line ;

exception when others then dbms_output.put_line('Error : '||to_char(SQLCODE)||' '||SQLERRM);end;

7.6 BUILT-IN RE-STARTABILITY

Version 1.00 Page 15 of 19

Page 16: In for Ma Tic a Cookbook

Informatica Cookbook

7.6.1 ChallengeDesign sessions such that the support and maintenance effort is low

7.6.2 Description

Sessions should be created with built in re-startability. Incase of failure it should be easy to re-start from the point of failure.

Incase aggregates are being populated data should be first deleted for the period for which data is being inserted before actually inserting the data.

Tasks should be broken into different sessions that calling all scripts as a part of one session. By this if a given script fails then re-starting would be easy.

Version 1.00 Page 16 of 19

Page 17: In for Ma Tic a Cookbook

Informatica Cookbook

7.7 PROJECT DIRECTORY STRUCTURE IN UNIX

7.7.1 ChallengeDefine a standard for organization of directories in Unix

7.7.2 Description

All examples are for a project named sample.

Following directories should be created inside the home directory for each project Bin – Directory for all the scripts used in the project (E.g.

/webstatmmk1/post/sample/bin) Env – Directory for parameter and environment settings files(E.g.

/webstatmmk1/post/sample/env) Incoming – Directory where the files that act as the source for the project

should reside (E.g. /webstatmmk1/post/sample/incoming) Outgoing – Directory where the output files created by various processes

should reside (E.g. /webstatmmk1/post/sample/outgoing) Temp – Directory for temporary files created by various processes, the

bad files and lookup cache files created by Informatica should also reside in this directory (E.g. /webstatmmk1/post/sample/temp)

Log – Directory for the log files generated by various processes in the project. The Informatica log files should be saved into this directory (E.g. /webstatmmk1/post/sample/log)

Archive – Directory for storing files that need to be archived as a part of the project (E.g. /webstatmmk1/post/sample/archive)

The Directory where the log files are stored should be added to the script in the crontab that checks for the # of errors and warnings in Informatica log files so that it would become easy to track sessions with many errors/warnings.

7.8 PARAMETERIZATION OF SESSION INFORMATION

7.8.1 Challenge Session information should be parameterized as far as possible so that migration of code between dev/qa and production can be done with minimum changes. The log files/bad files target files etc can be separated for each application so that they don’t affect each other.

7.8.2 Description

The session information that can be parameterized is

Srl. # Session Information1. Session log file Directory/name2. Source database connector3. Source file directory/name4. Target database connector

Version 1.00 Page 17 of 19

Page 18: In for Ma Tic a Cookbook

Informatica Cookbook

5. Target file directory/name6. Reject File directory/name7. $Source connection value in the

properties tab8. $Target connection value in the

properties tab

A sample parameter file

[SAMPLE.s_m_first_sample_session] $PMSessionLogFile=/webstatmmk1/post/sample/log/ s_m_first_sample_session.log $DBConnection_sample_source=sample_source $DBConnection_sample_target=sample_target $RejectFile_sample=/webstatmmk1/post/sample/temp $TargetFileDir_test=/webstatmmk1/post/sample/outgoing $SrcFileDir_test=/webstatmmk1/post/sample/incoming

Parameter file header

The header should be FolderName.SessionName, the folder name is not required but it is advised to add the same.

Session log file Directory/name

The session log file name and directory can be parameterized, if only the file name needs to be parameterized then the property “Session Log File Name” needs to set to $PMSessionLogFile. If the log file name and directory needs to be parameterized then the property “Session Log File directory“ should be left blank and then the property “Session Log File Name” should be set to $PMSessionLogFile.

Database connection

The source and target database connection information can be parameterized.

Source/Target/reject File Directory/Name

The Source/Target or Reject file names can be parameterized. If only the file name needs to be changed to $TargetFileDir_test and the value for the parameter can be set to a different file name. If the file as well as the directory needs to be changed then the property “Output file directory” should be left blank and in the file name should be populated as $TargetFileDir_test.

Session Information that cannot be parameterized using a value in the parameter file

1. Information in the transformation taba. Lookup and Stored proc connection information

i. The $Source and $Target that is defined in the properties tab can be used for the lookup and the stored proc connection information

b. Cache file locationi. Unix soft links should be used so that the same string can be

Version 1.00 Page 18 of 19

Page 19: In for Ma Tic a Cookbook

Informatica Cookbook

used in Development/QA and Production2. Parameter Filename in the properties tab, an exception being if the session is

being scheduled by pmcmd. When using pmcmd the parameter file name is taken as an input parameter.

Version 1.00 Page 19 of 19