21
Informatica Power Center PowerCenter provides an environment that allows you to load data into a centralized location, such as a datamart, data warehouse, or operational data store (ODS). You can extract data from multiple sources, transform the data according to business logic you build in the client application, and load the transformed data into file and relational targets. PowerCenter provides the following integrated components: PowerCenter repository. The PowerCenter repository is at the center of the PowerCenter suite. You create a set of metadata tables within the repository database that the PowerCenter applications and tools access. The PowerCenter Client and Server access the repository to save and retrieve metadata. PowerCenter Repository Server. The PowerCenter Repository Server manages connections to the repository from client applications. It inserts, updates, and fetches objects from the repository database tables. It also maintains object consistency. PowerCenter Client. Use the PowerCenter Client to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client has the following client applications: Repository Manager, Repository Server Administration Console, Designer, Workflow Manager, and Workflow Monitor. PowerCenter Server. The PowerCenter Server extracts the source data, performs the data transformation, and loads the transformed data into the targets. Sources PowerCenter accesses the following sources: Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server, and Teradata.

Informatica Power Center NOTES

Embed Size (px)

DESCRIPTION

Informatica power center Notes

Citation preview

Page 1: Informatica Power Center NOTES

Informatica Power Center

PowerCenter provides an environment that allows you to load data into a centralized location, such as a datamart, data warehouse, or operational data store (ODS). You can extract data from multiple sources, transform the data according to business logic you build in the client application, and load the transformed data into file and relational targets. PowerCenter provides the following integrated components:

PowerCenter repository. The PowerCenter repository is at the center of the PowerCenter suite. You create a set of metadata tables within the repository database that the PowerCenter applications and tools access. The PowerCenter Client and Server access the repository to save and retrieve metadata.

PowerCenter Repository Server. The PowerCenter Repository Server manages connections to the repository from client applications. It inserts, updates, and fetches objects from the repository database tables. It also maintains object consistency.

PowerCenter Client. Use the PowerCenter Client to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client has the following client applications: Repository Manager, Repository Server Administration Console, Designer, Workflow Manager, and Workflow Monitor.

PowerCenter Server. The PowerCenter Server extracts the source data, performs the data transformation, and loads the transformed data into the targets. 

 SourcesPowerCenter accesses the following sources:

Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server, and Teradata.

File. Fixed and delimited flat file, COBOL file, and XML. Application. You can purchase additional PowerConnect products to access

business sources, such as PeopleSoft, SAP R/3, Siebel, IBM MQSeries, and TIBCO.

Mainframe. You can purchase PowerConnect for Mainframe for faster access to IBM DB2 on MVS.

Other. Microsoft Excel and Access. Note: The Designer imports relational sources, such as Microsoft Excel, Microsoft Access, and Teradata using ODBC and native drivers.

Page 2: Informatica Power Center NOTES

For more information about sources, see “Working with Sources” in the Designer Guide. Targets PowerCenter can load data into the following targets:

Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata.

File. Fixed and delimited flat file and XML. Application. You can purchase additional PowerConnect products to load

data into SAP BW. You can also load data into IBM MQSeries message queues and TIBCO.

Other. Microsoft Access. You can load data into targets using ODBC or native drivers, FTP, or external loaders. For more information about targets, see “Working with Targets” in the Designer Guide. Power Center repositoryThe PowerCenter repository resides on a relational database. The repository database tables contain the instructions required to extract, transform, and load data. PowerCenter Client applications access the repository database tables through the Repository Server. You add metadata to the repository tables when you perform tasks in the PowerCenter Client application, such as creating users, analyzing sources, developing mappings or mapplets, or creating workflows. The PowerCenter Server reads metadata created in the Client application when you run a workflow. The PowerCenter Server also creates metadata, such as start and finish times of a session or session status. You can develop global and local repositories to share metadata:

Global repository. The global repository is the hub of the domain. Use the global repository to store common objects that multiple developers can use through shortcuts. These objects may include operational or Application source definitions, reusable transformations, mapplets, and mappings.

Local repositories. A local repository is within a domain that is not the global repository. Use local repositories for development. From a local repository, you can create shortcuts to objects in shared folders in the global repository. These objects typically include source definitions, common dimensions and lookups, and enterprise standard transformations. You can also create copies of objects in non-shared folders.

Page 3: Informatica Power Center NOTES

Version control. A versioned repository can store multiple copies, or versions, of an object. Each version is a separate object with unique properties. PowerCenter version control features allow you to efficiently develop, test, and deploy metadata into production.

You can connect to a repository, back up, delete, or restore repositories using pmrep, a command line program. For more information on pmrep, see “Using pmrep”.Repository Server                    The Repository Server manages repository connection requests from client applications. For each repository database registered with the Repository Server, it configures and manages a Repository Agent process. The Repository Server also monitors the status of running Repository Agents, and sends repository object notification messages to client applications.

                 The Repository Agent is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. The Repository Agent ensures the consistency of metadata in the repository by employing object locking. PowerCenter Client The PowerCenter Client consists of the following applications that you use to manage the repository, design mappings, mapplets, and create sessions to load the data:

Repository Server Administration Console. Use the Repository Server Administration console to administer the Repository Servers and repositories.

Repository Manager. Use the Repository Manager to administer the metadata repository. You can create repository users and groups, assign privileges and permissions, and manage folders and locks.

Designer. Use the Designer to create mappings that contain transformation instructions for the PowerCenter Server. Before you can create mappings, you must add source and target definitions to the repository. The Designer has five tools that you use to analyze sources, design target schemas, and build source-to-target mappings:

o Source Analyzer. Import or create source definitions. o Warehouse Designer. Import or create target definitions.

Page 4: Informatica Power Center NOTES

o Transformation Developer. Develop reusable transformations to use in mappings.

o Mapplet Designer. Create sets of transformations to use in mappings.

o Mapping Designer. Create mappings that the PowerCenter Server uses to extract, transform, and load data.

Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the links connecting the tasks. You can run a task by placing it in a workflow.

Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each PowerCenter Server. You can choose a Gantt Chart or Task view. You can also access details about those workflow runs.

Install the client tools on a Microsoft Windows machine. For more information about installation requirements, see Minimum System Requirements. Power Center Server The Power Center Server reads mapping and session information from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules that you configure in the mapping. The Power Center Server loads the transformed data into the mapping targets.

The Power Center Server can achieve high performance using symmetric multi-processing systems. The Power Center Server can start and run multiple workflows concurrently. It can also concurrently process partitions within a single session. When you create multiple partitions within a session, the Power Center Server creates multiple database connections to a single source and extracts a separate range of data for each connection, according to the properties you configure. Database Connections The Repository Server maintains a pool of reusable database connections for serving client applications. The server generates a Repository Agent process for each database. The Repository Agent creates new database connections only if all the current connections are in use.

For example, if 10 clients send requests to the Repository Agent one at a time, the agent requires only one connection. It reuses the same database connection for all the requests. If the 10 clients send requests simultaneously, the Repository Agent

Page 5: Informatica Power Center NOTES

opens 10 connections. You can set the maximum number of open connections using the DatabasePoolSize parameter in the repository configuration file.

For a session, a reader object holds the connection for as long as it needs to read the data from the source tables. A writer object holds a connection for as long as it needs to write data to the target tables.

The PowerCenter Server maintains a database connection pool for stored procedure or lookup databases in a workflow. You can optionally set the MaxLookupSPDBConnections parameter to limit connections when you configure the PowerCenter service. The PowerCenter Server allows an unlimited number of connections to lookup or stored procedure databases. If a database user does not have permission for the number of connections a session requires, the session fails.

For pre-session, post-session, and load stored procedures, consecutive stored procedures reuse a connection if they have identical connection attributes. Otherwise, the connection for one stored procedure closes and a new connection begins for the next stored procedure. PowerCenter Metadata Reporter PowerCenter provides PowerCenter Metadata Reporter, a web-based application that allows you to run reports against PowerCenter repository metadata. It gives you insight into your repository, which enhances your ability to analyze and manage your repository efficiently. The Metadata Reporter provides a number of reports, including reports on transformations, mapplets, mappings, sources, targets, sessions, worklets, and workflows. Using the Repository Server Administration ConsoleUse the Repository Server Administration Console to administer your Repository Servers and repositories. A Repository Server can manage multiple repositories. You use the Repository Server Administration Console to create and administer the repository through the Repository Server. You can use the Administration Console to perform the following tasks:

Add, edit, and remove repository configurations. Export and import repository configurations. Create a repository. Promote a local repository to a global repository. Copy a repository. Delete a repository from the database.

Page 6: Informatica Power Center NOTES

Back up and restore a repository. Start, stop, enable, and disable repositories. Send repository notification messages. Register and unregister a repository. Propagate domain connection information for a repository. View repository connections and locks. Close repository connections. Register and remove repository plug-ins. Upgrade a repository. 

Repository Objects You create repository objects using the Repository Manager, Designer, and Workflow Manager client tools. You can view the following objects in the Navigator window of the Repository Manager:

Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.

Target definitions. Definitions of database objects or files that contain the target data.

Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.

Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the PowerCenter Server uses to transform and move data.

Reusable transformations. Transformations that you can use in multiple mappings.

Mapplets. A set of transformations that you can use in multiple mappings. Sessions and workflows. Sessions and workflows store information about

how and when the PowerCenter Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. 

The Design Process

The goal of the design process is to create mappings that depict the flow of data between sources and targets, including changes made to the data before it reaches the targets. However, before you can create a mapping, you must first create or import source and target definitions. You might also want to create reusable

Page 7: Informatica Power Center NOTES

objects, such as reusable transformations or mapplets. For a list of objects you create in the Design process, see Repository Objects.

Perform the following design tasks in the Designer: 1. Import source definitions. Use the Source Analyzer to connect to the

sources and import the source definitions. 2. Create or import target definitions. Use the Warehouse Designer to

define relational, flat file, or XML targets to receive data from sources. You can import target definitions from a relational database or a flat file, or you can manually create a target definition.

3. Create the target tables. If you add a target definition to the repository that does not exist in a relational database, you need to create target tables in your target database. You do this by generating and executing the necessary SQL code within the Warehouse Designer.

4. Design mappings. Once you have source and target definitions in the repository, you can create mappings in the Mapping Designer. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. A transformation is an object that performs a specific function in a mapping, such as looking up data or performing aggregation.

5. Create mapping objects. Optionally, you can create reusable objects for use in multiple mappings. Use the Transformation Developer to create reusable transformations. Use the Mapplet Designer to create mapplets. A mapplet is a set of transformations that may contain sources and transformations.

6. Debug mappings. Use the Mapping Designer to debug a valid mapping to gain troubleshooting information about data and error conditions.

7. Import and export repository objects. You can import and export repository objects, such as sources, targets, transformations, mapplets, and mappings to archive or share metadata. 

Workflow Manager The Workflow Manager consists of three tools to help you develop a workflow:

Task Developer. Create tasks you want to accomplish in the workflow in the Task Developer.

Workflow Designer. Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow.

Page 8: Informatica Power Center NOTES

Worklet Designer. Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You can nest multiple worklets inside a workflow.

Before you create a workflow, you must configure the following connection information:

PowerCenter Server connection. Register the PowerCenter Server with the repository before you can start it or create a session to run against it.

Database connections. Create connections to source and target systems. Other connections. If you want to use external loaders or FTP, you

configure these connections in the Workflow Manager. Workflow Monitor After you create a workflow, you run the workflow in the Workflow Manager and monitor it in the Workflow Monitor. The Workflow Monitor is a tool that displays details about workflow runs in two views, Gantt Chart view and Task view. You can monitor workflows in online and offline modes. The Workflow Monitor consists of the following windows:

Navigator window. Displays monitored repositories, servers, and repositories objects.

Output window. Displays messages from the PowerCenter Server. Time window. Displays progress of workflow runs. Gantt Chart view. Displays details about workflow runs in chronological

format. Task view. Displays details about workflow runs in a report format. 

Getting Started Before you can begin using PowerCenter, you must create the environment and perform the following administration tasks to allow access to the repository and the PowerCenter Server:

1. Configure the sources. If you extract data from relational sources, ask the database administrator to create user profiles with read access. These user profiles allow you to import source definitions into the repository and access the sources at runtime.

If you extract data from file sources, the files must be accessible to the PowerCenter Server and Client machines.

2. Configure the targets. Ask the database administrator to create user profiles with read and write access. These user profiles allow you to import target definitions into the repository and write to the targets at runtime.

Page 9: Informatica Power Center NOTES

If the target database does not exist, create it using the database administration tools included with your RDBMS. After you create the target database, you can use the Designer to design and create target tables. For flat file targets, you need a target directory large enough to process the resulting files.

3. Choose globalization settings and data movement modes. The data movement mode you use depends on whether you want the PowerCenter Server to process single-byte data or multibyte character data. You select code pages for the repository, PowerCenter Client and PowerCenter Server.

4. Create repository database. Create a database for the repository. Users accessing the repository database need full rights in that database. If you upgrade the repository to a new version, you need database rights to drop or modify these tables.

5. Install the PowerCenter Client. Install the client software on a machine that accesses the sources, targets, and repository databases, as well as the PowerCenter Server.

6. Install and configure the Repository Server. Install and configure the Repository Server on a machine that accesses the repository database, the PowerCenter Client, and the PowerCenter Server.

7. Install and configure the PowerCenter Server. Install the PowerCenter Server on a Windows or UNIX system that accesses the sources, targets, and the repository database.

8. Configure connectivity. Configure network, native, and ODBC connectivity. Create ODBC data sources to connect to the PowerCenter Clients to the sources and targets. You must also have network connections between all databases and PowerCenter Servers.

9. Create the repository. After you configure connectivity between source, target, and repository databases, you can create the metadata repository. Connect to the Repository Server from within the Repository Server Administration Console to create the metadata repository. The Repository Server connects to the repository database and runs the SQL to create the repository tables. All the objects you create with PowerCenter are stored as metadata in the repository.

10.Create repository users and groups. Create groups and user profiles, then assign privileges and permissions that determine tasks that users can perform.

Page 10: Informatica Power Center NOTES

11.Register the PowerCenter Server. Before you can start the PowerCenter Server, you must register the PowerCenter Server so the Workflow Manager can direct the PowerCenter Server to the repository. 

12. Changing Data Movement Modes 13.You can change the PowerCenter Server data movement mode in the

PowerCenter Server configuration parameters. After you change the data movement mode, the PowerCenter Server runs in the new data movement mode the next time you start the PowerCenter Server. When the data movement mode changes, the PowerCenter Server handles character data differently. To avoid creating data inconsistencies in your target tables, the PowerCenter Server performs additional checks for sessions that reuse session caches and files.

14.Table 2-1 describes how the PowerCenter Server handles session files and caches after you change the data movement mode:

15.Table 2-1. Session and File Cache Handling After Data Movement Mode Change

Session File or Cache

Time of Creation or UsePowerCenter Server Behavior After Data Movement Mode

Change

Session Log File (*.log)

Each session.

No change in behavior. Creates a new session log for each session using the PowerCenter Server code page.

Workflow Log Each workflow.

No change in behavior. Creates a new workflow log file for each workflow using the PowerCenter Server code page.

Reject File (*.bad)

Each session.

No change in behavior. Appends rejected data to the existing reject file using the PowerCenter Server code page.

Output File (*.out)

Sessions writing to flat file. No change in behavior for delimited flat files. Creates a new output file for each session using the target

Page 11: Informatica Power Center NOTES

code page.

Indicator File (*.in)

Sessions writing to flat file.No change in behavior. Creates a new indicator file for each session.

Incremental Aggregation Files (*.idx, *.dat)

Sessions with Incremental Aggregation enabled.

When files are removed or deleted, the PowerCenter Server creates new files.

When files are not removed or deleted, the PowerCenter Server fails the session with the following error message:

TE_7038 Aggregate Error: ServerMode: [server data movement mode] and CachedMode: [data movement mode that created the files] mismatch.

You should also remove or delete files created using a different code page.

Unnamed Persistent Lookup Files (*.idx, *.dat)

Sessions with a Lookup transformation configured for a named persistent lookup cache.

Rebuilds the persistent lookup cache.

Named Persistent Lookup Files (*.idx, *.dat)

Sessions with a Lookup transformation configured for a persistent lookup cache.

Fails the session.

Code Page Overview                A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set. You use code pages to identify data that might be in different languages. For example, if you are importing Japanese data into a mapping, you must select a Japanese code page for the source data.

Page 12: Informatica Power Center NOTES

                 When you choose a code page, the program or application for which you set the code page refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data. 

Table 2-2. Code Page Compatibility

Component Code Page Code Page Compatibility

Source (including relational, flat file, and XML file)

Subset of target.Subset of PowerCenter Server.

Target (including relational, XML files, and flat files)

Superset of source.Superset of PowerCenter Server.PowerCenter Server creates external loader data and control files using the target flat file code page.

Lookup and Stored Procedures

Compatible with PowerCenter Server and repository.

PowerCenter Server

Superset of source.Subset of target.Identical to PowerCenter Server operating system and machine hosting pmcmd.Compatible with repository and PowerCenter Client.Compatible with database connection code page used by Lookup and Stored Procedure transformations.

Repository ServerCompatible with repository.Compatible with PowerCenter Client and PowerCenter Server.

Global RepositoryCompatible with local repository. Can also be a subset of local repository.Compatible with PowerCenter Client and Server.

Local RepositoryCompatible with global repository. Can also be a superset of global repository.Compatible with PowerCenter Client and Server.

Standalone Repository Compatible with PowerCenter Client and Server.

Page 13: Informatica Power Center NOTES

PowerCenter Client Compatible with PowerCenter Server and repository.

Machine hosting pmcmd Identical to PowerCenter Server.

Power Center Server Variable Directories The installation program creates the following directories under the installation directory to store session files and caches associated with each PowerCenter Server:

BadFiles Cache ExtProc LkpFiles SessLogs SrcFiles Temp TgtFiles WorkflowLogs

All workflows use these directories by defaultServer Variables You can define server variables for each PowerCenter Server you register. Server variables define the path and directories for session and workflow output files and caches. You can also use server variables to define workflow properties, such as the number of workflow logs to archive. The installation process creates default directories in the location where you install the PowerCenter Server. By default, the PowerCenter Server writes output files in these directories when you run a workflow. To use these directories as the default location for the session and workflow output files, you must configure the server variable $PMRootDir to define the path to the directories. Sessions and workflows are configured to use server directories by default. You can override the default by entering different directories session or workflow properties. For example, you might have a PowerCenter Server running all workflows in a repository. If you define the server variable for workflow logs directory as c:\pmserver\workflowlog, the PowerCenter Server saves the workflow log for each workflow in c:\pmserver\workflowlog by default. If you change the default server directories, make sure the designated directories exist before running a workflow. If the PowerCenter Server cannot resolve a directory during the workflow, it cannot run the workflow. By using server variables instead of hard-coding directories and parameters, you simplify the process of changing the PowerCenter Server that runs a workflow. If

Page 14: Informatica Power Center NOTES

each workflow in a development folder uses server variables, then when you copy the folder to a production repository, the production server can run the workflow as configured. When the production server runs the workflow, it uses the directories configured for its server variables. If, instead, you changed workflow to use hard-coded directories, workflows fail if those directories do not exist on the production server. Table 11-1 lists the server variables you configure when you register a PowerCenter Server:

Table 11-1. Server Variables

Server VariableRequired/Optional

Description

$PMRootDir Required

A root directory to be used by any or all other server variables. Informatica recommends you use the PowerCenter Server installation directory as the root directory.

$PMSessionLogDir RequiredDefault directory for session logs. Defaults to $PMRootDir/SessLogs.

$PMBadFileDir RequiredDefault directory for reject files. Defaults to $PMRootDir/BadFiles.

$PMCacheDir Required

Default directory for the lookup cache, index and data caches, and index and data files. To avoid performance problems, always use a drive local to the PowerCenter Server for the cache directory. Do not use a mapped or mounted drive for cache files. Defaults to $PMRootDir/Cache.

$PMTargetFileDir RequiredDefault directory for target files. Defaults to $PMRootDir/TgtFiles.

$PMSourceFileDir RequiredDefault directory for source files. Defaults to $PMRootDir/SrcFiles.

$PMExtProcDir Required Default directory for external

Page 15: Informatica Power Center NOTES

procedures. Defaults to $PMRootDir/ExtProc.

$PMTempDir RequiredDefault directory for temporary files. Defaults to $PMRootDir/Temp.

$PMSuccessEmailUser Optional

Email address to receive post-session email when the session completes successfully. Use to address post-session email.

$PMFailureEmailUser Optional

Email address to receive post-session email when the session fails. Use to address post-session email. Default is an empty string. For details, see “Sending Emails” in the Workflow Administration Guide.

$PMSessionLogCount Optional

Number of session logs the PowerCenter Server archives for the session. Defaults to 0. Use to archive session logs. For details, see “Log Files” in the Workflow Administration Guide.

$PMSessionErrorThreshold

Optional Number of non-fatal errors the PowerCenter Server allows before failing the session. Non-fatal errors include reader, writer, and DTM errors. If you want to stop the session on errors, enter the number of non-fatal errors you want to allow before stopping the session. The PowerCenter Server maintains an independent error count for each source, target, and transformation. Use to configure the Stop On option in the session properties.Defaults to 0. If you use the default setting, non-fatal errors do not cause

Page 16: Informatica Power Center NOTES

the session to stop.

$PMWorkflowLogDir RequiredDefault directory for workflow logs. Defaults to $PMRootDir/WorkflowLogs.

$PMWorkflowLogCount Optional

Number of workflow logs the PowerCenter Server archives for the workflow. Use to archive workflow logs. Defaults to 0.

$PMLookupFileDir OptionalDefault directory for lookup files. Defaults to $PMRootDir/LkpFiles.