47468272 introduction-to-informatica

Preview:

DESCRIPTION

Informatica

Citation preview

INFORMATICA

Overview

A DataWarehouse is a collection of subject oriented databases. It is a series of processes, procedures and tools (h/w & s/w). From the Data Warehouse , data flows to various customized databases. If this data is periodically extracted from data warehouse and loaded into local databases, then local database is called a Data Mart.

Metadata

Data SourcesData Sources Data ManagementData Management AccessAccess

Complete Warehouse Solution Architecture

Operational Data

Legacy Data

The Post

VISA

External DataSources

EnterpriseData

Warehouse

Organizationally structured

ExtractTransformLoad

Data Information Knowledge

Asset Assembly (and Management) Asset Exploitation

Data Mart

Data Mart

Departmentally structured

Data Mart

Sales

Inventory

Purchase

Use of Informatica in Datawarehousing

The data in the data warehouse comes from various sources running on different platforms. An ETL tool is used to integrate data from various sources and load it into DataWarehouse.INFORMATICA is an ETL tool used in the process of Extracting data, transforming the data and loading it in data warehouse. INFORMATICA has two products to carry out this ETL process.

PowerCenterPowerMart

Overview

Source TargetServerSource

DataTransformed Data

Instructions

Repository

Overview

Components

INFORMATICA PowerCenter has following components :•ODBC•PowerCenter Server: It is a application that reads, transforms and writes data to target.

•PowerCenter Client : The client has five different tools:

The Source Analyzer : Used to add source definitions to the repository.The Warehouse Designer : Used to create targets and add their definitions to the repository.The Transformation Developer : Used to create reusable transformations.

Components

Mapplet Designer : Used to create

mapplets.The Mapping Designer : Used to create

mappings from source to targets.

Components

Connectivity And Set Up

Configuring Server Manager

• Informatica Server name

• Type of network protocol to access the server – TCP/IP or IPX/SPX

• Port number on which the client communicates (for TCP/IP) - 4001

• Address of machine on which the server runs (for IPX/SPX)

• Timeout – number of seconds the SM waits for response from Informatica Server

Configuring Server Manager

• Default directories for session files and caches e.g $PMRootDir, $PMSessionLogDir, $PMBadFileDir

• Defining Database Connections

• Defining FTP connections

Features

•INFORMATICA Server : Reads data from sources, transforms data as instructed by repository metadata and writes it to target.

•Repository manager: Used to create and manage repositories.

Repository is a database containing a set of instructions to know from where to get data (source), how to process/transform it and where to write it (target). This set of instructions is called metadata.

Features

You can create repository users and groups, assign privileges and permissions, manage folders and locks, import and export from ODBC data sources.•Designer: used to create mappings and target tables.•Server manager: used to create sessions and configure the schedule to run the sessions.

Features

Repository User Management

Multiple developers can use same repository to create/manage multiple projects or same project. Informatica allows to create separate user profile for each developer with separate username and password.

Privileges like Administer Server, Create sessions, User Designer can be assigned to each user on repository.Groups of users can be created and privileges can be granted to the groups.A user can be member of one or more groups.

Repository User Management

Access can be restricted to individual folders within a repository.Permissions of following types can be granted to Owner, Owner’s group and Repository users on folders: Read: Allow to view the folder and objects within the folder. Write: Allow to create and edit objects within the folder. Execute: Allow to execute or schedule a session in the folder.

Repository User Management

Designer

• Creation of mappings

MAPPING

Type of metadata that you create to specify how to move and transform data between sources and targets

- Stored in Repository

A mapping describes how to move and transform data from sources to targets. Mapping includes:

SourceTargetTransformations

Mapping

Sample MappingMapping

A component of a mapping which describes how Informatica Server should transform data.

Transformations

There are two categories of transformations depending upon their scope:

Standard Transformation: It is created in a mapping

and exists within that mapping. It can not be used in

other mappings.

Reusable Transformation: It is created and stored

independently in the repository. It can be used by all

mappings.

Transformations

Following are the types of transformations:

Expression – Calculate a value or modify text. Operates on individual rows.Aggregator – Perform aggregate calculations. Operates on sets of rows.

Transformations

Source Qualifier – Filter records read from the relational source only. Order records queried by Informatica server.Filter – Filter records sent to the targets. Applicable to any source.Stored Procedure – Call a stored procedure.External procedure/Advanced External Procedure – Call a procedure in a shared library (e.g. a DLL) or in a COM layer of Windows NT.

Transformations

Sequence Generator – Generates primary keys.Rank – Limit records to a top or bottom range.Normalizer – Normalize records including those read from COBOL sources.Lookup – Get related values.

Transformations

Update Strategy – Determine whether to insert, update, delete or reject data.Joiner – Join records from different databases or flat file systems.

Transformations

Every mapping needs at least one Source Qualifier Transformation or a normalizer transformation for COBOL sources.

Transformations

Ports

A port represents a single column of data.Every source definition, target definition and transformation contains a collection of ports.

There exist four types of ports:

Input port - Receives data.

Output port – provide data.

Input/Output port – pass data.

Variable port – Used to store components of expression.

Ports

Source definitions contain only output ports, since they provide data.

Target definitions contain only input ports, since they receive data.

Transformations contain a combination of input port, output port and input/output port, since they can pass the data as it is or modify the data depending upon its type.

Ports

Transformation Language

Transformation Language is used to write expressions for Transformations. It consists of functions (similar to SQL) used to modify the data or validate the data.

Expressions can be written in followingtypes of transformations:

Aggregator

Expression

Filter

Rank

Update Strategy.

Transformation Language

Transformation Language consists of following components: Functions : E.g. AVG, COUNT, ISNULL,

SUBSTR, IIF etc. Operators : E.g. Addition, Subtraction,

Multiplication, Division etc. Constants : E.g. Built-in constants like TRUE Variables : E.g. SYSDATE to represent current

date. Return Values.

Transformation Language

Mapplets

A Mapplet is a reusable object created in a repository that represents a set of transformations.

Basic steps to create a project:

Create database that contains repository.

Create data model for target.

Create repositories.

Create folders within repositories.

Import definitions of sources.

Create targets that will receive data.

Summary

Create mappings between source & targets,

including transformations which modify the data.

Create source & target connections in the server

manager.

Create sessions for transferring data between

source & target.

Schedule & run sessions.

Summary

Recommended