40
www.edureka.co View Informatica course details at www.edureka.co/informatica ETL Using Informatica Power Center For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : [email protected] www.edureka.co/informatica

ETL Using Informatica Power Center

  • Upload
    edureka

  • View
    459

  • Download
    2

Embed Size (px)

Citation preview

Page 1: ETL Using Informatica Power Center

www.edureka.co

View Informatica course details at www.edureka.co/informatica

ETL Using Informatica Power Center

For Queries:Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN

For more details please contact us: US : 1800 275 9730 (toll free)INDIA : +91 88808 62004Email Us : [email protected]

www.edureka.co/informatica

Page 2: ETL Using Informatica Power Center

Slide 2 www.edureka.co/informatica

At the end of this session, you will be able to understand:

The Information Economy

ETL - an Overview

Why ETL is still relevant?

Informatica Overview

The Informatica Platform

Why Informatica

Informatica Partners & Customers

Informatica Architecture Overview & Components

Usecase1 - Loading Product Dimension table using Slowly changing dimension (SCD)

Usecase2 - Populate Sales summary table using Incremental Aggregation

Job trends

Scope of this course

Objectives

Page 3: ETL Using Informatica Power Center

Slide 3 www.edureka.co/informatica

MergersAcquisitions

&Divestitures

Acquire &Retain

Customers

OutsourceNon-coreFunctions

ImproveDecisions

ModernizeBusiness

ImproveEfficiency& Reduce

Costs

Lack of relevant, trustworthy and timely data

GovernanceRisk

Compliance

IncreasePartnerNetworkEfficiency

IncreaseBusinessAgility

ConsolidationGlobalization GrowthOperationalEfficiency

Governance

The Information Economy

Lack of Trustworthy Data Impedes Key Business Imperatives

Page 4: ETL Using Informatica Power Center

Slide 4 www.edureka.co/informatica

ETL - An Overview

ETL stands for Extraction, Transformation and Load

The "E" represents the ability to consistently and reliably extract data with high performance and minimal impact to the source system

The "T" represents the ability to transform one or more data sets in batch or real-time into a consumable format

The "L" stands for loading data into a persistent or virtual data store

Page 5: ETL Using Informatica Power Center

Slide 5 www.edureka.co/informatica

Why ETL is Still Relevant

Is ETL becoming a History with the advent of Big Data?

Data needs to flow from source applications into analytic data stores in a controlled, reliable, secure manner

Information needs to be standardized, with regards to semantics, format and lexicon, for accurate analysis

Operational results need to be consistent and repeatable

Operational results need to be verifiable and transparent

Page 6: ETL Using Informatica Power Center

Slide 6 www.edureka.co/informatica

Facilitates Integration of data from various data sources for building a Data warehouse

Businesses have data in multiple databases with different codification and formats

Transformation is required to convert and to summarize operational data into a consistent, business oriented format

Pre-Computation of any derived information

Summarization is also carried out to pre-compute summaries and aggregates

Makes data available in a query-able format

Why ETL is Still Relevant

Mergers and acquisitions also create disparities in data representation and pose more difficult challenges in ETL.

Page 7: ETL Using Informatica Power Center

Slide 7 www.edureka.co/informatica

Informatica – A Product Company

Informatica Corp Provides data integration software and services for various business, industries and government organizations including telecommunications, health care, financial and insurance services.

Founded : 1993

2012 Revenue : $811.6 million

7-year Annual CAGR: 17% per year

Employees : 2,810+

Partners : 450+» Major SI, ISV, OEM and On-Demand Leaders

Customers: Over 5,000» Customers in 82 countries» Direct Presence in 28 countries» # 1 in Customer Loyalty Rankings

(7 Years in a Row)

Page 8: ETL Using Informatica Power Center

Slide 8 www.edureka.co/informatica

The Informatica Approach

Comprehensive, Unified, Open and Economical Approach

Page 9: ETL Using Informatica Power Center

Slide 9 www.edureka.co/informatica

Informatica Products & Their Functionalities

There are a wide range of Products available under the Informatica product suite that helps satisfy the data integration requirements within the enterprise and beyond

Informatica's product is a portfolio focused on Data Integration: Data Integration & ETL Information Lifecycle ManagementComplex Event ProcessingData MaskingData QualityData ReplicationData VirtualizationMaster Data ManagementUltra Messaging

Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data warehouses

Page 10: ETL Using Informatica Power Center

Slide 10 www.edureka.co/informatica

Informatica Products & Their Functionalities

Page 11: ETL Using Informatica Power Center

Slide 11 www.edureka.co/informatica

A Singular Focus on Data Integration

Why Informatica?

Proven technology leadership

A track record of continuous innovation

The most neutral trusted partner

Long history of customer success

Page 12: ETL Using Informatica Power Center

Slide 12 www.edureka.co/informatica

Business Glossary, ICC Manageability

Informatica 8.6.1Cloud Synch.

Q4 2008

Application ILMQ1 2009

Application Information Lifecycle Management

CEPPowerCenter CE

Q3 2009

Informatica 9.0Informatica Cloud 9

Q4 2009

Collaboration, Pervasive DQ, Data ServicesAddress Validation

Q2 2009

Address Validation for DQ

Complex Event Processing and Cloud IaaS

MDMUltra Messaging

Q1 2010

Multi-domain MDMUltra-low Latency Messaging

InformaticaMarketplace

Q2 2010

Online exchange for solutions

CloudQ4 2010

Trust framework, plug-ins

MDMILM

Q3 2010

Test data mgmt

Why Informatica?

A Track Record of Continuous Innovation

Page 13: ETL Using Informatica Power Center

Slide 13 www.edureka.co/informatica

Financial Services and Insurance

Tele-communications

Manufacturing

Retail and Services

Healthcare and Life Sciences

Utilities and Energy

Government andPublic Sector

Transportation and Distribution

Over 4,200 Leaders Rely on Informatica

Why Informatica?

Page 14: ETL Using Informatica Power Center

Slide 14 www.edureka.co/informatica

PowerCenter:

It is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover and integrate data from virtually any business system, in any format and deliver that data throughout the enterprise at any speed.

An ETL tool ( Extract, Transform and Load)

The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used in both Windows and Unix based systems.

PowerCenter can read from a variety of different sources and write to as many targets, while transforming data in between.

The main advantages of PowerCenter over other ETL tools, and hence a reason for its popularity over other such tools are as follows:

» It is robust, and can be used in both windows and UNIX based systems» It is high performing yet very simple for developing, maintaining and administering

Introduction to PowerCenter

Page 15: ETL Using Informatica Power Center

Slide 15 www.edureka.co/informatica

The architecture of Informatica PowerCenter (version 9.x onwards) is based on the service Oriented Architecture (SOA) concept.

A service-oriented architecture (SOA) can be defined as a group of services, which communicate with each other. The process of communication involves either simple data passing or it could involve two or more services coordinating same activity.

Informatica 9.X represents a major change in the architecture of the product line.

Aim: Its main aim is to provide improved performance and high availability.

Approach: By reengineering the underlying architecture has been made even more services-based.

PowerCenter Architecture - SOA

Page 16: ETL Using Informatica Power Center

Slide 16 www.edureka.co/informatica

PowerCenter Architecture

Single Unified Architecture

Page 17: ETL Using Informatica Power Center

Slide 17 www.edureka.co/informatica

PowerCenter Architecture - Proven Scalability

Threaded Parallel Processing

Page 18: ETL Using Informatica Power Center

Slide 18 www.edureka.co/informatica

PowerCenter Architecture - Proven Scalability

Pipeline Parallel Processing

Page 19: ETL Using Informatica Power Center

Slide 19 www.edureka.co/informatica

Client Components of PowerCenter

PowerCenter Repository Manager

PowerCenter Designer

PowerCenter Workflow Manager

PowerCenter Workflow Monitor

PowerCenter Administration Console (browser based)

Page 20: ETL Using Informatica Power Center

Slide 20 www.edureka.co/informatica

The PowerCenter server components comprises of the following services:

Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables.

Integration service: The Integration service runs sessions and workflows.

SAP BW service: The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to extract data from, or load data into the SAP BW.

Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenterworkflows as services.

Server Components of PowerCenter

Page 21: ETL Using Informatica Power Center

Slide 21Slide 21Slide 21 www.edureka.co/informatica

ODBC

Targets

Native drivers/ODBC

Native drivers/ODBC

HTTPS

SOURCES

Native drives

TCP/IP

TCP/IP

ODBC

Power Center Client

Administrator

Security

Domain MetadataRepository

Native drives

TCP/IP

DOMAIN

RepositoryService

RepositoryService Process

Overall Architecture of PowerCenter

IntegrationService

Page 22: ETL Using Informatica Power Center

Slide 22 www.edureka.co/informatica

The salient features of a Domain are as follows:

» A Domain is a logical collection or set of nodes and services.» The PowerCenter Domain is the fundamental administrative unit of PowerCenter.» A Domain can be a single PowerCenter installation, or it can consist of multiple PowerCenter installations.

The salient features of a node are as follows:

» A node is a logical representation of a physical machine. It has physical attributes such as a hostname and a port number.

» Each node runs a service manager which is responsible for the application and core services.» A node can be a gateway node or a worker node, but it can belong to only one Domain.

Informatica - Domain & Nodes

Page 23: ETL Using Informatica Power Center

Slide 23 www.edureka.co/informatica

A service can be described as follows:

A service is a resource that provides specialized functions. All PowerCenter processes run as services on a node.

PowerCenter has two types of services:

» Application Services represent server based functions including Repository and Integration Services.

» Core Services represent functions that manage and maintain the environment in which PowerCenter operates and include services like Log Service, Licensing Service, and Domain Service amongst many others.

Informatica- Services

Page 24: ETL Using Informatica Power Center

Slide 24 www.edureka.co/informatica

Component-based development is a technique where predefined components or functional units, or both, with specific functionalities are used to assemble the final product.

PowerCenter follows the component-based development methodologies by allowing to build a data flow from a source to the target, using different components (called transformations) and linking them to each other as required.

Component Based Development Techniques

Page 25: ETL Using Informatica Power Center

Slide 25 www.edureka.co/informatica

The advantages of a component-based development model are as listed below:

As the functional units are already built, the developer need not build them from scratch and can instead use them directly. Apart from making the entire process easier, this reduces the development time as well.

This approach makes bug-fixing easier as well as aid in various maintenance activities, with only the malfunctioned components needing to be fixed.

Reusability is also another factor that works in the favor of a component-based development model

Component Based Development Techniques

Page 26: ETL Using Informatica Power Center

Slide 26 www.edureka.co/informatica

Transformation is the process in ETL where we actually apply the business rules in the data flow

It is here that the data cleansing and formatting activities are actually performed along with data validation, which is one of its main functionalities

In PowerCenter, transformations are the functional components

In order to meet all kinds of requirements, a wide range of transformations are available within Informatica

The hierarchy goes in this way

» Transformation» Mapping» Sessions» Workflow

Transformation -> Mapping -> Session -> Workflow

Page 27: ETL Using Informatica Power Center

Slide 27 www.edureka.co/informatica

Informatica PowerCenter is the premium data integration solution available today

"Database neutral” - will communicate with any database

Powerful data transformations convert one application’s data to another’s format

Informatica PowerCenter – DI Solution

Manufacturing(DB2)

Sales (SalesForce)

Billing (Sybase)

Resource Planning (PSFT)

Inventory(SQL Server)

Marketing (ORCL)

Accounting (upgraded)

Informatica PowerCenter

Page 28: ETL Using Informatica Power Center

Slide 28 www.edureka.co/informatica

A company purchases a new accounts payable application

PowerCenter can move the existing account data to the new application

» Preserves data lineage for tax, accounting, and other legally mandated purposes

Data Migration

InformaticaPowerCenter

Accounting (Old)

Accounting (New)

Page 29: ETL Using Informatica Power Center

Slide 29 www.edureka.co/informatica

Company A purchases Company B

To achieve the benefits of consolidation, Company B’s billing system must be integrated into Company A’s billing system

Application Integration

InformaticaPowerCenterBilling A Billing B

Page 30: ETL Using Informatica Power Center

Slide 30 www.edureka.co/informatica

Data Warehousing

Data warehouses put information from many sources together for analysis

Data is moved from many databases to the Data warehouse

Inventory(SQL Server)

InformaticaPowerCenter

Marketing (ORCL)

Accounting (upgraded)

Manufacturing(DB2)

Resource Planning (PSFT)

Sales (SalesForce)

Billing (Sybase)

Data warehouse

Page 31: ETL Using Informatica Power Center

Slide 31 www.edureka.co/informatica

Middleware

Informatica can connect variety of sources, including the most of the Application Sources

SAP certified Data Integration tool

Can pull and push data into SAP R3, SAP BW systems

Have connectivity adapter for majority of the Application Sources

Can be used as Middleware between two Applications like SAP R3, SAP BW etc.

Page 32: ETL Using Informatica Power Center

Slide 32 www.edureka.co/informatica

Some Unique Features of Informatica

Single Administration console to Administer all the application services

Unified Users, Groups, Privileges and Roles admin across PC AE Tools

Single Sign on for all the client tool - Once you login to one client tool, others are automatically logged in

In built version control

Grid and High availability

In built scheduling tool

Page 33: ETL Using Informatica Power Center

Slide 33 www.edureka.co/informatica

Loading Product Dimension table using Slowly changing dimension (SCD)

Populate Sales summary table using Incremental Aggregation

Demonstrating Informatica PowerCenter Partitioning capability

Use Cases

Page 34: ETL Using Informatica Power Center

Slide 34 www.edureka.co/informatica

Fresher » Data Warehouse Developer» ETL developer

Mid Level» Data Specialist» Sr. ETL Developer» Informatica Designer» Informatica Administrator

Senior Level» ETL Architect» Informatica Architect» Technical Manager

Job Trends

Informatica - Role Wise Comparison

Page 35: ETL Using Informatica Power Center

Slide 35 www.edureka.co/informatica

Job Trends

Informatica Skill Requirements

Page 36: ETL Using Informatica Power Center

Slide 36 www.edureka.co/informatica

Job Trends

Informatica Other Skill Requirements

Page 37: ETL Using Informatica Power Center

Slide 37 www.edureka.co/informatica

Informatica PowerCenter Basic

Informatica PowerCenter Advanced Transformations

Informatica PowerCenter Installation and Configuration

Informatica PowerCenter Administration and Operation Basics

PowerCenter Troubleshooting & Performance Tuning

Best Practices and Methodology

Ample amount of Lab to be followed after each module

Scope of This Course

Page 38: ETL Using Informatica Power Center

Slide 38 www.edureka.co/informatica

Module 9

» Performance Tuning & Optimization

Module 10

» PowerCenter Repository Manager

Module 11

» Informatica Administration Console & Security

Module 12

» Informatica 9.X - Technical Architecture

Module 13

» Informatica Installation & Operations Manual

Module 14

» Command line utilities

Module 15

» ETL Scenarios using Informatica

Module 16

» Best Practises & Velocity Methodologies

Module 1

» Informatica PowerCenter 9.X – An overview

Module 2

» ETL Fundamentals

Module 3

» PowerCenter Designer

Module 4

» PowerCenter Workflow Manager & Monitor

Module 5

» Advanced Transformation Techniques

Module 6

» Parameters & Variables

Module 7

» Debugging Troubleshooting Error Handling & Recovery

Module 8

» Cache

Course Topics

Page 39: ETL Using Informatica Power Center

Slide 39

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

www.edureka.co/informatica

How it Works

Page 40: ETL Using Informatica Power Center

Slide 40