Upload
mrsudhanshu
View
9
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Informatica
Citation preview
1
Data Integration
• Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations both commercial (when two similar companies need to merge their database) and scientific (combining research results from different bioinformatics repositories, for example).
• Data integration appears with increasing frequency as the volume and the need to share existing data explodes It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII).
2
How to enable Data Integration
USING ETL PROCESS
3
ETL ( Extract Transform Load)
• ETL stands for extract, transform and load, the processes that enable companies to move data from multiple sources, reformat and cleanse it, and load it into another database, a data mart or a data warehouse for analysis, or on another operational system to support a business process
4
ETL ( Extract Transform Load)
“A Properly designed ETL system extracts data from the source systems, enforces data quality and consistency standards, conforms data so that separate sources can be used together, and finally delivers data in a presentation-ready format so that application developers can build applications and end users can make decisions… ETL makes or breaks the data warehouse…” Ralph Kimball
5
ETL ( Extract Transform Load)
6
ETL ( Extract Transform Load)
7
ETL – Process Flow
8
ETL – Process Flow
9
ETL Glossary
• Source SystemA database, application, file, or other storage facility from which the data in a data warehouse is derived.
• MappingThe definition of the relationship and data flow between source and target objects.
• MetadataData that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
• Staging AreaA place where data is processed before entering the warehouse
10
ETL Glossary
• CleansingThe process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
• TransformationThe process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
• TransportationThe process of moving copied or transformed data from a source to a data warehouse.
• Target SystemA database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
11
ETL Tools
12
Informatica 8.6 – What & How to work?
• What is Informatica 8.6?
– Informatica is an ETL tool that delivers an open, scalable data integration solution addressing the complete life cycle for data warehouse and analytic application development.
– Informatica provides an environment that can extract data from multiple sources, transform the data according to the business logic that is built in the Informatica Client application and load the transformed data into files or relational targets.
13
Informatica 8.6– PowerCenter
PowerCenter provides an environment that allows you to load data into a centralized location, such as a data warehouse or operational data store (ODS). You can extract data from multiple sources, transform the data according to business logic you build in the client application, and load the transformed data into file and relational targets.
14
Informatica Architecture 8.6
15
Informatica Architecture 8.6- Data Flow
16
Informatica Architecture 8.6- Components
17
PowerCenter - Components
18
PowerCenter - Components
19
Informatica – PowerCenter Domain
20
PowerCenter - Domain
21
PowerCenter – Admin Console
22
PowerCenter – Application Services
23
Informatica-Power Center Repository Service
24
Informatica-Power Center Integration Service
25
PowerCenter – Client Components
The Informatica Client is used to manage users, define sources and targets, building
mappings and mapplets with the transformation logic, and create sessions to run the
mapping logic.
The Informatica Client has the following main applications:Repository ManagerDesignerWorkflow ManagerWorkflow Monitor
26
PowerCenter – Repository
27
PowerCenter – Client Components