38
Business Case: Why do we need ETL Tools? Think of GE, the company has over 100+ years of history & presence in almost all the industries. Over these years company’s management style has been changed from book keeping to SAP. This transition was not a single day transition. In transition, from book keeping to SAP, they used a wide array of technologies, ranging from mainframes to PCs, data storage ranging from flat files to relational databases, programming languages ranging from Cobol to Java. This transformation resulted into different businesses, or to be precise different sub businesses within a business, running different applications, different hardware and different architecture. Technologies are introduced as and when invented & as and when required. This directly resulted into the scenario, like HR department of the company running on Oracle Applications, Finance running SAP, some part of process chain supported by mainframes, some data stored on Oracle, some data on mainframes, some data in VSM files & the list goes on. If one day company requires a consolidated reports of assets, there are two ways. First completely manual, generate different reports from different systems and integrate them. Second fetch all the data from different systems/applications, make a Data Warehouse, and generate reports as per the requirement. Obviously second approach is going to be the best. Now to fetch the data from different systems, making it coherent, and loading into a Data Warehouse requires some kind of extraction, cleansing, integration, and load. ETL stands for Extraction, Transformation & Load. ETL Tools provide facility to Extract data from different non-coherent systems, cleanse it, merge it and load into target systems. Informatica: What is it all about? What is Informatica?

Informatica Material Beginner

Embed Size (px)

Citation preview

Page 1: Informatica Material Beginner

Business Case: Why do we need ETL Tools?

Think of GE, the company has over 100+ years of history & presence in almost all the industries. Over these years company’s management style has been changed from book keeping to SAP. This transition was not a single day transition. In transition, from book keeping to SAP, they used a wide array of technologies, ranging from mainframes to PCs, data storage ranging from flat files to relational databases, programming languages ranging from Cobol to Java. This transformation resulted into different businesses, or to be precise different sub businesses within a business, running different applications, different hardware and different architecture. Technologies are introduced as and when invented & as and when required.

This directly resulted into the scenario, like HR department of the company running on Oracle Applications, Finance running SAP, some part of process chain supported by mainframes, some data stored on Oracle, some data on mainframes, some data in VSM files & the list goes on. If one day company requires a consolidated reports of assets, there are two ways.

First completely manual, generate

different reports from different

systems and integrate them.

Second fetch all the data from

different systems/applications, make

a Data Warehouse, and generate

reports as per the requirement.

Obviously second approach is going to be the best.Now to fetch the data from different systems, making it coherent, and loading into a Data Warehouse requires some kind of extraction, cleansing, integration, and load. ETL stands for Extraction, Transformation & Load.

ETL Tools provide facility to Extract data from different non-coherent systems, cleanse it, merge it and load into target systems.

Informatica: What is it all about?

What is Informatica?

Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now a days Informatica is also being used as an Integration tool.

Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load. These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases.

Page 2: Informatica Material Beginner

Informatica can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc), can move/transform data between them. It can move huge volumes of data in a very effective way, many a times better than even bespoke programs written for specific data movement only. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively join data from two distinct data sources (even a xml file can be joined with a relational table). In all, Informatica has got the ability to effectively integrate heterogeneous data sources & converting raw data into useful information.

Before we start actually working in Informatica, let’s have an idea about the company owning this wonderful product.

Some facts and figures about Informatica Corporation:

Founded in 1993, based in Redwood City,

California

1400+ Employees; 3450 + Customers; 79 of

the Fortune 100 Companies

NASDAQ Stock Symbol: INFA; Stock Price:

$18.74 (09/04/2009)

Revenues in fiscal year 2008: $455.7M

Informatica Developer Networks : 20000

Members

In short, Informatica is worlds leading ETL tool & its rapidly acquiring market as an Enterprise Integration Platform.

Informatica PowerCenter Components

Informatica PowerCenter is not just a tool but an end-to-end data processing and data integration environment. It facilitates organizations to collect, centrally process and redistribute data. It can be used just to integrate two different systems like SAP and MQ Series or to load data warehouses or Operational Data Stores (ODS).

Now Informatica PowerCenter also includes many add-on tools to report the data being processed, business rules applied and quality of data before and after processing.

To facilitate this PowerCenter is divided into different components:

PowerCenter Domain: As Informatica says “The Power Center domain is the

primary unit for management and administration within PowerCenter”. Doesn’t

make much sense? Right... So here is a simpler version. Power Center domain is

Page 3: Informatica Material Beginner

the collection of all the servers required to support Power Center functionality.

Each domain has gateway (called domain server) hosts. Whenever you want to

use Power Center services you send a request to domain server. Based on request

type it redirects your request to one of the Power Center services.

PowerCenter Repository: Repository is nothing but a relational database which

stores all the metadata created in Power Center. Whenever you develop mapping,

session, workflow, execute them or do anything meaningful (literally), entries are

made in the repository.

Integration Service: Integration Service does all the real job. It extracts data from

sources, processes it as per the business logic and loads data to targets.

Repository Service: Repository Service is the one that understands content of the

repository, fetches data from the repository and sends it back to the requesting

components (mostly client tools and integration service)

PowerCenter Client Tools: The PowerCenter Client consists of multiple tools. They

are used to manage users, define sources and targets, build mappings and

mapplets with the transformation logic, and create workflows to run the mapping

logic. The PowerCenter Client connects to the repository through the Repository

Service to fetch details. It connects to the Integration Service to start workflows.

So essentially client tools are used to code and give instructions to PowerCenter

servers.

PowerCenter Administration Console: This is simply a web-based administration

tool you can use to administer the PowerCenter installation.

There are some more not-so-essential-to-know components discussed below:

Web Services Hub: Web Services Hub exposes PowerCenter functionality to

external clients through web services.

Page 4: Informatica Material Beginner

SAP BW Service: The SAP BW Service extracts data from and loads data to SAP

BW.

Data Analyzer: Data Analyzer is like a reporting layer to perform analytics on data

warehouse or ODS data.

Metadata Manager: Metadata Manager is a metadata management tool that you

can use to browse and analyze metadata from disparate metadata repositories. It

shows how the data is acquired, what business rules are applied and where data

is populated in readable reports.

PowerCenter Repository Reports: PowerCenter Repository Reports are a set of

prepackaged Data Analyzer reports and dashboards to help you analyze and

manage PowerCenter metadata.

Informatica: Architecture Made Easy

Informatica Software Architecture illustrated

Informatica ETL product, known as Informatica Power Center consists of 3 main components.

1. Informatica PowerCenter Client Tools:

These are the development tools installed at developer end. These tools enable a developer to

Define transformation process, known as mapping. (Designer)

Define run-time properties for a mapping, known as sessions (Workflow

Manager)

Monitor execution of sessions (Workflow Monitor)

Manage repository, useful for administrators (Repository Manager)

Report Metadata (Metadata Reporter)

2. Informatica PowerCenter Repository:

Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept. This is the place where all the metadata for your application is stored. All the client tools and Informatica Server fetch data from Repository. Informatica client and server without repository is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process. This can be treated as backend of Informatica.

3. Informatica PowerCenter Server:

Server is the place, where all the executions take place. Server makes physical connections to sources/targets, fetches data, applies the

Page 5: Informatica Material Beginner

transformations mentioned in the mapping and loads the data in the target system.

This architecture is visually explained in diagram below:

Sources

Standard: RDBMS, Flat Files, XML, ODBC

Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2

EAI: MQ Series, Tibco, JMS, Web Services

Legacy: Mainframes (DB2, VSAM, IMS, IDMS, Adabas)AS400 (DB2, Flat File)

Remote Sources

Targets

Standard: RDBMS, Flat Files, XML, ODBC

Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2

EAI: MQ Series, Tibco, JMS, Web Services

Legacy: Mainframes (DB2)AS400 (DB2)

Remote Targets

This is the sufficient knowledge to start with Informatica. So lets go straight to development in Informatica.

Page 6: Informatica Material Beginner

Informatica Product Line

Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL softwares.

The important products provided by Informatica Corporation is provided below:

Power Center

Power Mart

Power Exchange

Power Center Connect

Power Channel

Metadata Exchange

Power Analyzer

Super Glue

Power Center & Power Mart: Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless environment.

Power Exchange: Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server, db2 etc) and flat files in unix, linux and windows systems.

Power Center Connect: This is add on to Informatica Power Center. It helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.

Power Channel: This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer files over FTP, etc.

Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have

Page 7: Informatica Material Beginner

spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc). By using meta deta exchange, these data structures can be imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for informatica developer to create these data structures once again.

Power Analyzer: Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes and develop business intelligence.

With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model.

Super Glue: Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze meta data.

Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on Informatica in near future. Please visit us soon to check back. To know more about Informatica, contact its official website www.informatica.com

Page 8: Informatica Material Beginner

Informatica Transformations

A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.

Transformations can be of two types:

Active Transformation

An active transformation can change the number of rows that pass through the transformation, change the transaction boundary, can change the row type. For example, Filter, Transaction Control and Update Strategy are active transformations.

The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group because the Integration Service may not be able to concatenate the rows passed by active transformations However, Sequence Generator transformation(SGT) is an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a result, the Integration Service does not encounter problems concatenating rows passed by a SGT and an active transformation.

Passive Transformation.

A passive transformation does not change the number of rows that pass through it, maintains the transaction boundary, and maintains the row type.

The key point is to note that Designer allows you to connect multiple transformations to the same downstream transformation or transformation input group only if all transformations in the upstream branches are passive. The transformation that originates the branch can be active or passive.

Page 9: Informatica Material Beginner

Transformations can be Connected or UnConnected to the data flow.

Connected TransformationConnected transformation is connected to other transformations or directly to target table in the mapping.

UnConnected Transformation

An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Informatica Transformations

Following are the list of Transformations available in Informatica:

Aggregator Transformation

Application Source Qualifier Transformation

Custom Transformation

Data Masking Transformation

Expression Transformation

External Procedure Transformation

Filter Transformation

HTTP Transformation

Input Transformation

Java Transformation

Joiner Transformation

Lookup Transformation

Normalizer Transformation

Output Transformation

Rank Transformation

Reusable Transformation

Router Transformation

Sequence Generator Transformation

Sorter Transformation

Source Qualifier Transformation

SQL Transformation

Stored Procedure Transformation

Transaction Control Transaction

Union Transformation

Unstructured Data Transformation

Update Strategy Transformation

XML Generator Transformation

XML Parser Transformation

XML Source Qualifier Transformation

Advanced External Procedure Transformation

Page 10: Informatica Material Beginner

External Transformation

In the following pages, we will explain all the above Informatica Transformations and their significances in the ETL process in detail.

Informatica Transformations

Aggregator Transformation

Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or groups. The Integration Service performs these calculations as it reads and stores data group and row data in an aggregate cache. It is an Active & Connected transformation.

Difference b/w Aggregator and Expression Transformation? Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups.

Aggregator transformation has following ports State, State_Count, Previous_State and State_Counter.

Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.

Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses and non-aggregate functions. can also include one aggregate function nested into another aggregate function.

Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE

Application Source Qualifier Transformation

Represents the rows that the Integration Service reads from an application, such as an ERP source, when it runs a session.It is an Active & Connected transformation.

Custom Transformation

It works with procedures you create outside the designer interface to extend PowerCenter functionality. calls a procedure from a shared library or DLL. It is active/passive & connected type.

You can use CT to create T. that require multiple input groups and multiple output groups.

Custom transformation allows you to develop the transformation logic in a procedure. Some of the PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom transformations, such as blocking rules, also apply to transformations built using Custom transformations. PowerCenter provides two sets of functions called generated and API functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API functions in the procedure code to develop the transformation logic.

Page 11: Informatica Material Beginner

Difference between Custom and External Procedure Transformation? In Custom T, input and output functions occur separately.The Integration Service passes the input data to the procedure using an input function. The output function is a separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the External Procedure transformation, an external procedure function does both input and output, and its parameters consist of all the ports of the transformation.

Data Masking Transformation

Passive & Connected. It is used to change sensitive production data to realistic test data for non production environments. It creates masked data for development, testing, training and data mining. Data relationship and referential integrity are maintained in the masked data.For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdate, phone number, etc. But is not a valid value. Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format. Default is no masking.

Expression Transformation

Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row. Example: to calculate discount of each product or to concatenate first and last names or to convert date to a string field.

You can create an Expression transformation in the Transformation Developer or the Mapping Designer. Components: Transformation, Ports, Properties, Metadata Extensions.

External Procedure

Passive & Connected or Unconnected. It works with procedures you create outside of the Designer interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an experienced programmer to use TX and use multi-threaded code in external procedures.

Filter Transformation

Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do not meet the condition. For example, to find all the employees who are working in NewYork or to find out all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single transformation. You cannot concatenate ports from more than one transformation into the Filter transformation. Components: Transformation, Ports, Properties, Metadata Extensions.

HTTP Transformation

Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping.Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.

Page 12: Informatica Material Beginner

Java Transformation

Active or Passive & Connected. It provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment.

Joiner Transformation

Active & Connected. It is used to join data from two related heterogeneous sources residing in different locations or to join data from the same source. In order to join two sources, there must be at least one or more pairs of matching column between the sources and a must to specify one source as master and the other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source.The Joiner transformation supports the following types of joins:

Normal

Normal join discards all the rows of data from the master and detail source that

do not match, based on the condition.

Master Outer

Master outer join discards all the unmatched rows from the master source and

keeps all the rows from the detail source and the matching rows from the master

source.

Detail Outer

Detail outer join keeps all rows of data from the master source and the matching

rows from the detail source. It discards the unmatched rows from the detail

source.

Full Outer

Full outer join keeps all rows of data from both the master and detail sources.

Limitations on the pipelines you connect to the Joiner transformation:*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation.

Lookup Transformation

Passive & Connected or UnConnected. It is used to look up data in a flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations. You can create a lookup definition from a source qualifier and can also use multiple Lookup transformations in a mapping.

Page 13: Informatica Material Beginner

You can perform the following tasks with a Lookup transformation:*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table.*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target.*Update slowly changing dimension tables. Determine whether rows exist in a target.

Lookup Components: Lookup source, Ports, Properties, Condition.Types of Lookup:1) Relational or flat file lookup.2) Pipeline lookup.3) Cached or uncached lookup.4) connected or unconnected lookup.

Informatica Transformations

Normalizer Transformation

Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row and returns a row for each instance of the multiple-occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format.

You can create following Normalizer transformation:*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe.*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. This is default when you create a normalizer transformation.

Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

Rank Transformation

Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions.

Router Transformation

Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition and route it to a default output group.

Page 14: Informatica Material Beginner

If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient.

Sequence Generator Transformation

Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing primary keys.

It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value. You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target.

For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration Service does not cache values during the session.For non-reusable Sequence Generator transformations, setting Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session.

For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize discarded values, however it must be greater than one. When you reduce the Number of Cached Values, you might increase the number of times the Integration Service accesses the repository to cache values during the session.

Sorter Transformation

Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order.

Source Qualifier Transformation

Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted ports.It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for the Integration Service to read source data

SQL Transformation

Page 15: Informatica Material Beginner

Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors.

Stored Procedure Transformation

Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements.

Transaction Control Transformation

Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction Control transformation. Transaction control can be defined within a mapping or within a session.Components: Transformation, Ports, Properties, Metadata Extensions.

Union Transformation

Active & Connected. The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.Rules1) You can create multiple input groups, but only one output group.2) All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation.4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.5) The Union transformation does not generate transactions.Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.

Unstructured Data Transformation

Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT.Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.

Update Strategy Transformation

Page 16: Informatica Material Beginner

Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. It flags rows for insert, update, delete or reject within a mapping.

XML Generator Transformation

Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.

XML Parser Transformation

Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline.

XML Source Qualifier Transformation

Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every column in the XML source.

External Procedure Transformation

Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.

Advanced External Procedure Transformation

Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.

Why do we need ETL Tools?

Informatica: What is it all

about?

Informatica PowerCenter

Components

Informatica: Architecture

Made Easy

Informatica Product Line

Page 17: Informatica Material Beginner

Informatica Transformations

Webinar: The 38 Subsystems for ETL by Dr Ralph Kimball

Interesting whitepaper: Alternative of ETL, E-LT

Informatica Certification Information

If you have wide and good hands-on experience in Informatica and you want to grow in the field of data integration area, then Informatica Certification will help you achieve this. There are variety of certifications available from Informatica.

Various Certifications from Informatica

(1) Informatica Certified Administrator

For PowerCenter administrators, testers and project managers Requirements - attain a passing score (70 percent or higher) on two exams:

Architechture and Administration

Advanced Administration

(2) Informatica Certified Designer

For PowerCenter Tranformation, mapping and mapplet developers Requirements - attain a passing score on three exams:

Architecture and Administration

Mapping Design

Advanced Mapping Design

(3) Informatica Certified Consultant

For PowerCenter experts Requirements: attain a passing score on five exams:

Architecture and Administration

Mapping Design

Advanced Administration

Advanced Mapping Design

Enablement Technologies

Page 18: Informatica Material Beginner

There are various tracks available to go with for the above certification depending on the version of PowerCenter you are using.

PowerCenter 5

PowerCenter 6

PowerCenter 7

However, candidates who earn titles in PowerCenter 5 can upgrade the certification to PowerCenter 6 by appearing in one PowerCenter 6 upgrade examination. Similarly the title in PwerCenter6 can be upgraded to 7.

Following is the list of exams available under the above mentioned tracks.

PowerCenter 5 TrackExam A: PowerCenter 5 Architecture and AdministrationExam B: PowerCenter 5 Mapping DesignExam C: PowerCenter 5 Advanced AdministrationExam D: PowerCenter 5 Advanced Mapping DesignExam E: Enablement Technologies

PowerCenter 6 TrackExam G: PowerCenter 6 Architecture and AdministrationExam H: PowerCenter 6 Mapping DesignExam J: PowerCenter 6 Advanced AdministrationExam I: PowerCenter 6 Advanced Mapping DesignExam E: Enablement Technologies

PowerCenter 7 TrackExam M: PowerCenter 7 Architecture and AdministrationClick to check syllabus

Exam N: PowerCenter 7 Mapping DesignClick to check syllabus

Exam O: PowerCenter 7 Advanced AdministrationExam P: PowerCenter 7 Advanced Mapping DesignClick to check syllabus

Exam E: Enablement Technologies

Update ExamsExam F: PowerCenter 6 UpdateExam L: PowerCenter 7 UpdateExam Q: PowerCenter 8 Update

Some Important Links :

Search Informatica Database for the list of Certified professionals

List of Certifications available from Informatica

Webinar: The 38 Subsystems for ETL by Dr Ralph Kimball

Interesting whitepaper: Alternative of ETL, E-LT

Page 19: Informatica Material Beginner

Data Warehousing and Business Intelligence

In this section of techtiks, you will find detailed information and must-read articles related to Data Warehousing and Business Intelligence. The target audience ranges from beginners to experts. These articles are written by highly qualified Data Warehouse Engineers.

Whenever you see or hear the words "Data Warehouse" you might think of some large building that has bits of information stored on shelves waiting for someone to retrieve them, perhaps? Let's think of some traditional warehouse, what it contains? It contains some goods stored in such a way that they are easy to identify and they can be quickly retrieved. A Data Warehouse also functions in the similar way. Then how the Data Warehouse is different from traditional relational databases? Relational database is similar to Data Warehouse but there are certain defining differences. You will see the differences in following articles.

Datawarehouse Defined

Elements of Data Warehouse

History of Data Warehousing

Dimensional Modeling

Data Warehouse Definition and Introduction

Consider an example where business analyst uses the systems containing operational data (the data that runs the daily transactions of your business). Analysts can use information about, which products were sold in which regions at what time of the year, to look for anomalies or to project future sales. However, there are several problems if analyst accesses operational data directly:

He might not have the expertise to query the operational database. For example,

querying IMS databases requires an application program that uses a specialized

type of data manipulation language. In general, those programmers who have the

expertise to query the operational database have a full-time job in maintaining

the database and its applications.

Performance is critical for many operational databases, such as databases for a

bank. The system cannot handle users making ad-hoc queries.

The operational data generally is not in the best format to be used for reporting

queries

Data warehousing solves these problems. In data warehousing, you create stores of informational data data that is extracted from the operational data and then transformed for reporting and decision making. For example, a data warehousing tool might copy all

Page 20: Informatica Material Beginner

the sales data from the operational database, perform calculations to summarize the data, and write it to a new database. End-users can query the new database (the warehouse) without impacting the operational databases.

To summarize

The purpose of data warehouse is to store data consistently across the

organization and to make the organizational information accessible.

It is adaptive and resilient source of information. When new data is added to the

Data Warehouse, the existing data and technologies are not disrupted. The design

of separate data marts that make up the data warehouse must be distributed and

incremental. Anything else is a compromise.

The data warehouse not only controls the access to the data, but gives its owners

great visibility into the uses and abuses of the data, even after it has left the data

warehouse.

Data warehouse is the foundation for decision-making.

Elements of Data Warehouse

Source Systems

Typically in any organization the data is stored in various databases, usually divided up by the systems. There may be data for marketing, sales, payroll, engineering, etc. These systems might be legacy/mainframe systems or relational database systems.

Staging Area

The data coming from various source systems is first kept in a staging area. The staging area is used to clean, transform, combine, de-duplicate, household, archive, and to prepare source data for use in data warehouse. The data coming from source system is kept as it is in this area. This need not be based on relational terminology. Sometimes managers of the data are comfortable with normalized set of data. In these cases, normalized structure of the data staging storage is certainly acceptable. Also, staging area doesnt provide querying/presentation services.

Presentation Server

Once the data is in staging area, it is cleansed, transformed and then sent to Data warehouse. You may or may not have ODS before transferring data to Data Warehouse.

OLAP

The data in Data Warehouse has to be easily manipulated in order to answer the business questions from management and other users. This is accomplished by connecting the data to fast and easy-to-use tools known as Online Analytical Processing (OLAP) tools. OLAP tools can be thought of as super high-speed forklifts that have knowledge of the warehouse and the operators built into them in order to allow ordinary

Page 21: Informatica Material Beginner

people off the street to jump in and quickly find products by asking English-like questions. Within the OLAP server, data is reorganized to meet the reporting and analysis requirements of the business, including:

Exception reporting

Ad-hoc analysis

Actual vs. budget reporting

Data mining (looking for trends or anomalies in the data)

In order to process business queries at high speed, answers to common questions are preprocessed in some OLAP servers, resulting in exceptional query responses at the cost of having an OLAP database that may be several times bigger than the data warehouse itself.

Data Mart

Data mart is a logical subset of complete data warehouse. It is often viewed as the

restriction of data warehouse to a single business process or to a group of related

business processes targeted toward a particular business group. For example an

organization may have a data mart for Sales or Inventory.

Data Warehouse History

Bill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." He has 35 years of experience in database technology management and data warehouse design. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association and many industry conferences, seminars, and tradeshows.

As an author, Bill has written about a variety of topics on the building, usage, and maintenance of the data warehouse and the Corporate Information Factory. He has written more than 650 articles, many of them have been published in major computer journals such as Datamation, ComputerWorld, and Byte Magazine. Bill is currently a columnist with Data Management Review, and has been since its inception. He has published 45 books; one sold over half a million copies, 21 have been book club selections with publishers such as Prentice-Hall, John Wiley, and QED. Translations of various books have been done in Chinese, Dutch, French, German, Japanese, Korean, Portuguese, Russian, and Spanish.

Page 22: Informatica Material Beginner

Ralph Kimball Biography

Ralph Kimball is known worldwide as an innovator, writer, educator, speaker and consultant in the field of data warehousing. He has remained steadfast in his long-term conviction that data warehouses must be designed to be understandable and fast. His books on dimensional design techniques have become the all time best sellers in data warehousing. To date Ralph has written more than 100 articles and columns for Intelligent Enterprise and its predecessors, winning the Readers Choice Award five years in a row.

After receiving a Ph.D. in 1972 from Stanford in electrical engineering (specializing in man-machine systems), Ralph joined the Xerox Palo Alto Research Center (PARC). At PARC Ralph co-invented the Xerox Star Workstation, the first commercial product to use mice, icons and windows.

Ralph then became vice president of applications at Metaphor Computer Systems, pioneering decision support software and services provider. As a hands-on manager, he developed the Capsule Facility in 1982. The Capsule was a graphical programming technique which connected icons together in a logical flow, allowing a very visual style of programming for non-programmers. The Capsule was used to build reporting and analysis applications at Metaphor.

Ralph founded Red Brick Systems in 1986, serving as CEO until 1992. Red Brick Systems, now owned by IBM, was known for its lightning fast relational database optimized for data warehousing. Ralph Kimball Associates incorporated in 1992 to provide data warehouse consulting and education.

Ralph Kimball Vs. Bill Inmon's Paradigm of Data Warehouse

In data warehousing field, we often hear about discussion on whether a person/organization’s philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. Below is the difference between two philosophies:

Bill Inmon's paradigm

Page 23: Informatica Material Beginner

Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

Ralph Kimball's paradigm

Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises is closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse.

Dimensional Modeling

Quick Reference Guide to Dimensional Modeling

Dimensional modeling is the design concept used by many data warehouse designers to build their data warehouse. Dimensional model is the underlying data model used by many of the commercial OLAP products available today in the market. Designing a data warehouse is very different from designing an online transaction processing (OLTP) system. In contrast to an OLTP system in which the purpose is to capture high rates of data changes and additions, the purpose of a data warehouse is to organize large amounts of stable data for ease of analysis and retrieval. Because of these differing purposes, there are many considerations in data warehouse design that differ from OLTP database design. In dimensional model, all data is contained in two types of tables called Fact Table and Dimension Table.

Fact Table

Each data warehouse or data mart includes one or more fact tables. The fact table captures the data that measures the organizations business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows, sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization. A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. Each fact table also includes a multipart index that contains as foreign keys the primary keys of related dimension tables, which contain the attributes of the fact records. Fact tables should not contain descriptive information or any data other than the numerical measurement fields and the index fields that relate the facts to corresponding entries in the dimension tables. An example of fact table is Sales_Fact table that might contain the information like sale_amount, unit_price, discount, etc.

Dimension Table

Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. For example, a dimension containing product information would often contain a hierarchy that separates

Page 24: Informatica Material Beginner

products into categories such as food, drink, and non-consumable items, with each of these categories further subdivided a number of times until the individual product is reached at the lowest level.

Dimensional modeling produces dimension tables in which each table contains fact attributes that are independent of those in other dimensions. For example, a customer dimension table contains data about customers, a product dimension table contains information about products, and a store dimension table contains information about stores. Queries use attributes in dimensions to specify a view into the fact information. For example, a query might use the product, store, and time dimensions to ask the question "What was the cost of non-consumable goods sold in the northeast region in 1999?" Subsequent queries might drill down along one or more dimensions to examine more detailed data, such as "What was the cost of kitchen products in New York City in the third quarter of 1999?" In these examples, the dimension tables are used to specify how a measure (sale_amount) in the fact table is to be summarized.

Consider an example of Sales_Fact table and the various attributes that describe this fact are Store, Product, Time and say Sales Person. In this case we will have four dimension tables, viz. Store_Dimension, Product_Dimension, Time_Dimension and Sales_Person_Dimension.

Figure 1

You may notice that all of these dimensions contain a Key field. This is called Surrogate Key. This key is substitute for a natural key in dimensions (e.g., in Sales_Person_Dimension, we have natural key as ID). In a data warehouse a surrogate key is a generalization of the natural production key and is one of the basic elements of data warehouse.

As a fact table is described by the four dimension tables described above, it will contain the Surrogate Keys of all these dimensions. This is how the Sales_Fact table will look like:

Figure 2

Now if you carefully look at the structure of above tables and how they are linked the

Page 25: Informatica Material Beginner

schema will look like this:

Figure 3

You can easily tell that this looks like a STAR. Hence its known as Star Schema.

Advantages of having Star Schema

Star Schema is very easy to understand, even for non technical business

managers

Star Schema provides better performance and smaller query times

Star schema is easily extensible and will handle future changes easily

Slowly Changing Dimensions

Handling changes to dimensional data across time is the most important aspect in designing a data warehouse. In dimensional modeling, there is a very rare chance that a dimension will remain static over time. For example, a customer address may change; a company may phase out old products and introduce new products. What if a customer name changes, sales person changes his region of sale or a company assigns new sales territory. How to record the history or preserve the old version of history? Here comes the concept of Slowly Changing Dimensions. The term Slowly Changing Dimension is about variation in dimensional attributes over time. The word slowly, in this context, might seem incorrect. A sales person may change his territory rapidly. But in general, when compared to measures in fact table, the changes in dimensions occur slowly.

Page 26: Informatica Material Beginner

Types of Slowly Changing Dimensions

In reference to Figure 3 above, lets say a sales person changes his region of sale. We may handle this change in several ways. These methods fall in various categories based on companys need to preserve an accurate history of dimensional changes. Ralph Kimball categorized the dimensional changes into three categories

Type One: Changes that overwrite history

Type Two: Preserve history

Type Three: Preserve a version of history

Type One (Overwrite History)

A type one change overwrites existing dimensional attribute with new information. In Sales Person Region change example, the old region name will be overwritten by the new region. Say, a sales person Rob, has territory as ASIA.

Sales_Person_Dimension

Sales_Person_Key

ID Name Region ...

100 203234 Rob Doe ASIA ...Now, if he starts looking after NorthWest Region, by implementing Type 1 dimension, the dimension table will look like:

Sales_Person_Dimension

Sales_Person_Key

ID Name Region ...

100 203234 Rob Doe NorthWest ...Advantages:

This is the easiest way to handle the Slowly Changing Dimension problem, since

there is no need to keep track of the old information.

Disadvantages:

All history is lost. By applying this methodology, it is not possible to trace back in

history. For example, in this case, the company would not be able to know that

Christina lived in Illinois before.

Type Two (Preserve History)

A Type Two change writes a record with the new attribute information and preserves a record of the old dimensional data. Type Two changes let you preserve historical data. Implementing Type Two changes within a data warehouse might require significant analysis and development. Type Two changes accurately partition history across time more effectively than other types. However, because Type Two changes add records, they can significantly increase the database's size.

In our example, lets say we identify Region as Type Two attribute. This can be handled in this way using:

Page 27: Informatica Material Beginner

Sales_Person_Dimension

Sales_Person_Key

ID Name Region ...

100 203234 Rob Doe ASIA ...

153 203234 Rob Doe NorthWest ...Advantages:

This allows us to accurately keep all historical information.

Disadvantages:

This will cause the size of the table to grow fast. In cases where the number of

rows for the table is very high to start with, storage and performance can become

a concern.

This necessarily complicates the ETL process.

Type Three (Preserve a Version of History)

You usually implement Type Three changes only if you have a limited need to preserve and accurately describe history, such as when someone gets married and you need to retain the previous name. Instead of creating a new dimensional record to hold the attribute change, a Type Three change places a value for the change in the original dimensional record. You can create multiple fields to hold distinct values for separate points in time. In the case of a region change example, you could create an OLD_REGION and NEW_REGION field and a REGION_CHANGE_EFF_DATE field to record when the change occurs. This method preserves the change. But how would you handle a second name change, or a third, and so on? The side effects of this method are increased table size and, more important, increased complexity of the queries that analyze historical values from these old fields. After more than a couple of iterations, queries become impossibly complex, and ultimately you're constrained by the maximum number of attributes allowed on a table.

This is how the table will look like in Type Three change:

Sales_Person_Dimension

Sales_Person_Key

ID Name Old Region New Region ...

100 203234 Rob Doe ASIA NorthWest ...Advantages:

This does not increase the size of the table, since new information is updated.

This allows us to keep some part of history.

Disadvantages:

Type 3 will not be able to keep all history where an attribute is changed more

than once. For example, if Christina later moves to Texas on December 15, 2003,

the California information will be lost.

Because most business requirements include tracking changes over time, data warehouse architects commonly implement Type Two changes. A data warehouse might use Type Two changes for all attributes in all tables. As an alternative, you can implement a mix of Type One and Type Two changes at an attribute level by

Page 28: Informatica Material Beginner

implementing Type 2 changes for only attributes whose historical values are important when you're slicing and dicing. For example, users might not need to know an individual's previous name if a name change occurs, so a Type One change would suffice. Users might want the system to show only the person's current name. However, if the company reassigns sales territories, users might need to track who sold what, at what time, and in what territory, necessitating a Type Two change.

Although most data warehouses include Type Two changes, you need to seriously examine the business need to record historical data. Implementing Type Two changes might be necessary, but those changes will increase the database size, degrade performance, and lengthen the development time. You need to carefully evaluate using a Type Two implementation, a Type One implementation, or a hybrid implementation.

PS: The goal of techtiks.com is to be one stop shop for all the internet needs of today's technologists. To achieve this we are adding new articles every day. Do keep visiting!

Technologist Blog

In this BLOG we discuss topics related to technology or to be precise topics related to technologists. We talk about the topics which you will be most interested in while working in technology department of an organization. You can take this as General Knowledge for technologists.

The topics discussed here include, but not limited to, databases - DB2 and Oracle, UNIX, Autosys, Software Development Mehodologies, Software Testing, Industrywide Standards, Finance, Financial Crisis etc.

Error Handling in DB2 UDB Stored Procedures

Recently somebody asked us about how to ensure that when a DB2 UDB Stored Procedure fails Informatica workflow/session also fails. This is not a very tough task but becomes handier if you have the code template with you.

Here is a sample procedure which returns the output of division operation. Its quite self explanatory.

SET SCHEMA SCHEMA_NAME;

DROP PROCEDURE SCHEMA_NAME.DIVIDE_BY;

Page 29: Informatica Material Beginner

CREATE PROCEDURE SCHEMA_NAME.DIVIDE_BY (IN NUMBER1 INTEGER,IN NUMBER2 INTEGER,OUT V_RESULT INTEGER)LANGUAGE SQLSPECIFIC SCHEMA_NAME.DIVIDE_BYBEGIN DECLARE SQLSTATE CHAR(5) DEFAULT '00000';DECLARE O_SQLSTATE CHAR(5) DEFAULT '00000';

DECLARE V_PROC_DATE TIMESTAMP;DECLARE V_PROC_NAME VARCHAR(20);

DECLARE V_BATCH INTEGER;DECLARE V_MSG_TXT VARCHAR(500);-- DECLARE V_RESULT INTEGER;

DECLARE EXIT HANDLER FOR SQLEXCEPTIONBEGINGET DIAGNOSTICS EXCEPTION 1 V_MSG_TXT = MESSAGE_TEXT;

SELECT 'DIVIDE_BY', CURRENT TIMESTAMP, SQLSTATEINTO V_PROC_NAME, V_PROC_DATE, O_SQLSTATEFROM SYSIBM.SYSDUMMY1;

ROLLBACK;

INSERT INTO AUDIT_TRAIL (CREATE_TMS, PROC_NAME, MSG_TEXT)VALUES (V_PROC_DATE, V_PROC_NAME, V_MSG_TXT);

COMMIT;

SIGNAL SQLSTATE O_SQLSTATE SET MESSAGE_TEXT = 'PROCEDURE FAILED. SEE AUDIT_TRAIL TABLE' ;END;

SET V_RESULT = NUMBER1 / NUMBER2;

END/

CALL SCHEMA_NAME.DIVIDE_BY(20,2,?); --returns 10CALL SCHEMA_NAME.DIVIDE_BY(20,0,?); --Populates error in the database table and throws the exception back.

The important statement used here is SIGNAL. The SIGNAL statement is used to signal an error or warning condition. It causes an error or warning to be returned with the specified SQLSTATE, along with optional message text. This is same as throwing an exception in Java. If you need more information do visit: http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0004232.htm

Page 30: Informatica Material Beginner

Mac OS X 10.6 Snow Leopard Top 10 New Features

Mac OS X version 10.6 "Snow Leopard" is the seventh major release of Mac OS X, Apple's operating system. This release focuses on improving performance, efficiency and reducing its overall memory footprint compared to its predecessor Mac OS X v10.5 "Leopard".

Snow Leopard is only available for Intel-based Macintosh computers. Cost of upgrade for users running Leopard is US$29 (£25) for the single-user license, or US$49 (£39) for up to five computers with the family pack.

Top 10 Features/Improvements in Snow Leopard:

1. Apple's Boot Camp utility under Mac OS X 10.6 Snow Leopard includes Windows HFS+ drivers. Boot Camp is Apple's software package that allows customers to boot Microsoft Windows operating system on their Intel Macs. As of now Windows does not recognize Mac formatted hard drives and is unable to read or write to them. The newest version of Snow Leopard's Boot Camp includes special drivers to allow read access to Mac data even under Windows.

2. QuickTime has a new built-in function in that allows you to record video of actions you take on the Mac screen. QuickTime video player has been upgraded, with a clean new interface for playback, and the new ability to record and trim videos. Icons can be more easily enlarged, and you can preview the files they represent, even playing videos in miniature or paging through multipage PDF or PowerPoint files.

3. Opening and closing applications, flicking between windows, and even booting up and shutting down feels slicker. Snow Leopard's support for 64-bit computing is key to this, as is its use of OpenCL, which utilises the power of the Mac's graphics processors to run other programs and applications, giving the entire system more grunt.

4. Text substitution is a new feature which lets you create shortcuts for phrases you use frequently that will expand automatically as you type. Common substitutions are built in for example, changing (c) to a copyright symbol (©) and fractions from 1/2 to ½. You can also add your own substitutions; for example, 'ars' can expand to 'All Rights Reserved' and your initials can expand to your full name.

5. Spotlight can now index Faces and Places in iPhoto '09. You can search for pictures of your friends by simply typing their name in Spotlight. You can even search for your photos based on where they were taken.

6. Snow Leopard takes up less than half the disk space compared to its predeccessor, freeing about 7GB+ for you - enough for about 1,750 more songs or a few thousand more photos.

7. When upgraded to OS X 10.6, all MacBooks with Multi-Touch trackpads now support three- and four-finger gestures.

8. A new technology called Grand Central Dispatch takes full advantage of multicore systems by making all of Mac OS X multicore aware and optimizing it for allocating tasks across multiple cores and processors. Grand Central Dispatch also makes it much easier for developers to create programs that utilize all the power of multicore systems.

9. Activate Exposé from the Dock - You can now click and hold an application icon in the Dock, and all open windows in the application you selected will unshuffle so you can quickly change to another window. You can press the tab key while in Exposé to move to the next application in the Dock and show the windows for that application.

Page 31: Informatica Material Beginner

10. Snow Leopard includes native support for Cisco IPsec VPN connections, so it would be easy to connect to your office. Snow Leopard also includes the Oxford American Writer’s Thesaurus second edition, which would make it even more friendly.

FAQs:

1. Can I upgrade my intel mac to Snow Leopard and Windows 7.0 both?

Yes. Microsoft confirmed Windows 7 will be licensed to work within a virtual machine on a Mac just like XP and Vista. Apple also plans to offer Windows 7 drivers for its hardware soon after new Microsoft system comes out on Oct. 22. You can choose to keep running older versions of Windows on Macs equipped with Snow Leopard, or run Windows 7 on Macs that still run Leopard.

2. Can I use 29$ disk to update from Tiger to Snow Leopard?

Yes, you can. Not sure if its legal. Also, this was your iLife (iPhoto etc) will still be old.

For more information visit: http://www.apple.com/macosx/refinements/enhancements-refinements.html