Data Integration Quick Start Bundle User Guide - Informatica · Data Integration Quick Start Bundle User ... functionality in the Data Integration Quick Start bundle ... The Aggregator

Data Integration Quick Start Bundle User Guide

© 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

Overview The Data Integration Quick Start bundle enables your development team to quickly automate common integration patterns. The advanced transformation functionality in the Data Integration Quick Start bundle addresses complex integration scenarios.

A bundle is a set of prebuilt integration templates you execute through custom integration tasks. Bundles improve the productivity of your developers and enhance the overall quality of your data integration projects.

The integration templates in this bundle perform the following tasks:

Aggregation. Use aggregate functions, such as SUM, MAX, COUNT, or AVG.

Error record routing. Evaluate data and route valid and invalid data to different targets based on a user-defined condition.

Join heterogeneous sources. Join two sources from different systems or sources of different types.

File list bulk processing. Merge multiple flat files of the same structure.

Pivot rows to columns. Denormalize source data by pivoting rows to columns.

Lookup multiple fields. Returns multiple fields from a lookup. Can also return multiple rows.

Installing the Bundle The Application Integration Quick Start bundle displays as an available bundle in your organization. To view and install the bundle, in your organization, click Administration > Available Bundles.

After you install the bundle, you can use the integration templates in the bundle. For more information about installing and working with bundles, see the Informatica Cloud User Guide or online help.

Sample Files The bundle includes sample files that you can use to work with the integration templates. You can download the sample files from the community article or marketplace block where you downloaded this user guide. Download the following zip for all sample files: DataIntegrationQuickStart_SampleFiles.zip.

See the template documentation for the sample files to use.

Basic Aggregation Template Use the Basic Aggregation template to perform aggregation in a custom integration task. You perform aggregation with aggregate functions such as SUM, MAX, COUNT, or AVG.

You cannot perform aggregation in data synchronization tasks.

For more information about aggregate functions, see the Informatica Cloud User Guide or online help.

Template Data Flow This template uses the Aggregate transformation to aggregate data based on a specified group-by field. A filter is applied before the aggregation to filter any rows from the data source.

The following figure shows the data flow of template:

2

Prerequisites Informatica Cloud Standard Edition.

Connections to source and target systems.

Relational or flat file source.

Sample Sources and Targets The following figure shows how the template performs the SUM, AVG, MAX, MIN, and COUNT aggregate functions using JOB as the group-by field.

Template Parameters The following table describes the parameters in the template:

Parameters Description

$Src$ Relational or flat file source and connection.

$FilterCondition$ Filter condition. Excludes records based on the condition. The task performs the filter on source data.

By default the filter condition is set to TRUE, which passes all rows.

Default is to not to filter any records but you can exclude records based on a data condition after the join is done.

$SortBy$ Sort-by fields. The task sorts filtered data in ascending order on the specified field or fields.

Sorting can improve task performance. Best practice is to sort by the group-by fields.

$GroupBy$ Group-by fields. The task performs aggregation based on the specified field or fields.

$FieldMap$ Field mappings. You can map source fields to target fields using the field mapping input control associated with this parameter.

$Tgt$ Relational or flat file target and connection.

Using the Integration Template After you import the template to your organization, you can use it in a custom integration task.

Use the Custom Integration Task wizard to create a new task and configure it as follows:

1. Select the template. 2. Select the source and target that you want to use.

3

3. On the Other Parameters page, enter a group by field. Optionally add a filter or sort-by fields.

4. Configure field mappings and define expressions with aggregate functions as needed.

5. Save and run the task.

Additional Resources You can use the following resources to help you use this template.

Sample Files You can use the following sample files to work with the template:

Source: EMP.txt.

Target: EMP_Agg.txt.

Aggregator Transformation The Aggregator transformation performs aggregate calculations, such as averages and sums. The application performs aggregate calculations as it reads and stores data group and row data in an aggregate cache. The Aggregator transformation performs calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis.

When you create aggregate expressions, you can use conditional clauses to filter rows, providing more flexibility than SQL language.

Aggregate Functions Use the following aggregate functions within an Aggregator transformation. You can nest one aggregate function within another aggregate function.

You can use the following aggregate functions:

AVG

COUNT

FIRST

4

LAST

MAX

MEDIAN

MIN

PERCENTILE

STDDEV

SUM

VARIANCE

When you use these functions, you must use them in an expression within an Aggregator transformation. For more information about aggregate functions, see the Informatica Cloud User Guide or online help.

Error Record Routing Template Use the Error Record Routing template to separate valid and invalid records based on a user-defined validation condition.

With this template, you configure an expression that determines if a record is valid. The template routes valid data to one target and invalid data to a separate target.

You cannot route data through different data flows or write to two different targets in data synchronization tasks.

Template Data Flow The following figure shows the data flow of the template:



A relational or flat file source that has valid and invalid records.

Sample Source and Targets The following figure shows source data with an invalid record highlighted. The expression to validate data checks for null values in the MGR field:

5

The following figure shows valid target data:

The following figure shows invalid target data:



$Src1$ Relational or flat file source and connection.

$FilterCondition$ Filter condition. Excludes records based on the condition. The task performs the filter before validating data.


$validcheck$ Expression used to validate data. Configure the expression to pass valid and invalid records for the router. The output of the expression should be an integer: 0 for valid records, 1 for invalid records.

$ValidFieldMap$ Field mappings for valid data. You can map source fields to target fields using the field mapping input control associated with this parameter.

$RejectFieldMap$ Field mappings for invalid data. You can map source fields to target fields using the field mapping input control associated with this parameter.

$Tgt1$ Relational or flat file target and connection for valid records.

$Tgt2$ Relational or flat file target and connection for invalid records.

Using the Integration Template After you import the template to your organization, you can use the template in a custom integration task.

6


1. Select the template. 2. On the Sources page, select the source that you want to use. 3. On the Targets page, select the targets you want to use for valid and invalid data. 4. On the Other Parameters page, configure a filter condition if desired. By default, all records are passed. 5. For $validcheck$, enter the expression that you want to use to route valid and invalid data. The expression

should evaluate to 0 for valid records and 1 for invalid records.

6. Configure the field mappings for each target. 7. Save and run the task.



Source file: EMP.txt.

Target files: EMP_VALID.txt and EMP_INVALID.txt.

Joiner Template Use the Joiner template to join two relational sources, two flat file sources, or a relational source to a flat file source in a custom integration task.

You can join sources from different source systems. And you can use the following join types: normal join, master outer, detail outer, or full outer. You can also filter joined records based on a condition.

The data synchronization application does not support joining flat files or heterogeneous sources.


7



Two related relational or flat file sources.

Sample Sources and Target The following figure shows two related sample sources:

The following figure shows the results of a full outer join on DEPTNO:



$Src1$ Detail source object and connection. Use a relational or flat file source.

$Src2$ Master source object and connection. Use a relational or flat file source.

$JoinType$ Join type. Use one of the following join types: Normal Join, Master Outer, Detail Outer, or Full Outer.

$JnrCondition$ Join condition, such as Src2_ID = Src1_ID.

$FilterCondition$ Filter condition. Excludes records based on the condition. The task performs the filter after joining source data.



$Target$ Relational or flat file target and connection.

8

Using the Integration Template After you import the template to your organization, you can use the template in a custom integration task.


1. Select the template. 2. Select the sources and target that you want to use. 3. On the Other Parameters page, define the join type and join condition. Optionally enter a filter condition.

4. Map the source fields to target fields. 5. Save and run the task.



Source files: EMP.txt and DEPTNO.txt.

Target file, EMP_DEPTNO.txt.

Joiner Transformation Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.

Join Condition The join condition contains fields from both input sources that must match to join two rows. Depending on the type of join selected, the task either adds the row to the result set or discards the row.

File List Bulk Processing Template Use the File List Bulk Processing template in a custom integration task to process a list of files with a single task.

You can use the task to consolidate files that have the same data structure that are saved in multiple locations and write data to a single target. Source files can reside in different directories, but must be local to the Secure Agent. When you configure the task, you select one source file to provide the file format.

Use a text file to list the names of the source files that you want to use. If the source files reside in different directories, provide a fully-qualified path for each file. Save the file list in the same directory as the source file you plan to select in the task.

You cannot process a set of flat files with a data synchronization task.

9




A set of flat files with the same format, that are local to the Secure Agent.

A list file – a text file that contains a fully-qualified path and file name for all source files that you want to use.

Sample Sources and Targets The following figures show three source files with the same format:

The following figure shows the contents of a list file. EMP1.txt is a source in the same directory as the file list. Note the use of a fully-qualified path for sources that reside in other directories.

10

The following figure shows the consolidated data in the target:



$Src$ Source flat file with the common data structure, and source connection.

$FilterCondition$ Filter condition. Excludes records based on the condition.


$IndFile_Name$ List file. Enter the name of the text file that lists the source files that you want to use.

The list file must reside in the same directory as the source file configured for $Src$.





1. Select the template. 2. On the Source page, select a source file with the data structure that you want to use. 3. On the Target page, select the target that you want to use. 4. On the Other Parameters page, for $IndFile_Name$, enter the name of the file that lists the source files for

the task.

5. Optionally, enter a filter condition. By default, all records are passed.

11

6. Configure the field mappings for the task.



Sample Files You can use the following files to work with the template:

Sources: EMP1.txt, EMP2.txt, and EMP3.txt.

Target: EMP_Consolidated.txt.

List file: Emp_Ind.txt.

Pivot Rows to Columns Template Use the Pivot Rows to Columns template in a custom integration task to pivot and denormalize source data. When you pivot a source, the task converts attribute names in name -value pairs to columns.

You might use the Pivot Rows to Columns template to provide data for analytical applications that require denormalized data.

You cannot perform pivots in a data synchronization task.

Template Data Flow Note the following details about the template:

1. The source must include an attribute name value pair with the following structure: ID, Attribute Name, Attribute Value.

2. Attribute data that you do not pivot is collected in a field named Other_o. You can map data in this field to any other field on the target.

The following figure shows the data flow of the template:

12



A relational or flat file source with attribute name-value pairs and the appropriate data structure.

Sample Sources and Targets The following figure shows how the template pivots sample source data. Notice how the attribute name-value pairs in the source – such as name-Tim Smith, and phone-333-4444, are pivoted and denormalized in the target to be part of a single row of data.



$Src$ Relational or flat file source and connection.

$FilterCondition$ Filter condition. Excludes records based on the condition. The task performs the filter after denormalizing and pivoting source data.


$AttrColumn$ The source column that contains attribute names.

$AttrColValue$ The source column that contains attribute values.

$AttributeNames$ List of attribute values present in the source. For example, Name, Email, Phone.

$IdCol$ The key field in the source file.

$FieldMap$ Field mappings. You can map source fields to denormalized target fields using the field mapping input control associated with this parameter.




1. Select the template. 2. On the Sources page, select a source with attribute name-value pairs. 3. On the Targets page, select an appropriate target or choose Create File. 4. On the Other Parameters page, select the Attribute Name Column, Attribute Value Column. 5. For Attribute Value List, enter a comma-separated list of the attribute values that you want to pivot and that

are included in your target file or table.

13

6. Select the ID Column and configure field mappings.

The Other_o field is a catch-all field. It contains attribute data that you did not include in the Attribute Value List. You can map this field to any other field on the target.




Source file: PivotRows2Column_Input.txt.

Target file: PivotRows2Column_Out.txt.

Connected Lookup Template Use the Connected Lookup template in a custom integration task to return multiple fields and multiple rows from a lookup.

You can use relational or flat file sources with this template to perform lookups on relational tables, flat files, or Salesforce objects. For information on how to use the template with other connection types, contact Informatica Global Customer Support.

You cannot return multiple fields from a lookup a data synchronization task.

14



Relational or flat file source.

A relational table, flat file, or Salesforce object for the lookup.

The source and lookup fields that you want to compare must have different field names.

Template Parameters

Name Description

$Source$ Relational or flat file source connection and object.

$Target$ Target connection and object.

$LookupObject$ Relational table, flat file, or Salesforce object for the lookup.

$LookupPolicy$ Behavior when the lookup condition returns mulitiple rows. You can return the first or last row. You can also return any row, or all rows that match the lookup condition.

$LookupCondition$ Lookup condition. Used to evaluate data in the lookup for matches. Place the lookup on the left side of the expression. For example:

<lookup field name> = <source field name>

The field name from the source should be different from the field name in the lookup.

$FieldMap$ Configure as a field map to be able to link source and lookup fields to the target. Move this parameter to the bottom of the list of other parameters by editing the template.



1. Select the template. 2. On the Sources page, select the source connection and object. 3. On the Targets page, select the target connection and object.

15

4. On the Other Parameters page, select the lookup connection and object.

5. For $LookupPolicy$, select the behavior to perform if the lookup returns multiple matches. 6. For $LookupCondition$, create the lookup condition. 7. In the $FieldMap$ parameter, configure the field mappings that you want to use.

The Source Object table includes the lookup input fields and the lookup return fields. Lookup return fields are appended with _Lkp, as follows: <lookup field name>_Lkp.

16

Documents

Data Integration Quick Start Bundle User Guide - Informatica · Data Integration Quick Start Bundle User ... functionality in the Data Integration Quick Start bundle ... The Aggregator