03 Sort Dedup and Reformat Components

Embed Size (px)

Citation preview

  • 8/12/2019 03 Sort Dedup and Reformat Components

    1/41

    Accenture Ab Initio Training 1

    Introduction toAb Initio

    Prepared By : Ashok Chanda

  • 8/12/2019 03 Sort Dedup and Reformat Components

    2/41

    Accenture Ab Initio Training 2

    Ab initio Session 3

    SortSort within GroupDeDup SortedReformat: Example showing multipleoutput

  • 8/12/2019 03 Sort Dedup and Reformat Components

    3/41

    Accenture Ab Initio Training 3

    Graphical DevelopmentEnvironment GDE

  • 8/12/2019 03 Sort Dedup and Reformat Components

    4/41

    Accenture Ab Initio Training 4

    Components

    Components may run on any computer runningthe Co>Operating System.

    Different components do different jobs.The particular work a component accomplishesdepends upon its parameter settings.

    Some parameters are data transformations, thatis business rules to be applied to an input (s) toproduce a required output.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    5/41

    Accenture Ab Initio Training 5

    The Sort Component

    Reads records from input port, sorts themby key, and writes the result on the outputport.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    6/41

    Accenture Ab Initio Training 6

    Sorting

  • 8/12/2019 03 Sort Dedup and Reformat Components

    7/41

    Accenture Ab Initio Training 7

    Sorting - The Key SpecifierEditor

  • 8/12/2019 03 Sort Dedup and Reformat Components

    8/41

    Accenture Ab Initio Training 8

    Parameters for Sort

    key(key specifier, required)Name(s) of the key field(s) and the sequence specifier(s)

    you want Sort to use when it orders data records.max-core

    (integer, required)Maximum memory usage in bytes. The default value of

    max-core is 100663296 (100 megabytes). When Sortreaches the number of bytes specified in the max-core parameter, it sorts the records it has read and writes atemporary file to disk.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    9/41

    Accenture Ab Initio Training 9

    Runtime Behavior of Sort

    The Sort component:Reads the records from all the flows connected to the in port until it reaches the number of bytes specified in themax-core parameter.Sorts the records and writes the results to a temporaryfile on disk.Repeats this procedure until it has read all records.

    Merges all the temporary files, maintaining the sort orderWrites the result to the out port Sort stores temporaryfiles in the working directories specified by its layout.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    10/41

    Accenture Ab Initio Training 10

    Sort within Groups

    Sort within Groups refines the sorting ofdata records already sorted according toone key specifier: it sorts the recordswithin the groups formed by the first sortaccording to a second key specifier.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    11/41

    Accenture Ab Initio Training 11

    Parameters for Sort within

    Groups major-key

    (key specifier, required) :Name(s) of the key field(s) and the sequencespecifier(s) by which Sort within Groups assumes input is ordered.minor-key

    (key specifier, required) :Name(s) of the key field(s) and the sequencespecifier(s) you want Sort within Groups to use when it orders datarecords.max-core

    (integer, required) :Maximum memory usage in bytes before Sortwithin Groups stops the execution of the graph. The default value ofmax-core is 10485760 (10 megabytes).

  • 8/12/2019 03 Sort Dedup and Reformat Components

    12/41

    Accenture Ab Initio Training 12

    Runtime Behavior of Sort

    within Groups Sort within Groups assumes input records are sortedaccording to the major-key parameter. Sort withinGroups reads data records from all the flows connected

    to the in port until it either reaches the end of a groupor reaches the number of bytes specified in the max-core parameter. When Sort within Groups reaches theend of a group, it does the following:Sorts the records in the group according to the minor-key parameterWrites the results to the out portRepeats this procedure with the next group

  • 8/12/2019 03 Sort Dedup and Reformat Components

    13/41

    Accenture Ab Initio Training 13

    Sort Within Groups : Example

    Input data sorted by Cust Id Output data Sorted by Cust Id And Tran Id

    Major Key

    Minor Key

  • 8/12/2019 03 Sort Dedup and Reformat Components

    14/41

    Accenture Ab Initio Training 14

    Sort Within Group

    Major Key : classMinor Key : roll_nbr

  • 8/12/2019 03 Sort Dedup and Reformat Components

    15/41

    Accenture Ab Initio Training 15

    Dedup Sorted

    Dedup Sorted separates one specifieddata record in each group of data recordsfrom the rest of the records in the group.Dedup Sorted requires grouped input

  • 8/12/2019 03 Sort Dedup and Reformat Components

    16/41

    Accenture Ab Initio Training 16

    Input

    Output

    Removing Duplicates

    Occurrence of DuplicateRecords in Customer Infofile

  • 8/12/2019 03 Sort Dedup and Reformat Components

    17/41

    Accenture Ab Initio Training 17

    Delete Duplicates

    Deletes duplicates from a group of records based on the key/s Data should be sorted on the same key/s before using Dedup keep property can be used to select either first, last or unique

    record from within the group

  • 8/12/2019 03 Sort Dedup and Reformat Components

    18/41

    Accenture Ab Initio Training 18

    Runtime Behavior of Dedup

    Sorted The Dedup Sorted component:Reads a grouped flow of records from thein port. If your records are not alreadygrouped, use Sort to group them.

    Applies the expression in the select parameter to the records, if you havedefined the select parameter

  • 8/12/2019 03 Sort Dedup and Reformat Components

    19/41

    Accenture Ab Initio Training 19

    Runtime Behavior of Dedup

    SortedIf the expression evaluates to 0 for a particular record,Dedup Sorted does not process the record (that is, therecord does not appear on any output port).

    If the expression produces NULL for a particular record,Dedup Sorted writes the record to the reject port andwrites a descriptive error message to the error port.Dedup Sorted discards the information if you do notconnect flows to the reject or error ports.If the expression evaluates to anything other than 0 orNULL for a particular record, Dedup Sorted processes therecord. If you do not supply an expression for the select parameter, Dedup Sorted processes all the records onthe in port.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    20/41

    Accenture Ab Initio Training 20

    Runtime Behavior of Dedup

    SortedDedup Sorted considers any consecutive recordswith the same key value to be in the samegroup:If a group consists of one record, Dedup Sortedwrites that record to the out port.If a group consists of more than one record,Dedup Sorted uses the value of the keep

    parameter to determine:Which record if any to write to the out port.Which record or records to write to the dup port.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    21/41

    Accenture Ab Initio Training 21

    Runtime Behavior of Dedup

    SortedIf you have chosen unique-only for thekeep parameter, Dedup Sorted does not

    write records to the out port from anygroups consisting of more than onerecord.

    Both the out and dup ports are optional;if you do not connect flows to them,Dedup Sorted discards the records.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    22/41

    Accenture Ab Initio Training 22

    More Complex Components

    In these componentsthe record format

    metadata typicallychanges (goesthrough atransformation) from

    input to output

  • 8/12/2019 03 Sort Dedup and Reformat Components

    23/41

    Accenture Ab Initio Training 23

    Reformat-Transform

    ComponentTransform components modify ormanipulate data records by using one or

    more transform functions.Reformat: Changes the record format ofyour data by dropping fields or by using

    DML expressions to add fields, combinefields, or modify the data.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    24/41

    Accenture Ab Initio Training 24

    Data Transformation

    0345,090263John,Smith;

    1000345Smith 1963.09.02

    Drop

    id+1000000

    Reformat

    Reformat Reorder

    Input record format:recorddecimal(,) id; date(MMDDYY) bday; string(,)first_name; string(;) last_name;

    end

    Output record format: record

    decimal(7) id;string(8) last_name;date(YYYY.MM.DD) bday;

    end

  • 8/12/2019 03 Sort Dedup and Reformat Components

    25/41

    Accenture Ab Initio Training 25

    The Reformat Component

    Reads records from input port, reformats eachaccording to a transform function (optional in thecase of the Reformat Component), and writes theresult records to the output (out0) port.

    Additional output ports (out1, ...) can be created byadjusting the count parameter.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    26/41

    Accenture Ab Initio Training 26

    The Transform Function Editor

  • 8/12/2019 03 Sort Dedup and Reformat Components

    27/41

    Accenture Ab Initio Training 27

    Reformat

    Reformat with 5 ports

  • 8/12/2019 03 Sort Dedup and Reformat Components

    28/41

    Accenture Ab Initio Training 28

    About Transform Functions

    A transform function (or transform ) is thelogic that drives data transformation

    most commonly, transform functionsexpress record reformatting logic. Ingeneral, however, you can use transform

    functions in data cleansing, recordmerging, and record aggregation.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    29/41

    Accenture Ab Initio Training 29

    About Transform Functions

    To be more specific, a transform function is a collectionof business rules, local variables, and statements. Thetransform expresses the connections between the rules,

    variables, and statements, as well as the connectionsbetween these elements and the input and output fields.Transform functions are always associated withtransform components; these are components that havea transform parameter: the Aggregate, DenormalizeSorted, Fuse, Join, Match Sorted, MultiReformat,Normalize, Reformat, Rollup, and Scan components.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    30/41

    Accenture Ab Initio Training 30

    About Transform Functions

    Each component that has a transform parameter:

    Determines the values that are passed tothe transform functionInterprets the results of the transformfunction

  • 8/12/2019 03 Sort Dedup and Reformat Components

    31/41

    Accenture Ab Initio Training 31

    Runtime Behavior of Reformat

    The n in out n gives each out port a unique number.Each out n port has a corresponding reject n and error n port.

    The Reformat component:Reads records from the in port.If you supply an expression for the select parameter,the expression filters the records on the in port:

    If the expression evaluates to 0 for a particular record, Reformatdoes not process the record, which means that the record doesnot appear on any output port.If the expression produces NULL for any record, Reformat writesa descriptive error message and stops execution of the graph.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    32/41

    Accenture Ab Initio Training 32

    Runtime Behavior of Reformat

    If the expression evaluates to anything otherthan 0 or NULL for a particular record, Reformatprocesses the record. If you do not supply anexpression for the select parameter, Reformatprocesses all the records on the in port.Passes the records to the transform functions,calling the transform function on each port, inorder, for each record, beginning with out port0 and progressing through out port count - 1.Writes the results to the out ports.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    33/41

    Accenture Ab Initio Training 33

    Parameters for Reformat

    Reformat :Parameterscount

    limitlogramp

    reject-thresholdselectTransform n

  • 8/12/2019 03 Sort Dedup and Reformat Components

    34/41

    Accenture Ab Initio Training 34

    Parameters for Reformat

    count :(integer, required)Integer from 1 to 20 that sets the numberof each of the following. The default is 1 .

    out portsreject ports

    error portstransform parameters

  • 8/12/2019 03 Sort Dedup and Reformat Components

    35/41

    Accenture Ab Initio Training 35

    Parameters for Reformat

    Transform: (filename or string, optional)Either the name of the file, or a transform string,

    containing a transform function corresponding toan out port; n represents the number of an out port .Transform functions for Reformat shouldhave one input and one output.select : (expression, optional)Filter for data records before reformatting.

  • 8/12/2019 03 Sort Dedup and Reformat Components

    36/41

    Accenture Ab Initio Training 36

    Parameters for Reformat

    limit : (integer, required) A number representing reject events.When the reject-threshold parameter isset to Use ramp/limit , the componentuses the values of the ramp and limit

    parameters in a formula to determine thecomponent's tolerance for reject events.Default is 0 .

  • 8/12/2019 03 Sort Dedup and Reformat Components

    37/41

    Accenture Ab Initio Training 37

    A Look Inside the Reformat

    Component b c a

    x z y

  • 8/12/2019 03 Sort Dedup and Reformat Components

    38/41

    Accenture Ab Initio Training 38

    45 QF 9

    out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);

    end;

    A R ecord arrives at the

    input port

  • 8/12/2019 03 Sort Dedup and Reformat Components

    39/41

    Accenture Ab Initio Training 39

    45 QF9

    out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);

    end;

    The Transformation

    Function is evaluated

    Th l d i i

  • 8/12/2019 03 Sort Dedup and Reformat Components

    40/41

    Accenture Ab Initio Training 40

    out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);

    end;

    44 RG 9

    The result record is written tothe output port of the

    component

  • 8/12/2019 03 Sort Dedup and Reformat Components

    41/41

    Thank You

    End of Session 3