CapGemini Datastage Exercise

Embed Size (px)

Citation preview

  • 8/18/2019 CapGemini Datastage Exercise

    1/122

    1

    Training course Datastage (part 1)Training course Datastage (part 1)

    V. BEYETV. BEYET

    03/07/2006 03/07/2006 

  • 8/18/2019 CapGemini Datastage Exercise

    2/122

    2

     Presentation ...Presentation ...

    Who am I ? Who am I ? 

    Who are you ? Who are you ? 

  • 8/18/2019 CapGemini Datastage Exercise

    3/122

    3

     Summar

     General presentation (DataStage : what is it ?) 

     DataStage : how to use it ? 

     The other components (part 2) 

  • 8/18/2019 CapGemini Datastage Exercise

    4/122

    4

    !enera" presentation

     Datastage : What is it ? 

     An ETL tool: Extract-Transform-Load

     A graphic environment

     A tool integrated in a suite of BI tools

     Developed by Ascential IB!"

  • 8/18/2019 CapGemini Datastage Exercise

    5/122

    5

     Datastage : why to use it ? 

     big si#e of data volume"

    multi-source and multi-target :

     files$ Databases oracle$ s%lserver$ access$ &"'

     Data transformation : (elect$ )ormat$ *ombine$ Aggregate (ort'

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    6/122

    6

     Datastage : how it wors ? 

     Development is done :

     on a client-server mode$

     +ith a graphical Design of flo+s$

     +ith simple and basic elements$ +ith a simple language basic"'

     Treatments are : *ompiled and run by an engine$

     ,ritten on a niverse database$

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    7/122

    7

    T#e $i%%erent too"s

    Server Server 

     Designer  Designer   Manager  Manager 

     Administrator  Administrator  Director  Director 

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    8/122

    8

    Server Server 

     The ser!er contains programs an" "ata.

     The programs

     *alled .obs : first as source code and then asexecutable programs$ +ritten in niverse Database

     But +e can/t understand source code

     Data : !ay be +ritten in niverse Database but better in

    server directories'

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    9/122

    9

    Server Server 

     What is a Pro#ect $or Datastage ? 

     A server is organi#ed in different environments called

    01ro2ects3

     A 1ro2ect is a separated environment for 2obs$ table

    definitions and routines

     A 1ro2ect can be created at any time

    The number of pro2ects is unlimited

     The number of 2obs is unlimited for each pro2ect But the number of simultaneous client connection islimited

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    10/122

    10

    Servur Servur 

    %ni!erse Data&ase:

     The niverse Database is a relational Database +ith files

     Tables are called 45ash )ile4

    A 5ash file is an indexed file6 It/s the central element to use all

    the possibilities of the Datastage engine'

     A Hash file with incorrectly defined  keys may create disastrous problems.

    !enera" presentation

  • 8/18/2019 CapGemini Datastage Exercise

    11/122

    11

     General presentation (Datastage : what is it ?) 

     DataStage : how to use it ? 

     The other components (part 2) 

     Summar

  • 8/18/2019 CapGemini Datastage Exercise

    12/122

    12

    T#e $esigner 

     The "esigner is to "esign #o&s & loo at the icon  

    The #o&s are compose" with ' Stages :

     active stages : action

     passive stages : data storageLin7s : bet+een the stages

     Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    13/122

    13

    T#e $esigner 

    1assive stages : a place for Data storage the

    data flo+ is from the stage or to the stage"

    Text )ile : se%uential file

    5ash )ile : It can be treated only by

    datastage and not by ,ord1ad$ &" but

    simultaneous access is possible on 5ash file'

    8 (tage : The file is in the niverse *ore

    Data(tage engine"'

    9DB* (tage$ 9LEDB$ 9A9*I :

    epresentation of a database6 it allo+s to

    access directly to a database +ith an 9DB*

    lin7'

     Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    14/122

    14

    Active stagesAn active stage is a representation of a transformation on the dataflo+ :

     Designer  Designer 

    T#e $esigner 

    (ort : of a file

    Aggregator : calculations

    Transformer : selection$ transformation$ transport of properties

    &

  • 8/18/2019 CapGemini Datastage Exercise

    15/122

    15

    lin7s

     Designer  Designer 

    T#e $esigner 

    Bet+een active and passive stages

    Bet+een passive stages

    Bet+een active stages

  • 8/18/2019 CapGemini Datastage Exercise

    16/122

    16

    T#e $esigner 

     * #o& in the "esigner 

     Designer  Designer 

    1assive (tageActive (tage

  • 8/18/2019 CapGemini Datastage Exercise

    17/122

    17

    T#e $esigner   Designer  Designer 

    Data(tage Designer :

    Each 2ob has :

    - one or more source of data

    - one or more transformation- one or more destination for the data

    The toolbar contains the stage icons to designthe 2obs'

    The 2obs have to be compiled to createexecutable programs'

  • 8/18/2019 CapGemini Datastage Exercise

    18/122

    18

    T#e $esigner   Designer  Designer 

    The repository

    The toolbar+ith stageicons

    palette"

    To compile the 2ob

    To run the 2ob

  • 8/18/2019 CapGemini Datastage Exercise

    19/122

    19

    T#e $esigner   Designer  Designer 

    Let/s study no+ the different (tages :

    (e%uential )iles text files"Transformer

    5ash )iles(ortAggregatoroutines8 (tages

  • 8/18/2019 CapGemini Datastage Exercise

    20/122

    20

    (e%uential file (tage :

     *an be read$ *an be +ritten$

     *an be read and +ritten in the same 2ob$ *an be +ritten cash or not$ *an be D9( file or nix file & *an be read by t+o 2obs at the same time

     *an/t be +ritten by t+o 2obs at the same time

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    21/122

    21

    T#e $esigner 

    (e%uential )ile :

     Designer  Designer 

    (tage name

    )ile Type

    (tage description

  • 8/18/2019 CapGemini Datastage Exercise

    22/122

    22

    T#e $esigner   Designer  Designer 

    (e%uential )ile :

    9utput lin7

    (tage name to be +ritten"

  • 8/18/2019 CapGemini Datastage Exercise

    23/122

    23

    T#e $esigner   Designer  Designer 

    (e%uential )ile :

    Data )ormat 9utput file"

    Al+ays those values

  • 8/18/2019 CapGemini Datastage Exercise

    24/122

    24

    T#e $esigner   Designer  Designer 

    (e%uential )ile : To test the connection andvie+ the data in the file

    Different columns of thefile 9utput" : type$ length

    (i#e to displayfor 8ie+ Data"

  • 8/18/2019 CapGemini Datastage Exercise

    25/122

    25

    ;roup your tabledefinitions byapplication

    *reate or modify the tabledefinitions for files$databases$ transformers$ &"

    T#e $esigner   Designer  Designer 

    To describe easily a file :use or create a 0tabledefinition3

    (e%uential )ile :

  • 8/18/2019 CapGemini Datastage Exercise

    26/122

    26

    Then it can be used in different 2obs clic7 on Load to find the rightdefinition"'

    T#e $esigner   Designer  Designer 

    (e%uential )ile :

  • 8/18/2019 CapGemini Datastage Exercise

    27/122

    27

    8ie+ Data

    T#e $esigner   Designer  Designer 

    (e%uential )ile :

  • 8/18/2019 CapGemini Datastage Exercise

    28/122

    28

    Transformer (tage :

     !ulti-source and multi-target$

     ,ait for the availability of the source of data$ !a7es loo7up bet+een < flo+s reference"$ Transform or propagate the data of each flo+$ Allo+s to select$ filter$ create refusals file'

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    29/122

    29

    Transformer (tage :

    *an do treatments by :

     native basic function or created in the manager$ Data(tage function or Data(tage macro$ routines before=after type"  9r only propagate columns'

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    30/122

    30

    Transformer (tage :

    T#e $esigner   Designer  Designer 

    Input data9utput data

    ight clic7 :propagate allthe columns

  • 8/18/2019 CapGemini Datastage Exercise

    31/122

    31

    T#e $esigner   Designer  Designer 

    Input data

    9utput data

    Transformer (tage :

  • 8/18/2019 CapGemini Datastage Exercise

    32/122

    32

    Exercise n°1 :Objective : Read a sequential file and create a new one sa!e t"e file#

    $"e catalo%ue&in file "as to 'e read and t"e catalo%ue(sa!e&t)* file "as to 'e written

    +ource ,ile : catalo%ue&in in -in director.#

    $ar%et ,ile : catalo%ue(sa!e&t)* in -t)* director.#

    +te*s :

    1/ reate a ta'le definition structure of atalo%ue ta'le #

    2/ esi%n t"e o' wit" 2 +equential ,iles and 1 $ransfor)er 

    3/ reate t"e lins data flow#

    4/ +a!e and o)*ile t"e o'

    5/ Run t"e o'

    6/oo at t"e *erfor)ances statistics ri%"t clic#

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    33/122

    33

    Loo7 at the performances of your 2ob :

    ight clic7 on the grid and then select

    0(ho+ performance statistics3

    T#e $esigner   Designer  Designer 

    Transformer (tage :

  • 8/18/2019 CapGemini Datastage Exercise

    34/122

    34

    *reate the parameters of the 2ob :

    )enu Edit / o' ro*erties ta' ara)eters&

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    35/122

    35

    Exercise n°2 :

    Objective : se en!iron)ent !aria'les

    / create a o' *ara)eter : director.

    / *lace it on all t"e *at"s fro) t"e o' of t"e first

    exercise exa)*le : director.-t)*#

    / co)*ile

    / )odif. .our in*ut file add .our 'est fil)#

    / run wit" different *at" ot"er %rou*s#&

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    36/122

    36

    5ash )ile (tage :

    T#e $esigner   Designer  Designer 

    >ecessary for a loo7up

    9ne 5ash file is entirely +ritten before it can beread )romTrans lin7 must be finished before )rom)ilmType5)can start"Allo+s to group multiple records +ith the same7ey suppress duplicate 7eys"*an be read in different 2obs simultaneously

    *an be +ritten by different lin7s simultaneouslyin the same 2ob or in different 2obs"

  • 8/18/2019 CapGemini Datastage Exercise

    37/122

    37

    5ash )ile :  

    T#e $esigner   Designer  Designer 

    (tage name

    Account nameData(tage pro2ect"

    )ile path

  • 8/18/2019 CapGemini Datastage Exercise

    38/122

    38

    T#e $esigner   Designer  Designer 

    5ash )ile :  )ile name

    )or files to +rite

    (elect this chec7 box tospecify that all recordsshould be cached$ ratherthan +ritten to the hashedfile immediately' This isnot recommended +here your 2ob +rites and readsto the same hashed file inthe same stream ofexecution

  • 8/18/2019 CapGemini Datastage Exercise

    39/122

    39

    A 7ey must be defined it can be a single or multiple 7ey"

    T#e $esigner   Designer  Designer 

    5ash )ile :

  • 8/18/2019 CapGemini Datastage Exercise

    40/122

    40

    (tage Transformer : Loo7up? The main flo+ can be from every type

    ? The secondary flo+ must has a 5ash )ile to design a loo7up so veryoften$ you +ill have to design a temporary 5ash )ile"

    ? The loo7 up is done +ith the 7ey of the secondary flo+

    ? The number of records in the main flo+ can/t be higher after theloo7up than before the loo7 up

    ? The loo7up is sho+n +ith a dotted line

    ? ,hen a loo7up is 0exclusive3 the number of records after the loo7upis smaller then the number of records before the loo7up

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    41/122

    41

    T#e $esigner   Designer  Designer 

    Transformer (tage : Loo7up

    1rincipal )lo+hori#ontal"

    eference )lo+vertical flo+"

  • 8/18/2019 CapGemini Datastage Exercise

    42/122

    42

    Exercise n°3 :  Objective : )ae a loou* 'etween atalo% file and ,il) $.*e

    to *ut t"e t.*e fil) in t"e out*ut file&

    +ource ,ile : catalo%ue&in in -in director.#

    $ar%et ,ile : catalo%ue&out in -out director.#

    +te*s :

    1/ reate a ta'le definition structure of ,il)$.*e ta'le #

    2/ odif. .our o' to create a ;as" ,ile fro) t"e ,il)$.*e&in file

    3/ reate t"e lin to s"ow t"e loou* data flow#4/ +a!e and o)*ile t"e o'

    5/ Run t"e o'

    6/oo at t"e *erfor)ances statistics ri%"t clic#

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    43/122

    43

    Exercise n°4 :  Objective : *ut t"e director na)e and t"e fil) na)e to%et"er

    se*arated '. a & ?f t"e fil) t.*e is not found *ut in t"e out*ut file& @"at "a**ens w"en t"e director na)e is

    e)*t. A ,ind a solution&

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    44/122

    44

    Exercise n°5 :  Objective : ?f t"e fil) t.*e is not found use constraint# *ut t"e

    fil) in a refusals file ,irst a +equential file and t"en a ;as"

    ,ile#

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    45/122

    45

     (tage Loo7up +ith selection exclusive loo7up"

     Don't %orget & "ooup can e $esigne$ *it# +,-+ stage or V stage ut it is more

    etter *it# as# i"es.

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    46/122

    46

    T#e $esigner   Designer  Designer 

    Exercise n°6 :  Objective : +elect onl. t"e fil)s for w"ic" t"e t.*e is nown

    t"at )eans t"at t"e loou* is BC#

  • 8/18/2019 CapGemini Datastage Exercise

    47/122

    47

    Exercise n°7 :  Objective : +elect all t"e clients w"o are fe)ale to *ut t"e) in

    an out*ut file$"e +EDE colu)n contains ale# or , fe)ale#

    nd t"en create an annotation for t"is o' all t"e o's )ust "a!e annotations#

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    48/122

    48

    T#e $irector  Director  Director 

     The Director is the 2ob controller$ it allo+s to :

     un 2obs

    Immediately or later$ +ith more options than in the Designer

     *ontrol 2ob status

    (tatus : *ompiled$ unning$ Aborted$ 8alidated$ )ailed validation '''

     .ob monitoring

    To control the number of lines treated by each active stage of a 2ob'

     

  • 8/18/2019 CapGemini Datastage Exercise

    49/122

    49

    un 2obs +ith Director

    T#e $irector  Director  Director 

    (elect the 2ob andclic7 here

    And then enterthe parameters

  • 8/18/2019 CapGemini Datastage Exercise

    50/122

    50

    To run a 2ob later :

     Director  Director T#e $irector 

    clic7 here

    And then choosethe date and time

  • 8/18/2019 CapGemini Datastage Exercise

    51/122

    51

    To modify running parameters for a 2ob : Limits Tab

     Director  Director T#e $irector 

    ,arnings limit : the 2obstops after x +arnings

    o+s limit : the 2ob stops after xro+s on each flo+"

  • 8/18/2019 CapGemini Datastage Exercise

    52/122

    52

    8erify the status of 2obs +ith Director

    The status :? 4>ot compiled4

    ? 4*ompiled4

    ? 4)ailed validation4

    ? 48alidated o74

    ? 4Aborted4? 4)inished4

    ? 4unning4

    T#e $irector  Director  Director 

  • 8/18/2019 CapGemini Datastage Exercise

    53/122

    53

     Director  Director 

    Example : list of 2obs

    T#e $irector 

    To run 2obs To stop 2obs To run 2obs laterTo vie+ the log To reset 2ob status

  • 8/18/2019 CapGemini Datastage Exercise

    54/122

    54

    Example of a !onitor :

     Director  Director 

    )or each step : the number of treated lines input and output"the beginning timethe execution duration Elapsed time"the statusthe performance ro+s=sec"

    T#e $irector 

    Lin7 type :1ri : principal flo+

    ef : reference flo+ loo7up"9ut : output flo+

    The monitor allo+s to follo+ the

    different stages of a 2ob' (eethe importance of a good namefor the stages and the lin7s @

  • 8/18/2019 CapGemini Datastage Exercise

    55/122

    55

    Example of a log :

     Director  Director T#e $irector 

    ;reen : 9 >o problemCello+ : +arninged : bloc7ing problem

     Don't %orget & "ear t#e "og %rom time to time (o4"ear "og).

    To loo7 at error messages$choose the 2ob and clic7 on the0log3 button

  • 8/18/2019 CapGemini Datastage Exercise

    56/122

    56

    All the elements :

    ? 2obs

    ?outines

    ?table definitions

    are classified in *ategories but the

    name must be uni%ue +ithin a pro2ect

    T#e manager 

     The manager is the tool to export=import elements from a

    Data(tage pro2ect to an other Data(tage pro2ect'

     Manager  Manager 

    To import or export elements clic7 on

    the appropriate button

    )ile9pen 1ro2ect to change pro2ect

    Drag and Drop on an element to changecategory

  • 8/18/2019 CapGemini Datastage Exercise

    57/122

    57

     E19T 

     Manager  Manager 

    T#e manager 

    To append to anexisting file

    To change the selectionoptions :

    - By category

    - By individual components

    ?.obs

    ?outines al+ays chec70(ource *ode3 box"

    ?Table definitions

    choose +hat do you +ant to export create a 'dsx"

  • 8/18/2019 CapGemini Datastage Exercise

    58/122

    58

     I!19T 

     Manager  Manager 

    T#e manager 

    This +ill create=modify elements inthe Data(tage 1ro2ect

    !a7e your choice

    choose +hat do you +ant to import

  • 8/18/2019 CapGemini Datastage Exercise

    59/122

    59

    ,ith the manager$ you can compile many 2obs at the same time multiple compile

     2obs"

    Tools un multiple 2ob compile

     you select the type of 2obs you +ant to compile and select 0(ho+ manual

    selection page3 and clic7 on 0>ext3 button

    select the 2obs and clic7 on 0>ext3 button

    clic7 on the 0(tart compile3 button

     Manager  Manager 

    T#e manager 

  • 8/18/2019 CapGemini Datastage Exercise

    60/122

    60

    (ort (tage :

    T#e $esigner   Designer  Designer 

    *riteria of sorting are filled in

    In (tage Tab=1roperties Tab

    !odify those parameters if thefile to sort has a lot of lines

  • 8/18/2019 CapGemini Datastage Exercise

    61/122

    61

    Exercise n°8 :  Objective : @"en .ou "a!e selected all t"e @o)en sort t"e file

     '. al*"a'etical order&

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    62/122

    62

    Aggregator (tage :  

    - Allo+s data to be aggregated on a smaller number ofrecords$

    - Intermediate treatments executed in memory$- Allo+s to execute a before=after  routine before or afterthe stage treatment +hen all the lines have been treated"$

    - 1erformances are better if data is sorted Input tab"$

    - The aggregator does not sort the records'

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    63/122

    63

    Aggregator (tage : Input Tab

    T#e $esigner   Designer  Designer 

    ,hen input datais sorted

  • 8/18/2019 CapGemini Datastage Exercise

    64/122

    64

    Aggregator (tage : 9utput tab

    T#e $esigner   Designer  Designer 

    ;roup by

    Differentfunctions

  • 8/18/2019 CapGemini Datastage Exercise

    65/122

    65

    Exercise n°9 :

    Objective : create a o' w"ic" reads location&innd calculates t"e "it/*arade fro) t"e )ost "ired cassettes

    order '. nu)'er of "ire descendin%#& ut also t"e na)e of t"e

    fil) and not onl. t"e nu)'er of t"e cassette loou* wit"

    catalo%ue&in#&

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    66/122

    66

    Exercise n°10 :

    Objective : create a o' w"ic" reads location&innd calculates t"e a!era%e nu)'er of "ire for eac" cassette&2 different )et"ods can 'e used#

    T#e $esigner   Designer  Designer 

    DD i

  • 8/18/2019 CapGemini Datastage Exercise

    67/122

    67

    Exercise n°9 o' to desi%n#

    T#e $esigner   Designer  Designer 

    D iD i

  • 8/18/2019 CapGemini Datastage Exercise

    68/122

    68

    Exercise n°10 o' to desi%n#

    T#e $esigner   Designer  Designer 

    D iD i

  • 8/18/2019 CapGemini Datastage Exercise

    69/122

    69

    5ash )ile (tage :  ,e have seen that the 5ash )ile is necessary for a loo7up

    ,e have seen also that 5ash )ile allo+s to suppressduplicate 7ey

    Let/s see no+ ho+ it is useful to group different flo+s

    T#e $esigner   Designer  Designer 

    D iD i

  • 8/18/2019 CapGemini Datastage Exercise

    70/122

    70

    Exercise n°11 :

    Objective : @it" t"e o' fro) exercise 10 use t"e 2 )et"ods in

    t"e sa)e o'# create a ;as" ,ile to *ut t"e different results in t"e

    sa)e ;as" ,ile&olu)n 1 : or

    olu)n 2 : t"e result of eac" )et"od

    ?n t"e ;as" file .ou )ust "a!e 2 lines&

    T#e $esigner   Designer  Designer 

    DesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    71/122

    71

    Exercise n°11 o' to desi%n#

    T#e $esigner   Designer  Designer 

    DesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    72/122

    72

    (tage 8ariables :  (imple treatments can be made easily +ith stage variable'

    - It is a data +hich remain 0active3 during all the duration of the stage' (o youcan find a max if data is sorted"$ calculate a sum or count something'

    - In the transformer$ clic7 on the right button and then select 0(ho+ (tagevariables3' Example :

    T#e $esigner   Designer  Designer 

    T# $ i DesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    73/122

    73

    T#e $esigner    Designer  Designer 

    Another example :

    T# $ i

    DesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    74/122

    74

    Exercise n°12 :

    Objective : $r. to calculate t"e a!era%e wit" sta%e !aria'les&

    T#e $esigner   Designer  Designer 

    Exercise n°13 :

    Objective : reate a o' t"at create a file wit" all t"e client e.#and in a second colu)n t"e list of t"e fil)s se*arated '. a dot#&

    T# $ iDesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    75/122

    75

    T#e $esigner   Designer  Designer 

    Exercise n°13 o' to desi%n#

    T# $ iDesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    76/122

    76

    T#e $esigner   Designer  Designer 

    Exercise n°13 o' to desi%n#T#e or$er o% t#e $i%%erent 5aria"es is important. T#e instructions are eecute$ in t#e

    or$er o% t#e stage 5aria"es (to c#ange t#e or$er 84 rig#t c"ic4stage properties49in

    or$ering Ta)

    T#e 5aria"es must e initia"i:e$ (84 rig#t c"ic4stage properties45aria"es).

    T#ere must e a #as# %i"e a%ter t#e stage.

    T# $ iDesignerDesigner

  • 8/18/2019 CapGemini Datastage Exercise

    77/122

    77

    Data(tage 8ariables :  Different variables are defined by Datastage :-F>LL

    - FI>9,>!$ F9T9,>!

    - FDATE

    - FTE$ F)AL(E

    - F1AT5

    T#e $esigner   Designer  Designer 

    Lin7 8ariables :  The more useful is : >9T)9>D

    T# $ iDesigner Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    78/122

    78

    outines :  

    - (ource code +ritten +ith Basic language"

    - It is external from the 2obs and can be used many times at many

    levels- It can be a Transform function or a Before=After )unction :

     a transform function is called at each line

     a before subroutine is called before the first lineexample : empty a file"

     an after subroutine is called +hen all the lines have beentreated

    T#e $esigner   Designer esigne

    T# $ i  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    79/122

    79

    outines G=H"

    T#e $esigner gg

    Type of routine>ame of the routine

    Al+ays fill in this(hort description

    T# $ i  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    80/122

    80

    outines

  • 8/18/2019 CapGemini Datastage Exercise

    81/122

    81

    T#e $esigner gg

    outines H=H"

    *ode : useArgument names

    (ave *ompileTest oftheroutine

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    82/122

    82

    T#e $esigner g

    outines : access to a se%uential file

     *lose(e% )ic

    9pen(e% )ic to xxx thenendelseend

    ,rite(e% )ic to xxx thenendelseend

    ead(e% )ic to xxx thenendelse

    end

    )ile 5eader

    ,eof(e% xxx To empty the file

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    83/122

    83

    T#e $esigner 

    outines :

    If & Then

    End

    Else

    End

    ;oTo

    )or i & To

    >ext i

    Loop ,hileepeat

    Loop ntilepeat

    *all D(LogInfo4Information4$ Houtine>ame4"*all D(Log,arn4,arning4$ 4outine>ame4"*all D(Log)atal4Abort4$ 4outine>ame4"

    AJ5ello /

    BJ,orld/*A:B

    *J5ello ,orld/

    field&$K$K$H$G" search string file after the third comma

    Trim&$ / /$/T/" suppress the trailing spaces

    pcase&"

    Iconv4MN=

  • 8/18/2019 CapGemini Datastage Exercise

    84/122

    84

    T#e $esigner 

    outines : Test

    By double-clic7 on esult column

    T#e $esigner

      Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    85/122

    85

    Exercise n°14 :

    +te* 1 :

    Objective : write a routine w"ic" calculates t"e nu)'er of da.

     'etween two dates&

    ?f 'e%in date is null t"en return 0

    ?f end date is null t"en initialiIe it wit" date of toda.

    +a!e co)*ile and test t"e routine&

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    86/122

    86

    T#e $esigner 

    T#e $esigner

      Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    87/122

    87

    Exercise n°14 :  

    +te* 2

    Objective :  Read location&in %enerate a file wit" t"e "ire

    duration returned cassettes onl.#

     Jon returned cassettes after 10 da.s end date null# will 'e

    written in a refusals file wit" t"e na)e and address of client to

    send t"en a )ail#

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    88/122

    88

    Exercise n°14 o' to 'e desi%ned#

    T#e $esigner 

    T#e $esigner

      Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    89/122

    89

    Exercise n°15 :

    Objective : @it" a routine se +E # calculate t"e a)ount

    for t"e cassette "ire da.s nu)'er K "ire *rice K coefficient#&

    $"e coefficient is calculated wit" t"at rule :L5 da.s M da.s K "ire *rice

    =M5 and L10 da.s M da.s K "ire *rice K 1&20

    =M10 and L30 da.s M da.s K "ire *rice K 1&50

    =M 30 da.s M da.s K "ire *rice K 3

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    90/122

    90

    8 (tage :  +or7s +ith internal hash file in the Data(tage 1ro2ect"

     ma7es a *artesian product

    uses (L re%uests select & from & +here & order by &"

    T#e $esigner 

    T#e $esigner

      Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    91/122

    91

    Exercise n°16 : execute t"e artesian *roduct on lients file

    and assettes file

    Objective : ro*ose to t"e clients cassettes "e "as ne!er "ired

    N+te* 1 : create t"e o' *ara)eter

    N+te* 2 : create a o' to write clients "as" file et cassettes "as" file

    in t"e + *roect wit" account *ara)eter N+te* 3 : ?n a new o' use t"ose "as" files to )ae t"e artesian

     *roduct

    Noo at .our o' *erfor)ances OO

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    92/122

    92

    Exercise 16 : +te* 1 and +te* 2

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    93/122

    93

    +te* 3 :

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    94/122

    94

    T#e $esigner 

    T#e $esigner  Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    95/122

    95

    T#e $esigner 

    The number of records

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    96/122

    96

     Jor)aliIation :

    #e $esigne

    12 PQPPPE

    12

    12 Q

    12

    12

    12 E

    The normali#ation :

    n/nor)aliIation :

    !ulti-valuated file >ormali#ed file

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    97/122

    97

    >ormali#ation :

    g

    !ulti-valuated file must have :

    G- a 7ey

  • 8/18/2019 CapGemini Datastage Exercise

    98/122

    98

    Exercise n°17 : nor)aliIationun/nor)aliIationN+te* 1 : create a o' w"ic" reads location&in file and writes a "as"

    file ?d(li as t"e e. and t"e list of all ?d(as se*arated '.

    SF# : use +ort sta%e and +ta%e Faria'les O

    M= View Data on the Input ink of the Hash !ile

    N+te* 2 : )odif. t"e a o' to add nor)aliIation of t"is file

    M= View Data on the "utput ink of the Hash !ile

    N+te* 3 : o)*are t"e sequential file wit" location&in file

    g

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    99/122

    99

    g

    Exercise J°17 : o' to desi%n and Fiew ata

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    100/122

    100

    g

    The 9A9*I (tages :

     The version of oracle used is Pi so use 9A9*IP stage Cou can :

    Either use a %uery generated by Data(tage

    9r use a user-defined %uery9r a combination of the both precedent possibilities

     The access parameters have to be defined by 2ob parameters The stage can access only one table or more Different actions can be programmed : read$ insert$ update

     Cou can also use (toc7ed 1rocedures

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    101/122

    101

    g

    The 9A9*I (tages :The access parameters have to be defined by 2ob parameters

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    102/122

    102

    g

    The 9A9*I (tages : 9utput lin7

    %uery generated byData(tage or user-defined %uery

    T#e $esigner    Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    103/122

    103

    (election of the tables"

    (election ofthecolumns

    0;roup by3clause

    (ort parameters%uery generated

    by Data(tage

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    104/122

    104

    ;enerate (ELE*T clause from column list6 enter other clauses

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    105/122

    105

    Enter custom (L statement : +hen you +ant to add something specific

    To format a date for

    example

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    106/122

    106

    The 9A9*I (tages : 9utput lin7

    *hoose the table

    *hoose the action

    Important parameters

    T#e $esigner    Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    107/122

    107

    The 9A9*I (tages : 9utput lin7

    >umber of lines

    bet+een < commit

    T#e $esigner    Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    108/122

    108

    The 9A9*I (tages : verify error code G=H"

    If the 2ob must abort+hen there is a(L error

    T#e $esigner    Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    109/122

    109

    The 9A9*I (tages : verify error code

  • 8/18/2019 CapGemini Datastage Exercise

    110/122

    110

    The 9A9*I (tages : verify error code H=H"

    Treat lines G by G

    To receive (L error code

    To select the errors

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    111/122

    111

    The 9A Bul7 (tages :

    - to insert in a table li7e (LL9AD"

    - 8ery fast deactivate the index before the load and reactivate itafter the load"

    - But no +arning if the index is in nusable state after the load

    +hen duplicate 7eys for example"- >ot a lot of Date and Time format DD'!!'CCCC$ CCCC-!!-DD$ DD-

    !9>-CCCC$ !!=DD=CCCC - hh

  • 8/18/2019 CapGemini Datastage Exercise

    112/122

    112

    The 9A Bul7 (tages

    D(>

    Date and Time format

    pass+ord

    Table name +ith

    oracle'table>ame"

    >umber of lines bet+een

    < *ommit

    user

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    113/122

    113

    5o+ to create a table definition from a table in the database U

    9n the repository$

    right clic7 on Table Definitions

    and then choose 0Import3

    and then 1lug-in !eta Data

    Definitions

    T#e $esigner 

      Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    114/122

    114

    Then choose the table s" and clic7 on 0Import3

    The table definitions +ill be created in the category 09DB*3

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    115/122

    115

    Exercise n°18 : Read a ata'ase

    Objective : reate a o' w"ic" reads t"e ta'le

    RE,($E in Q?B+ data'ase

    +te* 1 : create t"e ta'le definition fro) t"e data'ase

    +te* 2 : create t"e o' t"at reads t"e ta'le

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    116/122

    116

    Exercise n°19 : @rite in a ata'ase

    Objective : reate a o' w"ic" writes in t"e ta'le$+$(?J(GF in Q?B+ data'ase onl. t"e 2 first

    colu)ns : e.s#

    ocation&in $+$(?J(GF :?d(li MMMMMMMM == ;R1

    ?d(as MMMMMMMM == ;R2

    ?n ;R1 *ut a letter different for eac" %rou*# 'efore t"e client nu)'er ?d(li#&

    Step # : se BRB? sta%e

    Step $ : +a)e exercise wit" BRQC sta%e

    T#e $esigner   Designer  Designer 

  • 8/18/2019 CapGemini Datastage Exercise

    117/122

    117

    Exercise n°20 : *date a ata'ase

    Objective : reate a o' to u*date t"e colu)nsQEG?J($E and EJ($E in t"e ta'le

    $+$(?J(GF in Q?B+ data'ase fro) location&in file

    QEG?J($E and EJ($E are defined as ti)esta)* O

     Administrator  Administrator T#e a$ministrator 

  • 8/18/2019 CapGemini Datastage Exercise

    118/122

    118

     The *"ministrator  : 

     *reate a Data(tage pro2ect

     nloc7 a 2ob

    (ometimes$ due to server problems$ the designer or manager" falls do+n and

    some elements may be loc7ed 2obs$ table definitions$ routines$ &"In that case$ in the Administrator +ith administrator security rights" :

     Administrator  Administrator T#e a$ministrator 

  • 8/18/2019 CapGemini Datastage Exercise

    119/122

    119

    nloc7 a 2ob G=H"

    choose your pro2ect

    And clic7 on

    *ommand button

    To create a pro2ect

     Administrator  Administrator T#e a$ministrator 

  • 8/18/2019 CapGemini Datastage Exercise

    120/122

    120

    nloc7 a 2ob

  • 8/18/2019 CapGemini Datastage Exercise

    121/122

    121

    unloc7 your 2ob +ith device number

    nloc7 a 2ob H=H"

    or +ith user number>L9* (E ser>umber EADL9*"9r everything>L9* ALL"

     Administrator  Administrator T#e a$ministrator 

  • 8/18/2019 CapGemini Datastage Exercise

    122/122

    1ro2ect name

    *reate a pro2ectLocation for the 1ro2ect 2obs$

    routines$ 8 hash files$ table

    definitions$ &" on the server' !ust be

    different from the location for the

    directories of data @