Spoon 3 0 0 User Guide

  • Upload
    camalau

  • View
    232

  • Download
    1

Embed Size (px)

Citation preview

  • 8/13/2019 Spoon 3 0 0 User Guide

    1/265

    Last Modified on October 26th, 2007

    Pentaho Data Integration

    Spoon 3.0 User Guide

    Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their

    respective oners. !or the latest information" please visit our eb site at.pentaho.org

    http://www.pentaho.org/http://www.pentaho.org/http://www.pentaho.org/http://www.pentaho.org/
  • 8/13/2019 Spoon 3 0 0 User Guide

    2/265

    1. Contents

    #. Contents................................................................................................................................................. 2

    2. About $his %ocument.............................................................................................................................. &

    2.#. 'hat it is...................................................................................................................................... &

    2.2. 'hat it is not................................................................................................................................ &

    (. )ntroduction to *poon.............................................................................................................................. #0

    (.#. 'hat is *poon+............................................................................................................................. #0

    (.2. )nstallation................................................................................................................................... #0

    (.(. ,aunching *poon........................................................................................................................... ##

    (.-. *upported platforms...................................................................................................................... ##

    (.. /non )ssues............................................................................................................................... ##

    (.. *creen shots................................................................................................................................. #2

    (.7. Command line options................................................................................................................... #(

    (.1. Repository.................................................................................................................................... #

    (.1.#. Repository Auto,ogin......................................................................................................... #

    (.&. ,icense......................................................................................................................................... #

    (.#0. %efinitions.................................................................................................................................. #7

    (.#0.#. $ransformation %efinitions................................................................................................. #7

    (.##. $oolbar....................................................................................................................................... #1

    (.#2. 3ptions...................................................................................................................................... #&

    (.#2.#. 4eneral $ab...................................................................................................................... #&

    (.#2.2. ,ook 5 !eel tab................................................................................................................. 2#

    (.#-. *earch 6eta data........................................................................................................................ 2-

    (.#. *et environment variable............................................................................................................. 2-

    (.#. 8ecution log history................................................................................................................... 2

    (.#7. Replay........................................................................................................................................ 2

    (.#1. 4enerate mapping against target step........................................................................................... 2

    (.#1.#. 4enerate mappings e8ample.............................................................................................. 2

    (.#&. *afe mode.................................................................................................................................. 27

    (.20. 'elcome *creen.......................................................................................................................... 27

    -. Creating a $ransformation or 9ob.............................................................................................................. (#

    -.#. :otes........................................................................................................................................... (#

    -.2. *creen shot.................................................................................................................................. (2

    -.(. Creating a ne database connection............................................................................................... (2-.(.#. 4eneral.............................................................................................................................. ((

    -.(.2. Pooling............................................................................................................................... ((

    -.(.(. 6y*;,............................................................................................................................... (-

    -.(.-. 3racle................................................................................................................................ (-

    -.(.. )nformi8............................................................................................................................. (-

    -.(.. *;, *erver......................................................................................................................... (

    -.(.7. *AP R

  • 8/13/2019 Spoon 3 0 0 User Guide

    3/265

    -.(.1. 4eneric.............................................................................................................................. (

    -.(.&. 3ptions.............................................................................................................................. (7

    -.(.#0. *;,.................................................................................................................................. (7

    -.(.##. Cluster............................................................................................................................. (7

    -.(.#2. Advanced......................................................................................................................... (1

    -.(.#(. $est a connection.............................................................................................................. (1-.(.#-. 8plore............................................................................................................................. (1

    -.(.#. !eature ,ist...................................................................................................................... (1

    -.-. diting a connection...................................................................................................................... (1

    -.. %uplicate a connection................................................................................................................... (1

    -.. Copy to clipboard.......................................................................................................................... (1

    -.7. 8ecute *;, commands on a connection......................................................................................... (1

    -.1. Clear %= Cache option................................................................................................................... (&

    -.&. ;uoting........................................................................................................................................ (&

    -.#0. %atabase >sage 4rid................................................................................................................... (&

    -.##. Configuring 9:%) connections....................................................................................................... -2

    -.#2. >nsupported databases................................................................................................................ --

    . *;, ditor.............................................................................................................................................. -

    .#. %escription................................................................................................................................... -

    .2. ,imitations.................................................................................................................................... -

    . %atabase 8plorer................................................................................................................................... -

    7. ?ops...................................................................................................................................................... -7

    7.#. %escription................................................................................................................................... -7

    7.#.#. $ransformation ?ops........................................................................................................... -7

    7.#.2. 9ob ?ops............................................................................................................................ -7

    7.2. Creating A ?op............................................................................................................................. -1

    7.(. ,oops........................................................................................................................................... -1

    7.-. 6i8ing ros@ trap detector............................................................................................................. -1

    7.. $ransformation hop colors............................................................................................................. -&

    1. ariables................................................................................................................................................ 0

    1.#. ariable usage.............................................................................................................................. 0

    1.2. ariable scope.............................................................................................................................. 0

    1.2.#. nvironment variables......................................................................................................... 0

    1.2.2. /ettle variables................................................................................................................... #

    1.2.(. )nternal variables................................................................................................................ #

    &. $ransformation *ettings........................................................................................................................... 2

    &.#. %escription................................................................................................................................... 2

    &.2. $ransformation $ab....................................................................................................................... 2

    &.(. ,ogging........................................................................................................................................ 2

    &.-. %ates........................................................................................................................................... (

    &.. %ependencies............................................................................................................................... (

    &.. 6iscellaneous............................................................................................................................... (

    &.7. Partitioning................................................................................................................................... -

    &.1. *;, =utton................................................................................................................................... -

    #0. $ransformation *teps.............................................................................................................................

    Pentaho Data Integration TM Soon !ser "#ide

    $

  • 8/13/2019 Spoon 3 0 0 User Guide

    4/265

    #0.#. %escription.................................................................................................................................

    #0.2. ,aunching several copies of a step................................................................................................

    #0.(. %istribute or copy+...................................................................................................................... 7

    #0.-. *tep error handling...................................................................................................................... 1

    #0.. Apache irtual !ile *ystem B!* support ..................................................................................... #

    #0..#. 8ample@ Referencing remote Dob files................................................................................ ##0..2. 8ample@ Referencing files inside a Eip................................................................................ 2

    #0.. $ransformation *tep $ypes........................................................................................................... (

    #0..#. $e8t !ile )nput.................................................................................................................. (

    #0..2. $able input....................................................................................................................... 72

    #0..(. 4et *ystem )nfo................................................................................................................ 71

    #0..-. 4enerate Ros................................................................................................................. 1#

    #0... %eserialiFe from file Bformerly Cube )nput......................................................................... 12

    #0... G=ase input...................................................................................................................... 1(

    #0..7. 8cel input........................................................................................................................ 1-

    #0..1. 4et !ile :ames.................................................................................................................. 1&

    #0..&. $e8t !ile 3utput................................................................................................................ &0

    #0..#0. $able output................................................................................................................... &(

    #0..##. )nsert < >pdate............................................................................................................... &

    #0..#2. >pdate........................................................................................................................... &7

    #0..#(. %elete............................................................................................................................ &1

    #0..#-. *erialiFe to file Bformerly Cube !ile 3utput........................................................................ &&

    #0..#. G6, 3utput..................................................................................................................... #00

    #0..#. 8cel 3utput................................................................................................................... #02

    #0..#7. 6icrosoft Access 3utput................................................................................................... #0-

    #0..#1. %atabase lookup.............................................................................................................. #0

    #0..#&. *tream lookup................................................................................................................. #07

    #0..20. Call %= Procedure............................................................................................................ #0&

    #0..2#. ?$$P Client..................................................................................................................... ###

    #0..22. *elect values................................................................................................................... ##2

    #0..2(. !ilter ros....................................................................................................................... ##-

    #0..2-. *ort ros........................................................................................................................ ##

    #0..2. Add seHuence................................................................................................................. ##7

    #0..2. %ummy Bdo nothing....................................................................................................... ##&

    #0..27. Ro :ormaliser............................................................................................................... #20

    #0..21. *plit !ields...................................................................................................................... #22

    #0..(0. >niHue ros.................................................................................................................... #2

    #0..(#. 4roup =y........................................................................................................................ #2

    #0..(2. :ull )f............................................................................................................................. #21

    #0..((. Calculator....................................................................................................................... #2&

    #0..(-. G6, Add......................................................................................................................... #(#

    #0..(. Add constants................................................................................................................. #(-

    #0..(. Ro %enormaliser........................................................................................................... #(

    #0..(7. !lattener......................................................................................................................... #(

    #0..(1. alue 6apper.................................................................................................................. #(1

    Pentaho Data Integration TM Soon !ser "#ide

    %

  • 8/13/2019 Spoon 3 0 0 User Guide

    5/265

    #0..(&. =locking step................................................................................................................... #(&

    #0..-0. 9oin Ros BCartesian product.......................................................................................... #-0

    #0..-#. %atabase 9oin................................................................................................................. #-2

    #0..-2. 6erge ros..................................................................................................................... #--

    #0..-(. *orted 6erge.................................................................................................................. #-

    #0..--. 6erge 9oin...................................................................................................................... #-#0..-. 9ava*cript alues............................................................................................................. #-7

    #0..-. 6odified 9ava *cript alue................................................................................................ #-

    #0..-7. 8ecute *;, script........................................................................................................... #

    #0..-1. %imension lookup

  • 8/13/2019 Spoon 3 0 0 User Guide

    6/265

    #2.2.-. 9ob.................................................................................................................................. 20-

    #2.2.. *hell................................................................................................................................ 20

    #2.2.. 6ail.................................................................................................................................. 201

    #2.2.7. *;,.................................................................................................................................. 2#0

    #2.2.1. 4et a file ith !$P............................................................................................................. 2##

    #2.2.&. $able 8ists...................................................................................................................... 2#(#2.2.#0. !ile 8ists....................................................................................................................... 2#-

    #2.2.##. 4et a file ith *!$P......................................................................................................... 2#

    #2.2.#2. ?$$P.............................................................................................................................. 2#

    #2.2.#(. Create a file.................................................................................................................... 2#1

    #2.2.#-. %elete a file.................................................................................................................... 2#&

    #2.2.#. 'ait for a file.................................................................................................................. 220

    #2.2.#. !ile compare................................................................................................................... 22#

    #2.2.#7. Put a file ith *!$P......................................................................................................... 222

    #2.2.#1. Ping a host..................................................................................................................... 22(

    #2.2.#&. 'ait for.......................................................................................................................... 22-

    #2.2.20. %isplay 6sgbo8 info......................................................................................................... 22

    #2.2.2#. Abort Dob........................................................................................................................ 22

    #2.2.22. G*, transformation.......................................................................................................... 227

    #2.2.2(. Eip files.......................................................................................................................... 221

    #2.2.2-. =ulkload into 6y*;,........................................................................................................ 22&

    #2.2.2. 4et 6ails from P3P.......................................................................................................... 2(#

    #2.2.2. %elete !iles..................................................................................................................... 2(2

    #2.2.27. *uccess.......................................................................................................................... 2((

    #2.2.21. G*% alidator.................................................................................................................. 2(-

    #2.2.2&. 'rite to log..................................................................................................................... 2(

    #2.2.(0. Copy !iles....................................................................................................................... 2(

    #2.2.(#. %$% alidator................................................................................................................. 2(7

    #2.2.(2. Put a file ith !$P........................................................................................................... 2(1

    #2.2.((. >nFip.............................................................................................................................. 2(&

    #2.2.(-. %ummy 9ob ntry............................................................................................................ 2-0

    #(. 4raphical ie...................................................................................................................................... 2-#

    #(.#. %escription................................................................................................................................. 2-#

    #(.2. Adding steps or Dob entries........................................................................................................... 2-#

    #(.2.#. Create steps by drag and drop........................................................................................... 2-#

    #(.(. ?iding a step.............................................................................................................................. 2-2

    #(.-. $ransformation *tep options Brightclick menu.............................................................................. 2-2

    #(.-.#. dit step........................................................................................................................... 2-2

    #(.-.2. dit step description.......................................................................................................... 2-2

    #(.-.(. %ata movement................................................................................................................ 2-2

    #(.-.-. Change number of copies to start....................................................................................... 2-2

    #(.-.. Copy to clipboard.............................................................................................................. 2-2

    #(.-.. %uplicate *tep................................................................................................................... 2-2

    #(.-.7. %elete step....................................................................................................................... 2-2

    #(.-.1. ?ide *tep......................................................................................................................... 2-2

    Pentaho Data Integration TM Soon !ser "#ide

    6

  • 8/13/2019 Spoon 3 0 0 User Guide

    7/265

    #(.-.&. *ho input fields............................................................................................................... 2-2

    #(.-.#0. *ho output fields........................................................................................................... 2-2

    #(.. 9ob entry options Brightclick menu.............................................................................................. 2-2

    #(..#. 3pen $ransformation

  • 8/13/2019 Spoon 3 0 0 User Guide

    8/265

    2. 'bo#t This Doc#(ent

    2.1. )hat it is

    $his document is a technical description of *poon" the graphical transformation and Dob designer of the

    Pentaho %ata )ntegration suite also knon as the /ettle proDect.

    2.2. )hat it is not

    $his document does not attempt to describe in great detail ho to create Dobs and transformations for all

    possible situations. RecogniFing that different developers have different approaches to designing their data

    integration solutions" *poon empoers users ith the freedom and fle8ibility to design solutions in the

    manner they feel most appropriate to the problem at hand I and that is the ay it should beJ

    3ther documentation

    ?ere are links to other documents that you might be interesting to go through hen you are building

    transformations@

    !lash demos" screen shots" and an introduction to building a simple transformation@

    http@

  • 8/13/2019 Spoon 3 0 0 User Guide

    9/265

    $. Introd#ction to Soon

    $.1. )hat is Soon+

    /ettle is an acronym for M/ettle .$.$.,. nvironmentN. $his means it has been designed to help you ith

    your $$, needs@ the 8traction" $ransformation" $ransportation and ,oading of data.

    *poon is a graphical user interface that allos you to design transformations and Dobs that can be run ith

    the /ettle tools Pan and /itchen. Pan is a data transformation engine that is capable of performing a

    multitude of functions such as reading" manipulating and riting data to and from various data sources.

    /itchen is a program that can e8ecute Dobs designed by *poon in G6, or in a database repository. >sually

    Dobs are scheduled in batch mode to be run automatically at regular intervals.

    NOTE:!or a complete description of Pan or /itchen" please refer to the Pan and /itchen user guides.

    $ransformations and 9obs can describe themselves using an G6, file or can be put in a /ettle database

    repository. $his information can then be read by Pan or /itchen to e8ecute the described steps in the

    transformation or run the Dob.

    )n short" Pentaho %ata )ntegration makes data arehouses easier to build" update and maintainJ

    $.2. Instaation

    $he first step is the installation of *un 6icrosystems 9ava Runtime nvironment version #.- or higher. Oou

    can donload a 9R for free at http@

  • 8/13/2019 Spoon 3 0 0 User Guide

    10/265

    $.$. La#nching Soon

    $o launch *poon on the different platforms these are the scripts that are provided@

    Spoon.bat@ launch *poon on the 'indos platform.spoon.sh@ launch *poon on a >ni8like platform@ ,inu8" Apple 3*G" *olaris" ...

    )f you ant to make a shortcut under the 'indos platform an icon is provided@ Mspoon.icoN to set the

    correct icon. *imply point the shortcut to the *poon.bat file.

    $.%. S#orted atfor(s

    $he *poon 4>) is supported on the folloing platforms@

    6icrosoft 'indos@ all platforms since 'indos &" including ista

    ,inu8 4$/@ on i(1 and 81Q- processors" orks best on 4nome

    AppleLs 3*G@ orks both on PoerPC and )ntel machines

    *olaris@ using a 6otif interface B4$/ optional

    A)G@ using a 6otif interface

    ?P>G@ using a 6otif interface B4$/ optional

    !ree=*%@ preliminary support on i(1" not yet on 81Q-

    $.&. no/n Iss#es

    Linux

    3ccasional 96 crashes running *u* ,inu8 and /%. Running under 4nome has no problems. Bdetected on

    *>* ,inu8 #0.# but earlier versions suffer the same problem

    FreeBSD

    Problems ith drag and drop. 'orkaround is to use the right click popup menu on the canvas. B)nsert ne

    step

    Please check the $racker lists at http@

  • 8/13/2019 Spoon 3 0 0 User Guide

    11/265

    $.6. Screen shots

    $he 6ain tree in the upperleft panel of *poon allos you to brose connections along ith the Dobs and

    transformations you currently have open. 'hen designing a transformation" the Core 3bDects palate in the

    loer leftpanel contains the available steps used to build your transformation including input" output"

    lookup" transform" Doins" scripting steps and more. 'hen designing a Dob" the Core obDects palate contains

    the available Dob entries. 'hen designing a Dob" the Core 3bDects bar contains a variety of Dob entry types.

    Pentaho Data Integration TM Soon !ser "#ide

    11

    %esigning a $ransformation

    %esigning a Dob

  • 8/13/2019 Spoon 3 0 0 User Guide

    12/265

    $hese items are described in detail in the chapters belo@ -. %atabase Connections"7. ?ops"#0.

    $ransformation *teps"#2. 9ob ntries"#(. 4raphical ie.

    $.7. Co((and ine otions

    $hese are the command line options that you can use hen starting the *poon application@

    -file=filename

    $his option runs the specified transformation B.ktr @ /ettle $ransformation.

    -logfile=Logging Filename

    $his option allos you to specify the location of the log file. $he default is the standard output.

    -level=Logging Level

    $he level option sets the log level for the transformation being run.

    $hese are the possible values@

    Nothing: %o not sho any output

    Error: 3nly sho errors

    Minimal: >se minimal logging

    Basic: $his is the default basic logging level

    Detailed: 4ive detailed logging output

    Debug: *ho very detailed output for debugging purposes.

    o!le"el: %etailed logging at a ro level. 'arning this ill generate a lot of data.

    -rep=Repository name

    Connect to the repository ith name MRepository nameN.

    Note: Oou also need to specify the options Iuser" Ipass and Itrans described belo. $he repository

    details are loaded from the file repositories.8ml in the local directory or in the /ettle directory@

    ?36

  • 8/13/2019 Spoon 3 0 0 User Guide

    13/265

    >se this option to select the transformation to run from the repository.

    -job=Job Name

    >se this option to select the Dob to run from the repository.

    #mportant Notes:

    3n 'indos" e advise you to use the /option:valueformat to avoid command line parsing

    problems by the 6*%3* shell.

    !ields in italic represent the values that the options use.

    )tVs important that if spaces are present in the option values" you use Huotes or double Huotes to

    keep them together. $ake a look at the e8amples belo for more info.

    Pentaho Data Integration TM Soon !ser "#ide

    1$

  • 8/13/2019 Spoon 3 0 0 User Guide

    14/265

    $.*. eositor

    *poon provides you ith the ability to store transformation and Dob files to the local file system or in the

    /ettle repository. $he /ettle repository can be housed in any common relational database. $his means that

    in order to load a transformation from a database repository" you need to connect to this repository.$o do this" you need to define a database connection to this repository. Oou can do this using the

    repositories dialog you are presented ith hen you start up *poon@

    $he information concerning repositories is stored in a file called Mrepositories.8mlN. $his file resides in the

    hidden directory M.kettleN in your default home directory. 3n indos this is C@S%ocuments and

    *ettingsSTusernameUS.kettle

    Note:$he complete path and filename of this file is displayed on the *poon console.

    )f you donLt ant this dialog to be shon each time *poon starts up" you can disable it by unchecking theLPresent this dialog at startupL checkbo8 or by using the 3ptions dialog under the dit < 3ptions menu. *ee

    also2.#-. 3ptions.

    Note:$he default passord for theadminuser is also admin. Oou should change this default passord

    right after the creation using the Repository 8plorer or the MRepositoryserN menu.

    Pentaho Data Integration TM Soon !ser "#ide

    1%

    $he Repository login screen

  • 8/13/2019 Spoon 3 0 0 User Guide

    15/265

  • 8/13/2019 Spoon 3 0 0 User Guide

    16/265

    $.10. Definitions

    $.10.1. Transfor(ation Definitions

    $alue:alues are part of a ro and can contain any type of data@ *trings" floating point :umbers"unlimited precision =ig:umbers" )ntegers" %ates or =oolean values.

    o!: a ro e8ists of 0 or more values

    Output stream:an output stream is a stack of ros that leaves a step.

    #nput stream:an input stream is a stack of ros that enters a step.

    %op:a hop is a graphical representation of one or more data streams beteen 2 steps. A hop

    alays represents the output stream for one step and the input stream for another. $he

    number of streams is eHual to the copies of the destination step. B# or more

    Note:a note is a descriptive piece of information that can be added to a transformation

    9ob %efinitions

    &ob Entr':A Dob entry is one part of a Dob and performs a certain task

    %op:a hop is a graphical representation of one or more data streams beteen 2 steps. A hop

    alays represents the link beteen to Dob entries and can be set Bdepending on the type of

    originating Dob entry to e8ecute the ne8t Dob entry unconditionally" after successful e8ecution or

    failed e8ecution.

    Note:a note is a descriptive piece of information that can be added to a Dob

    Pentaho Data Integration TM Soon !ser "#ide

    16

  • 8/13/2019 Spoon 3 0 0 User Guide

    17/265

    $.11. Toobar

    $he icons on the toolbar of the main screen are from left to right@

    )con %escription

    Create a ne Dob or transformation

    3pen transformation

  • 8/13/2019 Spoon 3 0 0 User Guide

    18/265

    $.12. Otions

    /ettle options allo you to customiFe a number of properties related to the behavior and look and feel of

    the graphical user interface. 8amples include startup options like hether or not to display tips and the

    /ettle 'elcome Page" and user interface options like fonts and the colors. $o access the options dialog"

    select ditW3ptions... from the menubar.

    $.12.1. "enera Tab

    !eature %escription6a8imum >ndo ,evel $his parameter sets the ma8imum number of steps that can be

    undone Bor redone by *poon.

    %efault number of lines in previe

    dialog

    $his parameter allos you to change the default number of

    ros that are reHuested from a step during transformation

    previes.

    6a8imum nr of lines in the logging

    indos

    *pecify the ma8imum limit of ros to display in the logging

    indo.

    *ho tips at startup+ $his options sets the display of tips at startup.

    *ho elcome page at startup+ $his option controls hether or not to display the elcome

    page hen launching *poon.

    Pentaho Data Integration TM Soon !ser "#ide

    1*

    3ptions 4eneral tab

  • 8/13/2019 Spoon 3 0 0 User Guide

    19/265

    !eature %escription

    >se database cache+ *poon caches information that is stored on source and target

    databases. )n some cases this can lead to incorrect results

    hen youVre in the process of changing those very databases.

    )n those cases it is possible to disable the cache altogether

    instead of clearing the cache every time.

    NOTE:*poon automatically clears the database cache hen

    you launch %%, B%ata %efinition ,anguage statements

    toards a database connection. ?oever" hen using (rdparty

    tools" clearing the database cache manually may be necessary.

    3pen last file at startup+ nable this option to automatically Btry to load the last

    transformation you used Bopened or saved from G6, or

    repository.

    Auto save changed files+ $his option automatically saves a changed transformation

    before running.

    3nly sho the active file in the main

    tree+

    $his option reduces the number of transformation and Dob

    items in the main tree on the left by only shoing the currently

    active file.

    3nly save used connections to G6,+ $his option limits the G6, e8port of a transformation to the

    used connections in that transformation. $his comes in handy

    hile e8changing sample transformations to avoid having all

    defined connections to be included.

    Ask about replacing e8isting

    connections on open

  • 8/13/2019 Spoon 3 0 0 User Guide

    20/265

    !eature %escription

    %isplay tootlips+ $his option controls hether or not to display tooltips for the

    buttons on the main toolbar.

    $.12.2. Loo3 4 5ee tab

    !eature %escription

    !i8ed idth font $his is the font that is used in the dialog bo8es" trees" input fields" etc.

    !ont on orkspace $his is the font that is used on the graphical vie.

    !ont for notes $his font is used in the notes that are displayed in the 4raphical ie.

    =ackground color *ets the background color in *poon. )t affects all dialogs too.

    'orkspace background

    color

    *ets the background color in the 4raphical ie of *poon.

    $ab color $his is the color that is being used to indicate tabs that are

    active

  • 8/13/2019 Spoon 3 0 0 User Guide

    21/265

    !eature %escription

    above the canvas.

    %ialog middle percentage =y default" a parameter is dran at (X of the idth of the dialog"

    counted from the left. Oou can change this ith this parameter.

    Perhaps this can be useful in cases here you use unusually largefonts.

    Canvas antialiasing+ *ome platforms like 'indos" 3*G and ,inu8 support antialiasing

    through 4%)" Carbon or Cairo. Check this to enable smoother lines and

    icons in your graph vie. )f you enable this and your environment

    doesnLt ork any more afterards" change the value for option

    MnableAntiAliasingN to M:N in file ?36

  • 8/13/2019 Spoon 3 0 0 User Guide

    22/265

    $.1$. Search Meta data

    $his option ill search in any available fields" connectors or notes of all loaded Dobs and transformations for

    the string specified in the !ilter field. $he 6eta data search returns a detailed result set shoing the

    location of any search hits. $his feature is accessed by choosing ditW*earch 6eta data from the menubar.

    $.1%. Set eniron(ent ariabe

    $he *et nvironment ariable feature allos you to e8plicitly create and set environment variables for the

    current user session. $his is a useful feature hen designing transformations for testing variable

    substitutions that are normally set dynamically by another Dob or transformation.

    $his feature is accessible by choosing ditW*et nvironment ariable from the menubar.

    Note: $his screen is also presented hen you run a transformation that use undefined variables. $his

    allos you to define them right before e8ecution time.

    Sho/ eniron(ent ariabes$his feature ill display the current list of environment variables and their values. )t is accessed by

    selecting the ditW*ho environment variables option from the menubar.

    Pentaho Data Integration TM Soon !ser "#ide

    22

    *earch 6eta data %ialog

    *et nvironment ariable %ialog

  • 8/13/2019 Spoon 3 0 0 User Guide

    23/265

    $.1&. 8ec#tion og histor

    )f you have configured your 9ob or $ransformation to store log information in a database table" you can

    vie the log information from previous e8ecutions by rightclicking on the Dob or transformation in the 6ain

    $ree and selecting L3pen ?istory ieL. $his vie ill sho

    NOTE:$he log history for a Dob or transformation ill also open by default each ne8t time you e8ecute the

    file.

    $.16. ea

    $he Replay feature allos you to rerun a transformation that failed. Replay functionality is implemented for

    $e8t !ile )nput and 8cel input. )t allos you to send files that had errors back to the source and have the

    data corrected. 3:,O the lines that failed before are then processed during the replay if a .line file is

    present. )t uses the date in the filename of the .line file to match the entered replay date.

    Pentaho Data Integration TM Soon !ser "#ide

    2$

    $ransformation ?istory $ab

  • 8/13/2019 Spoon 3 0 0 User Guide

    24/265

    $.17. "enerate (aing against target ste

    )n cases here you have a fi8ed target table" you il l ant to map the fields from the stream to their

    corresponding fields in the target output table. $his is normally accomplished using a *elect alues step in

    your transformation. $he L4enerate mapping against targetL option provides you ith an easytouse dialog

    for defining these mappings that ill automatically create the resulting *elect alues step that can bedropped into your transformation flo prior to the table output step.

    $o access the L4enerate mapping against targetL option is accessed by rightclicking on the table output

    step.

    After defining your mappings" select 3/ and the *elect alues step containing your mappings ill appear on

    the orkspace. *imply" attach the mapping step into your transformation immediatelyAttach the mapping

    step into your transformation Dust before the table output step.

    $.17.1. "enerate (aings e8a(e

    ?ere is an e8ample of a simple transformation in hich e ant to generate mappings to our target output

    table@

    =egin by rightclicking on the $able output step and selecting L4enerate mappings against targetL. Add all

    necessary mappings using the 4enerate 6apping dialog shon above and click 3/. Oou ill no see a

    $able output mapping step has been added to the canvas@

    Pentaho Data Integration TM Soon !ser "#ide

    2%

    4enerate 6apping %ialog

    *plit hop before generating mappings

  • 8/13/2019 Spoon 3 0 0 User Guide

    25/265

    !inally" drag the generated $able output 6apping step into your transformation flo prior to the table

    output step@

    $.1*. Safe (ode

    )n cases here you are mi8ing the ros from various sources" you need to make sure that these ro all

    have the same layout in all conditions. !or this purpose" e added a Msafe modeN option that is available in

    the *poon logging indo or on the 8ecute a $ransformation

  • 8/13/2019 Spoon 3 0 0 User Guide

    26/265

    Pentaho Data Integration TM Soon !ser "#ide

    26

    $he elcome screen

  • 8/13/2019 Spoon 3 0 0 User Guide

    27/265

  • 8/13/2019 Spoon 3 0 0 User Guide

    28/265

    &. Database Connections

    A database connection describes the method by hich /ettle can connect to a database. Oou can create

    connections specific to a 9ob or $ransformation or store them in the /ettle repository for reuse ithin

    multiple transformations or Dobs.

    &.1. Screen shot

    &.2. Creating a ne/ database connection

    $his section describes ho to create and create a ne database connection including a detailed description

    of each connection property available in the Connection information dialog.

    Oou begin creating a ne connection by rightclicking on the L%atabase ConnectionsL tree entry and

    selecting L:eL or L:e Connection 'iFardL" by doubleclicking on L%atabase ConnectionsL" or simply by

    pressing !(.

    Pentaho Data Integration TM Soon !ser "#ide

    2*

    $he Connection information dialog

  • 8/13/2019 Spoon 3 0 0 User Guide

    29/265

    $his ill launch the LConnection informationL dialog shon above. $he folloing topics describe the

    configuration options available on each tab of the Connection information dialog.&.2.1. "enera

    $he general tab is here you setup the basic information about your connection like the connection name"

    type" access method" server name and login credentials. $he table belo provides a more detailed

    description of the options available on the 4eneral tab@

    !eature %escription

    Connection :ame >niHuely identifies a connection across transformations and Dobs

    Connection $ype $he type of database you are connecting to Bi.e. 6y*;," 3racle" etc.

    6ethod of access $his ill be either :ative B9%=C" 3%=C" or 3C). Available access types are

    dependent on the type of database you are connecting to

    *erver host name %efines the host name of the server on hich the database resides. Oou can also

    specify the host by )Paddress

    %atabase name )dentifies the database name you ant to connect to. )n case of 3%=C" specify

    the %*: name here

    Port number *ets the $CPsername 3ptionally specifies the username to connect to the database

    Passord 3ptionally specifies the passord to connect to the database

    &.2.2. Pooing

    $he pooling tab allos you to configure your connection to use connection pooling and define options

    related to connection pooling like the initial pool siFe" ma8imum pool siFe and connection pool parameters.

    $he table belo provides a more detailed description of the options available on the Pooling tab@

    Pentaho Data Integration TM Soon !ser "#ide

    2-

    Creating a ne database connection

  • 8/13/2019 Spoon 3 0 0 User Guide

    30/265

    !eature %escription

    >se a connection pool Check this option to enable connection pooling.

    $he initial pool siFe *ets the initial siFe of the connection pool.

    $he ma8imum pool siFe. *ets the ma8imum number of connections in the connection pool.

    Parameter $able Allos you to define additional custom pool parameters.

    &.2.$. MS;L

    =ecause by default" 6y*;, gives back complete Huery results in one block to the client B/ettle in this case

    e had to enable Mresult streamingN by default. $he big draback of this is that it allos only # Bone single

    Huery to be opened at any given time. )f you run into trouble because of that" you can disable this option in

    the 6y*;, tab of the database connection dialog.

    Another issue you might come across is that the default timeout in the 6y*;, 9%=C driver is set to 0. Bno

    timeout $his leads to a problem in certain situations as it doesnLt allo /ettle to detect a server crash or

    sudden netork failure if it happens in the middle of a Huery or open database connection. $his in turnleads to the infinite stalling of a transformation or Dob. $o solve this" set the Mconnect$imeoutN and

    Msocket$imeoutN parameters for 6y*;, in the 3ptions tab. $he value to be specified is in milliseconds@ for

    a 2 minute timeout you ould specify value #20000 B 2 8 0 8 #000 .

    Oou can also revie other options on the linked 6y*;, help page by clicking on the L*ho help te8t on

    option usageL button found on the 3ptions tab.

    &.2.%. Orace

    $his tab allos you to specify the default data and inde8 tablespaces hich /ettle ill use hen generating

    *;, for 3racle tables and inde8es.

    $his version of Pentaho %ata )ntegration ships ith the 3racle 9%=C driver version #0.2.0. )t is in general

    the most stable and recent driver e could find. ?oever" if you do have issues ith 3racle connectivity or

    other strange problems" you might ant to consider replacing the #0.2. 9%=C driver to match your database

    server. Replace files MoDdbc#-.DarN and Morai#1n.DarN in the directory libe8t

  • 8/13/2019 Spoon 3 0 0 User Guide

    31/265

    &.2.6. S;L Serer

    $his tab allos you configure the folloing properties specific to 6icrosoft *;, *erver@

    !eature %escription

    *;, *erver instance name *ets the instance name property for the *;, *erver connection.

    >se .. to separate schema and table nable hen using dot notation to separate schema and table.

    3ther properties can be configured by adding connection parameters on the options tab of the Connection

    information dialog. !or e8ample" you can enable single signon login by defining the domainoption on the

    3ptions tab as shon belo@

    !rom the D$%* !A; onhttp@

  • 8/13/2019 Spoon 3 0 0 User Guide

    32/265

    &.2.7. S'P

  • 8/13/2019 Spoon 3 0 0 User Guide

    33/265

    &.2.-. Otions

    $his tab allos you to set database specific options for the connection by adding parameters to the

    generated >R,. $o add a parameter" select the ne8t available ro in the parameter table" choose your

    database type" then enter a valid parameter name and its corresponding value. !or more database specific

    configuration help" click the Z*ho help te8t on option usageV button and a ne broser tab ill appear in*poon ith additional information about the configuring the 9%=C connection for the currently selected

    database type@

    &.2.10. S;L

    $his tab allos you to enter a number of *;, commands immediately after connecting to the database.

    $his is sometimes needed for various reasons like licensing" configuration" logging" tracing" etc.

    &.2.11. C#ster

    $his tab allos you to enable clustering for the database connection and create connections to the data

    partitions. $o enable clustering for the connection" check the L>se Clustering+L option.

    $o create a ne data partition" enter a partition )% and the hostname" port" database" username and

    passord for connecting to the partition.

    Pentaho Data Integration TM Soon !ser "#ide

    $$

    %isplay options help in a *poon broser

  • 8/13/2019 Spoon 3 0 0 User Guide

    34/265

    &.2.12. 'danced

    $his tab allos you configure the folloing properties for the connection@

    !eature %escription

    ;uote all identifiers in database *pecifies the language to be used hen connecting to *AP.

    !orce all identifiers to loer case *pecifies the system number of the *AP system to hich you ant to

    connect.

    !orce all identifiers to upper case *pecifies the three digit client number for the connection.

    &.2.1$. Test a connection

    $he L$estL button in the Connection information dialog allos you to test the current connection. An 3/

    message ill be displayed if *poon is able to establish a connection ith the target database.

    &.2.1%. 8ore

    $he %atabase 8plorer allos you to interactively brose the target database" previe data" generate %%,

    and much more. $o open the %atabase 8plorer for an e8isting connection" click the L8ploreL button found

    on the Connection information dialog or rightclick on the connection in the 6ain tree and select L8ploreL.

    Please see%atabase 8plorerfor more information.

    &.2.1&. 5eat#re List

    !eature list@ e8poses the 9%=C >R," class and various database settings for the connection such as the list

    of reserved ords.

    &.$. diting a connection

    $o edit an e8isting connection" doubleclick on the connection name in the main tree or rightclick on the

    connection name and select Ydit connectionY.

    &.%. D#icate a connection

    $o duplicate an e8isting connection" rightclick on the connection name and select Y%uplicateY.

    &.&. Co to ciboard

    Accessed by rightclicking on a connection name in the main tree" this option copies the G6, describing the

    connection to the clipboard.

    %elete a connection$o delete an e8isting database connection" rightclick on the connection name in the main tree and select

    Y%eleteY.

    &.6. 8ec#te S;L co((ands on a connection

    $o e8ecute *;, command against an e8isting connection" rightclick on the connection name and select

    Y*;, ditorY. *ee also*;, ditorfor more information.

    Pentaho Data Integration TM Soon !ser "#ide

    $%

  • 8/13/2019 Spoon 3 0 0 User Guide

    35/265

    &.7. Cear D= Cache otion

    $o speed up connections *poon uses a database cache. 'hen the information in the cache no longer

    represents the layout of the database" rightclick on the connection in the 6ain tree and select the LClear %=

    Cache...L option. $his is commonly used hen databases tables have been changed" created or deleted.

    &.*. ;#oting'e had more and more people complain about the handling of reserved ords" field names ith spaces in

    it" field names ith decimals B. in it" table names ith dashes and other special characters in it ... e

    implemented a database specific Huoting system that allos you to pretty much use any name or character

    that the database is comfortable ith.

    Pentaho %ata )ntegration contains a list of reserved ords for many Bbut not all of the supported

    databases. $o correctly implement Huoting" e had to go for a strict separation beteen the schema

    Busersername 5

    Passord

    3racle :ative ReHuired 3racle database *)% ReHuired

    B#2#

    ReHuired

    3%=C 3%=C %*: name ReHuired

    3C) %atabase $:* name ReHuired

    6y*;, :ative ReHuired 6y*;, database name 3ptional

    B((0

    3ptional

    3%=C 3%=C %*: name 3ptional

    A*

  • 8/13/2019 Spoon 3 0 0 User Guide

    36/265

    %atabase Access 6ethod *erver :ame or )P

    Address

    %atabase :ame Port [

    Bdefault

    >sername 5

    Passord

    3%=C 3%=C %*: name ReHuired

    Postgre*;, :ative ReHuired %atabase name ReHuired

    B-(2

    ReHuired

    3%=C 3%=C %*: name ReHuired

    )ntersystems

    Cach\

    :ative ReHuired %atabase name ReHuired

    B#&72

    ReHuired

    3%=C 3%=C %*: name ReHuired

    *ybase :ative ReHuired %atabase name ReHuiredB0

    0#

    ReHuired

    3%=C 3%=C %*: name ReHuired

    4upta *;, =ase :ative ReHuired %atabase :ame ReHuired

    B2#

    ReHuired

    3%=C 3%=C %*: name ReHuired%base )))") or

    .0

    3%=C 3%=C %*: name 3ptional

    !irebird *;, :ative ReHuired %atabase name ReHuired

    B(00

    ReHuired

    3%=C 3%=C %*: name ReHuired

    ?ypersonic :ative ReHuired %atabase name ReHuired

    B&00#

    ReHuired

    6a8%= B*AP %= :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    )ngres :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    =orland )nterbase :ative ReHuired %atabase name ReHuired

    B(00

    ReHuired

    3%=C 3%=C %*: name ReHuired

    8ten%= :ative ReHuired %atabase name ReHuired

    B-(

    ReHuired

    3%=C 3%=C %*: name ReHuired

    $eradata :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    3racle R%= :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    ?2 :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    :eteFFa :ative ReHuired %atabase name ReHuired

    B-10

    ReHuired

    Pentaho Data Integration TM Soon !ser "#ide

    $6

  • 8/13/2019 Spoon 3 0 0 User Guide

    37/265

    %atabase Access 6ethod *erver :ame or )P

    Address

    %atabase :ame Port [

    Bdefault

    >sername 5

    Passord

    3%=C 3%=C %*: name ReHuired

    )=6 >niverse :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    *;,ite :ative ReHuired %atabase name ReHuired

    3%=C 3%=C %*: name ReHuired

    Apache %erby :ative optional %atabase name 3ptional

    B#27

    3ptional

    3%=C 3%=C %*: name 3ptional

    4eneric B] :ative ReHuired %atabase name ReHuired

    BAny

    ReHuired

    3%=C 3%=C %*: name 3ptional

    B] $he generic database connection also needs to specify the >R, and %river class in the 4eneric tabJ 'eno also allo these fields to be specified using a variable. $hat ay you can access data from multiple

    database types using the same transformations and Dobs. 6ake sure to use clean A:*) *;, that orks on

    all used database types in that case.

    Pentaho Data Integration TM Soon !ser "#ide

    $7

  • 8/13/2019 Spoon 3 0 0 User Guide

    38/265

  • 8/13/2019 Spoon 3 0 0 User Guide

    39/265

    Note: )t is important that the information stored in this file in the simpleDndi directory mirrors the content

    of your application server data sources.

    &.11. !ns#orted databases

    )f you ant to access a database type that is not yet supported" let us kno and e ill try to find a

    solution. A fe database types are not supported in this release because of the lack of sample database

    and

  • 8/13/2019 Spoon 3 0 0 User Guide

    40/265

    6. S;L ditor

    6.1. Descrition

    $he *imple *;, ditor is an easytouse tool hen you need to e8ecute standard *;, commands for tasks

    like creating tables" dropping inde8es and modifying fields. )n several places throughout *poon" the *;,

    ditor is used to previe and e8ecute %%, B%ata %efinition ,anguage generated by *poon such as

    Mcreate

  • 8/13/2019 Spoon 3 0 0 User Guide

    41/265

    7. Database 8orer

    7.1. Descrition

    $he %atabase 8plorer provides the ability to e8plore configured database connections. )t currently

    supports tables" vies and synonyms along ith the catalog and

  • 8/13/2019 Spoon 3 0 0 User Guide

    42/265

    *. >os

    *.1. Descrition

    A hop connects one transformation step or Dob entry ith another. $he direction of the data flo is

    indicated ith an arro on the graphical vie pane. A hop can be enabled or disabled Bfor testing purposes

    for e8ample.

    *.1.1. Transfor(ation >os

    'hen a hop is disabled in a transformation" the steps donstream of the disabled hop are cut off from any

    data floing upstream of the disabled hop. $his may lead to une8pected results hen editing the

    donstream steps. !or e8ample" if a particular steptype offers a M4et !ieldsN button" clicking the button

    may not reveal any of the incoming fields as long as the hop is still disabled.

    *.1.2. 9ob >os

    =esides the e8ecution order" it also specifies the condition on hich the ne8t Dob entry ill be e8ecuted. Oou

    can specify the evaluation mode by right clicking on the Dob hop@

    M>nconditionalN specifies that the ne8t Dob entry ill be e8ecuted regardless of the result of the

    originating Dob entry.

    M!ollo hen result is trueN specifies that the ne8t Dob entry ill only be e8ecuted hen the result

    of the originating Dob entry as true" meaning successful e8ecution" file found" table found" ithout

    error" evaluation as false" ...

    M!ollo hen result is falseN specifies that the ne8t Dob entry ill only be e8ecuted hen the result

    of the originating Dob entry as false" meaning unsuccessful e8ecution" file not found" table not

    found" errorBs occurred" evaluation as false" ...

    *.2. Creating ' >o

    Oou can easily create a ne hop beteen 2 steps by one of the folloing options@

    %ragging on the 4raphical ie beteen 2 steps hile using the middle mouse button.

    %ragging on the 4raphical ie beteen 2 steps hile pressing the *?)!$ key and using the left

    mouse button.

    Pentaho Data Integration TM Soon !ser "#ide

    %2

    diting a 9ob ?opditing a $ransformation ?op

  • 8/13/2019 Spoon 3 0 0 User Guide

    43/265

    *electing to steps in the tree" clicking right and selecting Yne hopY

    *electing to steps in the graphical vie BC$R, _ left mouse click" right clicking on a step and

    selecting Yne hopY

    *plitting A ?op

    Oou can easily insert a ne step into a ne hop beteen to steps by dragging the step Bin the 4raphicalie over a hop until the hop becomes dran in bold. Release the left button and you ill be asked if you

    ant to split the hop. $his orks only ith steps that have not yet been connected to another step.

    *.$. Loos

    ,oops are not alloed in transformations because *poon depends heavily on the previous steps to

    determine the field values that are passed from one step to another. )f e ould allo loops in

    transformations e often ould get endless loops and undetermined results.

    ,oops arealloed in Dobs because *poon e8ecutes Dob entries seHuentially. 9ust make sure

    you donLt build endless loops. $his Dob entry can help you e8it closed loops based on the number of times a

    Dob entry as e8ecuted.

    *.%. Mi8ing ro/s? tra detector

    6i8ing ros ith different layout is not alloed in a transformation. 6i8ing ro layouts ill cause steps to

    fail because fields can not be found here e8pected or the data type changes une8pectedly.

    $he Mtrap detectorN is in place to provide arnings at design time hen a step is receiving mi8ed layouts@

    )n this case" the full error report reads@

    'e detected ros ith varying number of fields" this is not alloed in a transformation. $he first

    ro contained #( fields" another one contained # @ `customerQtkK0" versionK0" dateQfromK"

    dateQtoK" C>*$36R:RK0" :A6K" !)R*$:A6K" ,A:4>A4K" 4:%RK" *$R$K"

    ?3>*:RK" =>*:RK" E)PC3%K" ,3CA$)3:K" C3>:$ROK" %A$Q3!Q=)R$?K

    Note:this is only a arning and ill not prevent you from performing the task you ant to do.

    Pentaho Data Integration TM Soon !ser "#ide

    %$

  • 8/13/2019 Spoon 3 0 0 User Guide

    44/265

    *.&. Transfor(ation ho coors

    $ransformation hops display in a variety of colors based on the properties and state of the hop. $he

    folloing table describes the meaning behind a transformation hopLs color@

    ?op Color 6eaning

    4reen %istribute ros@ if multiple hops are leaving a step" ros of data ill be

    evenly distributed to all target steps.

    Red Copies ros@ if multiple hops are leaving a step" all ros of data ill be

    copied to all target steps.

    Oello Provides info for step" distributes ros

    6agenta Provides info for step" copies ros

    4ray $he hop is disabled.

    =lack $he hop has a named target step.

    =lue Candidate hop using middle button _ drag

    3range B%ot line $he hop is never used because no data ill ever go there.

    Red B=old %ot line $he hop is used for carrying ros that caused errors in source stepBs.

    Pentaho Data Integration TM Soon !ser "#ide

    %%

  • 8/13/2019 Spoon 3 0 0 User Guide

    45/265

    -. @ariabes

    -.1. @ariabe #sage

    ariables can be used throughout Pentaho %ata )ntegration" including ithin transformation steps and Dob

    entries. ariables can be defined by setting them ith the M*et ariableN step in a transformation or by

    setting them in the

  • 8/13/2019 Spoon 3 0 0 User Guide

    46/265

    -.2.2. ette ariabes

    =ecause the scope of an environment variable B&.2.#.nvironment variables is too broad" /ettle variables

    ere introduced to provide a ay to define variables that are local to the Dob in hich the variable is set.

    $he M*et ariableN step in a transformation allos you to specify in hich Dob you ant to set the variableLs

    scope Bi.e. parent Dob" grandparent Dob or the root Dob.

    -.2.$. Interna ariabes

    $he folloing variables are alays defined@

    ariable :ame *ample value

    )nternal./ettle.=uild.%ate 2007

  • 8/13/2019 Spoon 3 0 0 User Guide

    47/265

    10. Transfor(ation Settings

    10.1. Descrition

    $ransformation *ettings are a collection of properties to describe the transformation and configure its

    behavior. Access $ransformation *ettings from the main menu under $ransformationW*ettings. $he

    folloing sections provides a detailed description of the available settings.

    10.2. Transfor(ation Tab

    $he transformation tab allos you to specify general properties about the transformation including@

    *etting %escription

    $ransformation name $he name of the transformation

    ReHuired information if you ant to save to a repository

    %escription *hort description of the transformation" shon in the repository e8plorer8tended description ,ong e8tended description of the transformation

    *tatus %raft or production status

    ersion ersion description

    %irectory $he directory in the repository here the transformation is stored

    Created by %isplays the original creator of the transformation.

    Created at %isplays the date and time hen the transformation as created.

    ,ast modified by %isplays the user name of the last user that modified the transformation.

    Pentaho Data Integration TM Soon !ser "#ide

    %7

    $ransformation *ettings

  • 8/13/2019 Spoon 3 0 0 User Guide

    48/265

    *etting %escription

    ,ast modified at %isplays the date and time hen the transformation as last modified.

    10.$. Logging

    $he ,ogging tab allos you to configure ho and here logging information is captured. *ettings include@

    *etting %escription

    RA% log step >se the number of read lines from this step to rite to the log table. Read

    means@ read from source steps.

    ):P>$ log step >se the number of input lines from this step to rite to the log table. )nput

    means@ input from file or database.

    'R)$ log step >se the number of ritten lines from this step to rite to the log table.

    'ritten means@ ritten to target steps.

    3>$P>$ log step >se the number of output lines from this step to rite to the log table.

    3utput means@ output to file or database.

    >P%A$ log step >se the number of updated lines from this step to rite to the log table.>pdate means@ updated in a database.

    R9C$% log step >se the number of reDected lines from this step to rite to the log table.

    ReDected means@ error record.

    ,og connection $he connection used to rite to a log table.

    ,og table specifies the name of the log table Bfor e8ample ,Q$,

    >se =atch)%+ nable this if you ant to have a batch )% in the ,Q$, file. %isable for

    backard compatibility ith *poonse logfield to store

    logging in

    $his option stores the logging te8t in a C,3= field in the logging table. $his

    allos you to have the logging te8t together ith the run results in the same

    table. %isable for backard compatibility ith *poonse this for e8ample" if you

    find that the field %A$Q,A*$Q>P% has a ma8imum value of 200-02&

    2(@00@00" but you kno that the values for the last minute are not complete.

    )n this case" simply set the offset to 0.6a8imum date

    difference

    *ets the ma8imum date difference in the obtained date range. $his ill allo

    you to limit Dob siFes.

    10.&. Deendencies

    $he %ependencies tab allos you to enter all of the dependencies for the transformation. !or e8ample" if a

    dimension is depending on ( lookup tables" e have to make sure that these lookup tables have not

    changed. )f the values in these lookup tables have changed" e need to e8tend the date range to force a

    Pentaho Data Integration TM Soon !ser "#ide

    %*

  • 8/13/2019 Spoon 3 0 0 User Guide

    49/265

    full refresh of the dimension. $he dependencies allo you to look up hether a table has changed in case

    you have a Mdata last changedN column in the table.

    $he L4et dependencies buttonL ill try to automatically detect dependencies.

    10.6. Misceaneo#s$he 6iscellaneous tab allos you to configure the folloing settings@

    *etting %escription

    :umber of ros in

    rosets

    $his option allos you to change the siFe of the buffers beteen the connected

    steps in a transformation. Oou ill rarely

  • 8/13/2019 Spoon 3 0 0 User Guide

    50/265

    11. Transfor(ation Stes

    11.1. Descrition

    A step is one part of a transformation. *teps can provide you ith a ide range of functionality ranging

    from reading te8tfiles to implementing sloly changing dimensions. $his chapter describes various step

    settings folloed by a detailed description of available step types.

    11.2. La#nching seera coies of a ste

    *ometimes it can be useful to launch the same step several times. !or e8ample" for performance reasons it

    can be useful to launch a database lookup step ( times or more. $hat is because database connections

    usually have a certain latency. ,aunching the same step several times keeps the database busy on different

    connections" effectively loering the latency. Oou can launch several copies of step in a transformation

    simply by rightclicking on a step in the graphical vie and then by selecting Mchange number of copies to

    startN@

    Oou ill get this dialog@

    Pentaho Data Integration TM Soon !ser "#ide

    &0

    $he Y*tep copiesY popup menu

    $he step copies dialog

  • 8/13/2019 Spoon 3 0 0 User Guide

    51/265

    )f you enter ( this ill be shon@

    )t is the technical eHuivalent of this@

    Pentaho Data Integration TM Soon !ser "#ide

    &1

    6ultiple step copies eHuivalent

    6ultiple step copies e8ample

  • 8/13/2019 Spoon 3 0 0 User Guide

    52/265

    11.$. Distrib#te or co+

    )n the e8ample above" green lines are shon beteen the steps. $his indicates that ros are distributed

    among the target steps. )n this case" it means that the first ro coming from step MAN goes to step

    Mdatabase lookup #N" the second to Mdatabase lookup 2N" the third to M%atabase lookup (N" the fourth back

    to Mdatabase lookup #N" etc.

    ?oever" if e right click on step MAN" and select MCopy dataN" you ill get the hops dran in red@

    MCopy dataN means that all ros from step MAN are copied to all ( the target steps.

    )n this case it means that step M=N gets ( copies of all the ros that MAN has sent out.

    NOTE:=ecause of the fact that all these steps are run as different threads" the order in hich the

    single ros arrive at step M=N is probably not going to be the same as they left step MAN.

    Pentaho Data Integration TM Soon !ser "#ide

    &2

  • 8/13/2019 Spoon 3 0 0 User Guide

    53/265

    11.%. Ste error handing

    *tep error handling allos you to configure a step such that instead of halting a transformation hen an

    error occurs" pass those ros that caused an error to a different step. $o configure error handling" right

    click on the step and select M%efine rror handling...N.

    )n the e8ample belo" e artificially generate an error in the *cript alues step hen an )% is higher than

    .

    $o configure the error handling" you can right click on the step involved and select the Mrror handing...N

    menu item@

    Pentaho Data Integration TM Soon !ser "#ide

    &$

    *tep error handling settings

  • 8/13/2019 Spoon 3 0 0 User Guide

    54/265

    NOTE:this menu item only appears hen clicking on steps that support the ne error handling code.

    As you can see" you can add e8tra fields being to the Merror rosN@

    $his ay" e can easily define ne data flos in our transformations. $he typical usecase for this is an

    alternative ay of doing an >psert B)nsertpdate@

    Pentaho Data Integration TM Soon !ser "#ide

    &%

  • 8/13/2019 Spoon 3 0 0 User Guide

    55/265

    $his transformation performs an insert regardless of the content of the table. )f you put a primary key on

    the )% Bin this case the customer )% the insert into the table cause an error. =ecause of the error handling

    e can pass the ros in error to the update step. Preliminary tests have shon this strategy of doing

    upserts to be ( times faster in certain situations. Bith a lo updates to inserts ratio

    Pentaho Data Integration TM Soon !ser "#ide

    &&

  • 8/13/2019 Spoon 3 0 0 User Guide

    56/265

    11.&. 'ache @irt#a 5ie Sste( A@5SB s#ort

    /ettle provides support for the Apache irtual !ile *ystem B!* as an additional ay to reference source

    files" transformations and Dobs from any location you like. !or more information about !*" visitApache

    Commons irtual !ile *ystem.

    11.&.1. 8a(e? eferencing re(ote ob fies

    ?ere is a simple e8ample of using !* to reference the location of a Dob file e ant to e8ecute using

    /itchen@

    sh

  • 8/13/2019 Spoon 3 0 0 User Guide

    57/265

    $his allos us to reference the transformation as follos@

    Note:Oou ill not be able to save the Dob back to the eb server in this e8ample. $hat is not because e

    do not support it" but because you donVt have the permission to do so.

    !or more information on the almost endless list of possibilities ith !*" please visit@

    http@

  • 8/13/2019 Spoon 3 0 0 User Guide

    58/265

    11.6. Transfor(ation Ste Tes

    11.6.1. Te8t 5ie In#t

    11.6.1.1. "enera descrition

    $he $e8t !ile )nput step is used to read date from a variety of different te8tfile types. $he most

    commonly used formats include Comma *eparated alues BC* files generated by spreadsheets

    and fi8ed idth flat files.

    $he $e8t !ile )nput step provides the ability to specify a list of files to read" or a list of directories

    ith ild cards in the form of regular e8pressions. )n addition" you can accept filenames from a

    previous step making filename handling more even more generic.

    $he folloing sections describe in detail the available options for configuring the $e8t fi le input

    step.

    11.6.1.2. 5ie otions

    $he table belo provides a detailed descriptions of the features available on the !ile tab@

    3ption %escription

    !ile or directory $his field specifies the location and

  • 8/13/2019 Spoon 3 0 0 User Guide

    59/265

    3ption %escription

    *ho file content %isplays the content of the selected file.

    *ho content from

    first data line

    %isplays the content from the first data line only for the selected file.

    11.6.1.2.1. Selecting Files to read data from

    $he file tab Bshon above is here you identify the file or files from hich you ant to read data.

    $o specify a file@

    #. nter the location of the file in the L!ile or directoryL field or click the =rose button to

    brose the local file system.

    2. Click the LAddL button to add a file to the list of Lselected filesL like this@

    11.6.1.2.2. Selecting file using Regular Expressions

    Oou can also have this step search for files by specifying a ild card in the form of a regular

    e8pression. Regular e8pressions are more sophisticated than simply using L]L and L+L ild cards.

    ?ere are a fe e8amples of regular e8pressions@

    !ilename Regular 8pression !iles selected

  • 8/13/2019 Spoon 3 0 0 User Guide

    60/265

    $his option allos even more fle8ibility in combination ith other steps like M4et !ilenamesN. Oou

    can construct your filename and pass it to this step. $his ay the filename can come from any

    source@ te8t file" database table" etc.

    3ption %escription

    Accept filenames fromprevious steps

    $his enables the option to get filenames from previous steps.

    *tep to read filenames

    from

    $he step to read the filenames from

    !ield in the input to use

    as filename

    $e8t !ile )nput ill look in this step to determine the filenames to use.

    Pentaho Data Integration TM Soon !ser "#ide

    60

  • 8/13/2019 Spoon 3 0 0 User Guide

    61/265

    11.6.1.$. Content secification

    $he content tab allos you to specify the format of the te8t files that are being read. ?ere is a list

    of the options on this tab@

    3ption %escription!ile type $his can be either C* or !i8ed length. =ased on this selection" *poon

    ill launch a different helper 4>) hen you press the Mget fieldsN

    button in the last MfieldsN tab.

    *eparator 3ne or more characters that separate the fields in a single line of te8t.

    $ypically this is ^ or a tab.

    nclosure *ome fields can be enclosed by a pair of strings to allo separator

    characters in fields. $he enclosure string is optional. )f you use repeat

    an enclosures allo te8t line Not the $i$e oNNcloc< $e%s.N.

    'ith L the enclosure string" this gets parsed as ot the $i$e

    oNcloc< $e%s.

    Allo breaks in enclosed

    fields+

    $his is an e8perimental feature hich is currently disabled.

    Note: $his functionality is implemented and available in the C* )nput

    *tep.

    scape *pecify an escape character Bor characters if you have escaped

    characters in your data. )f you have S as an escape character" the te8t

    Not the $i$e oFNcloc< $e%s.L Bith L the enclosure ill get

    parsed as ot the $i$e oNcloc< $e%s.

    ?eader 5 number of header

    lines

    nable this option if your te8t file has a header ro. B!irst lines in the

    file Oou can specify the number of times the header lines appears.

    !ooter 5 number of footer

    lines

    nable this option if your te8t file has a footer ro. B,ast lines in the

    file Oou can specify the number of times the footer ro appears.

    'rapped lines 5 number of

    raps

    >se this if you deal ith datalines that have rapped beyond a

    certain page limit. :ote that headers 5 footers are never considered

    rapped.

    Paged layout 5 page siFe 5

    doc header

    Oou can use these options as a last resort hen dealing ith te8ts

    meant for printing on a line printer. >se the number of document

    header lines to skip introductory te8ts and the number of lines per

    page to position the data lines.

    Compression nable this option if your te8t file is placed in a Eip or 4Eip archive.

    NOTE@At the moment" only the first file in the archive is read.

    :o empty ros %onLt send empty ros to the ne8t steps.

    )nclude filename in output nable this if you ant the filename to be part of the output.

    !ilename field name $he name of the field that contains the filename.

    Ronum in output+ nable this if you ant the ro number to be part of the output.

    Ro number field name $he name of the field that contains the ro number.

    Ronum by file+ Allos the ro number to be reset per file.

    Pentaho Data Integration TM Soon !ser "#ide

    61

  • 8/13/2019 Spoon 3 0 0 User Guide

    62/265

    3ption %escription

    !ormat $his can be either %3*" >:)G or mi8ed. >:)G files have lines that are

    terminated by line feeds. %3* files have lines separated by carriage

    returns and line feeds. )f you specify mi8ed" no verification is done.

    ncoding *pecify the te8t file encoding to use. ,eave blank to use the default

    encoding on your system. $o use >nicode specify >$!1 or >$!#.

    3n first use" *poon ill search your system for available encodings.

    ,imit *ets the number of lines that is read from the file. 0 means@ read all

    lines.

    =e lenient hen parsing

    dates+

    %isable this option if you ant strict parsing of data fields. )n case

    lenient parsing is enabled" dates like 9an (2nd ill become !eb #st.

    $he date format ,ocale $his locale is used to parse dates that have been ritten in full like

    M!ebruary 2nd" 200N. Parsing this date on a system running in the

    !rench BfrQ!R locale ould not ork because !ebruary ould be

    called !\vrier in that locale.

    Pentaho Data Integration TM Soon !ser "#ide

    62

  • 8/13/2019 Spoon 3 0 0 User Guide

    63/265

    11.6.1.%. rror handing

    $he error handling tab as added to allo you to specify ho this step should react hen errors

    occur. $he table belo describes the options available for rror handling@

    3ption %escription

    )gnore errors+ Check this option if you ant to ignore errors during parsing*kip error lines nable this option if you ant to skip those lines that contain errors.

    :ote that you can generate an e8tra file that ill contain the line

    numbers on hich the errors occurred. )f lines ith errors are not

    skipped" the fields that did have parsing errors" ill be empty Bnull

    rror count field name Add a field to the output stream ros. $his field ill contain the number

    of errors on the line.

    rror fields field name Add a field to the output stream ros. $his field ill contain the field

    names on hich an error occurred.

    rror te8t field name Add a field to the output stream ros. $his field ill contain the

    descriptions of the parsing errors that have occurred.

    'arnings file directory 'hen arnings are generated" they ill be put in this directory. $hename of that file ill be Tarning

    dirU

  • 8/13/2019 Spoon 3 0 0 User Guide

    64/265

    11.6.1.&. 5iters

    $he filters tab provides the ability to specify the lines you ant to skip in the te8t file.

    $he table belo describes the available options for defining filters@

    3ption %escription

    !ilter string $he string to look for.

    !ilter position $he position here the filter string has to be at in the line. 0 is the first

    position in the line. )f you specify a value belo 0 here" the filter string is

    searched for in the entire string.

    *top on filter *pecify O here if you ant to stop processing the current te8t file hen the

    filter string is encountered.

    Pentaho Data Integration TM Soon !ser "#ide

    6%

    *pecifying te8t file filters

  • 8/13/2019 Spoon 3 0 0 User Guide

    65/265

    11.6.1.6. 5ieds

    $he fields tab is here you specify the information about the name and format of the fields being

    read from the te8t file. Available options include@

    3ption %escription

    :ame name of the field$ype $ype of the field can be either *tring" %ate or :umber

    !ormat *ee:umber !ormats for a complete description of format symbols.

    ,ength !or :umber@ $otal number of significant figures in a number^

    !or *tring@total length of string^

    !or %ate@length of printed output of the string Be.g. - only gives back the year.

    Precision !or :umber@:umber of floating point digits^

    !or *tring" %ate" =oolean@unused^

    Currency used to interpret numbers like #0"000.00 or .000"00

    %ecimal A decimal point can be a Y.Y B#0^000.00 or Y"Y B.000"00

    4rouping A grouping can be a dot Y"Y B#0^000.00 or Y.Y B.000"00

    :ull if treat this value as :>,,%efault $he default value in case the field in the te8t file as not specified. Bempty

    $rim type trim this field Bleft" right" both before processing

    Repeat Osed to Huote special characters in a prefi8 or suffi8"

    for e8ample" YL[L[Y formats #2( to Y[#2(Y. $o

    create a single Huote itself" use to in a ro@ Y[

    oLLclockY.

    Pentaho Data Integration TM Soon !ser "#ide

    6&

    http://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.htmlhttp://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.html
  • 8/13/2019 Spoon 3 0 0 User Guide

    66/265

    *cientific :otation

    )n a pattern" the e8ponent character immediately folloed by one or more digit characters

    indicates scientific notation. 8ample@ Y0.[[[0Y formats the number #2(- as Y#.2(-(Y.

    11.6.1.6.2. ate formats

    $he information on %ate formats as taken from the *un 9ava AP) documentation" to be found

    here@ http@

  • 8/13/2019 Spoon 3 0 0 User Guide

    67/265

    11.6.2. Tabe in#t

    11.6.2.1. "enera descrition

    $his step is used to read information from a database" using a connection and *;,. =asic *;,

    statements are generated automatically.

    11.6.2.2. Otions3ption %escription

    *tep name :ame of the step. $his name has to be uniHue in a single transformation.

    Connection $he database connection used to read data from.

    *;, $he *;, statement used to read information from the database

    connection. Oou can also click the L4et *;, select statement...L button to

    brose tables and automatically generate a basic select statement.

    nable laFy

    conversion

    ,aFy conversion ill avoid unnecessary data type conversio