TalendOpenStudio Components RG 32a En

  • Upload
    ninova

  • View
    1.071

  • Download
    49

Embed Size (px)

Citation preview

Talend Open Studio Components

3.X

Reference Guide

Version 3.2_aAdapted for Talend Open Studio v3.2.x. Supersedes previous Reference Guide releases.

CopyrightThis documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please check: http://creativecommons.org/licenses/by-nc-sa/2.0/

ii

Talend Open Studio Components

Creative Commons License

Talend Open Studio Components Reference Guide ......................... i

tSPSSStructure properties ......................................31 Related scenarios ...................................................31

Business components .................................33tAlfrescoOutput .........................................................34 tAlfresco Properties ...............................................34 Installation procedure .........................................35 Prerequisites ...................................................36 Installing the Talend Alfresco module ...........36 Useful information for advanced use .............37 Dematerialization, tAlfrescoOutput, and Enterprise Content Management ...................................................38 Scenario: Creating documents on an Alfresco server 39 tCentricCRMInput ....................................................45 tCentricCRMInput Properties ................................45 Related Scenario ....................................................45 tCentricCRMOutput .................................................46 tCentricCRMOutput Properties .............................46 Related Scenario ....................................................46 tMicrosoftCRMInput ................................................47 tMicrosoftCRMInput Properties ............................47 Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows .............................................................................48 tMicrosoftCRMOutput .............................................55 tMicrosoftCRMOutput Properties .........................55 Related Scenario ....................................................56 tSalesforceGetDeleted ...............................................57 tSalesforceGetDeleted properties ...........................57 Scenario: Recovering deleted data from the Salesforce server ..................................................................58 tSalesforceGetServerTimestamp ..............................60 tSalesforceGetServerTimestamp properties ...........60 Related scenarios ...................................................60 tSalesforceGetUpdated ..............................................61 tSalesforceGetUpdated properties .........................61 Related scenarios ...................................................62 tSalesforceInput .........................................................63 tSalesforceInput Properties ....................................63 Related scenario .....................................................64 tSalesforceOutput ......................................................65 tSalesforceOutput Properties .................................65 Scenario: Deleting data from the Account object ..66 tSAPConnection .........................................................68 tSAPConnection properties ....................................68 Related scenarios ...................................................68 tSAPInput ...................................................................69 tSAPInput Properties .............................................69 Scenario 1: Retrieving metadata from the SAP system 70 Scenario 2: Reading data in the different schemas of iii

Preface ........................................................................ xv Purpose .................................................................. xv Audience ................................................................ xv Typographical conventions ................................... xv History of changes .................................................... xvi Feedback and Support ............................................. xvi

Business Intelligence components .............. 1tDB2SCD ...................................................................... 2 tDB2SCD properties ................................................ 2 Related scenarios ..................................................... 3 tIngresSCD .................................................................. 4 tIngresSCD Properties ............................................. 4 Related scenario ...................................................... 5 tMondrianInput .......................................................... 6 tMondrianInput Properties ...................................... 6 Scenario: Cross-join tables ...................................... 7 tMSSqlSCD ................................................................ 10 tMSSqlSCD Properties .......................................... 10 Related scenario .................................................... 11 tMysqlSCD ................................................................. 12 tMysqlSCD Properties ........................................... 12 SCD management methodologies ...................... 13 SCD keys ....................................................... 14 Combining SCD types ................................... 14 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) ........................... 15 tOracleSCD ................................................................ 23 tOracleSCD Properties .......................................... 23 Related scenario .................................................... 24 tSybaseSCD ................................................................ 25 tSybaseSCD properties .......................................... 25 Related scenarios ................................................... 26 tSPSSInput ................................................................. 27 tSPSSInput properties ............................................ 27 Scenario: Displaying the content of an SPSS .sav file 27 tSPSSOutput .............................................................. 29 tSPSSOutput properties ......................................... 29 Related scenarios ................................................... 29 tSPSSProperties ........................................................ 30 tSPSSProperties properties .................................... 30 Related scenarios ................................................... 30 tSPSSStructure .......................................................... 31 Creative Commons License

Talend Open Studio Components

the RFC_READ_TABLE function ............................. 76 tSAPOutput ............................................................... 82 tSAPOutput Properties .......................................... 82 Related scenario .................................................... 82 tSugarCRMInput ...................................................... 83 tSugarCRMInput Properties .................................. 83 Scenario: Extracting account data from SugarCRM . 83 tSugarCRMOutput ................................................... 85 tSugarCRMOutput Properties ............................... 85 Related Scenario .................................................... 85 tVtigerCRMInput ..................................................... 86 tVtigerCRMInput Properties ................................. 86 Related Scenario .................................................... 86 tVtigerCRMOutput .................................................. 87 tVtigerCRMOutput Properties .............................. 87 Related Scenario .................................................... 87

Scenario: Validating dates against schema (java) 122 tUniqRow ..................................................................126 tUniqRow Properties ............................................126 Scenario: Unduplicating entries ...........................126

Database components ..............................129tAccessInput .............................................................130 tAccessInput properties ........................................130 Related scenarios .................................................131 tAccessOutput ..........................................................132 tAccessOutput properties .....................................132 Related scenarios .................................................134 tAccessRow ...............................................................135 tAccessRow properties .........................................135 Related scenarios .................................................136 tAS400Commit .........................................................137 tAS400Commit Properties ...................................137 Related scenario ...................................................137 tAS400Connection ...................................................138 tAS400Connection Properties ..............................138 Related scenario ...................................................138 tAS400Input .............................................................139 tAS400Input properties ........................................139 Related scenarios .................................................140 tAS400Output ..........................................................141 tAS400Output properties .....................................141 Related scenarios .................................................143 tAS400Row ...............................................................144 tAS400Row properties .........................................144 Related scenarios .................................................145 tCreateTable .............................................................146 tCreateTable Properties ........................................146 Scenario: Creating new table in a Mysql Database ... 148 tDB2BulkExec ..........................................................150 tDB2BulkExec properties ....................................150 Related scenarios .................................................151 tDB2Input .................................................................152 tDB2Input properties ...........................................152 Related scenarios .................................................153 tDB2Output ..............................................................154 tDB2Output properties .........................................154 Related scenarios .................................................156 tDB2Row ...................................................................157 tDB2Row properties ............................................157 Related scenarios .................................................158 tDB2SCD ..................................................................159 tDB2SP ......................................................................160 tDB2SP properties ...............................................160 Related scenarios .................................................161 tDBInput ..................................................................162 tDBInput properties .............................................162 Creative Commons License

Custom Code components ........................ 89tJava ........................................................................... 90 tJava Properties ...................................................... 90 Scenario: Printing out a variable content .............. 90 tJavaFlex .................................................................... 93 tJavaFlex properties ............................................... 93 Scenario1: Generating data flow ........................... 94 Scenario2: Processing rows of data with tJavaFlex .. 96 tPerl ............................................................................ 99 tPerl properties ...................................................... 99 Scenario: Displaying a number of processed lines 99

Data Quality components ....................... 103tAddCRCRow .......................................................... 104 tAddCRCRow properties .................................... 104 Scenario: Adding a surrogate key to a file .......... 104 tFuzzyMatch ............................................................ 107 tFuzzyMatch properties ....................................... 107 Scenario 1: Levenshtein distance of 0 in first names 108 Scenario 2: Levenshtein distance of 1 or 2 in first names ....................................................................... 110 Scenario 3: Metaphonic distance in first name ... 111 tIntervalMatch ........................................................ 112 tIntervalMatch properties .................................... 112 Scenario: identifying Ip country (Perl and Java) . 113 tReplaceList ............................................................. 117 tReplaceList Properties ........................................ 117 Scenario: Replacement from a reference file ...... 118 tSchemaComplianceCheck ..................................... 121 tSchemaComplianceCheck Properties ................ 121

iv

Talend Open Studio Components

Scenario 1: Displaying selected data from DB table . 163 Scenario 2: Using StoreSQLQuery variable ....... 164 tDBOutput ............................................................... 166 tDBOutput properties .......................................... 166 Scenario: Displaying DB output ......................... 168 tDBSQLRow ............................................................ 171 tDBSQLRow properties ...................................... 171 Scenario: Resetting a DB auto-increment ........... 172 tFirebirdInput ......................................................... 174 tFirebirdInput properties ...................................... 174 Related scenarios ................................................. 175 tFirebirdOutput ...................................................... 176 tFirebirdOutput properties ................................... 176 Related scenarios ................................................. 178 tFirebirdRow ........................................................... 179 tFirebirdRow properties ...................................... 179 Related scenarios ................................................. 180 tHSQLDbInput ........................................................ 181 tHSQLDbInput properties ................................... 181 Related scenarios ................................................. 182 tHSQLDbOutput ..................................................... 183 tHSQLDbOutput properties ................................ 183 Related scenarios ................................................. 185 tHSQLDbRow ......................................................... 186 tHSQLDbRow properties .................................... 186 Related scenarios ................................................. 187 tInformixInput ........................................................ 188 tInformixInput properties .................................... 188 Related scenarios ................................................. 189 tInformixOutput ...................................................... 190 tInformixOutput properties .................................. 190 Related scenarios ................................................. 192 tInformixRow .......................................................... 193 tInformixRow properties ..................................... 193 Related scenarios ................................................. 194 tIngresCommit ........................................................ 195 tIngresCommit Properties .................................... 195 Related scenario .................................................. 195 tIngresConnection ................................................... 196 tIngresConnection Properties .............................. 196 Related scenarios ................................................. 196 tIngresInput ............................................................. 197 tIngresInput properties ........................................ 197 Related scenarios ................................................. 198 tIngresOutput .......................................................... 199 tIngresOutput properties ...................................... 199 Related scenarios ................................................. 201 tIngresRollback ....................................................... 202 tIngresRollback properties .................................. 202 Related scenarios ................................................. 202 tIngresRow ............................................................... 203 tIngresRow properties ......................................... 203 Creative Commons License

Related scenarios .................................................204 tIngresSCD ...............................................................205 tInterbaseInput ........................................................206 tInterbaseInput properties ....................................206 Related scenarios .................................................207 tInterbaseOutput .....................................................208 tInterbaseOutput properties ..................................208 Related scenarios .................................................210 tInterbaseRow ..........................................................211 tInterbaseRow properties .....................................211 Related scenarios .................................................212 tJavaDBInput ...........................................................213 tJavaDBInput properties ......................................213 Related scenarios .................................................214 tJavaDBOutput ........................................................215 tJavaDBOutput properties ....................................215 Related scenarios .................................................217 tJavaDBRow .............................................................218 tJavaDBRow properties .......................................218 Related scenarios .................................................219 tJDBCColumnList ...................................................220 tJDBCColumnList Properties ..............................220 Related scenario ...................................................220 tJDBCCommit .........................................................221 tJDBCCommit Properties ....................................221 Related scenario ...................................................221 tJDBCConnection ....................................................222 tJDBCConnection Properties ...............................222 Related scenario ...................................................222 tJDBCInput ..............................................................223 tJDBCInput properties .........................................223 Related scenarios .................................................224 tJDBCOutput ...........................................................225 tJDBCOutput properties .......................................225 Related scenarios .................................................227 tJDBCRollback ........................................................228 tJDBCRollback properties ...................................228 Related scenario ...................................................228 tJDBCRow ................................................................229 tJDBCRow properties ..........................................229 Related scenarios .................................................230 tJDBCSP ...................................................................231 tJDBCSP Properties .............................................231 Related scenario ...................................................232 tJDBCTableList .......................................................233 tJDBCTableList Properties ..................................233 Related scenario ...................................................233 tLDAPInput .............................................................234 tLDAPInput Properties ........................................234 Scenario: Displaying LDAP directorys filtered content .............................................................................235 tLDAPOutput ...........................................................238 tLDAPOutput Properties ......................................238 v

Talend Open Studio Components

Scenario: Editing data in an LDAP directory ...... 239 tMSSqlBulkExec ..................................................... 242 tMSSqlBulkExec properties ................................ 242 Related scenarios ................................................. 244 tMSSqlColumnList ................................................. 245 tMSSqlColumnList Properties ............................ 245 Related scenario .................................................. 245 tMSSqlInput ............................................................ 246 tMSSqlInput properties ....................................... 246 Related scenarios ................................................. 247 tMSSqlOutput ......................................................... 248 tMSSqlOutput properties ..................................... 248 Related scenarios ................................................. 251 tMSSqlOutputBulk ................................................. 252 tMSSqlOutputBulk properties ............................. 252 Related scenarios ................................................. 253 tMSSqlOutputBulkExec ......................................... 254 tMSSqlOutputBulkExec properties ..................... 254 Related scenarios ................................................. 255 tMSSqlRow .............................................................. 256 tMSSqlRow properties ........................................ 256 Related scenarios ................................................. 257 tMSSqlSCD .............................................................. 258 tMSSqlSP ................................................................. 259 tMSSqlSP Properties ........................................... 259 Related scenario .................................................. 260 tMSSqlTableList ..................................................... 261 tMSSqlTableList Properties ................................ 261 Related scenario .................................................. 261 tMysqlBulkExec ...................................................... 262 tMysqlBulkExec properties ................................. 262 Related scenarios ................................................. 263 tMysqlColumnList .................................................. 264 tMysqlColumnList Properties ............................. 264 Scenario: Iterating on a DB table and listing its column names ................................................................ 264 tMysqlCommit ......................................................... 268 tMysqlCommit Properties ................................... 268 Related scenario .................................................. 268 tMysqlConnection ................................................... 269 tMysqlConnection Properties .............................. 269 Scenario: Inserting data in mother/daughter tables ... 269 tMysqlInput ............................................................. 274 tMysqlInput properties ........................................ 274 Related scenarios ................................................. 275 tMysqlLastInsertId ................................................. 276 tMysqlLastInsertId properties ............................. 276 Scenario: Get the ID for the last inserted record . 276 tMysqlOutput .......................................................... 281 tMysqlOutput properties ...................................... 281 Scenario 1: Adding a new column and altering data in a DB table .................................................................. 283 vi

Scenario 2: Updating data in a database table .....288 tMysqlOutputBulk ...................................................292 tMysqlOutputBulk properties ..............................292 Scenario: Inserting transformed data in MySQL database ............................................................................293 tMysqlOutputBulkExec ..........................................296 tMysqlOutputBulkExec properties ......................296 Scenario: Inserting data in MySQL database .......297 tMysqlRollback ........................................................299 tMysqlRollback properties ...................................299 Scenario: Rollback from inserting data in mother/daughter tables .......................................................299 tMysqlRow ...............................................................300 tMysqlRow properties ..........................................300 Scenario: Removing and regenerating a MySQL table index ..........................................................................301 tMysqlSCD ...............................................................303 tMysqlSCDELT .......................................................304 tMysqlSCDELT Properties ..................................304 Related Scenario ..................................................305 tMysqlSP ...................................................................306 tMysqlSP Properties .............................................306 Scenario: Finding a State Label using a stored procedure ............................................................................307 tMysqlTableList .......................................................310 tMysqlTableList Properties ..................................310 Scenario: Iterating on DB tables and deleting their content using a user-defined SQL template ...............310 Related scenario ...................................................314 tNetezzaBulkExec ....................................................315 tNetezzaBulkExec properties ...............................315 Related scenarios .................................................316 tNetezzaCommit .......................................................317 tNetezzaCommit Properties .................................317 Related scenario ...................................................317 tNetezzaConnection .................................................318 tNetezzaConnection Properties ............................318 Related scenarios .................................................318 tNetezzaInput ...........................................................319 tNetezzaInput properties ......................................319 Related scenarios .................................................320 tNetezzaOutput ........................................................321 tNetezzaOutput properties ...................................321 Related scenarios .................................................323 tNetezzaRollback .....................................................324 tNetezzaRollback properties ................................324 Related scenarios .................................................324 tNetezzaRow .............................................................325 tNetezzaRow properties .......................................325 Related scenarios .................................................326 tOracleBulkExec ......................................................327 tOracleBulkExec properties .................................327 Scenario: Truncating and inserting file data into OraCreative Commons License

Talend Open Studio Components

cle DB ....................................................................... 329 tOracleCommit ........................................................ 333 tOracleCommit Properties ................................... 333 Related scenario .................................................. 333 tOracleConnection .................................................. 334 tOracleConnection Properties .............................. 334 Related scenario .................................................. 334 tOracleInput ............................................................ 336 tOracleInput properties ........................................ 336 Related scenarios ................................................. 337 tOracleOutput ......................................................... 338 tOracleOutput properties ..................................... 338 Related scenarios ................................................. 340 tOracleOutputBulk ................................................. 341 tOracleOutputBulk properties ............................. 341 Related scenarios ................................................. 342 tOracleOutputBulkExec ......................................... 343 tOracleOutputBulkExec properties ..................... 343 Related scenarios ................................................. 345 tOracleRollback ...................................................... 346 tOracleRollback properties .................................. 346 Related scenario .................................................. 346 tOracleRow .............................................................. 347 tOracleRow properties ......................................... 347 Related scenarios ................................................. 348 tOracleSCD .............................................................. 349 tOracleSP ................................................................. 350 tOracleSP Properties ............................................ 350 Scenario: Checking number format using a stored procedure ................................................................... 351 tParseRecordSet ...................................................... 356 tParseRecordSet properties .................................. 356 Scenario ............................................................... 356 tPostgresqlBulkExec ............................................... 357 tPostgresqlBulkExec properties .......................... 357 Related scenarios ................................................. 358 tPostgresqlCommit .................................................. 359 tPostgresqlCommit Properties ............................. 359 Related scenario .................................................. 359 tPostgresqlConnection ............................................ 360 tPostgresqlConnection Properties ........................ 360 Related scenario .................................................. 360 tPostgresqlInput ...................................................... 361 tPostgresqlInput properties .................................. 361 Related scenarios ................................................. 362 tPostgresqlOutput ................................................... 363 tPostgresqlOutput properties ............................... 363 Related scenarios ................................................. 365 tPostgresqlOutputBulk ........................................... 366 tPostgresqlOutputBulk properties ....................... 366 Related scenarios ................................................. 367 tPostgresqlOutputBulkExec ................................... 368 tPostgresqlOutputBulkExec properties ............... 368 Creative Commons License

Related scenarios .................................................368 tPostgresqlRollback .................................................369 tPostgresqlRollback properties ............................369 Related scenario ...................................................369 tPostgresqlRow ........................................................370 tPostgresqlRow properties ...................................370 Related scenarios .................................................371 tSASInput .................................................................372 tSASInput properties ............................................372 Related scenarios .................................................373 tSASOutput ..............................................................374 tSASOutput properties .........................................374 Related scenarios .................................................376 tSQLiteConnection ..................................................377 SQLiteConnection properties ...............................377 Related scenarios ................................................377 tSQLiteInput ............................................................378 tSQLiteInput Properties .......................................378 Scenario: Filtering SQlite data .............................379 tSQLiteOutput .........................................................381 tSQLiteOutput Properties .....................................381 Related Scenario ..................................................383 tSQLiteRow ..............................................................384 tSQLiteRow Properties ........................................384 Scenario: Updating SQLite rows .........................385 tSybaseBulkExec ......................................................388 tSybaseBulkExec Properties ................................388 Related scenarios .................................................389 tSybaseCommit ........................................................390 tSybaseCommit Properties ...................................390 Related scenario ...................................................390 tSybaseConnection ...................................................391 tSybaseConnection Properties .............................391 Related scenarios .................................................391 tSybaseInput .............................................................392 tSybaseInput Properties .......................................392 Related scenarios .................................................393 tSybaseIQBulkExec .................................................394 tSybaseIQBulkExec Properties ............................394 Related scenarios .................................................395 tSybaseIQOutputBulkExec .....................................396 tSybaseIQOutputBulkExec properties .................396 Related scenarios .................................................397 tSybaseOutput ..........................................................398 tSybaseOutput Properties .....................................398 Related scenarios .................................................400 tSybaseOutputBulk .................................................401 tSybaseOutputBulk properties .............................401 Related scenarios .................................................402 tSybaseOutputBulkExec .........................................403 tSybaseOutputBulkExec properties .....................403 Related scenarios .................................................404 tSybaseRollback .......................................................405 vii

Talend Open Studio Components

tSybaseRollback properties ................................. 405 Related scenarios ................................................. 405 tSybaseRow .............................................................. 406 tSybaseRow Properties ........................................ 406 Related scenarios ................................................. 407 tSybaseSCD .............................................................. 408 tSybaseSCDELT ...................................................... 409 tSybaseSCDELT Properties ................................ 409 Related Scenario .................................................. 410 tSybaseSP ................................................................. 411 tSybaseSP properties ........................................... 411 Related scenarios ................................................. 412 tTeradataInput ........................................................ 413 tTeradataInput Properties .................................... 413 Related scenarios ................................................. 414 tTeradataOutput ..................................................... 415 tTeradataOutput Properties .................................. 415 Related scenarios ................................................. 417 tTeradataRow .......................................................... 418 tTeradataRow Properties ..................................... 418 Related scenarios ................................................. 419 tVerticaBulkExec .................................................... 420 tVerticaBulkExec Properties ............................... 420 Related scenarios ................................................ 421 tVerticaCommit ....................................................... 422 tVerticaCommit Properties ................................. 422 Related scenario ................................................. 422 tVerticaConnection ................................................. 423 tVerticaConnection Properties ........................... 423 Related scenario ................................................. 423 tVerticaInput ........................................................... 424 tVerticaInput Properties ..................................... 424 Related scenarios ................................................ 425 tVerticaOutput ........................................................ 426 tVerticaOutput Properties ................................... 426 Scnarios associs .............................................. 428 tVerticaOutputBulk ................................................ 429 tVerticaOutputBulk Properties ........................... 429 Related scenarios ................................................ 430 tVerticaOutputBulkExec ........................................ 431 tVerticaOutputBulkExec Properties ................... 431 Related scenarios ................................................ 432 tVerticaRollback ..................................................... 433 tVerticaRollback Properties ............................... 433 Related scenario ................................................. 433 tVerticaRow ............................................................. 434 tVerticaRow Properties ...................................... 434 Related scenario ................................................. 435

Scenario: Filtering and aggregating table columns directly on the DBMS ...................................................439 tELTFilterColumns .................................................445 tELTFilterColumns Properties .............................445 Related Scenario ..................................................446 tELTFilterRows .......................................................447 tELTFilterRows Properties ..................................447 Realted Scenario ..................................................448 tELTMysqlInput ......................................................449 tELTMysqlInput properties .................................449 Related scenarios .................................................449 tELTMysqlMap .......................................................450 tELTMysqlMap properties ...................................450 Connecting ELT components ...........................450 Mapping and joining tables ..............................451 Adding where clauses .......................................451 Generating the SQL statement ..........................451 Scenario1: Aggregating table columns and filtering . 452 Scenario 2: ELT using Alias table .......................455 tELTMysqlOutput ...................................................459 tELTMysqlOutput properties ...............................459 Related scenarios .................................................460 tELTOracleInput .....................................................461 tELTOracleInput properties .................................461 Related scenarios .................................................461 tELTOracleMap ......................................................462 tELTOracleMap properties ..................................462 Connecting ELT components ...........................462 Mapping and joining tables ..............................463 Adding where clauses .......................................463 Generating the SQL statement ..........................463 Scenario: Updating Oracle DB entries .................463 tELTOracleOutput ..................................................466 tELTOracleOutput properties ..............................466 Related scenarios .................................................467 tELTTeradataInput .................................................468 tELTTeradataInput properties ..............................468 Related scenarios .................................................468 tELTTeradataMap ..................................................469 tELTTeradataMap properties ...............................469 Connecting ELT components ...........................469 Mapping and joining tables ..............................470 Adding WHERE clauses ..................................470 Generating the SQL statement ..........................470 Related scenarios .................................................470 tELTTeradataOutput ..............................................471 tELTTeradataOutput properties ...........................471 Related scenarios .................................................472

ELT components ..................................... 437tELTAggregate ........................................................ 438 tELTAggregate properties ................................... 438 viii

File components .......................................473tAdvancedFileOutputXML .....................................474 Creative Commons License

Talend Open Studio Components

tApacheLogInput .................................................... 475 tApacheLogInput properties ................................ 475 Scenario: Reading an Apache access-log file ...... 475 tCreateTemporaryFile ........................................... 477 tCreateTempraryFile properties .......................... 477 Scenario: Creating a temporary file and writing data in it ............................................................................ 477 tFileCompare ........................................................... 482 tFileCompare properties ...................................... 482 Scenario: Comparing unzipped files ................... 482 tFileCopy .................................................................. 485 tFileCopy Properties ............................................ 485 Scenario: Restoring files from bin ....................... 485 tFileDelete ................................................................ 487 tFileDelete Properties .......................................... 487 Scenario: Deleting files ....................................... 487 tFileExist .................................................................. 489 tFileExist Properties ............................................ 489 Scenario: Checking for the presence of a file and creating it if it does not exist .......................................... 489 tFileFetch ................................................................. 493 tFileInputDelimited ............................................... 494 tFileInputDelimited properties ............................ 494 Scenario: Delimited file content display ............. 495 tFileInputMSDelimited .......................................... 497 tFileInputEBCDIC .................................................. 498 tFileInputEBCDIC properties .............................. 498 Scenario: Extracting data from an EBCDIC file and populating a database ................................................ 498 tFileInputExcel ....................................................... 504 tFileInputExcel properties .................................. 504 Related scenarios ................................................. 505 tFileInputFullRow .................................................. 506 tFileInputFull Row properties ............................. 506 Scenario: Reading full rows in a delimited file ... 506 tFileInputMail ......................................................... 509 tFileInputMail properties ..................................... 509 Scenario: Extracting key fields from email ......... 509 tFileInputMSPositional ......................................... 511 tFileInputPositional ............................................... 512 tFileInputPositional properties ............................ 512 Scenario: From Positional to XML file ............... 513 tFileInputProperties ................................................ 517 tFileInputProperties properties ............................ 517 Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary .............................................................. 517 tFileInputRegex ....................................................... 521 tFileInputRegex properties .................................. 521 Scenario: Regex to Positional file ....................... 522 tFileInputXML ........................................................ 525 tFileInputMSXML .................................................. 526 tFileList ................................................................... 527 Creative Commons License

tFileList properties ...............................................527 Scenario: Iterating on a file directory ..................527 tFileOutputEBCDIC ...............................................531 tFileOutputEBCDIC properties ...........................531 Scenario: Creating an EBCDIC file using two delimited files .....................................................................531 tFileOutputExcel ......................................................534 tFileOutputExcel Properties .................................534 Related scenario ...................................................535 tFileOutputLDIF .....................................................536 tFileOutputLDIF Properties .................................536 Scenario: Writing DB data into an LDIF-type file .... 537 tFileOutputProperties .............................................539 tFileOutputProperties properties ..........................539 Related scenarios .................................................539 tFileOutputXML ......................................................540 tFileProperties ..........................................................541 tFileProperties Properties .....................................541 Scenario: Displaying the properties of a processed file 541 tFileUnarchive ..........................................................543 tFileUnarchive Properties ....................................543 Related scenario ...................................................543 tPivotOutputDelimited ............................................544 tPivotOutputDelimited Properties ........................544 Scenario: Using a pivot column to aggregate data ..... 544

Internet components ................................547tFileFetch ..................................................................548 tFileFetch properties ............................................548 Scenario: Fetching data through HTTP ...............548 tFTPConnection .......................................................550 tFTPConnection properties ..................................550 Related scenarios .................................................550 tFTPDelete ................................................................551 tFTPDelete properties ..........................................551 Related scenario ...................................................551 tFTPFileList ............................................................552 tFTPFileList properties .......................................552 Scenario: Iterating on a remote directory ............553 tFTPGet ....................................................................556 tFTPGet properties ...............................................556 Related scenario ...................................................556 tFTPPut ....................................................................557 tFTPPut properties ...............................................557 Scenario: Putting files on a remote FTP server ...557 tMomInput ...............................................................560 tMomInput Properties ..........................................560 Scenario: asynchronous communication via a MOM server .........................................................................560 ix

Talend Open Studio Components

tMomMessageIdList ............................................... 564 tMomMessageIdList Properties .......................... 564 Related scenario .................................................. 564 tMomOutput ............................................................ 565 tMomOutput Properties ....................................... 565 Related scenario .................................................. 565 tPOP ......................................................................... 566 tPOP properties .................................................... 566 Scenario: Retrieving a selection of email messages from an email server ................................................ 567 tRSSInput ................................................................ 570 tRSSInput Properties ........................................... 570 Scenario: Fetching frequently updated blog entries. . 570 tRSSOutput .............................................................. 573 tRSSOutput Properties ........................................ 573 Scenario: Creating an RSS flow. ......................... 573 tSCPConnection ...................................................... 578 tSCPConnection properties ................................. 578 Related scenarios ................................................. 578 tSCPDelete ............................................................... 579 tSCPDelete properties ......................................... 579 Related scenario .................................................. 579 tSCPFileExists ......................................................... 580 tSCPFileExists properties .................................... 580 Related scenario .................................................. 580 tSCPFileList ............................................................. 581 tSCPFileList properties ....................................... 581 Related scenario .................................................. 581 tSCPGet .................................................................... 582 tSCPGet properties .............................................. 582 Scenario: Getting files from a remote SCP server .... 582 tSCPPut .................................................................... 584 tSCPPut properties .............................................. 584 Related scenario .................................................. 584 tSCPRename ............................................................ 585 tSCPRename properties ....................................... 585 Related scenario .................................................. 585 tSCPTruncate .......................................................... 586 tSCPRename properties ....................................... 586 Related scenario .................................................. 586 tSendMail ................................................................. 587 tSendMail Properties ........................................... 587 Scenario: Email on error ...................................... 588 tSocketInput ............................................................ 590 tSocketInput properties ........................................ 590 Scenario: Passing on data to the listening port (Java) 591 tSocketOutput .......................................................... 595 tSocketOutput properties ..................................... 595 Related Scenario .................................................. 596 tWebServiceInput ................................................... 597 x

tWebServiceInput Properties ...............................597 Scenario 1: Extracting images through a Web service 598 Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features (Java only) ...........................................................................600 tXMLRPC ................................................................605 tXMLRPC Properties ...........................................605 Scenario: Guessing the State name from an XMLRPC 605

Logs & Errors components .....................609tChronometerStart ..................................................610 tChronometerStart Properties ...............................610 Related scenario ...................................................610 tChronometerStop ...................................................611 tChronometerStop Properties ...............................611 Scenario: Measuring the processing time of a subjob and part of a subjob ...................................................611 tDie ............................................................................617 tDie properties ......................................................617 Related scenarios .................................................617 tFlowMeter ...............................................................618 tFlowMeter Properties .........................................618 Related scenario ...................................................618 tFlowMeterCatcher .................................................619 tFlowMeterCatcher Properties .............................619 Scenario: Catching flow metrics from a Job ........620 tLogCatcher .............................................................624 tLogCatcher properties ........................................624 Scenario1: warning & log on entries ...................624 Scenario 2: log & kill a Job ..................................626 tLogRow ..................................................................628 tLogRow properties .............................................628 Scenario: Delimited file content display ..............628 tStatCatcher .............................................................629 tStatCatcher Properties .........................................629 Scenario: Displaying job stats log ........................629 tWarn ........................................................................632 tWarn Properties ..................................................632 Related scenarios .................................................632

Misc group components ..........................633tAddLocationFromIP .............................................634 tAddLocationFromIP Properties ..........................634 Scenario: Identifying a real-world geographic location of an IP ...............................................................634 tBufferInput .............................................................637 tBufferInput properties .........................................637 Scenario: Retrieving bufferized data (Java) .........637

Talend Open Studio Components

Creative Commons License

tBufferOutput .......................................................... 640 tBufferOutput properties ..................................... 640 Scenario1: Buffering data (Java) ......................... 640 Scenario 2: Buffering output data on the webapp server .............................................................................. 643 Scenario 3: Calling a Job with context variables from a browser .................................................................. 646 Scenario 4: Calling a Job exported as Webservice in another Job ................................................................ 648 tContextDump ......................................................... 651 tContextDump properties .................................... 651 Related Scenario ................................................ 651 tContextLoad ........................................................... 652 tContextLoad properties ...................................... 652 Scenario: Dynamic context use in MySQL DB insert 652 tMsgBox .................................................................. 655 tMsgBox properties ............................................. 655 Scenario: Hello world! type test ....................... 655 tRowGenerator ....................................................... 657 tRowGenerator properties ................................... 657 Defining the schema ......................................... 657 Defining the function ....................................... 658 Scenario: Generating random java data ............... 658

Group element ..............................................679 Related scenario ...................................................680

Orchestration components ......................681tFileList ....................................................................682 tFlowToIterate .........................................................683 tFlowToIterate Properties ....................................683 Scenario: Transforming data flow to a list ...........683 tIterateToFlow .........................................................687 tIterateToFlow Properties ....................................687 Scenario: Transforming a list of files as data flow .... 687 tLoop .........................................................................690 tLoop Properties ...................................................690 Scenario: Job execution in a loop ........................690 tPostjob .....................................................................693 tPostjob Properties ...............................................693 Related scenario ...................................................693 tPrejob ......................................................................694 tPrejob Properties .................................................694 Related scenario ...................................................694 tReplicate ..................................................................695 tReplicate Properties ............................................695 Related scenario ...................................................695 tSleep .........................................................................696 tSleep Properties ..................................................696 Related scenarios .................................................696 tUnite .........................................................................697 tUnite Properties ..................................................697 Scenario: Iterate on files and merge the content ..697 tWaitForFile .............................................................700 tWaitForFile properties ........................................700 Scenario: Waiting for a file to be removed .........700 tWaitForSqlData .....................................................703 tWaitForSqlData properties .................................703 Scenario: Waiting for insertion of rows in a table .... 703

MultiSchema components ...................... 661tFileInputMSDelimited .......................................... 662 tFileInputMSDelimited properties ...................... 662 The Multi Schema Editor ................................. 662 Scenario: Reading a multi structure delimited file .... 663 tFileInputMSPositional ......................................... 669 tFileInputMSPositional properties ...................... 669 Related scenario .................................................. 669 tFileOutputMSDelimited ....................................... 670 tFileOutputMSDelimited properties ................... 670 Related scenarios ................................................. 671 tFileOutputMSPositional ....................................... 672 tFileOutputMSPositional properties .................... 672 Related scenario .................................................. 672 tFileInputMSXML .................................................. 673 tFileInputMSXML Properties ............................. 673 Scenario: Reading a multi structure XML file .... 673 tFileOutputMSXML ............................................... 676 tFileOutputMSXML Properties ........................... 676 Defining the MultiSchema XML tree .............. 676 Importing the XML tree .............................. 677 Creating manually the XML tree ................. 678 Mapping XML data from multiple schema sources 678 Defining the node status ................................... 679 Loop element ............................................... 679 Creative Commons License

Processing components ...........................709tAggregateRow ........................................................710 tAggregateRow properties ...................................710 Scenario: Aggregating values and sorting data ....711 tAggregateSortedRow .............................................714 tAggregateSortedRow properties .........................714 Related scenario ...................................................715 tConvertType ...........................................................716 tConvertType properties ......................................716 Scenario: Converting java types ..........................716 tDenormalize ............................................................721 tDenormalize Properties .......................................721 Scenario 1: Denormalizing on one column in Perl .... xi

Talend Open Studio Components

721 Scenario 2: Denormalizing on multiple columns in Java ............................................................................ 723 tDenormalizeSortedRow ........................................ 726 tDenormalizeSortedRow properties .................... 726 Scenario: Regrouping sorted rows ...................... 726 tEmptyToNull .......................................................... 730 tEmptyToNull properties ..................................... 730 Scenario: Replacing empty fields by NULL fields (fields of unknown value) ........................................ 730 tExternalSortRow ................................................... 734 tExternalSortRow properties ............................... 734 Related scenario .................................................. 735 tExtractDelimitedFields .......................................... 736 tExtractDelimitedFields properties ...................... 736 Scenario: Extracting fields from a comma-delimited file ............................................................................. 736 tExtractPositionalFields ......................................... 740 tExtractPositionalFields properties ...................... 740 Related scenario .................................................. 741 tExtractRegexFields ................................................ 742 tExtractRegexFields properties ........................... 742 Scenario: Extracting name, domain and TLD from e-mail addresses ........................................................ 742 tFilterColumn .......................................................... 746 tFilterColumn Properties ..................................... 746 Related Scenario .................................................. 746 tFilterRows .............................................................. 747 tFilterRows Properties ......................................... 747 Scenario: Filtering and searching a list of names 747 tMap ........................................................................ 750 tMap properties .................................................... 750 Scenario 1: Mapping with filter and simple explicit join (Perl) ................................................................. 750 Scenario 2: Mapping with Inner join rejection (Perl) 755 Scenario 3: Cascading join mapping ................... 759 Scenario 4: Advanced mapping with filters, explicit joins and Inner join rejection .................................... 760 Scenario 5: Advanced mapping with filters and a check of all rows ..................................................... 764 Scenario 6: Advanced mapping with lookup reload at each row (java) .......................................................... 767 tNormalize ................................................................ 773 tNormalize Properties .......................................... 773 Scenario: Normalizing data (in Perl) ................... 773 tPivotToRows .......................................................... 776 tPivotToRows properties ..................................... 776 Scenario: Concatenating a list of columns in a table by using the other table columns as pivot ................ 777 tReplace .................................................................... 780 tReplace Properties .............................................. 780 Scenario: multiple replacements and column filtering xii

781 tSampleRow .............................................................785 tSampleRow properties ........................................785 Scenario: Filtering rows and groups of rows .......785 tSortRow ...................................................................788 tSortRow properties .............................................788 Scenario: Sorting entries ......................................789

System components .................................791tRunJob ....................................................................792 tRunJob Properties ...............................................792 Scenario: Executing a child Job ...........................793 tSetEnv ......................................................................798 tSetEnv Properties ................................................798 Scenario: Modifying the Date variable during the execution of a Job .........................................................798 tSSH ..........................................................................802 tSSH Properties ....................................................802 Scenario: Remote system information display via SSH ...........................................................................802 tSystem ......................................................................804 tSystem Properties ...............................................804 Scenario: Echo Hello World! ............................805

XML components ....................................807tAdvancedFileOutputXML .....................................808 tAdvancedFileOutputXML properties .................808 Defining the XML tree .....................................809 Importing the XML tree ...............................810 Creating manually the XML tree .................811 Mapping XML data ..........................................811 Defining the node status ...................................812 Loop element ...............................................812 Group element ..............................................812 Scenario: Creating an XML file using a loop ......813 tDTDValidator .........................................................818 tDTDValidator Properties ....................................818 Scenario: Validating xml files .............................818 tExtractXMLField ...................................................821 tExtractXMLField properties ...............................821 Scenario: Extract XML data from a field in a database table ...........................................................................822 tFileInputXML .........................................................825 tFileInputXML Properties ....................................825 Scenario: XML street finder ................................826 tFileOutputXML ......................................................828 tFileOutputXML properties .................................828 Scenario: From Positional to XML file ...............829 tWriteXMLField ......................................................830 tWriteXMLField properties .................................830

Talend Open Studio Components

Creative Commons License

Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table ............ 831 tXSDValidator ......................................................... 836 tDTDValidator Properties ................................... 836 Related scenario .................................................. 836

tXSLT .......................................................................837 tXSLT Properties .................................................837 Scenario: Transforming XML to html using an XSL stylesheet ...................................................................837

Creative Commons License

Talend Open Studio Components

xiii

xiv

Talend Open Studio Components

Creative Commons License

PrefacePurposeThis Reference Guide explains in detail the major components you can find in each of the different groups in the Palette of Talend Open Studio. Information presented in this document applies to Talend Open Studio releases beginning with 3.2.x.

AudienceThis guide is for users and administrators of Talend Open Studio.The layout of GUI screens provided in this document may vary slightly from your actual GUI.

Typographical conventionsThis guide uses the following typographical conventions: text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu options, text in [bold]: window, wizard, and dialog box titles, text in courier: system parameters typed in by the user, text in italics: file, schema, column, row, and variable names referred to in all use cases, and also names of the fields in the Basic and Advanced setting views referred to in the property table for each component, In the properties section for every component, the component is available in Java and/or in Perl. or icon indicates whether the

The icon indicates an item that provides additional information about an important point. It is also used to add comments related to a table or a figure, The icon indicates a message that gives information about the execution requirements or recommendation type. It is also used to refer to situations or information the end-user need to be aware of or pay special attention to.

Creative Commons License

Talend Open Studio Components

xv

History of changesThe below table lists the changes made in the 3.x release of the Talend Open Studio Reference Guide.Version v3.0_c Date 17/12/2008 History of Change Updates in Talend Open Studio Reference Guide include: -Updates in the DB input and output component properties regarding the Advanced settings view. -Updates in the DB SCD components regarding the SCD editor. Updates in Talend Open Studio Reference Guide include: -Updates in the *Row component properties regarding the Advanced settings view -Updates in the DB components properties to adjust them to the basic settings view -New components: tAlfrescoOutput, tMysqlTableList, tFileInputEBCDIC, tFileOutputEBCDIC, tFileInputMSDelimited, tFileInputMSXML, tFileInputPositional, tFileOutputPositional, tFileOutputMSXML. -Changes in tMap interface and new tMap scenario No updates in Talend Open Studio Reference Guide. Updates in Talend Open Studio Reference Guide include: -New components in the Business, File, Internet, and XML families. Updates in Talend Open Studio Reference Guide include: -New components in the File, Internet, Processing, Database, Business intelligence and Data quality chapters. -Modifications in the settings and scenarios of many components to match the modifications in the GUI. -Modifications in tSAPInput + new scenario. -Modifications in tRunJob and its scenario.

v3.1_a

30/04/2009

v3.1_b v3.1_c

13/05/2009 18/06/2009

v3.2_a

20/10/2009

Feedback and SupportYour feedback is valuable. Do not hesitate to give your input, make suggestions or requests regarding this documentation or product and find support from the Talend team, on Talends Forum website at: http://talendforge.org/forum

xvi

Talend Open Studio Components

Creative Commons License

Business Intelligence componentsThis chapter details the major components that you can find in Business Intelligence group of the Palette of Talend Open Studio. The BI family groups connectors that cover needs such as reading or writing multidimensional or OLAP databases, outputting Jasper reports, tracking DB changes in slow changing dimension tables and so on.

Business Intelligence componentstDB2SCD

tDB2SCDtDB2SCD propertiesComponent family Databases/DB2

Function Purpose Basic settings

tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table. tDB2SCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Property type Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Table Schema Username and Password Table Schema type and Edit Schema Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: Setting a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: Setting a Repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 13. Select this check box to maximize system performance.

Use memory saving Mode Usage

This component is used as Output component. It requires an Input component and Row main link as input.

2

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstDB2SCD

Related scenariosFor related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 15.

Creative Commons License

Talend Open Studio Components

3

Business Intelligence componentstIngresSCD

tIngresSCDtIngresSCD PropertiesComponent family Databases/Ingress

Function Purpose Basic settings

tIngresSCD reflects and tracks changes in a dedicated Ingres SCD table. tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields to follow are pre-filled in using fetched data. Server Port Database Username and Password Table Schema type and Edit Schema Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: Setting a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: Setting a Repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 13. Select this check box to maximize system performance.

Use memory saving Mode Usage

This component is used as Output component. It requires an Input component and Row main link as input.

4

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstIngresSCD

Related scenarioFor related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 15.

Creative Commons License

Talend Open Studio Components

5

Business Intelligence componentstMondrianInput

tMondrianInputtMondrianInput PropertiesComponent family Business Intelligence/OLAP Cube tMondrianInput reads data from relational databases and produces multidimensional data sets based on an MDX query. tMondrianInput executes a multi-dimensional expression (MDX) query corresponding to the dataset structure and schema definition. Then it passes on the multidimensional dataset obtained to the next component via a Main row link. DB type Property type Select the relevant type of relational database Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Schema type and Edit Schema Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: Setting a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: Setting a Repository schema of Talend Open Studio User Guide. Catalog MDX Query Path to the catalog (structure of the data warehouse). Type in the MDX query paying particularly attention to properly sequence the fields in order to match the schema definition and the data warehouse structure. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Function Purpose

Basic settings

Encoding

Usage

This component covers MDX queries for multi-dimensional datasets.

6

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstMondrianInput

Scenario: Cross-join tablesThis Job extracts multi-dimensional datasets from relational database tables stored in a MySql base. The data are retrieved using a multidimensional expression (MDX query). Obviously you need to have to know the structure of your data, or at least have a structure description (catalog) as a reference for the dataset to be retrieved in the various dimensions.

Drop tMondrianInput and tLogRow from the Palette to the design workspace. Link the Mondrian connector to the output component using a Row Main connection. Select the tMondrianInput component and select the Component view.

In DB type field, select the relational database you are using with Mondrian. Select the relevant Repository entry as Property type, if you store your DB connection details centrally. In this example the properties are built-in. Fill out the details of connection to your DB: Host, Port, Database name, User Name and Password. Select the relevant Schema in the Repository if you store it centrally. In this example, the schema is to be set (built-in).

Creative Commons License

Talend Open Studio Components

7

Business Intelligence componentstMondrianInput

The relational database we want to query contains five columns: media, drink, unit_sales, store_cost and store_sales. The query aims at retrieving the unit_sales, store_cost and store_sales figures for various media / drink using an MDX query such as in the example below:

Back on the Basic settings tab of the tMondrianInput component, set the Catalog path to the data warehouse. This catalog describes the structure of the warehouse. Then type in the MDX query such as: "select {[Measures].[Unit Sales], [Measures].[Store Cost], [Measures].[Store Sales]} on columns, CrossJoin( { [Promotion Media].[All Media].[Radio], [Promotion Media].[All Media].[TV], [Promotion Media].[All Media].[Sunday Paper], [Promotion Media].[All Media].[Street Handout] }, [Product].[All Products].[Drink].children) on rows from Sales where ([Time].[1997])"

8

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstMondrianInput

Eventually, select the Encoding type on the list. Select the tLogRow component and select the Print header check box to display the column names on the console. Then press F6 to run the Job.

The console shows the result of the unit_sales, store_cost and store_sales for each type of Drink (Beverages, Dairy, Alcoholic beverages) crossed with each media (TV, Sunday Paper, Street handout) as shown previously in a table form.

Creative Commons License

Talend Open Studio Components

9

Business Intelligence componentstMSSqlSCD

tMSSqlSCDtMSSqlSCD PropertiesComponent family Databases/MSSQL Server tMSSqlSCD reflects and tracks changes in a dedicated MSSQL SCD table. tMSqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Property type Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Server Port Database Schema Username and Password Table Schema type and Edit Schema Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: Setting a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: Setting a Repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 13. Select this check box to maximize system performance.

Function Purpose Basic settings

Use memory saving Mode Usage

This component is used as Output component. It requires an Input component and Row main link as input.

10

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstMSSqlSCD

Related scenarioFor related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 15.

Creative Commons License

Talend Open Studio Components

11

Business Intelligence componentstMysqlSCD

tMysqlSCDtMysqlSCD PropertiesComponent family Databases/MySQL

Function Purpose Basic settings

tMysqlSCD reflects and tracks changes in a dedicated MySQL SCD table. tMysqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Property type Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Table Schema type and Edit Schema Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: Setting a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: Setting a Repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 13. Select this check box to maximize system performance.

Use memory saving Mode Usage

This component is used as Output component. It requires an Input component and Row main link as input.

12

Talend Open Studio Components

Creative Commons License

Business Intelligence componentstMysqlSCD

SCD management methodologiesSlowly Changing Dimensions (SCD)s are dimensions that have data that slowly changes. The SCD editor offers the simplest method of building the data flow for the SCD outputs. In the SCD editor, you can map columns, select surrogate key columns, and set column change attributes through combining SCD types. The below figure illustrates an example of the SCD editor.

Creative Commons License

Talend Open Studio Components

13

Business Intelligence componentstMysqlSCD

SCD keys You must choose one or more source key columns from the incoming data to ensure its unicity. You must set one surrogate key column in the dimension table and map it to an input column in the source table. The value of the surrogate key links a record in the source to a record in the dimension table. The editor uses this mapping to locate the record in the dimension table and to determine whether a record is new or changing. The surrogate key is typically the primary key in the source, but it can be an alternate key as long as it uniquely identifies a record and its value does not change. Source keys: Drag one or more columns from the Unused panel to the Source keys panel to be used as the key (s) that ensure the unicity of the incoming data. Surrogate keys: Set the column where the generated surrogate key will be stored. A surrogate key can be generated based on a method selected on the Creation list. Creation: Select any of the below methods to be used for the key generation: Auto increment: auto-incremental key. Input field: key is provided in an input field. Routine: you can access the basic functions through Ctrl+ Space bar combination. Table max +1: the maximum value from the SCD table is incremented to create a surrogate key. Combining SCD types The Slowly Changing Dimensions support four types of changes: Type 0 through Type 3. You can apply any of the SCD types to any column in a source table by a simple drag-and-drop operation. Type 0: is not used frequently. Some dimension data may be overwritten and other may stay unchanged over time. This is most appropriate when no effort has been made to deal with the changing dimension issues. Type 1: no history is kept in the database. New data overwrites old data. Use this type if tracking changes is not necessary. this is most appropriate when correcting certain typos, for example the spelling of a name.

14

Talend Open Studio Components

Creative Commons License

Business