TalendOpenStudio Components RG 42a en

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1648

Talend Open Studio Components

4.X

Reference Guide

Intentionally Blank

ii

Talend Open Studio Components

Version 4.2_a
Adapted for Talend Open Studio v4.2.x. Supersedes previous Reference Guide releases.

Copyleft
This documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/

Talend Open Studio Components

iii

iv

Talend Open Studio Components

Talend Open Studio Components Reference Guide ......................... i

Preface .....................................................................xxiii Purpose ...............................................................xxiii Audience .............................................................xxiii Typographical conventions ................................xxiii History of changes .................................................. xxiv Feedback and Support ............................................ xxv

Business components .................................. 1


tAlfrescoOutput ........................................................... 2 tAlfresco Properties ................................................. 2 Installation procedure ........................................... 3 Prerequisites .................................................... 4 Installing the Talend Alfresco module ............ 4 Useful information for advanced use .............. 5 Dematerialization, tAlfrescoOutput, and Enterprise Content Management .................................................... 6 Scenario: Creating documents on an Alfresco server 7 tBonitaDeploy ............................................................ 13 tBonitaDeploy Properties ...................................... 13 Related Scenario .................................................... 14 tBonitaInstantiateProcess ......................................... 15 tBonitaInstantiateProcess Properties ..................... 15 Scenario: Executing a Bonita process via a Talend Job 16 tCentricCRMInput ................................................... 21 tCentricCRMInput Properties ................................ 21 Related Scenario .................................................... 21 tCentricCRMOutput ................................................ 22 tCentricCRMOutput Properties ............................. 22 Related Scenario .................................................... 22 tHL7Input .................................................................. 23 tHL7Input Properties ............................................. 23 Scenario: Retrieving information about patients and events from an HL7 file .............................................. 24 tHL7Output ............................................................... 27 tHL7Output Properties .......................................... 27 Related scenario .................................................... 27 tMarketoInput ........................................................... 28 tMarketoInput Properties ....................................... 28 Related Scenario .................................................... 30 tMarketoOutput ........................................................ 31 tMarketoOutput Properties .................................... 31 Scenario: Data access between Marketo DB and an

external system ............................................................32 tMicrosoftCRMInput ................................................38 tMicrosoftCRMInput Properties ............................38 Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows .............................................................................39 tMicrosoftCRMOutput .............................................46 tMicrosoftCRMOutput Properties .........................46 Related Scenario ....................................................47 tMSAXInput ...............................................................48 tMSAXInput properties .........................................48 Related scenarios ...................................................48 tMSAXOutput ............................................................49 tMSAXOutput properties .......................................49 Scenario 1: Inserting data in a defined table in a MicrosoftAX server ..........................................................50 Scenario 2: Deleting data from a defined table in a MicrosoftAX server .....................................................53 tOpenbravoERPInput ...............................................56 tOpenbravoERPInput properties ...........................56 Related Scenario ...................................................57 tOpenbravoERPOutput ............................................58 tOpenbravoERPOutput properties ........................58 Related scenario ....................................................58 tSageX3Input .............................................................59 tSageX3Input Properties ........................................59 Scenario: Using query key to extract data from a given Sage X3 system .......................................................60 tSageX3Output ...........................................................64 tSageX3Output Properties .....................................64 Scenario: Using a Sage X3 Webservice to insert data into a given Sage X3 system ........................................65 tSalesforceBulkExec ..................................................69 tSalesforceBulkExec Properties .............................69 Related Scenario: ..................................................70 tSalesforceConnection ...............................................71 tSalesforceConnection properties ..........................71 Related scenario .....................................................71 tSalesforceGetDeleted ...............................................72 tSalesforceGetDeleted properties ...........................72 Scenario: Recovering deleted data from the Salesforce server ..................................................................73 tSalesforceGetServerTimestamp ..............................76 tSalesforceGetServerTimestamp properties ...........76 Related scenarios ...................................................77 tSalesforceGetUpdated ..............................................78 tSalesforceGetUpdated properties .........................78 Related scenarios ...................................................79 tSalesforceInput .........................................................80 tSalesforceInput Properties ....................................80 Scenario: Using queries to extract data from a Salesforce database ..............................................................82 tSalesforceOutput ......................................................86 v

Talend Open Studio Components

tSalesforceOutput Properties ................................. 86 Scenario: Deleting data from the Account object . 87 tSalesforceOutputBulk ............................................. 90 tSalesforceOutputBulk Properties ......................... 90 Scenario: Inserting transformed bulk data into your Salesforce.com ............................................................ 90 tSalesforceOutputBulkExec ..................................... 95 tSalesforceOutputBulkExec Properties ................. 95 Scenario: Inserting bulk data into your Salesforce.com ..................................................................... 96 tSAPCommit ............................................................ 100 tSAPCommit Properties ...................................... 100 Related scenario .................................................. 100 tSAPConnection ...................................................... 101 tSAPConnection properties ................................. 101 Related scenarios ................................................. 101 tSAPInput ................................................................ 102 tSAPInput Properties ........................................... 102 Scenario 1: Retrieving metadata from the SAP system 104 Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function ........................... 110 tSAPOutput ............................................................. 116 tSAPOutput Properties ........................................ 116 Related scenario .................................................. 117 tSAPRollback .......................................................... 118 tSAPRollback properties ..................................... 118 Related scenarios ................................................. 118 tSugarCRMInput .................................................... 119 tSugarCRMInput Properties ................................ 119 Scenario: Extracting account data from SugarCRM . 119 tSugarCRMOutput ................................................. 122 tSugarCRMOutput Properties ............................. 122 Related Scenario .................................................. 122 tVtigerCRMInput ................................................... 123 tVtigerCRMInput Properties ............................... 123 Related Scenario .................................................. 124 tVtigerCRMOutput ................................................ 125 tVtigerCRMOutput Properties ............................ 125 Related Scenario .................................................. 126

Business Intelligence components .......... 127


tBarChart ................................................................. 128 tBarChart properties ............................................ 128 Scenario: Creating a bar chart from the input data .... 129 tDB2SCD .................................................................. 135 tDB2SCD properties ............................................ 135 Related scenarios ................................................. 136 tDB2SCDELT .......................................................... 137 tDB2SCDELT Properties .................................... 137 vi

Related Scenario ..................................................139 tGreenplumSCD ......................................................140 tGreenplumSCD Properties .................................140 Related scenario ...................................................141 tInformixSCD ...........................................................142 tInformixSCD properties .....................................142 Related scenario ...................................................143 tIngresSCD ...............................................................144 tIngresSCD Properties .........................................144 Related scenario ...................................................145 tLineChart ................................................................146 tLineChart properties ...........................................146 Scenario: Creating a line chart to ease trend analysis 147 tMondrianInput .......................................................153 tMondrianInput Properties ...................................153 Scenario: Cross-join tables ..................................154 tMSSqlSCD ..............................................................157 tMSSqlSCD Properties ........................................157 Related scenario ...................................................158 tMysqlSCD ...............................................................159 tMysqlSCD Properties .........................................159 SCD management methodologies ....................160 SCD keys .....................................................162 Combining SCD types .................................162 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) .........................163 tMysqlSCDELT .......................................................171 tMysqlSCDELT Properties ..................................171 Related Scenario ..................................................173 tOracleSCD ..............................................................174 tOracleSCD Properties .........................................174 Related scenario ...................................................175 tOracleSCDELT ......................................................176 tOracleSCDELT Properties .................................176 Related Scenario ..................................................178 tPaloCheckElements ................................................179 tPaloCheckElements Properties ...........................179 Related scenario ...................................................180 tPaloConnection .......................................................181 tPaloConnection Properties ..................................181 Related scenario ...................................................181 tPaloCube .................................................................182 tPaloCube Properties ............................................182 Scenario: Creating a cube in an existing database ..... 183 tPaloCubeList ...........................................................186 tPaloCubeList Properties .....................................186 Discovering the read-only output schema of tPaloCubeList ...................................................................187 Scenario: Retrieving detailed cube information from a given database .........................................................188 tPaloDatabase ...........................................................190

Talend Open Studio Components

tPaloDatabase Properties ..................................... 190 Scenario: Creating a database .............................. 191 tPaloDatabaseList ................................................... 193 tPaloDatabaseList Properties ............................... 193 Discovering the read-only output schema of tPaloDatabaseList .................................................................. 194 Scenario: Retrieving detailed database information from a given Palo server ........................................... 195 tPaloDimension ........................................................ 196 tPaloDimension Properties .................................. 196 Scenario: Creating a dimension with elements ... 199 tPaloDimensionList ................................................. 205 tPaloDimensionList Properties ............................ 205 Discovering the read-only output schema of tPaloDimensionList ............................................................... 206 Scenario: Retrieving detailed dimension information from a given database ............................................... 207 tPaloInputMulti ....................................................... 209 tPaloInputMulti Properties .................................. 209 Scenario: Retrieving dimension elements from a given cube ...................................................................... 211 tPaloOutput ............................................................. 215 tPaloOutput Properties ........................................ 215 Related scenario .................................................. 216 tPaloOutputMulti .................................................... 217 tPaloOutputMulti Properties ................................ 217 Scenario 1: Writing data into a given cube ......... 219 Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube ................. 222 tPaloRule .................................................................. 226 tPaloRule Properties ............................................ 226 Scenario: Creating a rule in a given cube ............ 227 tPaloRuleList ........................................................... 230 tPaloRuleList Properties ...................................... 230 Discovering the read-only output schema of tPaloRuleList ................................................................... 231 Scenario: Retrieving detailed rule information from a given cube ................................................................. 232 tParAccelSCD .......................................................... 234 tParAccelSCD Properties .................................... 234 Related scenario .................................................. 235 tPostgresPlusSCD .................................................... 236 tPostgresPlusSCD Properties .............................. 236 Related scenario .................................................. 237 tPostgresPlusSCDELT ............................................ 238 tPostgresPlusSCDELT Properties ....................... 238 Related Scenario .................................................. 240 tPostgresqlSCD ........................................................ 241 tPostgresqlSCD Properties .................................. 241 Related scenario .................................................. 242 tPostgresqlSCDELT ................................................ 243 tPostgresqlSCDELT Properties ........................... 243 Related Scenario .................................................. 245

tSPSSInput ...............................................................246 tSPSSInput properties ..........................................246 Scenario: Displaying the content of an SPSS .sav file 246 tSPSSOutput ............................................................249 tSPSSOutput properties .......................................249 Scenario: Writing data in an .sav file ...................249 tSPSSProperties .......................................................252 tSPSSProperties properties ..................................252 Related scenarios .................................................252 tSPSSStructure ........................................................253 tSPSSStructure properties ....................................253 Related scenarios .................................................253 tSybaseSCD ..............................................................254 tSybaseSCD properties ........................................254 Related scenarios .................................................255 tSybaseSCDELT ......................................................256 tSybaseSCDELT Properties .................................256 Related Scenario ..................................................258

Custom Code components ......................259


tGroovy .....................................................................260 tGroovy Properties ...............................................260 Related Scenarios .................................................260 tGroovyFile ...............................................................261 tGroovyFile Properties .........................................261 Scenario: Calling a file which contains Groovy code 261 tJava ..........................................................................263 tJava Properties ....................................................263 Scenario: Printing out a variable content .............263 tJavaFlex ...................................................................267 tJavaFlex properties .............................................267 Scenario 1: Generating data flow .........................268 Scenario 2: Processing rows of data with tJavaFlex .. 270 tJavaRow ..................................................................273 Proprits du tJavaRow ........................................273 Related scenario ...................................................273 tLibraryLoad ...........................................................274 tLibraryLoad Properties .......................................274 Scenario: Checking the format of an e-mail addressl 274 tSetGlobalVar ..........................................................277 tSetGlobalVar Properties .....................................277 Scenario: Printing out the content of a global variable 277

Data Quality components ........................279


tAddCRCRow ..........................................................280 tAddCRCRow properties .....................................280 vii

Talend Open Studio Components

Scenario: Adding a surrogate key to a file .......... 280 tChangeFileEncoding ............................................. 283 tExtractRegexFields ............................................... 284 tFuzzyMatch ............................................................ 285 tFuzzyMatch properties ....................................... 285 Scenario 1: Levenshtein distance of 0 in first names 286 Scenario 2: Levenshtein distance of 1 or 2 in first names ....................................................................... 288 Scenario 3: Metaphonic distance in first name ... 289 tIntervalMatch ........................................................ 290 tIntervalMatch properties .................................... 290 Scenario: Identifying Ip country (Perl and Java) . 291 tParseAddress .......................................................... 295 tParseAddress properties ..................................... 295 Related scenario .................................................. 296 tParseName .............................................................. 297 tParseName Properties ........................................ 297 Related scenario .................................................. 298 tReplaceList ............................................................. 299 tReplaceList Properties ........................................ 299 Scenario: Replacement from a reference file ...... 300 tSchemaComplianceCheck ..................................... 304 tSchemaComplianceCheck Properties ................ 304 Scenario: Validating data against schema (java) . 305 tUniqRow ................................................................. 310 tUniqRow Properties ........................................... 310 Scenario: Deduplicating entries ........................... 311

Database components ............................. 315


tAccessBulkExec ...................................................... 316 tAccessBulkExec properties ................................ 316 Related scenarios ................................................. 317 tAccessCommit ........................................................ 318 tAccessCommit Properties .................................. 318 Related scenario .................................................. 318 tAccessConnection .................................................. 319 tAccessConnection Properties ............................. 319 Scenario: Inserting data in parent/child tables .... 319 tAccessInput ............................................................ 323 tAccessInput properties ....................................... 323 Related scenarios ................................................. 324 tAccessOutput .......................................................... 325 tAccessOutput properties .................................... 325 Related scenarios ................................................. 327 tAccessOutputBulk ................................................. 329 tAccessOutputBulk properties ............................. 329 Related scenarios ................................................. 330 tAccessOutputBulkExec ......................................... 331 tAccessOutputBulkExec properties ..................... 331 Related scenarios ................................................. 333 tAccessRollback ....................................................... 334 viii

tAccessRollback properties ..................................334 Related scenarios .................................................334 tAccessRow ...............................................................335 tAccessRow properties .........................................335 Related scenarios .................................................337 tAS400Close .............................................................338 tAS400Close properties .......................................338 Related scenario ...................................................338 tAS400Commit .........................................................339 tAS400Commit Properties ...................................339 Related scenario ...................................................339 tAS400Connection ...................................................340 tAS400Connection Properties ..............................340 Related scenario ...................................................341 tAS400Input .............................................................342 tAS400Input properties ........................................342 Related scenarios .................................................343 tAS400LastInsertId .................................................344 tAS400LastInsertId properties .............................344 Related scenario ...................................................344 tAS400Output ..........................................................345 tAS400Output properties .....................................345 Related scenarios .................................................348 tAS400Rollback .......................................................349 tAS400Rollback properties ..................................349 Related scenarios .................................................349 tAS400Row ...............................................................350 tAS400Row properties .........................................350 Related scenarios .................................................352 tCreateTable .............................................................353 tCreateTable Properties ........................................353 Scenario: Creating new table in a Mysql Database ... 355 tDB2BulkExec ..........................................................358 tDB2BulkExec properties ....................................358 Related scenarios .................................................360 361 tDB2Close .................................................................362 tDB2Close properties ...........................................362 Related scenario ...................................................362 tDB2Commit ............................................................363 tDB2Commit Properties .......................................363 Related scenario ...................................................363 tDB2Connection .......................................................364 tDB2Connection properties .................................364 Related scenarios ................................................365 tDB2Input .................................................................366 tDB2Input properties ...........................................366 Related scenarios .................................................367 tDB2Output ..............................................................368 tDB2Output properties .........................................368 Related scenarios .................................................371 tDB2Rollback ...........................................................372

Talend Open Studio Components

tDB2Rollback properties ..................................... 372 Related scenarios ................................................. 372 tDB2Row .................................................................. 373 tDB2Row properties ............................................ 373 Related scenarios ................................................. 374 tDB2SCD .................................................................. 376 tDB2SCDELT .......................................................... 377 tDB2SP ..................................................................... 378 tDB2SP properties ............................................... 378 Related scenarios ................................................. 379 tDBInput .................................................................. 380 tDBInput properties ............................................. 380 Scenario 1: Displaying selected data from DB table . 381 Scenario 2: Using StoreSQLQuery variable ....... 382 tDBOutput ............................................................... 384 tDBOutput properties .......................................... 384 Scenario: Displaying DB output ......................... 386 tDBSQLRow ............................................................ 389 tDBSQLRow properties ...................................... 389 Scenario: Resetting a DB auto-increment ........... 390 tEXAInput ............................................................... 392 tEXAInput properties .......................................... 392 Related scenarios ................................................. 393 tEXAOutput ............................................................ 394 tEXAOutput properties ........................................ 394 Related scenario .................................................. 396 tEXARow ................................................................. 397 tEXARow properties ........................................... 397 Related scenarios ................................................. 398 tEXistConnection .................................................... 399 tEXistConnection properties ............................... 399 Related scenarios ................................................ 399 tEXistDelete ............................................................. 400 tEXistDelete properties ....................................... 400 Related scenario ................................................. 401 tEXistGet .................................................................. 402 tEXistGet properties ............................................ 402 Scenario: Retrieve resources from a remote eXist DB server ......................................................................... 403 tEXistList ................................................................. 406 tEXistList properties ............................................ 406 Related scenario ................................................. 407 tEXistPut .................................................................. 408 tEXistPut properties ............................................ 408 Related scenario ................................................. 409 tEXistXQuery .......................................................... 410 tEXistXQuery properties ..................................... 410 Related scenario ................................................. 411 tEXistXUpdate ........................................................ 412 tEXistXUpdate properties ................................... 412 Related scenario ................................................. 413 tFirebirdClose .......................................................... 414

tFirebirdClose properties .....................................414 Related scenario ...................................................414 tFirebirdCommit .....................................................415 tFirebirdCommit Properties .................................415 Related scenario ...................................................415 tFirebirdConnection ................................................416 tFirebirdConnection properties ............................416 Related scenarios ................................................416 tFirebirdInput ..........................................................418 tFirebirdInput properties ......................................418 Related scenarios .................................................419 tFirebirdOutput .......................................................420 tFirebirdOutput properties ...................................420 Related scenarios .................................................422 tFirebirdRollback ....................................................423 tFirebirdRollback properties ...............................423 Related scenario ...................................................423 tFirebirdRow ............................................................424 tFirebirdRow properties .......................................424 Related scenarios .................................................425 tGreenplumBulkExec ..............................................427 tGreenplumBulkExec Properties .........................427 Related scenarios .................................................429 tGreenplumClose .....................................................430 tGreenplumClose properties ................................430 Related scenario ...................................................430 tGreenplumCommit ................................................431 tGreenplumCommit Properties ............................431 Related scenario ...................................................431 tGreenplumConnection ...........................................432 tGreenplumConnection properties .......................432 Related scenarios ................................................433 tGreenplumInput .....................................................434 tGreenplumInput properties .................................434 Related scenarios .................................................435 tGreenplumOutput ..................................................436 tGreenplumOutput Properties ..............................436 Related scenarios .................................................438 tGreenplumOutputBulk ..........................................439 tGreenplumOutputBulk properties .......................439 Related scenarios .................................................440 tGreenplumOutputBulkExec .................................441 tGreenplumOutputBulkExec properties ...............441 Related scenarios .................................................442 tGreenplumRollback ...............................................443 tGreenplumRollback properties ...........................443 Related scenarios .................................................443 tGreenplumRow .......................................................444 tGreenplumRow Properties ..................................444 Related scenarios .................................................446 tGreenplumSCD ......................................................447 tHiveClose .................................................................448 tHiveClose properties ...........................................448 ix

Talend Open Studio Components

Related scenario .................................................. 448 tHiveConnection ...................................................... 449 tHiveConnection properties ................................. 449 Related scenario .................................................. 449 tHiveRow .................................................................. 450 tHiveRow properties ............................................ 450 Related scenarios ................................................. 451 tHSQLDbInput ........................................................ 452 tHSQLDbInput properties ................................... 452 Related scenarios ................................................. 454 tHSQLDbOutput ..................................................... 455 tHSQLDbOutput properties ................................ 455 Related scenarios ................................................. 458 tHSQLDbRow ......................................................... 459 tHSQLDbRow properties .................................... 459 Related scenarios ................................................. 461 tInformixBulkExec .................................................. 462 tInformixBulkExec Properties ............................. 462 Related scenario .................................................. 464 tInformixClose ......................................................... 465 tInformixClose properties .................................... 465 Related scenario .................................................. 465 tInformixCommit .................................................... 466 tInformixCommit properties ................................ 466 Related Scenario .................................................. 466 tInformixConnection .............................................. 467 tInformixConnection properties .......................... 467 Related scenario .................................................. 468 tInformixInput ........................................................ 469 tInformixInput properties .................................... 469 Related scenarios ................................................. 470 tInformixOutput ...................................................... 471 tInformixOutput properties .................................. 471 Related scenarios ................................................. 473 tInformixOutputBulk ............................................. 474 tInformixOutputBulk properties .......................... 474 Related scenario .................................................. 475 tInformixOutputBulkExec ..................................... 476 tInformixOutputBulkExec properties .................. 476 Related scenario .................................................. 478 tInformixRollback ................................................... 479 tInformixRollback properties .............................. 479 Related Scenario .................................................. 479 tInformixRow .......................................................... 480 tInformixRow properties ..................................... 480 Related scenarios ................................................. 482 tInformixSCD .......................................................... 483 tInformixSP ............................................................. 484 tInformixSP properties ........................................ 484 Related scenario .................................................. 485 tIngresClose ............................................................. 487 tIngresClose properties ........................................ 487 Related scenario .................................................. 487 x

tIngresCommit .........................................................488 tIngresCommit Properties ....................................488 Related scenario ...................................................488 tIngresConnection ...................................................489 tIngresConnection Properties ...............................489 Related scenarios .................................................489 tIngresInput .............................................................490 tIngresInput properties .........................................490 Related scenarios .................................................491 tIngresOutput ...........................................................492 tIngresOutput properties ......................................492 Related scenarios .................................................494 tIngresRollback ........................................................495 tIngresRollback properties ...................................495 Related scenarios .................................................495 tIngresRow ...............................................................496 tIngresRow properties ..........................................496 Related scenarios .................................................497 tIngresSCD ...............................................................498 tInterbaseClose ........................................................499 tInterbaseClose properties ....................................499 Related scenario ...................................................499 tInterbaseCommit ....................................................500 tInterbaseCommit Properties ...............................500 Related scenario ...................................................500 tInterbaseConnection ..............................................501 tInterbaseConnection properties ..........................501 Related scenarios ................................................501 tInterbaseInput ........................................................503 tInterbaseInput properties ....................................503 Related scenarios .................................................504 tInterbaseOutput .....................................................505 tInterbaseOutput properties ..................................505 Related scenarios .................................................507 tInterbaseRollback ..................................................508 tInterbaseRollback properties ..............................508 Related scenarios .................................................508 tInterbaseRow ..........................................................509 tInterbaseRow properties .....................................509 Related scenarios .................................................510 tJavaDBInput ...........................................................512 tJavaDBInput properties ......................................512 Related scenarios .................................................513 tJavaDBOutput ........................................................514 tJavaDBOutput properties ....................................514 Related scenarios .................................................516 tJavaDBRow .............................................................517 tJavaDBRow properties .......................................517 Related scenarios .................................................518 tJDBCColumnList ...................................................519 tJDBCColumnList Properties ..............................519 Related scenario ...................................................519 tJDBCClose ..............................................................520

Talend Open Studio Components

tJDBCClose properties ........................................ 520 Related scenario .................................................. 520 tJDBCCommit ......................................................... 521 tJDBCCommit Properties .................................... 521 Related scenario .................................................. 521 tJDBCConnection ................................................... 522 tJDBCConnection Properties .............................. 522 Related scenario .................................................. 523 tJDBCInput ............................................................. 524 tJDBCInput properties ......................................... 524 Related scenarios ................................................. 525 tJDBCOutput .......................................................... 526 tJDBCOutput properties ...................................... 526 Related scenarios ................................................. 528 tJDBCRollback ........................................................ 529 tJDBCRollback properties ................................... 529 Related scenario .................................................. 529 tJDBCRow ............................................................... 530 tJDBCRow properties .......................................... 530 Related scenarios ................................................. 531 tJDBCSP .................................................................. 533 tJDBCSP Properties ............................................ 533 Related scenario .................................................. 534 tJDBCTableList ...................................................... 535 tJDBCTableList Properties .................................. 535 Related scenario .................................................. 535 tLDAPAttributesInput ........................................... 536 tLDAPAttributesInput Properties ........................ 536 Related scenario .................................................. 538 tLDAPInput ............................................................. 539 tLDAPInput Properties ........................................ 539 Scenario: Displaying LDAP directorys filtered content ............................................................................. 540 tLDAPOutput .......................................................... 543 tLDAPOutput Properties ..................................... 543 Scenario: Editing data in a LDAP directory ........ 544 tLDAPRenameEntry .............................................. 547 tLDAPRenameEntry properties .......................... 547 Related scenarios ................................................. 548 tMaxDBInput .......................................................... 549 tMaxDBInput properties ...................................... 549 Related scenario .................................................. 550 tMaxDBOutput ........................................................ 551 tMaxDBOutput properties ................................... 551 Related scenario .................................................. 553 tMaxDBRow ............................................................ 554 tMaxDBRow properties ...................................... 554 Related scenario .................................................. 555 tMSSqlBulkExec ..................................................... 556 tMSSqlBulkExec properties ................................ 556 Related scenarios ................................................. 558 559 tMSSqlColumnList ................................................. 560

tMSSqlColumnList Properties .............................560 Related scenario ...................................................560 tMSSqlClose .............................................................561 tMSSqlClose properties .......................................561 Related scenario ...................................................561 tMSSqlCommit ........................................................562 tMSSqlCommit properties ...................................562 Related scenarios .................................................562 tMSSqlConnection ...................................................563 tMSSqlConnection properties ..............................563 Related scenarios .................................................564 tMSSqlInput .............................................................565 tMSSqlInput properties ........................................565 Related scenarios .................................................566 tMSSqlLastInsertId .................................................567 tMSSqlLastInsertId properties .............................567 Related scenario ...................................................567 tMSSqlOutput ..........................................................568 tMSSqlOutput properties .....................................568 Related scenarios .................................................571 tMSSqlOutputBulk ..................................................572 tMSSqlOutputBulk properties .............................572 Related scenarios .................................................573 tMSSqlOutputBulkExec .........................................574 tMSSqlOutputBulkExec properties .....................574 Related scenarios .................................................576 tMSSqlRollback .......................................................577 tMSSqlRollback properties ..................................577 Related scenario ...................................................577 tMSSqlRow ...............................................................578 tMSSqlRow properties .........................................578 Related scenarios .................................................580 tMSSqlSCD ..............................................................581 tMSSqlSP ..................................................................582 tMSSqlSP Properties ............................................582 Related scenario ...................................................583 tMSSqlTableList ......................................................584 tMSSqlTableList Properties .................................584 Related scenario ...................................................584 tMysqlBulkExec .......................................................585 tMysqlBulkExec properties .................................585 Related scenarios .................................................587 tMysqlClose ..............................................................588 tMysqlClose properties ........................................588 Related scenario ...................................................588 tMysqlColumnList ...................................................589 tMysqlColumnList Properties ..............................589 Scenario: Iterating on a DB table and listing its column names .................................................................589 tMysqlCommit .........................................................593 tMysqlCommit Properties ....................................593 Related scenario ...................................................593 tMysqlConnection ....................................................594 xi

Talend Open Studio Components

tMysqlConnection Properties .............................. 594 Scenario: Inserting data in mother/daughter tables ... 594 tMysqlInput ............................................................. 598 tMysqlInput properties ........................................ 598 Scenario: Writing dynamic columns from a MySQL database to an output file .......................................... 599 tMysqlLastInsertId ................................................. 604 tMysqlLastInsertId properties ............................. 604 Scenario: Get the ID for the last inserted record . 604 tMysqlOutput .......................................................... 609 tMysqlOutput properties ...................................... 609 Scenario 1: Adding a new column and altering data in a DB table .................................................................. 613 Scenario 2: Updating data in a database table ..... 618 Scenario 3: Retrieve data in error with a Reject link 621 tMysqlOutputBulk .................................................. 627 tMysqlOutputBulk properties .............................. 627 Scenario: Inserting transformed data in MySQL database ............................................................................ 628 tMysqlOutputBulkExec .......................................... 632 tMysqlOutputBulkExec properties ...................... 632 Scenario: Inserting data in MySQL database ...... 634 tMysqlRollback ....................................................... 636 tMysqlRollback properties .................................. 636 Scenario: Rollback from inserting data in mother/daughter tables ...................................................... 636 tMysqlRow ............................................................... 638 tMysqlRow properties ......................................... 638 Scenario 1: Removing and regenerating a MySQL table index .................................................................... 640 Scenario 2: Using PreparedStatement objects to query data ............................................................................ 641 tMysqlSCD ............................................................... 647 tMysqlSCDELT ....................................................... 648 tMysqlSP .................................................................. 649 tMysqlSP Properties ............................................ 649 Scenario: Finding a State Label using a stored procedure ............................................................................ 650 tMysqlTableList ...................................................... 653 tMysqlTableList Properties ................................. 653 Scenario: Iterating on DB tables and deleting their content using a user-defined SQL template .............. 653 Related scenario .................................................. 657 tNetezzaBulkExec .................................................... 658 tNetezzaBulkExec properties .............................. 658 Related scenarios ................................................. 659 tNetezzaClose ........................................................... 660 tNetezzaClose properties ..................................... 660 Related scenario .................................................. 660 tNetezzaCommit ...................................................... 661 tNetezzaCommit Properties ................................. 661 xii

Related scenario ...................................................661 tNetezzaConnection .................................................662 tNetezzaConnection Properties ............................662 Related scenarios .................................................662 tNetezzaInput ...........................................................663 tNetezzaInput properties ......................................663 Related scenarios .................................................664 tNetezzaNzLoad .......................................................665 tNetezzaNzLoad properties ..................................665 Loading DATE, TIME and TIMESTAMP columns 670 Related scenario ...................................................670 tNetezzaOutput ........................................................671 tNetezzaOutput properties ...................................671 Related scenarios .................................................674 tNetezzaRollback .....................................................675 tNetezzaRollback properties ................................675 Related scenarios .................................................675 tNetezzaRow .............................................................676 tNetezzaRow properties .......................................676 Related scenarios .................................................678 tOracleBulkExec ......................................................679 tOracleBulkExec properties .................................679 Scenario: Truncating and inserting file data into Oracle DB ........................................................................682 tOracleClose .............................................................685 tOracleClose properties ........................................685 Related scenario ...................................................685 tOracleCommit ........................................................686 tOracleCommit Properties ...................................686 Related scenario ...................................................686 tOracleConnection ...................................................687 tOracleConnection Properties ..............................687 Related scenario ...................................................688 tOracleInput .............................................................689 tOracleInput properties ........................................689 Related scenarios .................................................690 tOracleOutput ..........................................................691 tOracleOutput properties ......................................691 Related scenarios .................................................694 tOracleOutputBulk ..................................................695 tOracleOutputBulk properties ..............................695 Related scenarios .................................................696 tOracleOutputBulkExec .........................................697 tOracleOutputBulkExec properties ......................697 Related scenarios .................................................699 tOracleRollback .......................................................701 tOracleRollback properties ..................................701 Related scenario ...................................................701 tOracleRow ...............................................................702 tOracleRow properties .........................................702 Related scenarios .................................................704 tOracleSCD ..............................................................705

Talend Open Studio Components

tOracleSCDELT ...................................................... 706 tOracleSP ................................................................. 707 tOracleSP Properties ............................................ 707 Scenario: Checking number format using a stored procedure ................................................................... 709 tOracleTableList ..................................................... 713 tOracleTableList properties ................................. 713 Related scenarios ................................................. 713 tParAccelBulkExec ................................................. 714 tParAccelBulkExec Properties ............................ 714 Related scenarios ................................................. 716 tParAccelClose ........................................................ 717 tParAccelClose properties ................................... 717 Related scenario .................................................. 717 tParAccelCommit .................................................... 718 tParAccelCommit Properties ............................... 718 Related scenario .................................................. 718 tParAccelConnection .............................................. 719 tParAccelConnection Properties .......................... 719 Related scenario .................................................. 720 tParAccelInput ........................................................ 721 tParAccelInput properties .................................... 721 Related scenarios ................................................. 722 tParAccelOutput ..................................................... 723 tParAccelOutput Properties ................................. 723 Related scenarios ................................................. 725 tParAccelOutputBulk ............................................. 726 tParAccelOutputBulk properties ......................... 726 Related scenarios ................................................. 727 tParAccelOutputBulkExec ..................................... 728 tParAccelOutputBulkExec Properties ................. 728 Related scenarios ................................................. 729 tParAccelRollback .................................................. 730 tParAccelRollback properties .............................. 730 Related scenario .................................................. 730 tParAccelRow .......................................................... 731 tParAccelRow Properties .................................... 731 Related scenarios ................................................. 733 tParAccelSCD .......................................................... 734 tParseRecordSet ...................................................... 735 tParseRecordSet properties .................................. 735 Related Scenario .................................................. 735 tPostgresPlusBulkExec ........................................... 736 tPostgresPlusBulkExec properties ....................... 736 Related scenarios ................................................. 737 tPostgresPlusClose .................................................. 738 tPostgresPlusClose properties ............................. 738 Related scenario .................................................. 738 tPostgresPlusCommit .............................................. 739 tPostgresPlusCommit Properties ......................... 739 Related scenario .................................................. 739 tPostgresPlusConnection ........................................ 740 tPostgresPlusConnection Properties .................... 740

Related scenario ...................................................741 tPostgresPlusInput ...................................................742 tPostgresPlusInput properties ...............................742 Related scenarios .................................................743 tPostgresPlusOutput ................................................744 tPostgresPlusOutput properties ............................744 Related scenarios .................................................747 tPostgresPlusOutputBulk .......................................748 tPostgresPlusOutputBulk properties ....................748 Related scenarios .................................................749 tPostgresPlusOutputBulkExec ...............................750 tPostgresPlusOutputBulkExec properties ............750 Related scenarios .................................................751 tPostgresPlusRollback .............................................752 tPostgresPlusRollback properties .........................752 Related scenarios .................................................752 tPostgresPlusRow ....................................................753 tPostgresPlusRow properties ...............................753 Related scenarios .................................................755 tPostgresPlusSCD ....................................................756 tPostgresPlusSCDELT ............................................757 tPostgresqlBulkExec ................................................758 tPostgresqlBulkExec properties ...........................758 Related scenarios .................................................760 tPostgresqlCommit ..................................................761 tPostgresqlCommit Properties .............................761 Related scenario ...................................................761 tPostgresqlClose .......................................................762 tPostgresqlClose properties ..................................762 Related scenario ...................................................762 tPostgresqlConnection .............................................763 tPostgresqlConnection Properties ........................763 Related scenario ...................................................763 tPostgresqlInput .......................................................764 tPostgresqlInput properties ..................................764 Related scenarios .................................................765 tPostgresqlOutput ....................................................766 tPostgresqlOutput properties ................................766 Related scenarios .................................................768 tPostgresqlOutputBulk ...........................................769 tPostgresqlOutputBulk properties ........................769 Related scenarios .................................................770 tPostgresqlOutputBulkExec ...................................771 tPostgresqlOutputBulkExec properties ................771 Related scenarios .................................................773 tPostgresqlRollback .................................................774 tPostgresqlRollback properties ............................774 Related scenario ...................................................774 tPostgresqlRow ........................................................775 tPostgresqlRow properties ...................................775 Related scenarios .................................................777 tPostgresqlSCD ........................................................778 tPostgresqlSCDELT ................................................779 xiii

Talend Open Studio Components

tSASInput ................................................................ 780 tSASInput properties ........................................... 780 Related scenarios ................................................. 781 tSASOutput .............................................................. 782 tSASOutput properties ........................................ 782 Related scenarios ................................................. 784 tSQLiteClose ............................................................ 785 tSQLiteClose properties ...................................... 785 Related scenario .................................................. 785 tSQLiteCommit ....................................................... 786 tSQLiteCommit Properties .................................. 786 Related scenario .................................................. 786 tSQLiteConnection .................................................. 787 SQLiteConnection properties .............................. 787 Related scenarios ................................................ 787 tSQLiteInput ............................................................ 788 tSQLiteInput Properties ....................................... 788 Scenario: Filtering SQlite data ............................ 789 tSQLiteOutput ......................................................... 792 tSQLiteOutput Properties .................................... 792 Related Scenario .................................................. 794 tSQLiteRollback ...................................................... 795 tSQLiteRollback properties ................................. 795 Related scenarios ................................................. 795 tSQLiteRow ............................................................. 796 tSQLiteRow Properties ........................................ 796 Scenario: Updating SQLite rows ......................... 797 tSybaseBulkExec ..................................................... 800 tSybaseBulkExec Properties ................................ 800 Related scenarios ................................................. 802 tSybaseClose ............................................................ 803 tSybaseClose properties ...................................... 803 Related scenario .................................................. 803 tSybaseCommit ........................................................ 804 tSybaseCommit Properties .................................. 804 Related scenario .................................................. 804 tSybaseConnection .................................................. 805 tSybaseConnection Properties ............................. 805 Related scenarios ................................................. 805 tSybaseInput ............................................................ 806 tSybaseInput Properties ....................................... 806 Related scenarios ................................................. 807 tSybaseIQBulkExec ................................................ 808 tSybaseIQBulkExec Properties ........................... 808 Related scenarios ................................................. 809 tSybaseIQOutputBulkExec .................................... 810 tSybaseIQOutputBulkExec properties ................ 810 Related scenarios ................................................. 811 tSybaseOutput ......................................................... 813 tSybaseOutput Properties .................................... 813 Related scenarios ................................................. 816 tSybaseOutputBulk ................................................. 817 tSybaseOutputBulk properties ............................. 817 xiv

Related scenarios .................................................818 tSybaseOutputBulkExec .........................................819 tSybaseOutputBulkExec properties .....................819 Related scenarios .................................................821 tSybaseRollback .......................................................822 tSybaseRollback properties ..................................822 Related scenarios .................................................822 tSybaseRow ..............................................................823 tSybaseRow Properties ........................................823 Related scenarios .................................................825 tSybaseSCD ..............................................................826 tSybaseSCDELT ......................................................827 tSybaseSP .................................................................828 tSybaseSP properties ............................................828 Related scenarios .................................................829 tTeradataClose .........................................................830 tTeradataClose properties ....................................830 Related scenario ...................................................830 tTeradataCommit ....................................................831 tTeradataCommit Properties ...............................831 Related scenario ..................................................831 tTeradataConnection ...............................................832 tTeradataConnection Properties ..........................832 Related scenario ..................................................833 tTeradataFastExport ...............................................834 tTeradataFastExport Properties ............................834 Related scenario ...................................................835 tTeradataFastLoad ..................................................836 tTeradataFastLoad Properties ..............................836 Related scenario ...................................................837 tTeradataFastLoadUtility .......................................838 tTeradataFastLoadUtility Properties ....................838 Related scenario ...................................................839 tTeradataInput .........................................................840 tTeradataInput Properties .....................................840 Related scenarios .................................................841 tTeradataMultiLoad ................................................842 tTeradataMultiLoad Properties ............................842 Related scenario ...................................................843 tTeradataOutput ......................................................844 tTeradataOutput Properties ..................................844 Related scenarios .................................................847 tTeradataRollback ...................................................848 tTeradataRollback Properties ..............................848 Related scenario ..................................................848 tTeradataRow ..........................................................849 tTeradataRow Properties ......................................849 Related scenarios .................................................851 tTeradataTPump .....................................................852 tTeradataTPump Properties .................................852 Scenario: Inserting data into a Teradata database table 854 tVectorWiseCommit ................................................858

Talend Open Studio Components

tVectorWiseCommit Properties .......................... 858 Related scenario .................................................. 858 tVectorWiseConnection .......................................... 859 tVectorWiseConnection Properties ..................... 859 Related scenario .................................................. 859 tVectorWiseInput .................................................... 861 tVectorWiseInput Properties ............................... 861 Related scenario .................................................. 862 tVectorWiseOutput ................................................. 863 tVectorWiseOutput Properties ............................ 863 Related scenario .................................................. 865 tVectorWiseRollback .............................................. 866 tVectorWiseRollback Properties ......................... 866 Related scenario ................................................. 866 tVectorWiseRow ...................................................... 867 tVectorWiseRow Properties ................................ 867 Related scenario .................................................. 869 tVerticaBulkExec .................................................... 870 tVerticaBulkExec Properties ............................... 870 Related scenarios ................................................ 871 tVerticaClose ........................................................... 873 tVerticaClose properties ...................................... 873 Related scenario .................................................. 873 tVerticaCommit ....................................................... 874 tVerticaCommit Properties ................................. 874 Related scenario ................................................. 874 tVerticaConnection ................................................. 875 tVerticaConnection Properties ........................... 875 Related scenario ................................................. 875 tVerticaInput ........................................................... 877 tVerticaInput Properties ..................................... 877 Related scenarios ................................................ 878 tVerticaOutput ........................................................ 879 tVerticaOutput Properties ................................... 879 Related scenarios ................................................. 882 tVerticaOutputBulk ................................................ 883 tVerticaOutputBulk Properties ........................... 883 Related scenarios ................................................ 884 tVerticaOutputBulkExec ........................................ 885 tVerticaOutputBulkExec Properties ................... 885 Related scenarios ................................................ 886 tVerticaRollback ..................................................... 887 tVerticaRollback Properties ............................... 887 Related scenario ................................................. 887 tVerticaRow ............................................................. 888 tVerticaRow Properties ...................................... 888 Related scenario ................................................. 890

ELT components ..................................... 891


tELTJDBCInput ..................................................... 892 tELTJDBCInput properties ................................. 892 Related scenarios ................................................. 892

tELTJDBCMap .......................................................894 tELTJDBCMap properties ...................................894 Related scenario: ..................................................895 tELTJDBCOutput ...................................................896 tELTJDBCOutput properties ...............................896 Related scenarios .................................................897 tELTMSSqlInput .....................................................898 tELTMSSqlInput properties .................................898 Related scenarios .................................................898 tELTMSSqlMap ......................................................900 tELTMSSqlMap properties ..................................900 Related scenario: ..................................................901 tELTMSSqlOutput ..................................................902 tELTMSSqlOutput properties ..............................902 Related scenarios .................................................903 tELTMysqlInput ......................................................904 tELTMysqlInput properties .................................904 Related scenarios .................................................904 tELTMysqlMap .......................................................905 tELTMysqlMap properties ...................................905 Connecting ELT components ...........................906 Mapping and joining tables ..............................906 Adding where clauses .......................................907 Generating the SQL statement ..........................907 Scenario 1: Aggregating table columns and filtering 907 Scenario 2: ELT using an Alias table ..................911 tELTMysqlOutput ...................................................916 tELTMysqlOutput properties ...............................916 Related scenarios .................................................917 tELTOracleInput .....................................................918 tELTOracleInput properties .................................918 Related scenarios .................................................919 tELTOracleMap ......................................................920 tELTOracleMap properties ..................................920 Connecting ELT components ...........................921 Mapping and joining tables ..............................921 Adding where clauses .......................................922 Generating the SQL statement ..........................922 Scenario: Updating Oracle DB entries .................922 tELTOracleOutput ..................................................925 tELTOracleOutput properties ..............................925 Scenario: Using the Oracle MERGE function to update and add data simultaneously ..............................927 tELTPostgresqlInput ...............................................931 tELTPostgresqlInput properties ...........................931 Related scenarios .................................................931 tELTPostgresqlMap ................................................933 tELTPostgresqlMap properties ............................933 Related scenario: ..................................................934 tELTPostgresqlOutput ............................................935 tELTPostgresqlOutput properties ........................935 Related scenarios .................................................936 xv

Talend Open Studio Components

tELTSybaseInput .................................................... 937 tELTSybaseInput properties ................................ 937 Related scenarios ................................................. 937 tELTSybaseMap ...................................................... 939 tELTSybaseMap properties ................................. 939 Related scenarios ................................................. 940 tELTSybaseOutput ................................................. 941 tELTSybaseOutput properties ............................. 941 Related scenarios ................................................. 942 tELTTeradataInput ................................................ 943 tELTTeradataInput properties ............................. 943 Related scenarios ................................................. 943 tELTTeradataMap .................................................. 944 tELTTeradataMap properties .............................. 944 Connecting ELT components ........................... 945 Mapping and joining tables .............................. 945 Adding WHERE clauses .................................. 945 Generating the SQL statement ......................... 945 Related scenarios ................................................. 945 tELTTeradataOutput ............................................. 946 tELTTeradataOutput properties .......................... 946 Related scenarios ................................................. 947 tSQLTemplateAggregate ........................................ 948 tSQLTemplateAggregate properties .................... 948 Scenario: Filtering and aggregating table columns directly on the DBMS .................................................. 949 tSQLTemplateCommit ........................................... 955 tSQLTemplateCommit properties ....................... 955 Related scenario .................................................. 956 tSQLTemplateFilterColumns ................................ 957 tSQLTemplateFilterColumns Properties ............. 957 Related Scenario .................................................. 958 tSQLTemplateFilterRows ...................................... 959 tSQLTemplateFilterRows Properties .................. 959 Related Scenario .................................................. 960 tSQLTemplateMerge .............................................. 961 tSQLTemplateMerge properties .......................... 961 Scenario: Merging data directly on the DBMS ... 963 tSQLTemplateRollback .......................................... 970 tSQLTemplateRollback properties ..................... 970 Related scenarios ................................................. 971

tESBProviderRequest .............................................995 tESBProviderRequest properties .........................995 Scenario: Service sends a message without expecting a response ..................................................................995 Setting up a Provider Job ..................................995 Setting up a Consumer Job .............................1000 Run the Scenario .............................................1006 tESBProviderResponse .........................................1008 tESBProviderResponse properties .....................1008 Scenario: Return Hello world response .............1008 Setting up a Provider Job ................................1008 Setting up a Consumer Job .............................1013 Run the Scenario .............................................1020

File components .....................................1021


tAdvancedFileOutputXML ...................................1022 tApacheLogInput ...................................................1023 tApacheLogInput properties ..............................1023 Scenario: Reading an Apache access-log file ....1024 tCreateTemporaryFile ..........................................1026 tCreateTemporaryFile properties .......................1026 Scenario: Creating a temporary file and writing data in it ...........................................................................1027 tChangeFileEncoding ............................................1031 tChangeFileEncoding Properties .......................1031 Scenario: Transforming the character encoding of a file. ...........................................................................1031 tFileArchive ............................................................1033 tFileArchive properties ......................................1033 Scenario: Zip files using a tFileArchive ............1034 tFileCompare .........................................................1036 tFileCompare properties .....................................1036 Scenario: Comparing unzipped files ..................1037 tFileCopy ................................................................1039 tFileCopy Properties ..........................................1039 Scenario: Restoring files from bin .....................1040 tFileDelete ...............................................................1042 tFileDelete Properties .........................................1042 Scenario: Deleting files ......................................1042 tFileExist .................................................................1045 tFileExist Properties ...........................................1045 Scenario: Checking for the presence of a file and creating it if it does not exist ........................................1046 tFileInputARFF .....................................................1050 tFileInputARFF properties .................................1050 Scenario: Display the content of a ARFF file ....1051 tFileInputDelimited .............................................1054 tFileInputDelimited properties ...........................1054 Scenario: Delimited file content display ............1055 Scenario 2: Reading data from a remote file in streaming mode ..................................................................1057 tFileInputEBCDIC ................................................1060

ESB components ..................................... 973


tESBConsumer ........................................................ 974 tESBConsumer Properties ................................... 974 Scenario: Return valid email .............................. 976 tESBProviderFault .................................................. 983 tESBProviderFault properties .............................. 983 Scenario: Return Fault message .......................... 983 Setting up a Provider Job ................................. 983 Setting up a Consumer Job .............................. 988 Run the Scenario .............................................. 994 xvi

Talend Open Studio Components

tFileInputEBCDIC properties ............................ 1060 Scenario: Extracting data from an EBCDIC file and populating a database .............................................. 1060 tFileInputExcel ...................................................... 1066 tFileInputExcel properties ................................ 1066 Related scenarios ............................................... 1068 tFileInputFullRow ................................................. 1069 tFileInputFull Row properties ........................... 1069 Scenario: Reading full rows in a delimited file . 1070 tFileInputJSON ..................................................... 1072 tFileInputJSON properties ................................ 1072 Scenario: Extracting data from the fields of a JSON format file ................................................................ 1073 tFileInputLDIF ...................................................... 1076 tFileInputLDIF Properties ................................. 1076 Related scenario ................................................ 1077 tFileInputMail ....................................................... 1078 tFileInputMail properties ................................... 1078 Scenario: Extracting key fields from an email .. 1079 tFileInputMSDelimited ......................................... 1081 tFileInputMSDelimited properties .................... 1081 The Multi Schema Editor ............................... 1081 Scenario: Reading a multi structure delimited file .... 1083 tFileInputMSPositional ........................................ 1088 tFileInputMSPositional properties .................... 1088 Related scenario ................................................ 1089 tFileInputMSXML ................................................ 1090 tFileInputMSXML Properties ........................... 1090 Scenario: Reading a multi structure XML file .. 1091 tFileInputPositional .............................................. 1094 tFileInputPositional properties .......................... 1094 Scenario: From Positional to XML file ............. 1095 tFileInputProperties .............................................. 1099 tFileInputProperties properties .......................... 1099 Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary ............................................................ 1099 tFileInputRegex ..................................................... 1103 tFileInputRegex properties ................................ 1103 Scenario: Regex to Positional file ..................... 1104 tFileInputXML ...................................................... 1107 tFileList .................................................................. 1108 tFileList properties ............................................ 1108 Scenario: Iterating on a file directory ................ 1110 tFileOutputARFF .................................................. 1113 tFileOutputARFF properties .............................. 1113 Related scenarios ............................................... 1114 tFileOutputDelimited ............................................ 1115 tFileOutputDelimited properties ........................ 1115 Scenario: Writing data in a delimited file ......... 1116 tFileOutputEBCDIC ............................................. 1121 tFileOutputEBCDIC properties ......................... 1121

Scenario: Creating an EBCDIC file using two delimited files ...................................................................1121 tFileOutputExcel ....................................................1124 tFileOutputExcel Properties ...............................1124 Related scenario .................................................1125 tFileOutputJSON ..................................................1126 tFileOutputJSON properties ..............................1126 Scenario: Writing a JSON structured file ..........1126 tFileOutputLDIF ...................................................1130 tFileOutputLDIF Properties ...............................1130 Scenario: Writing DB data into an LDIF-type file .... 1131 tFileOutputMSDelimited ......................................1134 tFileOutputMSDelimited properties .................1134 Related scenarios ...............................................1135 tFileOutputMSPositional ......................................1136 tFileOutputMSPositional properties ..................1136 Related scenario .................................................1136 tFileOutputMSXML ..............................................1137 tFileOutputMSXML Properties .........................1137 Defining the MultiSchema XML tree .............1137 Importing the XML tree .............................1138 Creating manually the XML tree ...............1140 Mapping XML data from multiple schema sources 1140 Defining the node status .................................1141 Loop element .............................................1141 Group element ............................................1141 Related scenario .................................................1142 tFileOutputPositional ............................................1143 tFileOutputPositional Properties ........................1143 Related scenario ................................................1144 tFileOutputProperties ...........................................1145 tFileOutputProperties properties ........................1145 Related scenarios ...............................................1145 tFileOutputXML ....................................................1146 tFileProperties ........................................................1147 tFileProperties Properties ...................................1147 Scenario: Displaying the properties of a processed file 1148 tFileRowCount .......................................................1150 tFileRowCount properties ..................................1150 Related scenario .................................................1151 tFileTouch ...............................................................1152 tFileTouch properties .........................................1152 Related scenario ................................................1152 tFileUnarchive ........................................................1153 tFileUnarchive Properties ..................................1153 Related scenario .................................................1154 tGPGDecrypt .........................................................1155 tGPGDecrypt Properties ....................................1155 Scenario: Decrypt a GnuPG-encrypted file and display its content .........................................................1155 xvii

Talend Open Studio Components

tPivotToColumnsDelimited .................................. 1159 tPivotToColumnsDelimited Properties ............. 1159 Scenario: Using a pivot column to aggregate data .... 1159

Internet components ............................. 1163


tFileFetch ............................................................... 1164 tFileFetch properties .......................................... 1164 Scenario 1: Fetching data through HTTP .......... 1165 Scenario 2: Reusing stored cookie to fetch files through HTTP ......................................................... 1166 Related scenario: Reading the data from a remote file in streaming mode ................................................... 1169 tFileInputJSON ..................................................... 1170 tFTPConnection .................................................... 1171 tFTPConnection properties ................................ 1171 Related scenarios ............................................... 1172 tFTPDelete ............................................................. 1173 tFTPDelete properties ........................................ 1173 Related scenario ................................................ 1174 tFTPFileExist ........................................................ 1175 tFTPFileExist properties ................................... 1175 Related scenario ................................................ 1176 tFTPFileList .......................................................... 1177 tFTPFileList properties ..................................... 1177 Scenario: Iterating on a remote directory ......... 1178 tFTPFileProperties ................................................ 1182 tFTPFileProperties Properties ........................... 1182 Related scenario ................................................ 1183 tFTPGet .................................................................. 1184 tFTPGet properties ............................................ 1184 Related scenario ................................................ 1185 tFTPPut .................................................................. 1186 tFTPPut properties ............................................. 1186 Scenario: Putting files on a remote FTP server . 1187 tFTPRename .......................................................... 1190 tFTPRename Properties ..................................... 1190 Related scenario ................................................ 1191 tFTPTruncate ....................................................... 1192 tFTPTruncate properties ................................... 1192 Related scenario ................................................ 1193 tHttpRequest .......................................................... 1194 tHttpRequest properties ..................................... 1194 Scenario: Sending a HTTP request to the server and saving the response information to a local file ....... 1195 tJMSInput .............................................................. 1197 tJMSInput properties ......................................... 1197 Related scenario ............................................... 1198 tJMSOutput ........................................................... 1199 tJMSOutput properties ...................................... 1199 Related scenario ............................................... 1200 tMicrosoftMQInput .............................................. 1201 xviii

tMicrosoftMQInput Properties ...........................1201 Scenario: Writing and fetching queuing messages from Microsoft message queue ................................1202 tMicrosoftMQOutput ............................................1205 tMicrosoftMQOutput Properties ........................1205 Related scenario .................................................1205 tMomCommit .........................................................1206 tMomCommit Properties ...................................1206 Related scenario .................................................1206 tMomInput .............................................................1207 tMomInput Properties ........................................1207 Scenario: asynchronous communication via a MOM server .......................................................................1208 tMomMessageIdList ..............................................1212 tMomMessageIdList Properties .........................1212 Related scenario .................................................1212 tMomOutput ..........................................................1213 tMomOutput Properties .....................................1213 Related scenario .................................................1214 tMomRollback .......................................................1215 tMolRollback properties ....................................1215 Related scenario .................................................1215 tPOP ........................................................................1216 tPOP properties ..................................................1216 Scenario: Retrieving a selection of email messages from an email server ...............................................1217 tREST .....................................................................1220 tREST properties ................................................1220 Scenario: Creating and retrieving data by invoking REST Web service ..................................................1221 tRSSInput ...............................................................1224 tRSSInput Properties ..........................................1224 Scenario: Fetching frequently updated blog entries. .. 1224 tRSSOutput ............................................................1227 tRSSOutput Properties .......................................1227 Scenario 1: Creating an RSS flow and storing files on an FTP server ...........................................................1228 Scenario 2: Creating an RSS flow that contains metadata ...........................................................................1232 Scenario 3: Creating an ATOM feed XML file 1234 tSCPClose ...............................................................1238 tSCPClose Properties .........................................1238 Related scenario .................................................1238 tSCPConnection .....................................................1239 tSCPConnection properties ................................1239 Related scenarios ...............................................1239 tSCPDelete ..............................................................1240 tSCPDelete properties ........................................1240 Related scenario .................................................1240 tSCPFileExists ........................................................1241 tSCPFileExists properties ..................................1241 Related scenario .................................................1241

Talend Open Studio Components

tSCPFileList ........................................................... 1242 tSCPFileList properties ..................................... 1242 Related scenario ................................................ 1242 tSCPGet .................................................................. 1243 tSCPGet properties ............................................ 1243 Scenario: Getting files from a remote SCP server .... 1243 tSCPPut .................................................................. 1245 tSCPPut properties ............................................ 1245 Related scenario ................................................ 1245 tSCPRename .......................................................... 1246 tSCPRename properties ..................................... 1246 Related scenario ................................................ 1246 tSCPTruncate ........................................................ 1247 tSCPRename properties ..................................... 1247 Related scenario ................................................ 1247 tSendMail ............................................................... 1248 tSendMail Properties ......................................... 1248 Scenario: Email on error .................................... 1249 tSetKeystore ........................................................... 1251 tSetKeystore properties ..................................... 1251 Scenario: Extracting customer information from a private WSDL file ................................................... 1251 tSocketInput .......................................................... 1257 tSocketInput properties ...................................... 1257 Scenario: Passing on data to the listening port (Java) 1259 tSocketOutput ........................................................ 1262 tSocketOutput properties ................................... 1262 Related Scenario ................................................ 1263 tSOAP ..................................................................... 1264 tSOAP properties ............................................... 1264 Scenario: Extracting the weather information using a Web service ............................................................. 1265 tWebServiceInput ................................................. 1268 tWebServiceInput Properties ............................. 1268 Scenario 1: Extracting images through a Web service 1270 Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features (Java only) ........................................................................ 1271 tXMLRPCInput .................................................... 1276 tXMLRPCInput Properties ................................ 1276 Scenario: Guessing the State name from an XMLRPC 1276

tAssertCatcher Properties ...................................1286 Related scenarios ...............................................1287 tChronometerStart ................................................1288 tChronometerStart Properties .............................1288 Related scenario .................................................1288 tChronometerStop .................................................1289 tChronometerStop Properties .............................1289 Scenario: Measuring the processing time of a subjob and part of a subjob .................................................1289 tDie ..........................................................................1294 tDie properties ....................................................1294 Related scenarios ...............................................1294 tFlowMeter .............................................................1295 tFlowMeter Properties .......................................1295 Related scenario .................................................1295 tFlowMeterCatcher ...............................................1296 tFlowMeterCatcher Properties ...........................1296 Scenario: Catching flow metrics from a Job ......1297 tLogCatcher ...........................................................1301 tLogCatcher properties ......................................1301 Scenario 1: warning & log on entries ................1301 Scenario 2: Log & kill a Job ..............................1303 tLogRow .................................................................1305 tLogRow properties ...........................................1305 Scenario: Delimited file content display ............1305 tStatCatcher ...........................................................1306 tStatCatcher Properties .......................................1306 Scenario: Displaying job stats log ......................1306 tWarn ......................................................................1309 tWarn Properties ................................................1309 Related scenarios ...............................................1309

Misc group components ........................1311


tAddLocationFromIP ...........................................1312 tAddLocationFromIP Properties ........................1312 Scenario: Identifying a real-world geographic location of an IP .............................................................1313 tBufferInput ...........................................................1315 tBufferInput properties .......................................1315 Scenario: Retrieving bufferized data (Java) .......1315 tBufferOutput ........................................................1318 tBufferOutput properties ....................................1318 Scenario 1: Buffering data (Java) ......................1318 Scenario 2: Buffering output data on the webapp server .............................................................................1321 Scenario 3: Calling a Job with context variables from a browser .................................................................1324 Scenario 4: Calling a Job exported as Webservice in another Job ...............................................................1326 tContextDump ........................................................1329 tContextDump properties ...................................1329 Related Scenario ..............................................1329 xix

Logs & Errors components .................. 1279


tAssert .................................................................... 1280 tAssert Properties .............................................. 1280 Scenario: Setting up the assertive condition for a Job execution ................................................................. 1280 tAssertCatcher ....................................................... 1286

Talend Open Studio Components

tContextLoad ......................................................... 1330 tContextLoad properties .................................... 1330 Scenario: Dynamic context use in MySQL DB insert 1331 tFixedFlowInput .................................................... 1334 tFixedFlowInput properties ............................... 1334 Related scenarios ............................................... 1334 tMemorizeRows .................................................... 1336 tMemorizeRows properties ............................... 1336 Scenario: Counting the occurrences of different ages 1337 tMsgBox ................................................................ 1342 tMsgBox properties ........................................... 1342 Scenario: Hello world! type test ..................... 1342 tRowGenerator ..................................................... 1344 tRowGenerator properties ................................. 1344 Defining the schema ....................................... 1344 Defining the function ..................................... 1345 Scenario: Generating random java data ............. 1346

tUnite .......................................................................1370 tUnite Properties ................................................1370 Scenario: Iterate on files and merge the content 1371 tWaitForFile ...........................................................1374 tWaitForFile properties ......................................1374 Scenario: Waiting for a file to be removed .......1376 tWaitForSocket ......................................................1379 tWaitForSocket properties .................................1379 Related scenario ................................................1380 tWaitForSqlData ...................................................1381 tWaitForSqlData properties ...............................1381 Scenario: Waiting for insertion of rows in a table .... 1382

Processing components .........................1385


tAggregateRow .......................................................1386 tAggregateRow properties .................................1386 Scenario: Aggregating values and sorting data ..1387 tAggregateSortedRow ...........................................1391 tAggregateSortedRow properties .......................1391 Related scenario .................................................1392 tConvertType .........................................................1393 tConvertType properties ....................................1393 Scenario: Converting java types ........................1394 tDenormalize ..........................................................1398 tDenormalize Properties .....................................1398 Scenario 1: Denormalizing on one column in Perl .... 1398 Scenario 2: Denormalizing on multiple columns ....... 1400 tDenormalizeSortedRow .......................................1403 tDenormalizeSortedRow properties ...................1403 Scenario: Regrouping sorted rows .....................1403 tEmptyToNull ........................................................1407 tEmptyToNull properties ...................................1407 Scenario: Replacing empty fields by NULL fields (fields of unknown value) .......................................1407 tExternalSortRow ..................................................1411 tExternalSortRow properties ..............................1411 Related scenario .................................................1412 tExtractDelimitedFields ........................................1413 tExtractDelimitedFields properties ....................1413 Scenario: Extracting fields from a comma-delimited file ............................................................................1414 tExtractPositionalFields ........................................1418 tExtractPositionalFields properties ....................1418 Related scenario .................................................1419 tExtractRegexFields ..............................................1420 tExtractRegexFields properties ..........................1420 Scenario: Extracting name, domain and TLD from e-mail addresses .......................................................1421 tExtractXMLFields ..............................................1424

Orchestration components ................... 1349


tFileList ................................................................. 1350 tFlowToIterate ....................................................... 1351 tFlowToIterate Properties .................................. 1351 Scenario: Transforming data flow to a list ........ 1351 tForeach ................................................................. 1355 tForeach Properties ........................................... 1355 Scenario: Iterating on a list and retrieving the values 1355 tInfiniteLoop .......................................................... 1358 tInfiniteLoop Properties ................................... 1358 Related scenario ................................................ 1358 tIterateToFlow ....................................................... 1359 tIterateToFlow Properties .................................. 1359 Scenario: Transforming a list of files as data flow .... 1360 tLoop ...................................................................... 1362 tLoop Properties ................................................ 1362 Scenario: Job execution in a loop ...................... 1363 tPostjob .................................................................. 1365 tPostjob Properties ............................................. 1365 Related scenario ................................................ 1365 tPrejob .................................................................... 1366 tPrejob Properties .............................................. 1366 Related scenario ................................................ 1366 tReplicate ............................................................... 1367 tReplicate Properties .......................................... 1367 Related scenario ................................................ 1367 tRunJob ................................................................. 1368 tSleep ...................................................................... 1369 tSleep Properties ................................................ 1369 Related scenarios ............................................... 1369 xx

Talend Open Studio Components

tFilterColumns ...................................................... 1425 tFilterColumns Properties .................................. 1425 Related Scenario ................................................ 1425 tFilterRow .............................................................. 1426 tFilterRow Properties ........................................ 1426 Scenario: Filtering and searching a list of names ...... 1427 tJoin ........................................................................ 1430 tJoin properties .................................................. 1430 Scenario: Doing an exact match on two columns and outputting the main and rejected data ..................... 1430 tMap ....................................................................... 1436 tMap properties .................................................. 1436 Scenario 1: Mapping data using a filter and a simple explicit join ............................................................ 1436 Scenario 2: Mapping data using inner join rejections 1440 Scenario 3: Cascading join mapping ................. 1445 Scenario 4: Advanced mapping using filters, explicit joins and rejections .................................................. 1445 Scenario 5: Advanced mapping with filters and different rejections ........................................................... 1450 Scenario 6: Advanced mapping with lookup reload at each row (Java) ....................................................... 1454 Scenario 7: Mapping with join output tables ..... 1461 tNormalize .............................................................. 1465 tNormalize Properties ........................................ 1465 Scenario: Normalizing data (in Perl) ................. 1465 tPerl ........................................................................ 1468 tPerl properties .................................................. 1468 Scenario: Displaying a number of processed lines .... 1468 tPivotToRows ........................................................ 1471 tPivotToRows properties ................................... 1471 Scenario: Concatenating a list of columns in a table by using the other table columns as pivot .............. 1472 tReplace .................................................................. 1475 tReplace Properties ............................................ 1475 Scenario: multiple replacements and column filtering 1476 tSampleRow ........................................................... 1480 tSampleRow properties ..................................... 1480 Scenario: Filtering rows and groups of rows ..... 1480 tSortRow ................................................................ 1483 tSortRow properties ........................................... 1483 Scenario: Sorting entries ................................... 1484 tXMLMap .............................................................. 1487 tXMLMap properties ......................................... 1487 Scenario: Mapping and transforming XML data ..... 1487

tRunJob ..................................................................1494 tRunJob Properties .............................................1494 Scenario: Executing a child Job .........................1496 tSetEnv ....................................................................1501 tSetEnv Properties ..............................................1501 Scenario: Modifying the Date variable during the execution of a Job .......................................................1501 tSSH ........................................................................1505 tSSH Properties ..................................................1505 Scenario: Remote system information display via SSH .........................................................................1507 tSystem ....................................................................1509 tSystem Properties .............................................1509 Scenario: Echo Hello World! ..........................1510

Talend MDM components ....................1513


tMDMBulkLoad ....................................................1514 tMDMBulkLoad properties ...............................1514 Enhancing the MDM bulk data load ..............1515 Scenario: Loading records into a business entity ....... 1518 tMDMDelete ...........................................................1523 tMDMDelete properties .....................................1523 Scenario: Deleting master data from an MDM hub ... 1524 tMDMInput ............................................................1528 tMDMInput properties .......................................1528 Scenario: Reading master data in an MDM hub 1529 tMDMOutput .........................................................1533 tMDMOutput properties ....................................1533 Scenario: Writing master data in an MDM hub .1536 tMDMReceive ........................................................1542 tMDMReceive properties ...................................1542 Related scenario .................................................1543 tMDMRouteRecord ...............................................1544 tMDMRouteRecord properties ..........................1544 Scenario: Routing a record to Event Manager ...1545 Scenario prerequisites .....................................1545 Routing a record to trigger the corresponding process .........................................................................1546 tMDMSP .................................................................1554 tMDMSP Properties ...........................................1554 Scenario: Executing a stored procedure in the MDM hub ...........................................................................1555 tMDMViewSearch .................................................1561 tMDMViewSearch properties ............................1561 Scenario: Retrieving records from an MDM hub via an existing view .......................................................1563

XML components ..................................1567 System components ............................... 1493


tAdvancedFileOutputXML ...................................1568 xxi

Talend Open Studio Components

tAdvancedFileOutputXML properties .............. 1568 Defining the XML tree ................................... 1569 Importing the XML tree ............................ 1570 Creating the XML tree manually ............... 1571 Mapping XML data ........................................ 1571 Defining the node status ................................. 1572 Loop element ............................................. 1572 Group element ........................................... 1573 Scenario: Creating an XML file using a loop .... 1574 tDTDValidator ...................................................... 1579 tDTDValidator Properties ................................. 1579 Scenario: Validating XML files ........................ 1579 tEDIFACTtoXML ................................................. 1582 tEDIFACTtoXML Properties ............................ 1582 Scenario: From EDIFACT to XML .................. 1582 tExtractXMLField ................................................ 1585 tExtractXMLField properties ............................ 1585 Scenario 1: Extracting XML data from a field in a database table .............................................................. 1586 Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file ..................... 1588

tFileInputXML .......................................................1592 tFileInputXML Properties ..................................1592 Scenario 1: Reading and extracting data from an XML structure .........................................................1594 Scenario 2: Extracting erroneous XML data via a reject flow ...................................................................1595 tFileOutputXML ....................................................1599 tFileOutputXML properties ...............................1599 Scenario: From Positional to XML file .............1600 tWriteXMLField ....................................................1601 tWriteXMLField properties ...............................1601 Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table ..........1602 tXSDValidator .......................................................1607 tXSDValidator Properties ..................................1607 Scenario: Validating data flows against an XSD file . 1607 tXSLT .....................................................................1611 tXSLT Properties ...............................................1611 Scenario: Transforming XML to html using an XSL stylesheet .................................................................1611

xxii

Talend Open Studio Components

Preface
Purpose
This Reference Guide explains in detail the major components you can find in each of the different groups in the Palette of Talend Open Studio. Information presented in this document applies to Talend Open Studio releases beginning with 4.2.x.

Audience
This guide is for users and administrators of Talend Open Studio.
The layout of GUI screens provided in this document may vary slightly from your actual GUI.

Typographical conventions
This guide uses the following typographical conventions: text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu options, text in [bold]: window, wizard, and dialog box titles, text in courier: system parameters typed in by the user, text in italics: file, schema, column, row, and variable names referred to in all use cases, and also names of the fields in the Basic and Advanced setting views referred to in the property table for each component, In the properties section for every component, the component is available in Java and/or in Perl. or icon indicates whether the

The icon indicates an item that provides additional information about an important point. It is also used to add comments related to a table or a figure, The icon indicates a message that gives information about the execution requirements or recommendation type. It is also used to refer to situations or information the end-user need to be aware of or pay special attention to.

Talend Open Studio Components

xxiii

History of changes
The below table lists the changes made in the 4.x release of the Talend Open Studio Reference Guide.
Version v4.0_a Date 06/04/2010 History of Change Updates in Talend Open Studio Reference Guide include: -New components in the File, Database, Business and Data quality chapters. -Modifications in the settings and scenarios of many components to match the modifications in the GUI. -Modifications in tMap + a new scenario. - Deleted the Multischema chapter from the book and place all the multischema components in the File chapter. Updates in Talend Open Studio Reference Guide include: -New components in the File, Database, Business, Internet and MDM chapters. -EXist components have been added to the Databases chapter. -Modifications in the settings and scenarios of many components to match the modifications in the GUI. Updates in Talend Open Studio Reference Guide include: -New components in the File, Database, Business, Internet and MDM chapters. -SAPIDoc components have been added to the Business chapter. -Bonita components have been added to the Business chapter -Global variables have been added to the Orchestration chapter. -Modifications in the settings and scenarios of many components to match the modifications in the GUI. Updates in Talend Open Studio Reference Guide include: -Translated the tFireBirdRolllback component and added it to the database family. -New components in the Business Intelligence, File, Database, FileScale, Palo and Internet chapters. -Added Microsoft components to the Internet chapter. -Modifications in the settings and scenarios of many components to match the modifications in the GUI. Updates in Talend Open Studio Reference Guide include: -Added ESB family components. -New components in the Data Quality, Processing, XML and MDM chapters include: tStandardizeRow, tPigXX components, Edifact components... - Added Validation Rules and Dynamic Schema information in relevant components in Business Intelligence and Processing chapters. -Modifications in the settings and scenarios of many components to match the modifications in the GUI.

v4.0_b

27/05/2010

v4.1_a

05/10/2010

v4.1_b

13/12/2010

v4.2_a

28/04/2011

xxiv

Talend Open Studio Components

Feedback and Support


Your feedback is valuable. Do not hesitate to give your input, make suggestions or requests regarding this documentation or product and find support from the Talend team, on Talends Forum website at: http://talendforge.org/forum

Talend Open Studio Components

xxv

xxvi

Talend Open Studio Components

Business components
This chapter details the major components that you can find in Business group of the Palette of Talend Open Studio. The Business component family groups connectors that covers specific Business needs, such as reading and writing CRM, or ERP types of database and reading from or writing to an SAP system.

Business components
tAlfrescoOutput

tAlfrescoOutput
tAlfresco Properties
Component family Business

Function Purpose Basic settings

Creates dematerialized documents in an Alfresco server where they are indexed under meaningful models. Allows to create and manage documents in an Alfresco server. URL Login and Password Base Type in the URL to connect to the Alfresco Web application. Type in the user authentication data to the Alfresco server. Type in the base path where to put the document, or Select the Map... check box and then in the Column list, select the target location column. Note: When you type in the base name, make sure to use the double backslash (\\) escape character. Select in the list the mode you want to use for the created document. Create only: creates a document if it does not exist. Note that an error message will display if you try to create a document that already exists Create or update: creates a document if it does not exist or updates the document if it exists. Select in the list the mode you want to use for the destination folder in Alfresco. Update only: updates a destination folder if the folder exists. Note that an error message will display if you try to update a document that does not exist Create or update: creates a destination folder if it does not exist or updates the destination folder if it exists. Click the three-dot button to display the tAlfrescoOutput editor. This editor enables you to: - select the file where you defined the metadata according to which you want to save the document in Alfresco -define the type f the document -select any of the aspects in the available aspects list of the model file and click the plus button to add it in the list to the left. Displays the parameters you set in the tAlfrescoOutput editor and according to which the document will be created in the Alfresco server. Note that in the Property Mapping area, you can modify any of the input schemas.

Document Mode

Container Mode

Define Document Type

Property Mapping

Talend Open Studio Components

Business components
tAlfrescoOutput

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in.

Result Log File Name Browse to the file where you want to save any logs related to the job execution. Advanced settings Configure Target Location Container Allows to configure the (by default) type of containers (folders) Select this check box to display new fields where you can modify the container type to use your own created types based on the father/child model. When selected, allows to manually configure access rights to containers and documents. Select the Inherit Permissions check box to synchronize access rights between containers and documents. Click the Plus button to add new lines to the Permissions list, then you can assign roles to user or group columns. Select the encoding type from the list or select Custom and define it manually. This field is compulsory. Allows to create new documents in Alfresco with associated links towards other documents already existing in Alfresco, to facilitate the navigation process for example. To create associations: -Open the tAlfresco editor. -Click the Add button and select a model where you have already defined aspects that contain associations. -Click the drop-down arrow at the top of the editor and select the corresponding document type. -Click OK to close the editor and display the created association in the Association Target Mapping list.

Configure Permissions

Encoding

Association Target Mapping

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation/Prerequisites Usually used as an output component. An input component is required. To be able to use the tAlfrescoOutput component, few relevant resources need to be installed: check the Installation procedure sub section for more information.

Installation procedure
To be able to use tAlfrescoOutput in Talend Open Studio, you need first to install the Alfresco server with few relevant resources. The below sub sections detail the prerequisite and the installation procedure.

Talend Open Studio Components

Business components
tAlfrescoOutput

Prerequisites Start with the below operations: Download the file alfresco-community-tomcat-2.1.0.zip Unzip the file in an installation folder, for example: C:\Program Files\Java\jdk1.50_16 Install JDK 1.5.0+ Update the environment variable JAVA_HOME (JAVA_HOME= C:\alfresco) From the installation folder (C:\alfresco), launch the alfresco server using the script alf_start.bat
Make sure that the Alfresco server is launched correctly before start using the tAlfrescoOutput component.

Installing the Talend Alfresco module Note that the talendalfresco_20081014.zip is provided with the tAlfrescoOutput component in Talend Open Studio. To install the talendalfresco module: From talendalfresco_20081014.zip and in the talendalfresco_20081014\alfresco folder, look for the following jars: stax-api-1.0.1.jar, wstx-lgpl-3.2.7.jar, talendalfresco-client_1.0.jar, and talendalfresco-alfresco_1.0.jar and move them to C:\alfredsco\tomcat\webapps\alfresco\WEB-INF\lib Add the authentification filter of the commands to the web.xml file located in the path C:\alfredsco\tomcat\webapps\alfresco\WEB-INF son WEB-INF/ following the model of the example provided in talendalfresco_20081014/alfresco folder of the zipped file talendalfresco_20081014.zip The below figures show the portion of lines (in blue) to add in the file web.xml alfresco.

Talend Open Studio Components

Business components
tAlfrescoOutput

Useful information for advanced use Installing new types for Alfresco: From the package_jeu_test.zip and in the package_jeu_test/fichiers_conf_alfresco2.1 folder, look for the following files: xml H76ModelCustom.xml (description of the model), web-client-config-custom.xml (web interface of the model), and custom-model-context.xml (registration of the new model) and paste them in the following folder: C:/alfredsco/tomcat/shared/classes/alfresco/extension Dates: The dates must be of the Talend date type java.util.Date Columns without either mapping or default values, for example of the type Date, are written as empty strings. Solution: delete all columns without mapping or default values. Note that any modification of the type Alfresco will put them back. Content: Do not mix up between the file path which content you want to create in Alfresco and its target location in Alfresco. Provide a URL! It can target various protocols, among which are file, HTTP and so on. For URLs referring to files on the file system, precede them by "file:" for Windows used locally, and by "file://" for Windows on a network (which accepts as well "file: \ \") or for Linux. Do not double the backslash in the target base path (automatic escape), unless you type in the path in the basic settings of the tAlfrescoOutput component, or doing concatenation in the tMap editor for example. Multiple properties or associations: It is possible to create only one association by document if it is mapped to a string value, or one or more associations by document if it is mapped to a list value (object). You can empty an association by mapping it to an empty list, which you can create, for example, by using new java.util.ArrayList()in the tMap component.

Talend Open Studio Components

Business components
tAlfrescoOutput

However, it is impossible to delete an association. Building List(object)with tAggregate: -define the table of the relation n-n in a file, containing a name line for example (included in the input rows), and a category line (that can be defined with its mapping in a third file). -group by: input name, output name. -operation: output categoryList, function list(object), input category. ATTENTION list (object) and non simple list! References (documents and folders): References are created by mapping one or more existing reference nodes (xpath or namepath) using String type or List(object). An error in the association or the property of the reference type does not prevent the creation of the node that holds the reference. Properties of the reference type are created in the Basic Settings view. Associations are created in the Advanced Settings view.

Dematerialization, tAlfrescoOutput, and Enterprise Content Management


Dematerialization is the process that convert documents held in physical form into electronic form, and thus helps to move away from the use of physical documentation to the use of electronic Enterprise Content Management (ECM) systems. The range of documents that can be managed with an Enterprise Content Management system include just about everything from basic documents to stock certificates, for example. Enterprises dematerialize their content via a manual document handling, done by man, or an automatic document handling, machine-based. Considering the varied nature of the content to be dematerialized, enterprises have to use varied technologies to do it. Scanning paper documents, creating interfaces to capture electronic documents from other applications, converting document images into machine-readable/editable text documents, and so on are examples of the technologies available. Furthermore, scanned documents and digital faxes are not readable texts. To convert them into machine-readable characters, different character recognition technologies are used. Handwritten Character Recognition (HCR) and Optical Mark Recognition (OMR) are two examples of such technologies. Equally important as the content that is captured in various formats from numerous sources in the dematerialization process is the supporting metadata that allows efficient identification of the content via specific queries. Now how can this document content along with the related metadata be aggregated and indexed in an Enterprise Content Management system so that it can be retrieved and managed in meaningful ways? Talend provides the answer through the tAlfrescoOutput component. The tAlfrescoOutput component allows you to stock and manage your electronic documents and the related metadata on the Alfresco server, the leading open source enterprise content management system. The below figure illustrates Talends role between the dematerialization process and the Enterprise Content Management system (Alfresco).
6 Talend Open Studio Components

Business components
tAlfrescoOutput

Scenario: Creating documents on an Alfresco server


This Java scenario describes a two-component Job which aims at creating two document files with the related metadata in an Alfresco server, the java-based Enterprise Control Management system. Drop the tFileInputDelimited and tAlfrescoOutput components from the Palette onto the design workspace. Connect the two components together using a Row Main link.

In the design workspace, double-click tFileInputDelimited to display its basic settings. Set the File Name path and all related properties. Note that if you have already stored your input schemas locally in the Repository, you can simply drop the relevant file item from the Metadata folder onto the design workspace and the delimited file settings will automatically display in the relevant fields in the component Basic settings view. For more information about metadata, see Setting up a File Delimited schema in Talend Open Studio User Guide.

Talend Open Studio Components

Business components
tAlfrescoOutput

In this scenario, the delimited file provides the metadata and path of two documents we want to create in the Alfresco server. The input schema for the documents consists of four columns: file_name, destination_folder name, source_path, and author.

And therefore the input schema of the delimited file will be as the following:

In the design workspace, double-click tAlfrescoOutput to display its basic settings.

Talend Open Studio Components

Business components
tAlfrescoOutput

In the Alfresco Server area, enter the Alfresco server URL and user authentication information in the corresponding fields. In the TargetLocation area, either type in the base name where to put the document in the server, or Select the Map... check box and then in the Column list, select the target location column, destination_folder_name in this scenario.
When you type in the base name, make sure to use the double backslash (\\) escape character.

In the Document Mode list, select the mode you want to use for the created documents. In the Container Mode list, select the mode you want to use for the destination folder in Alfresco. Click the Define Document Type three-dot button to open the tAlfrescoOutput editor.

Talend Open Studio Components

Business components
tAlfrescoOutput

Click the Add button to browse and select the xml file that holds the metadata according to which you want to save the documents in Alfresco. All available aspects in the selected model file display in the Available Aspects list.
You can browse for this model folder locally or on the network. After defining the aspects to use for the document to be created in Alfresco, this model folder is not needed any more.

If needed, select in the Available Aspects list the aspect(s) to be included in the metadata to write in the Alfresco server. In this scenario we want the author name to be part of the metadata registered in Alfresco. Click the drop-down arrow at the top of the editor to select from the list the type to give to the created document in Alfresco, Content in this scenario. All the defined aspects used to select the metadata to write in the Alfresco server display in the Property Mapping list in the Basic Settings view of tAlfrescoOutput, three aspects in this scenario, two basic for the Content type (content and name) and an additional one (author). Click Sync columns to auto propagate all the columns of the delimited file. If needed, click Edit schema to view the output data structure of tAlfrescoOutput.

10

Talend Open Studio Components

Business components
tAlfrescoOutput

Click the three-dot button next to the Result Log File Name field and browse to the file where you want to save any logs after job execution. Save your Job, and press F6 to execute it.

The two documents are created in Alfresco using the metadata provided in the input schemas.

Talend Open Studio Components

11

Business components
tAlfrescoOutput

12

Talend Open Studio Components

Business components
tBonitaDeploy

tBonitaDeploy
tBonitaDeploy Properties
Component family Business/Bonita

Function Purpose Basic settings

This component configures any Bonita Runtime engine and deploys a specific Bonita process (a .bar file exported from the Bonita solution) to this engine. This component deploys a specific Bonita process to a Bonita Runtime. Bonita Runtime Environment File Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime environment file Browse to, or enter the path to the Bonita Runtime jaas file.

Bona Runtime logging Browse to, or enter the path to the Bonita Runtime file logging file. Business Archive User name Password Die on error Browse to, or enter the path to the Bonita process .bar file you want to use. Type in your user name used to log in Bonita studio. Type in your password used to log in Bonita studio. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a stand-alone component. Outgoing links (from one component to another): Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage Connections

Global Variables

Process Definition UUID: Indicates the identifier number of the process being deployed. This is available as a Flow variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Talend Open Studio Components

13

Business components
tBonitaDeploy

Limitation

The Bonita Runtime environment file, the Bonita Runtime jaas file and the Bonita Runtime logging file must be all stored on the excution server of the Job using this component.

Related Scenario
For related topic, see Scenario: Executing a Bonita process via a Talend Job on page 16.

14

Talend Open Studio Components

Business components
tBonitaInstantiateProcess

tBonitaInstantiateProcess
tBonitaInstantiateProcess Properties
Component family Business/Bonita

Function Purpose Basic settings

This component instantiates a process already deployed in a Bonita Runtime engine. This component starts an instance for a specific process deployed in a Bonita Runtime engine. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. Browse to, or enter the path to the Bonita Runtime environment file Browse to, or enter the path to the Bonita Runtime jaas file. Browse to, or enter the path to the Bonita Runtime logging file. This check box is cleared by default to activate the process name and the process version fields in order for you to enter the underlying information of a specific process you want to instantiate. This information is used to automatically generate the ID of this process. Once checked, the Process definition ID field is activated in which you can enter the required Definition ID of this process The process definition ID is created when the process is deployed into the Bonita Runtime engine. Type in your user name used to instantiate this process. Type in your password used to instantiate this process This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Bonita Runtime Environment File Bonita Runtime Jaas File Bonita Runtime logging file Use Process ID

User name Password Die on error

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a stand-alone component or as an output component. Talend Open Studio Components 15

Usage

Business components
tBonitaInstantiateProcess

Connections

Outgoing links (from one component to another): Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Main (providing the input parameters to this process) Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Global Variables

Process Instance UUID: Indicates the identifier number of the process instance being created. This is available as a Flow variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Executing a Bonita process via a Talend Job


This scenario describes a Job that deploys a Bonita process into the Bonita Runtime and executes this process, in which a personnel request is treated. The Job in this scenario uses three components.

tBonitaDeploy: this component deploys a Bonita process into the Bonita Runtime. tFixedFlowInput: this component generates the schema used as execution parameters of this deployed process. tBonitaInstantiateProcess: this component executes this deployed process.

16

Talend Open Studio Components

Business components
tBonitaInstantiateProcess

When generating schema using tFixedFlowInput, the column names of the schema must be identical with those of the Bonita parameters used to execute the same process by this Bonita.

Before beginning to replicate this schema, prepare your Bonita .bar file that is the process exported from the Bonita system and will be deployed into the Bonita Runtime engine. In this scenario, this file is TEST--4.0.bar. This process can be checked via the Bonita interface.

To replicate this scenario, proceed as follows: Drop tBonitaDeploy, tFixedFlowInput and tBonitaInstantiateProcess onto the design workspace. Right click tBonitaDeploy to open its contextual menu. Select Trigger > On Subjob Ok to connect tBonitaDeploy to tFixedFlowInput. Right click tFixedFlowInput to open its contextual menu and select Row > Main to connect this component to tBonitaInstantiateProcess using Main link. Double click tBonitaDeploy to open its Basic settings view.

In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File field, browse to the logging.properties file. In the Business Archive field, browse to the Bonita .bar file that is the process exported from your Bonita system and will be deployed into the Bonita Runtime engine.
Talend Open Studio Components 17

Business components
tBonitaInstantiateProcess

In the Username and the Password fields, type in your authentication information to connect to your Bonita. Double click tFixedFlowInput to open its Basic settings view.

Click the three-dot button next to Edit schema to open the schema editor.

In the schema editor, click the plus button to add one row. In the schema editor, click the new row and type in the new name: Name. Click OK. In the Mode area of the Basic settings view, select the Use inline table option. Under the inline table, click the plus button to add one row in the table. In the inline table, click the added row and type in the persons name from your personnel between the quotation marks: ychen, whose request will be treated by this deployed process. Double click tBonitaInstantiateProcess to open its Basic settings view.

18

Talend Open Studio Components

Business components
tBonitaInstantiateProcess

On the Basic settings view, click Sync columns to retrieve the schema from the preceding component. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File field, browse to the logging.properties file. Select the Use Process ID check box to activate the Process Definition Id field. In the Process Definition Id field, click between the quotation marks and press Ctrl+space to open the auto-completion drop-down list containing the available global variables for this Job. Double click the variable you need use to add it between the quotation marks. In this scenario, double click tBonitaDeploy_1_ProcessDefinitionUUID, which retrieves the process definition ID of the process being deployed by tBonitaDeploy.
If the process of interest was deployed and thus tBonitaDeploy is not used, clear the Use Process ID check box to activate the Process name and the Process version fields and fill in the corresponding information to the two fields. tBonitaInstantiateProcess concatenates the process name and the process version you type in to construct the process definition ID.

In the Username and Password fields, enter your login and password to connect to your Bonita. Press F6 to run the Job.

Talend Open Studio Components

19

Business components
tBonitaInstantiateProcess

This process is deployed into the Bonita Runtime and an instance is created for the personnel requests.

20

Talend Open Studio Components

Business components
tCentricCRMInput

tCentricCRMInput
tCentricCRMInput Properties
Component family Business/CentricCR M Connects to a module of a Centric CRM database via the relevant webservice. Allows to extract data from a Centric CRM DB based on a query. CentricCRM URL Module Server Type in the webservice URL to connect to the CentricCRM DB. Select the relevant module in the list Type in the IP address of the DB server.

Function Purpose Basic settings

UserID and Password Type in the Webservice user authentication data. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. Type in the query to select the data to be extracted.

Query condition Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a Start component. An output component is required. n/a

Usage Limitation

Related Scenario
No scenario is available for this component yet.

Talend Open Studio Components

21

Business components
tCentricCRMOutput

tCentricCRMOutput
tCentricCRMOutput Properties
Component family Business/CentricCR M Writes data in a module of a CentricCRM database via the relevant webservice. Allows to write data into a CentricCRM DB. CentricCRM URL Module Server Type in the webservice URL to connect to the CentricCRM DB. Select the relevant module in the list IP address of the DB server

Function Purpose Basic settings

UserID and Password Type in the Webservice user authentication data. Action Schema and Edit Schema Insert, Update or Delete the data in the CentricCRM module. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Used as an output component. An Input component is required. n/a

Usage Limitation

Related Scenario
No scenario is available for this component yet.

22

Talend Open Studio Components

Business components
tHL7Input

tHL7Input
tHL7Input Properties
Component family Business

Function Purpose

tHL7Input reads an HL7 structured file and extracts data row by row. Opens an HL7 structured file and reads it row by row to split them up into fields then sends the fields as defined in the Schema to the next component, via a Row link. Property type Either Built-in or Repository: Built-in: No property data stored centrally. Repository: Select the Repository file where the properties are stored. The fields that follow are completed automatically using fetched data. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. Multi Schemas Editor The [Multi Schema Editor] helps you build and configure the data flow in a multi-structured delimited file to associate one schema per output. Segment Lists Connection: The columns are automatically retrieved from the input file. The column name is the segment name. Column Mapping:. The mapping in this array is retrieved from the mapping you have done in the editor. Select this check box if you do not want to validate HL7 messages. Select this check box to modify the separators to be used for the numbers. Either: Thousands separator or Decimal separator Select the encoding type from the list or select Custom and define it manually. This field is compulsory.

Basic settings

Not Validate HL7 Message Advanced settings Advanced separator (for numbers)

Encoding

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Usually used as a Start component. An output component is required. n/a

Talend Open Studio Components

23

Business components
tHL7Input

Scenario: Retrieving information about patients and events from an HL7 file
This scenario describes a four-component Job which retrieves information about patients and events from an HL7 file.

From the Palette, drop an tHL7Input and three tLogRow components onto the design workspace. Double-click tHL7Input in order to open its editor.

24

Talend Open Studio Components

Business components
tHL7Input

In the File path field, click [Browse...] and browse the directory to select your HL7 file. In the File Setting area, type in your segment Start character and your segment End character. Under Segment(As Schema), in the Schema view area,select MSH. Drop the MSH-3(1)[HD] and MSH-7(1)[TS] segments from the Message View onto the Schema View.

Under Segment(As Schema), in the Schema view area,select EVN. Drop the EVN-1(1)-1-1[ID] and EVN-2(1)-1-1[ST] segments from the Message View onto the Schema View.

Under Segment(As Schema), in the Schema view area,select PID. Drag and drop the following segments from the Message View onto the Schema View: PID-1(1)-1-1[SI], PID-5(1)-1-1[ST], PID-5(1)-2-1[ST], PID-5(1)-3-1[ST], PID-5(1)-4-1[ST], PID-5(1)-5-1[ST], PID-5(1)-7-1[ID].
If available, use the Auto map! button, located at the bottom left of the interface, to carry out the mapping operation automatically.

Click Ok to close the editor. Link tHL7Input to the three tLogRow components, using MSH, EVN and PID links respectively. Save your Job and press F6 to execute it.

Talend Open Studio Components

25

Business components
tHL7Input

The console displays the three tLogRow tables, which return different types of information. The first one give the message header label and its date. The second table shows the information about the patient. The third one displays the event ID and its date.

26

Talend Open Studio Components

Business components
tHL7Output

tHL7Output
tHL7Output Properties
Component family Business

Function Purpose Basic settings

Writes an HL7 structured file and inserts the data row by row. This component writes an HL7 structured file according to the HL7 standards. Property type Either Built-In or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where the properties are stored. The fields that follow are completed automatically using fetched data Schema(s) Schema: Enter the node on which the data from the parent row is to be stored. Parent row: The data flow source. Browse to where you want to store the file generated. Opens the interface in which you can set up the HL7 mapping. Select your HL7 version from the list. This check box is selected by default. This creates a folder for the output file if there isnt one already. Select the encoding type from the list or select Custom and define it manually. This field is compulsory.

File Name/Output Stream Configure HL7 Tree HL7 version Advanced settings Create directory only if not exists Encoding

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Used as an output component. An Input component is required. n/a

Related scenario
For a related user case, see Scenario: Retrieving information about patients and events from an HL7 file on page 24.

Talend Open Studio Components

27

Business components
tMarketoInput

tMarketoInput
tMarketoInput Properties
Component family Business

Function Purpose Basic settings

The tMarketoInput component retrieves data from a Marketo Web server. The tMarketoInput component allows you to retrieve data from a Marketo DB on a Web server. Endpoint address Secret key The URL of the Marketo Web server for the SOAP API calls to. Encrypted authentication code assigned by Marketo. Contact Marketo Support via support@marketo.com to get this information. Client Access ID A user ID for the access to Marketo web service. Contact Marketo Support via support@marketo.com to get this information. Operation Options in this list allow you to retrieve lead data from Marketo to external systems. getLead: This operation retrieves basic information of leads and lead activities in Marketo DB. getMultipleLeads: This operation retrieves lead records in batch. getLeadActivities: This operation retrieves the history of activity records for a single lead identified by the provided key. getLeadChanges: This operation checks the changes on Lead data in Marketo DB. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: No property data is stored centrally. Repository: Select the Repository file where Properties are stored.

Schema and Edit Schema

28

Talend Open Studio Components

Business components
tMarketoInput

Columns Mapping

You can set the mapping conditions by making changes in Edit Schema. By default, column names in Column fields are the same as what they are in the schema. Because some column names in Marketo database may contain blank space, which is not allowed in the component schema, you need to specify the corresponding column fields in the Columns in Marketo field. If the defined column names in schema are the same as column names in Marketo database, it is not necessary to set the columns mapping.

LeadKey type LeadKey value Set Include Types

The data types of LeadKey supported by Marketo DB. The value of LeadKey. Select this check box to include the types of LeadActivity content to be retrieved. Click the plus button under the Include Types area to select in the list types to add. This field is displayed only when you select getLeadActivity or getLeadChanges from the Operation list.

Set Exclude Types

Select this check box to exclude the types of LeadActivity content to be retrieved. Click the plus button under the Exclude Types area to select in the list types to add. This field is displayed only when you select getLeadActivity or getLeadChanges from the Operation list.

Last Updated At

Type in the time of last update to retrieve only the data since the last specified time. The time format is YYYY-MM-DD HH:MM:SS. This field is displayed only when you select getMultipleLeads from the Operation list.

Batch Size

The maximum batch size in retrieving lead data in batch. This field is displayed only when you select getLeadActivity or getLeadChanges from the Operation list.

Timeout (milliseconds)

Type in the query timeout (in milliseconds) on the Marketo Web service. The Job will stop when Timeout exception error occurs.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject connection.

Talend Open Studio Components

29

Business components
tMarketoInput

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. This component is used as an input component, it requires an output component. n/a

Usage Limitation

Related Scenario
For a related use case, see Scenario: Data access between Marketo DB and an external system on page 32.

30

Talend Open Studio Components

Business components
tMarketoOutput

tMarketoOutput
tMarketoOutput Properties
Component family Business

Function Purpose Basic settings

The tMarketoOutput component outputs data to a Marketo Web server. The tMarketoOutput component allows you to write data into a Marketo DB on a Web server. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: No property data is stored centrally. Repository: Select the Repository file where Properties are stored. Endpoint address Secret key The URL of the Marketo Web server for the SOAP API calls to. Encrypted authentication code assigned by Marketo. Contact Marketo Support via support@marketo.com to get this information. Client Access ID A user ID for the access to Marketo web service. Contact Marketo Support via support@marketo.com to get this information. Operation Options in this list allow you to synchronize lead data between Marketo and another external system. syncLead: This operation requests an insert or update operation for a lead record. syncMultipleLeads: This operation requests an insert or update operation for lead records in batch. You can set the mapping conditions by making changes in Edit Schema. By default, column names in Column fields are the same as what they are in the schema. Because some column names in Marketo database may contain blank space, which is not allowed in the component schema, you need to specify the corresponding column fields in the Columns in Marketo field. If the defined column names in schema are the same as column names in Marketo database, it is not necessary to set the columns mapping. Talend Open Studio Components 31

Columns Mapping

Business components
tMarketoOutput

De-duplicate lead record on email address

Select this check box to de-duplicate and update lead records using email address. Deselect this check box to create another lead which contains the same email address. This check box will be displayed only when you select syncMultipleLeads from the Operation list. The maximum batch size in synchronizing lead data in batch. This field will be displayed only when you select syncMultipleLeads from the Operation list. Type in the query timeout (in milliseconds) on the Marketo Web service. The Job will stop when Timeout exception error occurs. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject connection.

Batch Size

Timeout (milliseconds)

Die on error

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. This component is used as an output component, it requires an input component. n/a

Usage Limitation

Scenario: Data access between Marketo DB and an external system


The following scenario describes a five-component Job that inserts Lead records into Marketo database and retrieves these records from Marketo database to a local file. Upon completing the data accessing, the Job displays the number of relevant API calls on the Run console.

32

Talend Open Studio Components

Business components
tMarketoOutput

Drop tMarketoOutput, tMarketoInput, tFileInputDelimited, tFileOutputDelimited and tJava from the Palette onto the design workspace. Connect tFileInputDelimited to tMarketoOutput using a Row > Main connection. Connect tMarketoInput to tFileOutputDelimited using a Row > Main connection. Connect tFileInputDelimited to tMarketoInput using a Trigger > OnSubjectOk connection. Connect tMarketoInput to tJava using a Trigger > OnSubjectOk connection. Double-click tFileInputDelimited to define the component properties in its Basic settings view.

Click the three-dot button next to the File name/Stream field to select the source file for data insertion. In this example, it is D:/SendData.csv. Click the three-dot button next to Edit schema to set the schema manually.

Talend Open Studio Components

33

Business components
tMarketoOutput

Click the plus button to add four columns: Id, Email, ForeignSysPersonId and ForeignSysType. Set the Type of Id to Integer and keep the rest as default. Then click OK to save the settings. Type in 1 in the Header field and keep the other settings as default. Double-click tMarketoOutput to define the component properties in its Basic settings view.

Click the Sync columns button to retrieve the schema defined in tFileInputDelimited.

34

Talend Open Studio Components

Business components
tMarketoOutput

Fill the Endpoint address field with the URL of the Marketo Web server. In this example, it is https://na-c.marketo.com/soap/demo/demo1. Note that the URL used in this scenario is for demonstration purpose only. Fill the Secret key field with encrypted authentication code assigned by Marketo. In this example, it is 1234567894DEMOONLY987654321. Fill the Client Access ID field with the user ID. In this example, it is mktodemo1_1234567894DEMOONLY987654321. From the Operation list, select syncMultipleLeads. Type in the limit of query timeout in the Timeout field. In this example, use the default number: 600000. Double-click tMarketoInput to define the component properties in its Basic settings view.

From the Operation list, select getLead. In Columns Mapping area, type in test@talend.com in Columns in Marketo column to set the Email column. Note that all the data used in this scenario is for demonstration purpose only. From the LeadKey type list, select EMAIL and fill the LeadKey value field with test@talend.com. Keep the rest of the settings as the corresponding settings in tMarketoOutput. Double-click tFileOutputDelimited to define the component properties in its Basic settings view.

Talend Open Studio Components

35

Business components
tMarketoOutput

Click the three-dot button next to the File name field to synchronize data to a local file. In this example, it is D:/ReceiveData.csv. Click the Sync columns button and keep the rest of the settings as default. Double-click tJava to add code in its Basic settings view.

In the Code field, type in following code to count the number of API calls throughout the data operations: System.out.println(("The Number of API calls for inserting data to Marketo DB is:")); System.out.println((Integer)globalMap.get("tMarketoOutput_1_ NB_CALL")); System.out.println(("The Number of API calls for data synchronization from Marketo DB is:")); System.out.println((Integer)globalMap.get("tMarketoInput_1_N B_CALL")); Save your Job and press F6 to execute it.

The inserted lead records in the Marketo DB are synchronized to D:/ReceiveData.csv.


36 Talend Open Studio Components

Business components
tMarketoOutput

The number of API calls throughout each data operation is displayed on the Run console.

Talend Open Studio Components

37

Business components
tMicrosoftCRMInput

tMicrosoftCRMInput
tMicrosoftCRMInput Properties
Component family Business

Function Purpose Basic settings

Connects to an entity of Microsoft CRM database via the relevant webservice. Allows to extract data from a MicrosoftCRM DB based on conditions set on specific columns. Property type Either Built-in or Repository: Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Microsoft Webservice URL Organizename Type in the webservice URL to connect to the MicrosoftCRM DB. Enter the name of the user or organization, set by an administrator, that needs to access the MicrosoftCRM database. Type in the Webservice user authentication data. Type in the domain name of the server on which MicrosoftCRM is hosted. Type in the IP address of Microsoft CRM database server. Listening port number of Microsoft CRM database server. Number of seconds for the port to listen before closing. Select the relevant entity in the list. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. if you make changes, the schema automatically becomes built-in. In this component the schema is related to the selected entity. In the case you want to combine the conditions you set on columns, select the combine mode you want to use.

Username and Password Domain Host Port Time out (seconds) Entity Schema and Edit Schema

Logical operators used to combine conditions

38

Talend Open Studio Components

Business components
tMicrosoftCRMInput

Conditions

Click the plus button to add as many conditions as needed. The conditions are performed one after the other for each row. Input column: Click in the cell and select the column of the input schema the condition is to be set on. Operator: Click in the cell and select the operator to bind the input column with the value. Value: Type in the column value, between quotes if need be.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a Start component. An output component is required. n/a

Usage Limitation

Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
This scenario describes a four-component Job which aims at writing the data included in a delimited input file in a custom entity in a MicrosoftCRM database. It then extracts specified rows to an output file using the conditions set on certain input columns.
If you want to write in a CustomEntity in Microsoft CRM database, make sure to name the columns in accordance with the naming rule set by Microsoft, that is name_columnname all in lower case.

Drop the following components from the Palette to the design workspace: tFileInputdelimited, tFileOutputDelimited, tMicrosoftCRMInput, and tMicrosoftCRMOutput.

Connect tFileInputDelimited to tMicrosoftCRMOutput using a Row Main connection. Connect tMicrosoftCRMIntput to tFileOutputDelimited using a Row Main connection. Connect tFileInputDelimited to tMicrosoftCRMInput using OnSubjobOk connection.
Talend Open Studio Components 39

Business components
tMicrosoftCRMInput

Double-click tFileInputDelimited to display its Basic settings view and define its properties

Set the Property Type to Repository if you have stored the input file properties centrally in the Metadata node in the Repository tree view. Otherwise, select Built-In and fill the fields that follow manually. In this example, property is set to Built-In. Click the three-dot button next to the File Name/Input Stream field and browse to the delimited file that holds the input data. The input file in this example contains the following columns: new_id, new_status, new_firstname, new_email, new_city, new_initial and new_zipcode.

In the Basic settings view, define the Row Separator allowing to identify the end of a row. Then define the Field Separator used to delimit fields in a row. If needed, define the header, footer and limit number of processed rows in the corresponding fields. In this example, the header, footer and limits are not set. Click Edit schema to open a dialog box where you can define the input schema you want to write in Microsoft CRM database.

40

Talend Open Studio Components

Business components
tMicrosoftCRMInput

Click OK to close the dialog box. Double-click tMicrosoftCRMOutput to display the component Basic settings view and define its properties.

Enter the Microsoft Web Service URL as well as the user name and password in the corresponding fields. In the OrganizeName field, enter the name that is given the right to access the Microsoft CRM database. In the Domain field, enter the domain name of the server on which Microsoft CRM is hosted, and then enter the host IP address and the listening port number in the corresponding fields.
Talend Open Studio Components 41

Business components
tMicrosoftCRMInput

In the Action list, select the operation you want to carry on. In this example, we want to insert data in a custom entity in Microsoft CRM. In the Time out field, set the amount of time (in seconds) after which the Job will time out. In the Entity list, select one among those offered. In this example, CustomEntity is selected. If CustomEntity is selected, a Custom Entity Name field displays where you need to enter a name for the custom entity. The Schema is then automatically set according to the entity selected. If needed, click Edit schema to display a dialog box where you can modify this schema and remove the columns that you do not need in the output. Click Sync columns to retrieve the schema from the preceding component.

Double-click tMicrosoftCRMInput to display the component Basic settings view and define its properties.

42

Talend Open Studio Components

Business components
tMicrosoftCRMInput

Set the Property Type to Repository if you have stored the input file properties centrally in the Metadata node in the Repository tree view. Otherwise, select Built-In and fill the fields that follow manually. In this example, property is set to Built-In. Enter the Microsoft Web Service URL as well as the user name and password in the corresponding fields. In the OrganizeName field, enter the name that is given the right to access the Microsoft CRM database. In the Domain field, enter the domain name of the server on which Microsoft CRM is hosted, and then enter the host IP address and the listening port number in the corresponding fields. In the Time out field, set the amount of time (in seconds) after which the Job will time out. In the Entity list, select the one among those offered you want to connect to. In this example, CustomEntity is selected.

Talend Open Studio Components

43

Business components
tMicrosoftCRMInput

The Schema is then automatically set according to the entity selected. But you can modify it according to your needs. In this example, you should set the schema manually since you want to access a custom entity. Copy the seven-column schema from tMicrosoftCRMOutput and paste it in the schema dialog box in tMicrosoftCRMInput.

Click OK to close the dialog box. You will be prompted to propagate changes. Click Yes in the popup message. In the Basic settings view, select And or Or as the logical operator you want to use to combine the conditions you set on the input columns. In this example, we want to set two conditions on two different input columns and we use And as the logical operator. In the Condition area, click the plus button to add as many lines as needed and then click in each line in the Input column list and select the column you want to set condition on. In this example, we want to set conditions on two columns, new-city and new_id. We want to extract all customer rows whose city is equal to New York and whose id is greater than 2. Click in each line in the Operator list and select the operator to bind the input column with its value, in this example Equal is selected for new_city and Greater Than for new_id. Click in each line in the Value list and set the column value, New York for new_city and 2 for new_id in this example. You can use a fixed or a context value in this field. Double-click tFileOutputdelimited to display the component Basic settings view and define its properties.

44

Talend Open Studio Components

Business components
tMicrosoftCRMInput

Set Property Type to Built-In and then click the three-dot button next to the File Name field and browse to the output file. Set row and field separators in the corresponding fields. Select the Append check box if you want to add the new rows at the end of the records. Select the Include Header check box if the output file includes a header. Click Sync columns to retrieve the schema from the preceding component. Save the Job and press F6 to execute it.

Only customers who live in New York city and those whose id is greater than 2 are listed in the output file you stored locally.

Talend Open Studio Components

45

Business components
tMicrosoftCRMOutput

tMicrosoftCRMOutput
tMicrosoftCRMOutput Properties
Component family Business

Function Purpose Basic settings

Writes in an entity of a Microsoft CRM database via the relevant webservice. Allows to write data into a MicrosoftCRM DB. Microsoft Webservice URL Organizename Username and Password Domain Host Port Action Type in the webservice URL to connect to the Microsoft CRM DB. Enter the name of the organization that needs to access the MicrosoftCRM database Type in the Webservice user authentication data. Type in the domain name of the server that installs MicrosoftCRM. Type in the IP address of Microsoft CRM database server. Listening port number of Microsoft CRM database server. Select in the list the action you want to do on the CRM data. Available actions are: insert, update, and delete. Number of seconds for the port to listen before closing. Select the relevant entity in the list. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Time out (seconds) Entity Schema and Edit Schema

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Used as an output component. An Input component is required. n/a

Usage Limitation

46

Talend Open Studio Components

Business components
tMicrosoftCRMOutput

Related Scenario
For a related use case, see Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows on page 39.

Talend Open Studio Components

47

Business components
tMSAXInput

tMSAXInput
tMSAXInput properties
Component family Business/ Microsoft AX tMSAXInput connects to a MicrosoftAX server. This component allows to extract data from a MicrosoftAX server based on a query. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Host Domain User and Password Schema and Edit Schema Type in the IP address of the MicrosoftAX server. Type in the domain name on which the MicrosoftAX server is hosted. Type in user authentication data. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. if you make changes, the schema automatically becomes built-in. Name of the table to read. Enter your SQL query paying particular attention to properly sequence the fields in order to match the schema definition.

Function Purpose Basic settings

Table Name Query

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is usually used as a start component. An output component is required. n/a

Usage Limitation

Related scenarios
No scenario is available for this component yet.

48

Talend Open Studio Components

Business components
tMSAXOutput

tMSAXOutput
tMSAXOutput properties
Component family Business/ Microsoft AX tMSAXOutput connects to a MicrosoftAX server. This component allows to write data in a MicrosoftAX server. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Host Domain Username and Password Table Name Action on data Type in the IP address of the MicrosoftAX server. Type in the domain name on which the MicrosoftAX server is hosted. Type in user authentication data. Name of the table you want to connect to and write/modify data in. You can do any of the following operations on the data in a MicrosoftAX server: Insert: insert data. Update: update data. Insert or update: add data or update existing one. Update or insert: update existing data or create it if it does not exist. Delete: delete data. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. if you make changes, the schema automatically becomes built-in. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link.

Function Purpose Basic settings

Schema and Edit Schema

Die on error

Talend Open Studio Components

49

Business components
tMSAXOutput

Additional Columns

This option allows you to use Local expressions to perform actions on columns. For example, you can alter values in columns of the defined table. When you update or delete data in a column, this option provides you with other possibilities on WHERE statements through using different operators from the Operator column. Name: name of the schema column to be altered or inserted as a new column. Operator: select in the list the operator you want to use with the WHERE statement. This column is not available when you use Insert as action on data. Data type: type of data. Local expression: Type in the Local statement to be executed in order to alter or insert the relevant column data, for example row1.[row name]. Or, press Ctrl + Space and select any of the context variables available in the list. Position: select in the list Before, After or Replace following the action you want to perform on the reference column. Reference column: type in a column of reference that the component can use to place/replace the new/ altered column.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is used as an output component. An Input component is required. n/a

Usage Limitation

Scenario 1: Inserting data in a defined table in a MicrosoftAX server


Before being able to use this component, make sure that you install and launch the MicrosoftAX server correctly.

This Java scenario describes a two-component Job that uses tMSAXOutput to insert four columns in a defined table in a MicrosoftAX server after it alters values in one of the inserted columns.

50

Talend Open Studio Components

Business components
tMSAXOutput

Drop tFixedFlowInput and tMSAXOutput from the Palette to the design workspace. Connect the two components together using a Row Main link. Double click tFixedFlowInput to display its Basic settings view and define the component properties.

Set Schema type to Built-in and click the three-dot button next to Edit schema to display a dialog box where you can define the input schema. Click the plus button and add the input schema columns, three in this example: name, city and street. Click OK to close the dialog box and accept propagating the changes when prompted by the system. The three schema columns display automatically in the Values list. Click in the Value column and enter a value for each of the input columns. Double-click tMSAXOutput to open its Basic settings view and define the component properties.

Talend Open Studio Components

51

Business components
tMSAXOutput

Set Property type to Built-in. In the Host field, type in the IP address of the MicrosoftAX server. In the Domain field, type in the domain name on which the MicrosoftAX server is hosted. Enter your username and password for the server in the corresponding fields. In the Table Name field, enter the name of the table you want to write data in, ADDRESS in this example. In the Action on data list, select the action you want to carry on, Insert in this example. Click Sync columns to retrieve the schema from the preceding component. In this example, we want to retrieve the three input columns: name, city and street and write the data included in the three input columns in the microsoftAX server without any changes. If needed, click the three-dot button next to Edit Schema to verify the retrieved schema. In the Additional columns list, click the plus button to add one line where you can define parameters for the new column to add to the row you want to write in the ADDRESS table. Set a name, a data type, a position and a reference column in the corresponding columns for the line you added. In this example, we want to add a new column we call address after the street column. Click in the Local expression column and press Ctrl + space on your keyboard to open the context variable list and select: StringHandling.UPCASE(row2.city)+"-"+row2.street. This expression will write the city name initially capped followed by the street name to form the address of Bryant park. Thus the address column in this example will contain the string: New York-Midtown Manhattan. Save your Job and press F6 to execute it.
52 Talend Open Studio Components

Business components
tMSAXOutput

tMSAXOutput inserts in the ADDRESS table in the MicrosoftAX server a row that holds the three input columns, name, city and street in addition to the new address column that combines the city name and the street name.

Scenario 2: Deleting data from a defined table in a MicrosoftAX server


Before being able to use this component, make sure that you install and launch the MicrosoftAX server correctly.

This Java scenario describes a two-component Job that uses tMSAXOutput to delete from a defined table in a MicrosoftAX server all rows that do not match the data included in a key column. In this example, the input schema we use is an address column that holds the following data: New York-Midtown Manhattan. We want to delete from the MicrosoftAX server all addresses that are not identical with this one.

Drop tFixedFlowInput and tMSAXOutput from the Palette to the design workspace. Connect the two components together using a Row Main link. Double click tFixedFlowInput to display its Basic settings view and define the component properties.

Set Schema type to Built-in and click the three-dot button next to Edit schema to display a dialog box where you can define the input schema. Click the plus button and add the input schema columns, address in this example. Click OK to close the dialog box. The schema column displays automatically in the Values list. Click in the Value column and enter a value for the input column.

Talend Open Studio Components

53

Business components
tMSAXOutput

Double-click tMSAXOutput to open its Basic settings view and define the component properties.

Set Property type to Built-in. In the Host field, type in the IP address of the MicrosoftAX server. In the Domain field, type in the domain name on which the MicrosoftAX server is hosted. Enter your username and password for the server in the corresponding fields. In the Table Name field, enter the name of the table you want to delete data from, ADDRESS in this example. In the Action on data list, select the action you want to carry on, Delete in this example. Click Sync columns to retrieve the schema from the preceding component. In this example, we want to retrieve the input column: address. Click the three-dot button next to Edit Schema to open a dialog box where you can verify the retrieved schema.

54

Talend Open Studio Components

Business components
tMSAXOutput

In the output schema, select the Key check box next to the column name you want to define as a key column.
When you select Delete as an action on data, you must always define the Reference column as a key column in order for tMSAXOutput to delete rows based on this key column.

Click OK to validate your changes and close the dialog box. In the Additional columns list, click the plus button to add one line and define the parameters the component will use as basis for the delete operation. Set a name, an operator, a data type, a local expression, a position and a reference column in the corresponding columns for the line you added. In this example, we want to delete from the ADDRESS table in the MicrosoftAX server all rows in which the address column is not equal to the address in the key address column and that reads as the following: New York-Midtown Manhattan.
When you select Delete as an action on data, you must always set Position to Replace. Otherwise, all settings in the Additional columns will not be taken into account when executing your Job.

Save your Job and press F6 to execute it. tMSAXOutput deletes from the ADDRESS table in the MicrosoftAX server all rows where the address string is not equal to the address in the key column.

Talend Open Studio Components

55

Business components
tOpenbravoERPInput

tOpenbravoERPInput
tOpenbravoERPInput properties
Component Family Business

Function Purpose Basic settings

tOpenbravoERPInput connects to an OpenbravoERP database entity via the appropriate webservice. This component allows you to extract data from OpenBravoERP database according to the conditions defined in specific columns. Openbravo REST WebService URL Username et Password Entity Schema and Edit Schema Enter the URL of the Web service that allows you to connect to the OpenbravoERP database. User authentication information. Select the appropriate entity from the drop-down list. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema.. If you make any changes, the schema will automatically become built-in. For this component, the schema corresponds to a selected entity. Enter your WHERE clause. Select this check bow to define how to order the results (the elements in the drop-down list depend on the entity selected) Sort: Choose whether to organise the results in either Ascending or Descending order. Enter the row number you want to retrieve first. Enter the maximum number of results you want to retrieve. Select this check box to modify the separators to be used for the numbers. Either: Thousands separator or Decimal separator

WHERE Clause Order by

First result Max result Advanced settings Advanced separator (for numbers)

tStatCatcher Statistics Select this check box to collect the log data at a component level. Utilisation Limitation This component is generally used as an input component. An output component is required. n/a

56

Talend Open Studio Components

Business components
tOpenbravoERPInput

Related Scenario
For a scenario in which tOpenbravoERPInput might be used, see Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows on page 39

Talend Open Studio Components

57

Business components
tOpenbravoERPOutput

tOpenbravoERPOutput
tOpenbravoERPOutput properties
Component Family Business

Function Purpose Basic settings

tOpenbravoERPOutput writes an object in an OpenbravoERP database via the appropriate Web service. This component writes data in an OpenbravoERP database. Openbravo REST Webservice URL Username et Password Action on data Enter the URL of the Web service that allows you to connect to the OpenbravoERP database. User authentication information. From the list, select the one of the following actions: Update/Create or Remove Select this check box if desired and then select the file by browsing your directory. Select the appropriate entity from the drop-down list. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema.. Note that if you modify the schema, it automatically built-in.. Click on Sync columns to retrieve the schema from the previous component.

Use existing data file Entity Schema and Edit Schema

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is used as an output component. It requires an input component. n/a

Related scenario
For a scenario in which tOpenbravoERPOutput may be used, see Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows on page 39

58

Talend Open Studio Components

Business components
tSageX3Input

tSageX3Input
tSageX3Input Properties
Component family Business/Sage X3

Function Purpose Basic settings

This component leverages the Webservice provided by a given Sage X3 Web server to extract data from the Sage X3 system (the X3 server). This component extracts data from a given Sage X3 system. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. If you make any changes, the schema will automatically become built-in. Type in the address of the Webservice provided by the given Sage X3 Web server. Type in the Webservice user authentication data that you have defined for configuring the Sage X3 Web server. Type in the name of the X3 language code used to start a connection group. Type in the name of the connection pool that distributes the received requests to available connections. This name was given from the Sage X3 configuration console. Type in the configuration string if you want to retrieve the debug or trace information. For example, the string could be: RequestConfigDebug=adxwss.trace.o n=on; If you need use several strings, separate them with a &, for example, RequestConfigDebug=adxwss.trace.o n=on&adxwss.trace.size=16384; A third party tool is needed to retrieve this kind of information. Publication name Type in the publication name of the published object, list or sub-program you want your Studio to access.

Endpoint address Username and Password Language Pool alias

Request config

Talend Open Studio Components

59

Business components
tSageX3Input

Mapping

Complete this table to map the variable elements of the object, the sub-program or the list set in the given Sage X3 Web server. The columns to be completed include: Column: the columns defined in the schema editor for this component.Group ID: the identifier of each variable element group. For example, a variable element group could represent one of attributes of an object.Field name: the field name of each variable element. Select this check box to set up the query condition(s). The columns to be completed include: Key: the names of the variable elements used as the key for data extraction. Value: the value of the given key field used to extract the corresponding data. Type in a number to indicate the maximum row count of the data to be extracted.

Query condition

Limit Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a Start component. An output component is required. n/a

Usage Limitation

Scenario: Using query key to extract data from a given Sage X3 system
This scenario describes a two-component Job used to extract one row of data from a given Sage X3 system. The object method is to be called, that means the variable elements of this object thus are attributes. The data used in this scenario can be found in the example provided by Sage X3.

Drop the tSageX3Input component and the tLogRow components onto the workspace from Palette. Connect the tSageX3Input component to the tLogRow component using a Row > Main link. Double click tSageX3Input to set its Basic Settings in the Component view.

60

Talend Open Studio Components

Business components
tSageX3Input

In the Schema field, select Built-In. Click the three dot button next to Edit schema to open the schema editor.

Talend Open Studio Components

61

Business components
tSageX3Input

In this editor, click the plus button 12 times beneath the schema table to add 12 rows into this table. Type in the names you want to use for each row. In this example, these rows are named after the publication names of the object attributes set in the Sage X3 Web server. These columns are used to mapped the corresponding attribute fields in the Sage X3 system. In the Type column, click the IMG row to display its drop-down list. From the drop-down list, select List as this attribute appears twice or even more. Do the same to switch the types of the TIT2NBLIG row, the ITMLNK row and the ZITMLNK row to List as well for the same reason. Click OK to validate this change and accept the propagation prompted by a pop-up dialog box. In the Endpoint address field, type in the URL address of the Webservice provided by the Sage X3 Web server. In this example, it is http://10.42.20.168:28880/adxwsvc/services/CAdxWebServiceXmlCC In the User field, type in the user name of the given Sage X3. In this example, it is ERP. In the Language field, type in the name of the X3 language code used to start a connection group. In this example, it is FRA. In the Pool alias field, type in the name of connection pool to be used. In this example, this connection pool is called TALEND. In the Publication name field, type in the publication name of the object to be called. In this scenario, the publication name is ITMDET. In the Group ID column and the Field name column of the Mapping table, type in values corresponding to the attribute group IDs and the attribute publication names defined in the Sage X3 Web server. In this example, the values are presented in the figure below.

62

Talend Open Studio Components

Business components
tSageX3Input

In the Mapping table, the Column column has been filled automatically with the columns you created in the schema editor.

Select the Query condition check box to activate the Conditions table. Under the Conditions table, click the plus button to add one row into the table. In the Key column, type in the publication name associated with the object attribute you need to extract data from. In the Value column, type in the value of the attribute you have selected as the key of the data extraction. In this scenario, it is CONTS00059, one of the product references. Select Built-In as the Schema and click [...] next to Edit schema to open the schema editor.

Press F6 to run the Job. The results are displayed in the Run view:

Talend Open Studio Components

63

Business components
tSageX3Output

tSageX3Output
tSageX3Output Properties
Component family Business/Sage X3

Function

This component connects to the Webservice provided by a given Sage X3 Web server and therefrom insert, update or delete data in the Sage X3 system (the X3 server). This component writes data into a given Sage X3 system. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Click Sync columns to retrieve the schema from the previous component connected in the Job. If you make any changes, the schema will automatically become built-in. Type in the address of the Webservice provided by the given Sage X3 Web server. Type in the Webservice user authentication data that you have defined for configuring the Sage X3 Web server. Type in the name of the X3 language code used to start a connection group. Type in the name of the connection pool that distributes the received requests to available connections. This name was given from the Sage X3 configuration console. Type in the configuration string if you want to retrieve the debug or trace information. For example, the string could be: RequestConfigDebug=adxwss.trace.o n=on; If you need use several strings, separate them with a &, for example, RequestConfigDebug=adxwss.trace.o n=on&adxwss.trace.size=16384; A third party tool is needed to retrieve this kind of information.

Purpose Basic settings

Endpoint address Username and Password Language Pool alias

Request config

64

Talend Open Studio Components

Business components
tSageX3Output

Publication name Action

Type in the publication name of the published object, list or sub-program you want your Studio to access. You can do any of the following operations on the data in a Sage X3 system: Insert: insert data Update: update data Delete: delete data Complete this table to map the variable elements of the object, the list or the sub-program your Studio access. Only the elements you need to conduct the data action of your interest on are selected and typed in for the purpose of mapping. The columns to be completed include: Column: the columns defined in the schema editor for this component. Key: the variable element used as key for data insertion, update or deletion. Select the corresponding check box if a variable element is the key. Group ID: the identifier of each variable element group. For example, a variable element group could represent one of attributes of an object.Field name: the field name of each selected variable element.

Mapping

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as an output component. An input component is required. n/a

Usage Limitation

Scenario: Using a Sage X3 Webservice to insert data into a given Sage X3 system
This scenario describes a two-component Job used to generate one row of data and insert the data into a given Sage X3 system. You can find the data used in this scenario in the example provided by Sage X3. The Sage X3 Webservice is used to access an object.

Drop the tFixedFlowInput and the tSageX3Output components onto the workspace from Palette. Connect the tFixedFlowInput component to the tSageX3Output component using a Row > Main link. Double click the tFixedFlowInput component to set its Basic Settings in the Component view
Talend Open Studio Components 65

Business components
tSageX3Output

Click the three-dot button next to Edit schema to open the schema editor.

In the schema editor and then under the schema table, click the plus button four times to add four rows. Click OK to validate this changes and then accept the propagation prompted by the pop-up dialog box. The four rows appear automatically in the Values table of the Component view. In the Values table within the Mode area, type in the values for each of the four rows in the Value column. In this scenario, the values downward are: CONTS00059, Screen 24\" standard 16/10, Screen 24\" standard 28/10, 2.
These values in the Value column must be put between quotation marks.

Double click tSageX3Output to set its Basic Settings in the Component view.

66

Talend Open Studio Components

Business components
tSageX3Output

In the Endpoint address field, type in the URL address of the Webservice provided by the Sage X3 Web server. In this example, it is http://10.42.20.168:28880/adxwsvc/services/CAdxWebServiceXmlCC In the User field, type in the user name of the given Sage X3. In this example, it is ERP. In the Language field, type in the name of the X3 language code used to start a connection group. In this example, it is FRA. In the Pool alias field, type in the name of connection pool to be used. In this example, this connection pool is called TALEND. In the Publication name field, type in the publication name of the object to be called. In this scenario, the publication name is ITMDET. In the Action field, select insert from the drop-down list. In the Field name column of the Mapping table, type in the field names of the attributes the selected data action is exercised on. In the Group ID column of the Mapping table, type in values corresponding to group IDs of the selected attributes. These IDs are defined in the Sage X3 Web server

Talend Open Studio Components

67

Business components
tSageX3Output

In the Mapping table, the Column column has been filled automatically with the columns retrieved from the schema of the preceding component.

Press F6 to run the Job. To verify the data that you inserted in this scenario, you can use the tSageX3Input component to read the concerned data from the Sage X3 server. For further information about how to use the tSageX3Input component to read data, see Scenario: Using query key to extract data from a given Sage X3 system on page 60.

68

Talend Open Studio Components

Business components
tSalesforceBulkExec

tSalesforceBulkExec
tSalesforceBulkExec Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file and then execute intended actions on the file for your Salesforce.com. These two steps compose the tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family Business

Function Purpose Basic settings

tSalesforceBulkExec executes the intended actions on the prepared bulk data. As a dedicated component, tSalesforceBulkExec gains performance while carrying out the intended data operations into your Salesforce.com. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appears allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Salesforce Version Bulk file path Action Type in the Webservice user authentication data. Type in the version of the Salesforce you are using. Directory where are stored the bulk data you need to process. You can do any of the following operations on the data of the Salesforce object: Insert: insert data. Update: update data. Upsert: update and insert data.

Talend Open Studio Components

69

Business components
tSalesforceBulkExec

Module

Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Specify the number of lines per data batch to be processed. Specify the number of bytes per data batch to be processed. Select this check box if you want to use a proxy server. Once selected, you need provide the connection parameters that are host, port, username and password.

Schema and Edit Schema

Advanced settings

Rows to commit Bytes to commit Use Socks Proxy

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Used as an output component. An Input component is required. The bulk data to be processed should be .csv format.

Related Scenario:
For a related scenario, see Scenario: Inserting transformed bulk data into your Salesforce.com on page 90.

70

Talend Open Studio Components

Business components
tSalesforceConnection

tSalesforceConnection
tSalesforceConnection properties
Component family Business

Function Putpose Basic settings

tSalesforceConnection opens a connection to a Salesforce system in order to carry out a transaction. The component enables connection to a Salesforce. Salesforce Webservice Enter the Webservice URL required to connect to the URL Salesforce database. Username et Password Timeout (milliseconds) For salesforce bulk component Enter your Web service authentication details. Type in the intended number of query timeout in Salesforce.com. Select this check box if you use bulk data processing components from the salesforce family. Once selected; the Salesforce Version field appears and therein you need to enter the Salesforce version you are using. For more information on these bulk data processing components, see tSalesforceOutputBulk on page 90, tSalesforceBulkExec on page 69 and tSalesforceOutputBulkExec on page 95. Select this check box if you want to activate SOAP compression. The compression of SOAP messages results in increased performance levels. Select this check box if you want to use a proxy. Once selected, you need type in the connection parameters in the fields which appear. These parameters are the host, the port, the username and the password of the Proxy you need use.

Use Soap Compression

Use Socks Proxy

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is normally used with Salesforce components.. n/a

Related scenario
For further information regarding the usage of tSalesforceConnection, see tMysqlConnection on

page 594.

Talend Open Studio Components

71

Business components
tSalesforceGetDeleted

tSalesforceGetDeleted
tSalesforceGetDeleted properties
Component family Business

Function Purpose Basic settings

tSalesforceGetDeleted recovers deleted data from a Salesforce object over a given period of time. This component can collect the deleted data from a Salesforce object during a specific period of time. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Timeout (milliseconds) Module Type in the Webservice user authentication data. Type in the intended number of query timeout in Salesforce.com. Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Schema and Edit Schema

72

Talend Open Studio Components

Business components
tSalesforceGetDeleted

Start Date

Type in between double quotes the date at which you want to start the search. Use the following date format: yyy-MM-dd HH:mm:ss. You can do the search only on the past 30 days. Type in between double quotes the date at which you want to end the search. Use the following date format:yyy-MM-dd HH:mm:ss. Select this check box to activate the SOAP compression. The compression of SOAP messages optimizes system performance.

End Date

Advanced settings

Use Soap Compression

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation You can use this component as an output component. tSalesforceGetDeleted requires an input component. n/a

Scenario: Recovering deleted data from the Salesforce server


This scenario describes a two-component Job that collects the deleted data over the past 5 days from the Salesforce server.

Drop tSalesforceGetDeleted and tLogRow from the Palette onto the design workspace. Connect the two components together using a Row > Main link. Double-click tSalesforceGetDeleted to display its Basic settings view and define the component properties.

Talend Open Studio Components

73

Business components
tSalesforceGetDeleted

In the Salesforce WebService URL filed, use the by-default URL of the Salesforce Web service or enter the URL you want to access. In the Username and Password fields, enter your login and password for the Web service. From the Module list, select the object you want to access, Account in this example. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually. In the Start Date and End Date fields, enter respectively the start and end dates for collecting the deleted data using the following date format: yyyy-MM-dd HH:mm:ss. You can collect deleted data over the past 30 days. In this example, we want to recover deleted data over the past 5 days. Double-click tLogRow to display its Basic settings view and define the component properties. Click Sync columns to retrieve the schema from the preceding component. In the Mode area, select Vertical to display the results in a tabular form on the console. Save your Job and press F6 to execute it.

74

Talend Open Studio Components

Business components
tSalesforceGetDeleted

Deleted data collected by the tSalesforceGetDeleted component is displayed in a tabular form on the console.

Talend Open Studio Components

75

Business components
tSalesforceGetServerTimestamp

tSalesforceGetServerTimestamp
tSalesforceGetServerTimestamp properties
Component family Business

Function Purpose Basic settings

tSalesforceGetServerTimestamp retrieves the current date of the Salesforce server. This component retrieves the current date of the Salesforce server presented in a timestamp format. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Timeout (milliseconds) Schema and Edit Schema Type in the Webservice user authentication data. Type in the intended number of query timeout in Salesforce.com. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Select this check box if you want to use a proxy server Once selected, you need enter the connection parameters that are the host, the port, the username and the passerword of the Proxy you need to use.

Advanced settings

Use Socks Proxy

76

Talend Open Studio Components

Business components
tSalesforceGetServerTimestamp

Use Soap Compression

Select this check box to activate the SOAP compression. The compression of the SOAP messages optimizes system performance.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation You can use this component as an output component. tSalesforceGetServerTimestamp requires an input component. n/a

Related scenarios
No scenario is available for this component yet.

Talend Open Studio Components

77

Business components
tSalesforceGetUpdated

tSalesforceGetUpdated
tSalesforceGetUpdated properties
Component family Business

Function Purpose Basic settings

tSalesforceGetUpdated recovers updated data from a Salesforce object over a given period of time. This component can collect all updated data from a given Salesforce object during a specific period of time. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Timeout (milliseconds) Module Type in the Webservice user authentication data. Type in the intended number of query timeout in Salesforce.com. Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Schema and Edit Schema

78

Talend Open Studio Components

Business components
tSalesforceGetUpdated

Start Date

Type in between double quotes the date at which you want to start the search. Use the following date format: yyy-MM-dd HH:mm:ss. You can do the search only on the past 30 days. Type in between double quotes the date at which you want to end the search. Use the following date format:yyy-MM-dd HH:mm:ss. Select this check box to activate the SOAP compression. The compression of SOAP messages optimizes system performance.

End Date

Advanced settings

Use Soap Compression

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation You can use this component as an output component. tSalesforceGetUpdate requires an input component. n/a

Related scenarios
No scenario is available for this component yet.

Talend Open Studio Components

79

Business components
tSalesforceInput

tSalesforceInput
tSalesforceInput Properties
Component family Business

Function Purpose Basic settings

tSalesforceInput connects to an object of a Salesforce database via the relevant webservice. Allows to extract data from a Salesforce DB based on a query. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Timeout (milliseconds) Module Type in the Webservice user authentication data. Type in the intended number of query timeout in Salesforce.com. Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to.

80

Talend Open Studio Components

Business components
tSalesforceInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. To retrieve a column from a linked module it is necessary to define the column in a particular manner in the Edit schema view, otherwise the relationship query will not work. The correct syntax is: NameofCurrentModule_NameofLinke dModule_NameofColumnofInterest.

Query condition Maunal input of SOQL query Query all records (include deleted records) Advanced settings Batch Size Use Socks Proxy

Type in the query to select the data to be extracted. Example: account_name= Talend Select this check box to display the Query field where you can manually enter the desired query. Select this check box to query all the records, including the deletions. Number of registrations in each processed batch. Select this check box if you want to use a proxy server. Once selected, you need enter the connection parameters that are the host, the port, the username and the passerword of the Proxy you need to use. Characters, strings or regular expressions used to normalize the data that is collected by queries set on different hierarchical Salsforce objects. Characters, strings or regular expressions used to separate the name of the parent object from the name of the child object when you use a query on the hierarchical relations among the different Salesforce objects. Select this check box to activate the SOAP compression. The compression of SOAP messages optimizes system performance, in particular for the batch operations.

Normalize delimited (for child relationship) Column name delimiter (for child relationship)

Use Soap Compression

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Usually used as a Start component. An output component is required. n/a

Talend Open Studio Components

81

Business components
tSalesforceInput

Scenario: Using queries to extract data from a Salesforce database


This scenario describes a four component Job used to extract specific sets of data from parent and child objects in a Salesforce database. Drop two tSalesforceInput components and two tLogRow components onto the workspace.

Connect each tSalesforceInput component to a tLogRow component using a Row > Main link for each connection. Connect tSalesforceInput_1 to tSalesforceInput_2 using an OnSubjobOk link. Double click tSalesforceInput_1 to set its Basic Settings in the Component tab.

As the Property Type, select Built-In. Enter the Salesforce WebService URL of the database you want to connect to in the corresponding field. Enter your authentication information in the corresponding Username and Password fields. Enter the desired query Timeout (milliseconds) limit. Select the Module (salesforce object) you want to query.

82

Talend Open Studio Components

Business components
tSalesforceInput

Select the Manual input of SOQL Query check box to enter your Query in the corresponding field. Enter your query or relationship query, respecting the SOQL syntax required. In this example, the IsWon and FiscalYear columns in the query are located in the Opportunity module specified. The Name column is in a linked module called Account. To return a column from a linked module the correct syntax is to enter the name of the linked module, followed by the period character, then the name of the column of interest. Hence, the query required in this example is: SELECT IsWon, FiscalYear, Account.Name FROM Opportunity.
To retrieve a column from a linked module, it is necessary to define the column in a particular manner in the Edit schema view. The correct syntax is: NameofCurrentModule_NameofLinkedModule_NameofColumnofInterest. Hence, in this example, the column must be named: Opportunity_Account_Name. If this syntax is not respected then the data from the linked table will not be returned.

Select Built-In as the Schema and click [...] next to Edit schema to open the schema editor.

Edit the schema as required using the [+] and [x] buttons. Add a new column for the fields taken from the Name column in the Account module. Name this column Opportunity_Account_Name. Click OK to save the changes and close the schema editor. Double click tSalesforceInput_2 to set its Basic settings in the Component tab.

Talend Open Studio Components

83

Business components
tSalesforceInput

As the Property Type, select Built-In. Enter the Salesforce WebService URL of the database you want to connect to in the corresponding field. Enter your authentication information in the corresponding Username and Password fields. Enter the desired query Timeout (milliseconds) limit. Select the Module (salesforce object) you want to query. Select the Manual input of SOQL Query check box to enter your Query in the corresponding field. Enter your query or relationship query, respecting the SOQL syntax required. In this example we want to extract the Id and CaseNumber fields from the Case module as well as the Name fields from the Account module. The query is therefore: SELECT Id, CaseNumber, Account.Name FROM Case. Select Built-In as the Schema and click [...] next to Edit schema to open the schema editor.

Edit the schema as required using the [+] and [x] buttons. Add a new column for the fields taken from the Name column in the Account module.

84

Talend Open Studio Components

Business components
tSalesforceInput

Name this column Case_Account_Name. Click OK to save the changes and close the schema editor. Click each tLogRow component and set their Basic settings as desired. In this example there is no need to modify the tLogRow settings. Press F6 to run the Job. The results are displayed in the Run tab:

Talend Open Studio Components

85

Business components
tSalesforceOutput

tSalesforceOutput
tSalesforceOutput Properties
Component family Business

Function Purpose Basic settings

tSalesforceoutput writes in an object of a Salesforce database via the relevant webservice. Allows to write data into a Salesforce DB. Property type Either Built-in or Repository. Built-in: No property data is stored centrally. Repository: Select the Repository file where Properties are stored. The fields that follow are pre-filled in using fetched data. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Timeout (milliseconds) Action Type in the Webservice user authentication data. Type in the intended number of query timeout in Salesforce.com. You can do any of the following operations on the data of the Salesforce object: Insert: insert data. Update: update data. Delete: delete data. Upsert: update and insert data. Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to.

Module

86

Talend Open Studio Components

Business components
tSalesforceOutput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. This check box is selected by default. It allows to transfer output data in batches. You can specify the number of lines per batch in the Rows to commit field. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. If you want to create a file that holds all error logs, click the three-dot button next to this field and browse to the specified file to set its access path and its name. Select this check box if you want to use a proxy server. Once selected, you need enter the connection parameters that are the host, the port, the username and the passerword of the Proxy you need to use. Select this check box to activate the SOAP compression. The compression of SOAP messages optimizes system performance. Select this check box to allow Salesforce.com to return the salesforce ID produced for a new row that is to be inserted. The ID column is added to the processed data schema in Salesforce.com. This option is available only when you have chosen insert action yet not in batch mode, i.e. not in the Extended Output option.

Advanced settings

Extended Output

Die on error

Error logging file

Use Socks Proxy

Use Soap Compression

Retrieve inserted ID

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Used as an output component. An Input component is required. n/a

Scenario: Deleting data from the Account object


This scenario describes a two-component Job that removes an entry from the Account object.

Talend Open Studio Components

87

Business components
tSalesforceOutput

Drop tSalesforceInput and tSalesforceOutput from the Palette onto the design workspace. Connect the two components together using a Row > Main link. Double-click tSalesforceInput to display its Basic settings view and define the component properties.

From the Property Type list, select Repository if you have already stored the connection to the salesforce server in the Metadata node of the Repository tree view. The property fields that follow are automatically filled in. If you have not defined the server connection locally in the Repository, fill in the details manually after selecting Built-in from the Property Type list. For more information about metadata, see Managing Metadata. In the Salesforce WebService URL field, use the by-default URL of the Salesforce Web service or enter the URL you want to access or select the Use an existing connection check box to use an established connection. In the Username and Password fields, enter your login and password for the Web service. Type in your intended query timeout in the Timeout (milliseconds) field. In this example, use the default number. From the Module list, select the object you want to access, Account in this example. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually.

88

Talend Open Studio Components

Business components
tSalesforceOutput

In the Query Condition field, enter the query you want to apply. In this example, we want to retrieve the clients whose names are sForce. To do this, we use the query: name=sForce. For a more advanced query, select the Manual input of SOQL query and enter the query manually. Double-click tSalesforceOutput to display its Basic settings view and define the component properties.

In the Salesforce WebService URL field, use the by-default URL of the Salesforce Web service or enter the URL you want to access. In the Username and Password fields, enter your login and password for the Web service. Type in your intended query timeout in the Timeout (milliseconds) field. In this example, use the default number. From the Action list, select the operation you want to carry out. In this example we select Delete to delete the sForce account selected in the previous component. From the Module list, select the object you want to access, Account in this example. Click Sync columns to retrieve the schema of the preceding component. save your Job and press F6 to execute it. Check the content of the Account object and verify that the sForce account(s) is/are deleted from the server.

Talend Open Studio Components

89

Business components
tSalesforceOutputBulk

tSalesforceOutputBulk
tSalesforceOutputBulk Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file and then execute intended actions on the file for your Salesforce.com. These two steps compose the tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family Business

Function Purpose Basic settings

tSalesforceOutputBulk generates files in suitable format for bulk processing. Prepares the file to be processed by tSalesforceBulkExec for executions in Salesforce.com. File Name Append Type in the directory where you store the generated file. Select the check box to write new data at the end of the existing data. Or the existing data will be overwritten. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Schema and Edit Schema

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is intended for the use along with tSalesforceBulkExec component. Used together they gain performance while feeding or modifying information in Salesforce.com. n/a

Usage

Limitation

Scenario: Inserting transformed bulk data into your Salesforce.com


This scenario describes a six-component Job that transforms .csv data suitable for bulk processing, load them in Salesforce.com and then displays the Job execution results in the console.

90

Talend Open Studio Components

Business components
tSalesforceOutputBulk

This Job is composed of two steps: preparing data by transformation and processing the transformed data. Before starting this scenario, you need to prepare the input file which offers the data to be processed by the Job. In this use case, this file is sforcebulk.txt, containing some customer information. Then to create and execute this Job, operate as follows: Drop tFileInputDelimited, tMap, tSalesforceOutputBulk, tSalesforceBulkExec and tLogRow from the Palette onto the workspace of your studio. Use Row > Main link to connect tFileInputDelimited to tMap, and Row > out1 from tMap to tSalesforceOutputBulk. Use Row > Main and Row > Reject to connect tSalesforceBulkExec respectively to the two tLogRow components. Use Trigger > OnSubjobOk link to connect tFileInputDelimited and tSalesforceBulkExec. Double-click tFileInputDelimited to display its Basic settings view and define the component properties.

Talend Open Studio Components

91

Business components
tSalesforceOutputBulk

From the Property Type list, select Repository if you have already stored the connection to the salesforce server in the Metadata node of the Repository tree view. The property fields that follow are automatically filled in. If you have not defined the server connection locally in the Repository, fill in the details manually after selecting Built-in from the Property Type list. For more information about metadata, see Managing Metadata. Next to the File name/Stream field, click the button to browse to the input file you prepared for the scenario, for example, sforcebulk.txt. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually. In this scenario, the schema is made of four columns: Name, ParentId, Phone and Fax.

According to your input file to be used by the Job, set the other fields like Row Separator, Field Separator... Double-click the tMap component to open its editor and set the transformation. Drop all columns from the input table to the output table.

Add .toUpperCase() behind the Name column.


92 Talend Open Studio Components

Business components
tSalesforceOutputBulk

Click OK to validate the transformation. Double-click tSalesforceOutputBulk to display its Basic settings view and define the component properties.

In the File Name field, type in or browse to the directory where you want to store the generated .csv data for bulk processing. Click Sync columns to import the schema from its preceding component. Double-click tSalesforceBulkExect to display its Basic settings view and define the component properties.

Use the by-default URL of the Salesforce Web service or enter the URL you want to access. In the Username and Password fields, enter your login and password for the Web service. In the Bulk file path field, browse to the directory where is stored the generated .csv file by tSalesforceOutputBulk. From the Action list, select the action you want to carry out on the prepared bulk data. In this use case, insert. From the Module list, select the object you want to access, Account in this example. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually. In this example, edit it conforming to the schema defined previously.

Talend Open Studio Components

93

Business components
tSalesforceOutputBulk

Double-click tLogRow_1 to display its Basic settings view and define the component properties.

Click Sync columns to retrieve the schema from the preceding component. Select Table mode to display the execution result. Do the same with tLogRow_2. Save your Job and press F6 to execute it. On the console of the Run view, you can check the execution result.

In the tLogRow_1 table, you can read the data inserted into your Salesforce.com. In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account objects you have accessed. All the customer names are written in upper case.

94

Talend Open Studio Components

Business components
tSalesforceOutputBulkExec

tSalesforceOutputBulkExec
tSalesforceOutputBulkExec Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file and then execute intended actions on the file for your Salesforce.com. These two steps compose the tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family Business

Function Purpose Basic settings

tSalesforceOutputBulkExec executes the intended actions on the .csv bulk data for Salesforce.com. As a dedicated component, tSalesforceOutputBulkExec gains performance while carrying out the intended data operations into your Salesforce.com. Use an existing connection Select this check box to use an established connection from tSalesforceConnection. Once you select it, the Component list field appear allowing you to choose the tSalesforceConnection component to be used. For more information on tSalesforceConnection, see section tSalesforceConnection on page 71. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Salesforce Webservice Type in the webservice URL to connect to the URL Salesforce DB. Username and Password Salesforce Version Bulk file path Action Type in the Webservice user authentication data. Type in the version of the Salesforce you are using. Directory where are stored the bulk data you need to process. You can do any of the following operations on the data of the Salesforce object: Insert: insert data. Update: update data. Upsert: update and insert data.

Talend Open Studio Components

95

Business components
tSalesforceOutputBulkExec

Module

Select the relevant module in the list. if you select the Use Custom module option, you display the Custom Module Name field where you can enter the name of the module you want to connect to. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Specify the number of lines per data batch to be profcessed. Specify the number of bytes per data batch to be processed. Select this check box if you want to use a proxy server. In this case, you should fill in the proxy parameters in the Proxy host, Proxy port, Proxy username and Proxy password fields which appear beneath.

Schema and Edit Schema

Advanced settings

Rows to commit Bytes to commit Use Socks Proxy

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded into Salesforce.com. The bulk data to be processed in Salesforce.com should be .csv format.

Scenario: Inserting bulk data into your Salesforce.com


This scenario describes a four-component Job that submits bulk data into Salesforce.com, executs your intended actions on the data, and ends up with displaying the Job execution results for your reference.

96

Talend Open Studio Components

Business components
tSalesforceOutputBulkExec

Before starting this scenario, you need to prepare the input file which offers the data to be processed by the Job. In this use case, this file is sforcebulk.txt, containing some customer information. Then to create and execute this Job, operate as follows: Drop tFileInputDelimited, tSalesforceOutputBulkExec, and tLogRow from the Palette onto the workspace of your studio. Use Row > Main link to connect tFileInputDelimited to tSalesforceOutputBulkExec. Use Row > Main and Row > Reject to connect tSalesforceOutputBulkExec respectively to the two tLogRow components. Double-click tFileInputDelimited to display its Basic settings view and define the component properties.

From the Property Type list, select Repository if you have already stored the connection to the salesforce server in the Metadata node of the Repository tree view. The property fields that follow are automatically filled in. If you have not defined the server connection locally in the Repository, fill in the details manually after selecting Built-in from the Property Type list. For more information about metadata, see Managing Metadata. Next to the File name/Stream field, click the button to browse to the input file you prepared for the scenario, for example, sforcebulk.txt. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually. In this scenario, the schema is made of four columns: Name, ParentId, Phone and Fax.

Talend Open Studio Components

97

Business components
tSalesforceOutputBulkExec

According to your input file to be used by the Job, set the other fields like Row Separator, Field Separator... Double-click tSalesforceOutputBulkExec to display its Basic settings view and define the component properties.

In Salesforce WebService URL field, use the by-default URL of the Salesforce Web service or enter the URL you want to access. In the Username and Password fields, enter your login and password for the Web service. In the Bulk file path field, browse to the directory where you store the bulk .csv data to be processed.
The bulk file here to be processed must be in .csv format.

From the Action list, select the action you want to carry out on the prepared bulk data. In this use case, insert. From the Module list, select the object you want to access, Account in this example. From the Schema list, select Repository and then click the three-dot button to open a dialog box where you can select the repository schema you want to use for this component. If you have not defined your schema locally in the metadata, select Built-in from the Schema list and then click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema manually. In this example, edit it conforming to the schema defined previously. Double-click tLogRow_1 to display its Basic settings view and define the component properties.

98

Talend Open Studio Components

Business components
tSalesforceOutputBulkExec

Click Sync columns to retrieve the schema from the preceding component. Select Table mode to display the execution result. Do the same with tLogRow_2. Save your Job and press F6 to execute it. On the console of the Run view, you can check the execution result.

In the tLogRow_1 table, you can read the data inserted into your Salesforce.com. In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account objects you have accessed. If you want to transform the input data before submitting them, you need to use tSalesforceOutputBulk and tSalesforceBulkExec in cooperation to achieve this purpose. For further information on the use of the two components, see Scenario: Inserting transformed bulk data into your Salesforce.com on page 90.

Talend Open Studio Components

99

Business components
tSAPCommit

tSAPCommit
tSAPCommit Properties
This component is closely related to tSAPConnection and tSAPRollback. It usually doesnt make much sense to use these components separately in a transaction.
Component family Business/SAP

Function Purpose

Validates the data processed through the Job into the connected server. Using a unique connection, this component commits a global transaction in one go instead of doing that on every row or every batch and thus provides gain in performance. SAPConnection Component list Release Connection Select the tSAPConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tSAPCommit to your Job, your data will be commited row by row. In this case, do not select the Release connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with SAP components, especially with tSAPConnection and tSAPRollback components. n/a

Related scenario
This component is closely related to tSAPConnection and tSAPRollback. It usually doesnt make much sense to use one of these without using a tSAPConnection component to open a connection for the current transaction. For tSAPCommit related scenario, see Scenario: Inserting data in mother/daughter tables on page 594.

100

Talend Open Studio Components

Business components
tSAPConnection

tSAPConnection
tSAPConnection properties
Component family Business

Function Purpose Basic settings

tSAPConnection opens a connection to the SAP system for the current transaction. tSAPConnection allows to commit a whole job data in one go to the SAP system as one transaction. Property type Either Built-in or Repository. Built-in: No property data is stored centrally. Repository: Select the Repository file where Properties are stored. The fields that follow are pre-filled in using fetched data. Connection configuration Client type: enter your usual SAP connection. Userid : enter user login. Password: enter password. Language: specify the language. Host name: enter the IP address of the SAP system. System number: enter the system number.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with other SAP components. n/a

Related scenarios
For a related scenarios, see Scenario 1: Retrieving metadata from the SAP system on page 104 and Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function on page 110.

Talend Open Studio Components

101

Business components
tSAPInput

tSAPInput
tSAPInput Properties
Component family Business

Function Purpose Basic settings

tSAPInput connects to the SAP system using the system IP address. tSAPInput allows to extract data from an SAP system at any level through calling RFC or BAPI functions. Property type Either Built-in or Repository: Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Connection configuration Client type: Enter your SAP usual connection code Userid: Enter the user connection Id. Password: Enter the password. Language: Specify a language. Host name Enter the SAP system IP address. System number Enter the system number. Enter the name of the function you want to use to retrieve data.

FunName

102

Talend Open Studio Components

Business components
tSAPInput

Initialize input

Set input parameters. Parameter Value: Enter between inverted commas the value that corresponds to the parameter you set in the Parameter Name column. Type: Select the type of the input entity to retrieve. Table Name (Structure Name): Enter between inverted commas the table name. Parameter Name: Enter between in,verted commas the name of the field that corresponds to the table set in the Table Name column. When you need different parameter values using the same parameter name, you should enter these values in one row and delimite them with comma.

Outputs

Configure the parameters of the output schema to select the data to be extracted: Schema: Enter the output schema name. Type (for iterate): Select the type of the output entity you want to have. Table Name (Structure Name): Enter between inverted commas the table name. Mapping: Enter between inverted commas the name of the field you want to retrieve data from. You can set as many outgoing Main links used to output data as schemas you added to this Outputs table. This way, data can be grouped into different files.

Connections

Outgoing links (from one component to another): Row: Main, Iterate. Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Advanced settings

Release Connection

Clear this check box to continue to use the selected connection once the component has performed its task.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Usually used as a Start component. An output component is required. n/a

Talend Open Studio Components

103

Business components
tSAPInput

Scenario 1: Retrieving metadata from the SAP system


Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a library validated and provided by SAP (JCO) that allows the user to call functions and retrieve data from the SAP system at Table, RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create SAP connection and call RFC and BAPI functions. This SAP wizard is available only for Talend Integration Suite users. If you are a user of Talend Open Studio or Talend On Demand, you need to set the basic settings for the tSAPInput component manually.

This java scenario uses the SAP wizard to first create a connection to the SAP system, and then call a BAPI function to retrieve the details of a company from the SAP system. It finally displays in Talend Open Studio the company details stored in the SAP system. The below figure shows the company detail parameters stored in the SAP system and that we want to read in Talend Open Studio using the tSAPInput component.

104

Talend Open Studio Components

Business components
tSAPInput

Create a connection to the SAP system using the SAP connection wizard, in this scenario the SAP connection is called sap and is saved in the Metadata node. Call the BAPI function BAPI_COMPANY_GETDETAIL using the SAP wizard to access the BAPI HTML document stored in the SAP system and see the company details. In the Name filter field, type in BAPI* and click the Search button to display all available BAPI functions. Select BAPI_COMPANY_GETDETAIL to display the schema that describe the company details. The three-tab view to the right of the wizard displays the metadata of the BAPI_COMPANY_GETDETAIL function and allows you to set the necessary parameters. The Document view displays the SAP html document about the BAPI_COMPANY_GETDETAIL function. The Parameter view provides information about the input and output parameters required by the BAPI_COMPANY_GETDETAIL function to return values. In the Parameter view, click the Input tab to list the input parameter(s). In this scenario, there is only one input parameter required by BAPI_COMPANY_GETDETAIL and it is called COMPANYID.

In the Parameter view, click the Output tab to list the output parameters returned by BAPI_COMPANY_GETDETAIL. In this scenario, there are two output parameters: COMPANY_DETAIL and RETURN.

Talend Open Studio Components

105

Business components
tSAPInput

Each of these two structure parameters consists of numerous single parameters. The Test it view allows you to add or delete input parameters according to the called function. In this scenario, we want to retrieve the metadata of the COMPANY_DETAIL structure parameter that consists of 14 single parameters.

106

Talend Open Studio Components

Business components
tSAPInput

In the Value column of the COMPANYID line in the first table, enter 000001 to send back company data corresponding to the value 000001. In the Output type list at the bottom of the wizard, select output.table. Click Launch at the bottom of the view to display the value of each single parameter returned by the BAPI_COMPANY_GETDETAIL function. Click Finish to close the wizard and create the connection. The sap connection and the new schema BAI_COMPANY_GETDETAIL display under the SAP Connections node in the Repository tree view. To retrieve the different schemas of the BAPI_COMPANY_GETDETAIL function, do the following: Right-click BAPI_COMPANY_GETDETAIL in the Repository tree view and select Retrieve schema in the contextual menu. In the open dialog box, select the schemas you want to retrieve, COMPANY_DETAIL and RETURN in this scenario. Click Next to display the two selected schemas and then Finish to close the dialog box. The two schemas display under the BAPI_COMPANY_GETDETAIL function in the Repository tree view.

Talend Open Studio Components

107

Business components
tSAPInput

To retrieve the company metadata that corresponds to the 000001 value and display it in Talend Open Studio, do the following: In the Repository tree view, drop the SAP connection you already created to the design workspace to open a dialog box where you can select tSAPConnection from the component list and finally click OK to close the dialog box. The tSAPConnection component holding the SAP connection, sap in this example, displays on the design workspace. Double-click tSAPConnection to display the Basic settings view and define the component properties.

If you store connection details in the Metadata node in the Repository tree view, the Repository mode is selected in the Property Type list and the fields that follow are pre-filled. If not, you need to select Built-in as property type and fill in the connection details manually.

In the Repository tree-view, expand Metadata and sap in succession and drop RFC_READ_TABLE to the design workspace to open a component list. Select tSAPInput from the component list and click OK. Drop tFilterColumns and tLogRow from the Palette to the design workspace. Connect tSAPConnection and tSAPInput using a Trigger > OnSubJobOk link To connect tSAPInput and tLogRow, right-click tSAPInput and select Row > row_COMPANY_DETAIL_1 and then click tLogRow.

108

Talend Open Studio Components

Business components
tSAPInput

In the design workspace, double click tSAPInput to display its Basic settings view and define the component properties. The basic setting parameters for the tSAPInput component display automatically since the schema is stored in the Metadata node and the component is initialized by the SAP wizard.

Select the Use an existing connection check box and then in the Component List, select the relevant tSAPConnection component, sap in this scenario. In the Initialize input area, we can see the input parameter needed by the BAPI_COMPANY_GETDETAIL function. In the Outputs area, we can see all different schemas of the BAPI_COMPANY_GETDETAIL function, in particular, COMPANY_DETAIL that we want to output. In the design workspace, double-click tLogRow to display the Basic settings view and define the component properties. For more information about this component, see tLogRow on page 1305. Save your Job and press F6 to execute it.

Talend Open Studio Components

109

Business components
tSAPInput

tSAPInput retrieved from the SAP system the metadata of the COMPANY_DETAIL structure parameter and tLogRow displayed the information on the console.

Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function


Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a library validated and provided by SAP (JCO) that allows the user to call functions and retrieve data from the SAP system at Table, RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create a SAP connection and call RFC and BAPI functions. This SAP wizard is available only for Talend Integration Suite users. If you are a user of Talend Open Studio or Talend On Demand, you need to set the basic settings for the tSAPInput component manually.

This java scenario uses the SAP wizard to first create a connection to the SAP system, and then call an RFC function to directly read from the SAP system a table called SFLIGHT. It finally displays in Talend Open Studio the structure of the SFLIGHT table stored in the SAP system. Create a connection to the SAP system using the SAP connection wizard, in this scenario the SAP connection is called sap. Call the RFC_READ_TABLE RFC function using the SAP wizard to access the table in the SAP system and see its structure. In the Name filter field, type in RFC* and click the Search button to display all available RFC functions.

110

Talend Open Studio Components

Business components
tSAPInput

Select RFC_READ_TABLE to display the schema that describe the table structure. The three-tab view to the right of the wizard displays the metadata of the RFC_READ_TABLE function and allows you to set the necessary parameters. The Document view displays the SAP html document about the RFC_READ_TABLE function. The Parameter view provides information about the parameters required by the RFC_READ_TABLE function to return parameter values. In the Parameter view, click the Table tab to show a description of the structure of the different tables of the RFC_READ_TABLE function.

Talend Open Studio Components

111

Business components
tSAPInput

The Test it view allows you to add or delete input parameters according to the called function. In this example, we want to retrieve the structure of the SFLIGHT table and not any data.

In the Value column of the DELIMITER line, enter ; as field separator. In the Value column of the QUERY_TABLE line, enter SFLIGHT as the table to query. In the Output type list at the bottom of the view, select output.table. In the Constructure|Table list, select DATA. Click Launch at the bottom of the view to display the parameter values returned by the RFC_READ_TABLE function. In this example, the delimiter is ; and the table to read is SFLIGHT.
112 Talend Open Studio Components

Business components
tSAPInput

Click Finish to close the wizard and create the connection. The sap connection and the RFC_READ_TABLE function display under the SAPConnections node in the Repository tree view. To retrieve the different schemas of the RFC_READ_TABLE function, do the following: In the Repository tree view, right-click RFC_READ_TABLE and select Retrieve schema in the contextual menu. A dialog box displays. Select in the list the schemas you want to retrieve, DATA, FIELDS and OPTIONS in this example. Click Next to open a new view on the dialog box and display these different schemas. Click Finish to validate your operation and close the dialog box. The three schemas display under the RFC_READ_TABLE function in the Repository tree view.

In this example, we want to retrieve the data and column names of the SFLIGHT table and display them in Talend Open Studio. To do that, proceed as the following: In the Repository tree view, drop the RFC_READ_TABLE function of the sap connection to the design workspace to open a dialog box where you can select tSAPInput from the component list and then click OK to close the dialog box. The tSAPInput component displays on the design workspace. Drop two tLogRow components from the Palette to the design workspace. To connect components together: Right-click tSAPInput and select Row > row_DATA_1 and click the first tLogRow component. Right-click tSAPInput and select Row > row_FIELDS_1 and click the second tLogRow components. In this example, we want to retrieve the FIELDS and DATA schemas and put them in two different output flows.

Talend Open Studio Components

113

Business components
tSAPInput

In the design workspace, double-click tSAPInput to open the Basic settings view and display the component properties.

The basic setting parameters for the tSAPInput component display automatically since the schema is stored in the Metadata node and the component is initialized by the SAP wizard. In the Initialize input area, we can see the input parameters necessary for the RFC_READ_TABLE function, the field delimiter ; and the table name SFLIGHT. In the Outputs area, we can see the different schemas of the SFLIGHT table.

114

Talend Open Studio Components

Business components
tSAPInput

In the design workspace, double click each of the two tLogRow components to display the Basic settings view and define the component properties. For more information on the properties of tLogRow, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The tSAPInput component retrieves from the SAP system the column names of the SFLIGHT table as well as the corresponding data. The tLogRow components display the information in a tabular form in the Console.

Talend Open Studio Components

115

Business components
tSAPOutput

tSAPOutput
tSAPOutput Properties
Component family Business

Function Purpose Basic settings

Writes to an SAP system. Allows to write data into an SAP system. Property type Either Built-in or Repository: Built-in: No property data stored centrally. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive. For more information about Dynamic settings, see your studio user guide. Connection configuration Client type: Enter your SAP usual connection code Userid: Enter the user connection Id. Password: Enter the password. Language: Specify a language. Host name Enter the SAP system IP address. System number Enter the system number. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Enter the name of the function you want to use to write data.

Schema and Edit Schema

FunName

116

Talend Open Studio Components

Business components
tSAPOutput

Mapping Advanced settings Release Connection

Set the parameters to select the data to write to the SAP system. Clear this check box to continue to use the selected connection once the component has performed its task.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Usually used as an output component. An input component is required. n/a

Related scenario
For a related scenarios, see Scenario 1: Retrieving metadata from the SAP system on page 104 and Scenario 2: Reading data in the different schemas of the RFC_READ_TABLE function on page 110.

Talend Open Studio Components

117

Business components
tSAPRollback

tSAPRollback
This component is not available in the Palette of the Talend Integration Express Studio.

tSAPRollback properties
This component is closely related to tSAPCommit and tSAPConnection. It usually does not make much sense to use these components separately in a transaction.
Component family Business/SAP

Function Purpose Basic settings

tSAPRollback cancels the transaction commit in the connected SAP. tSAPRollback avoids to commit only a fragment of a transaction. SAPConnection Component list Release Connection Select the tSAPConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is intended to be used along with SAP components, especially with tSAPConnection and tSAPCommit. n/a

Related scenarios
For tSAPRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables on page 636 of the tMysqlRollback.

118

Talend Open Studio Components

Business components
tSugarCRMInput

tSugarCRMInput
tSugarCRMInput Properties
Component family Business

Function Purpose Basic settings

Connects to a Sugar CRM database module via the relevant webservice. Allows you to extract data from a SugarCRM DB based on a query. SugarCRM Webservice URL Username and Password Module Type in the webservice URL to connect to the SugarCRM DB. Type in the Webservice user authentication data. Select the relevant module from the list To use customized tables, select Use custom module from the list. The Custom module package name and Custom module name fields which appear are automatically filled in with the relevant names. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. Type in the query to select the data to be extracted. Example: account_name= Talend

Schema and Edit Schema

Query condition Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. Usually used as a Start component. An output component is required. n/a

Scenario: Extracting account data from SugarCRM


This scenario describes a two-component Job which extracts account information from a SugarCRM database and writes it to an Excel output file.

Drop a tSugarCRMInput and a tFileOutputExcel component onto the workspace.

Talend Open Studio Components

119

Business components
tSugarCRMInput

Connect the input component to the output component using a main row link. On the tSugarCRMInput Component view, fill in the connection information in the SugarCRM Web Service URL as well as the Username and Password fields Then select the Module from the list of modules offered. In this example, Accounts is selected.

The Schema is then automatically set according to the module selected. But you can change it and remove the columns that you dont require in the output. In the Query Condition field, type in the query you want to extract from the CRM. In this example: billing_address_city=Sunnyvale Then select the tFileOutputExcel component.

Set the destination file name as well as the Sheet name and select the Include header check box. Save the Job and press F6 to run it.

120

Talend Open Studio Components

Business components
tSugarCRMInput

The filtered data is output in the defined spreadsheet of the specified Excel type file.

Talend Open Studio Components

121

Business components
tSugarCRMOutput

tSugarCRMOutput
tSugarCRMOutput Properties
Component family Business

Function Purpose Basic settings

Writes in a Sugar CRM database module via the relevant webservice. Allows you to write data into a SugarCRM DB. SugarCRM WebService URL Username and Password Module Type in the webservice URL to connect to the SugarCRM DB. Type in the Webservice user authentication data. Select the relevant module from the list To use customized tables, select Use custom module from the list. The Custom module package name and Custom module name fields which appear are automatically filled in with the relevant names. Insert or Update the data in the SugarCRM module. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job.

Action Schema and Edit schema

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. Used as an output component. An Input component is required. n/a

Related Scenario
No scenario is available for this component yet.

122

Talend Open Studio Components

Business components
tVtigerCRMInput

tVtigerCRMInput
tVtigerCRMInput Properties
Component family Business/VtigerCRM

Function Purpose Basic settings Vtiger Version Vtiger 5.0

Connects to a module of a VtigerCRM database. Allows to extract data from a VtigerCRM DB.

Select the version of the Vtiger Web Services you want to use (either Vtiger 5.0 or Vtiger 5.1) Server Address Port Vtiger Path Username and Password Version Module Method Type in the IP address of the VtigerCRM server Type in the Port number to access the server Type in the path to access the VtigerCRM server Type in the user authentication data. Type in the version of VtigerCRM you are using. Select the relevant module in the list Select the relevant method in the list. The method specifies the action you can carry out on the VtigerCRM module selected. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. Type in the URL address of the invoked Web server. Type in the user name to log in to the vTigerCRM.. Type in the access key for the user name. Type in the query to select the data to be extracted. Manually type in your query in the corresponding field.

Schema and Edit Schema

Vtiger 5.1

Endpoint Username Access key Query condition Manual input of SQL query

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usually used as a Start component. An output component is required. n/a

Usage Limitation

Talend Open Studio Components

123

Business components
tVtigerCRMInput

Related Scenario
No scenario is available for this component yet.

124

Talend Open Studio Components

Business components
tVtigerCRMOutput

tVtigerCRMOutput
tVtigerCRMOutput Properties
Component family Business/VtigerCRM

Function Purpose Basic settings Vtiger Version Vtiger 5.0

Writes data into a module of a VtigerCRM database. Allows to write data from a VtigerCRM DB.

Select the version of the Vtiger Web Services you want to use (either Vtiger 5.0 or Vtiger 5.1) Server Address Port Vtiger Path Username and Password Version Module Method Type in the IP address of the VtigerCRM server. Type in the Port number to access the server. Type in the path to access the server. Type in the user authentication data. Type in the version of VtigerCRM you are using. Select the relevant module in the list Select the relevant method in the list. The method specifies the action you can carry out on the VtigerCRM module selected. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected. Type in the URL address of the invoked Web server. Type in the user name to log in to the vTigerCRM.. Type in the access key for the user name. Insert or Update the data in the SugarCRM module. Select the relevant module in the list A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. In this component the schema is related to the Module selected.

Schema and Edit Schema

Vtiger 5.1

Endpoint Username Access key Action Module Schema and Edit Schema

Talend Open Studio Components

125

Business components
tVtigerCRMOutput

Die on error Advanced settings

This check box is clear by default to skip the row on error and complete the process for error-free rows.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Used as an output component. An Input component is required. n/a

Usage Limitation

Related Scenario
No scenario is available for this component yet.

126

Talend Open Studio Components

Business Intelligence components


This chapter details the main components which belong to the Business Intelligence family in the Talend Open Studio Palette. The BI family groups connectors that cover needs such as reading or writing multidimensional or OLAP databases, outputting Jasper reports, tracking DB changes in slow changing dimension tables and so on.

Business Intelligence components


tBarChart

tBarChart
tBarChart properties
Component family Business Intelligence/Charts tBarChart reads data from an input flow and transforms the data into a bar chart in a PNG image file. tBarChart generates a bar chart from the input data to ease technical analysis. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. The schema of tBarChart contains three read-only columns named series (string), category (string), and value (integer) respectively, in a fixed order. The data in any extra columns will be only passed to the next component, if any, without being presented in the bar chart. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the output component.

Function Purpose Basic settings

Generated image path Name and path of the output image file. Chart title Include legend Enter the title of the bar chart to be generated. Select this check box if you want the bar chart to include a legend, indicating all series in different colors. Select this check box to create an image with 3D effect. By default, this check box is selected and the bars representing the series of each category will be stacked one over another. If this check box is cleared, a 2D image will be created, with the bars displayed one besides another along the category axis. Enter the width and height of the image file, in pixels. Enter the category axis name and value axis name.

3Dimensions

Image width and Image height Category axis name and Value axis name

128

Talend Open Studio Components

Business Intelligence components


tBarChart

Foreground alpha

Enter an integer in the range of 0 to 100 to define the transparency of the image. The smaller the number you enter, the more transparent the image will be. Select the plot orientation of the bar chart: VERTICAL or HORIZONTAL.

Plot orientation Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is mainly used as Output component. It requires an Input component and Row main link as input.

Scenario: Creating a bar chart from the input data


This scenario describes a simple Job that reads data from a CSV file and transforms the data into a bar chart. The input file is shown below:

Because the input file has a different structure than the one required by the tBarChart component, this use case uses the tMap component to map the data to a three-column CSV file before using the tBarChart component to generate a bar chart file.
You will usually use the tMap component to adjust the input schema in accordance with the schema structure of the tBarChart component. For more information about how to use the tMap component, see Mapping data flows of Talend Open Studio User Guide and the section related to the tMap on page 1436.

Talend Open Studio Components

129

Business Intelligence components


tBarChart

Drop the following components from the Palette to the design workspace: two tFileInputDelimited components, a tMap, three tFileOutputDelimited components, and a tBarChart. Relabel the components to best describe their functionality. Double-click the first tFileInputDelimited component to display its Basic settings view.

Fill in the File name field by browsing to the input file. Specify the number of header rows. In this use case, you have only one header row. Leave the other parameters as they are. Click Edit schema to describe the data structure of the input file. In this use case, the input schema is made of four columns: City, Population, Area, and Density. Upon defining the column names and data types, click OK to close the schema dialog box.

Connect the tFileInputDelimited to the tMap using a Row > Main connection. Double-click the tMap to open the Map Editor.

130

Talend Open Studio Components

Business Intelligence components


tBarChart

Click the green plus button on top of the output panel to add three output tables: Population, Area, and Density. These output table names will appear as the labels of the connections linking the tMap to the output components on the design workspace. Use the Schema editor to add three columns to each output table: series (string), category (string), and value (integer). In the relevant Expression field of the output tables, enter the series names, as shown above. These series names will appear in the legend of the bar chart. Drop the City column of the input table onto the category column of each output table. Drop the Population column of the input table onto the value column of the Population table. Drop the Area column of the input table onto the value column of the Area table. Drop the Density column of the input table onto the value column of the Density table. Click OK to save the mappings and close the Map Editor. Right-click the tMap component and select Row > Population to connect it to the first tFileOutputDelimited component. Connect the tMap to the other tFileOutputDelimited components in the same way but by selecting Area and Density respectively. Double-click the first tFileOutputDelimited component to display its Basic settings view.

Talend Open Studio Components

131

Business Intelligence components


tBarChart

In the File Name field, define a CSV file to send the mapped data flows to. In this use case, we name the output file to be created LargeCities_mapped.csv. This file will be used as the input to the tBarChart component. If an existing file name is specified, make sure that the Append check box is cleared. Leave the other parameters as they are. For the other two tFileOutputDelimited components, use the same file path as defined for the first tFileOutputDelimited component, and select the Append check box. Make sure that the Append check box is selected so that all the mapped data flows will go to the same file without overwriting the existing data. Connect the first tFileInputDelimited component to the second tFileInputDelimited component using a Trigger > OnSubjobOK connection. Connect the second tFileInputDelimited component to the tBarChart using a Row > Main connection. Double-click the second tFileInputDelimited component to display its Basic settings view.

Fill in the File name field with the file path and name defined in the Basic settings view of each of the tFileOutputDelimited components. In this use case, the input file to the tBarChart is LargeCities_mapped.csv. Leave the other parameters as they are. As the input schema needs to have a structure required by the tBarChart component, we will copy the structure from the schema of the tBarChart component. Double-click the tBarChart component to display its Basic settings view.

132

Talend Open Studio Components

Business Intelligence components


tBarChart

In the Generated image path field, define the file path of the image file to be generated. In the Chart title field, define a title for the bar chart. Define the category and series axis names. Define the size and transparency degree of the image if needed. In this use case, we simply use the default settings. Click Edit schema to open the schema dialog box.

Copy all the columns from the output schema to the input schema by clicking the left-pointing double arrow button. Then, click OK to close the schema dialog box. Save your Job and press F6 to launch it. A bar chart is generated as defined.

Talend Open Studio Components

133

Business Intelligence components


tBarChart

134

Talend Open Studio Components

Business Intelligence components


tDB2SCD

tDB2SCD
tDB2SCD properties
Component family Databases/DB2

Function Purpose Basic settings

tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table. tDB2SCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio User Guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Table Schema Username and Password Table Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

Talend Open Studio Components

135

Business Intelligence components


tDB2SCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenarios
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

136

Talend Open Studio Components

Business Intelligence components


tDB2SCDELT

tDB2SCDELT
tDB2SCDELT Properties
Component family Databases/DB2

Function Purpose

tDB2SCDELT reflects and tracks changes in a dedicated DB2 SCD table. tDB2SCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated DB2 SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Source table Table The IP address of the database server. Listening port number of database server. Name of the database User authentication data for a dedicated database. Name of the input DB2 SCD table. Name of the table to be written. Note that only one table can be written at a time

Basic settings

Talend Open Studio Components

137

Business Intelligence components


tDB2SCDELT

Action on table

Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create table: The table is removed and created again Create table: A new table gets created. Create table if not exists: A table gets created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes.

138

Talend Open Studio Components

Business Intelligence components


tDB2SCDELT

Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the start date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record. Advanced settings Debug mode Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see tDB2SCD on page 135 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

139

Business Intelligence components


tGreenplumSCD

tGreenplumSCD
tGreenplumSCD Properties
Component family Databases/Greenplum

Function Purpose Basic settings

tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD table. tGreenplumSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type Host Port Database Schema Username and Password Table Select the relevant driver on the list. Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

140

Talend Open Studio Components

Business Intelligence components


tGreenplumSCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

141

Business Intelligence components


tInformixSCD

tInformixSCD
tInformixSCD properties
Component family Databases/Business Intelligence/Informix tInformixSCD tracks and shows changes which have been made to Informix SCD dedicated tables. tInformixSCD addresses Slowly Changing Dimension transformation needs, by regularly reading a data source and listing the modifications in an SCD dedicated table. Property type Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Instance Database server IP address. DB server listening port. Name of the database. Name of the schema. User authentication information. Name of the Informix instance to be used. This information can generally be found in the SQL hosts file.

Function Purpose

Basic settings

142

Talend Open Studio Components

Business Intelligence components


tInformixSCD

Table Schema and Edit schema

Name of the table to be created A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to improve system performance. Select this check box when the database is configured in NO_LOG mode. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Use Transaction Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component level. Debug mode Select this check box to display each step of the process by which data is written in the database.

Usage

This component is an output component. Consequently, it requires an input component and a connection of the Row > Main type.

Related scenario
For a scenarion in which tInformixSCD might be used, see, the tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

143

Business Intelligence components


tIngresSCD

tIngresSCD
tIngresSCD Properties
Component family Databases/Ingress

Function Purpose Basic settings

tIngresSCD reflects and tracks changes in a dedicated Ingres SCD table. tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well desactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your Studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields to follow are pre-filled in using fetched data. Server Port Database Username and Password Table Schema and Edit schema Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository.

144

Talend Open Studio Components

Business Intelligence components


tIngresSCD

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

145

Business Intelligence components


tLineChart

tLineChart
tLineChart properties
Component family Business Intelligence/Charts tLineChart reads data from an input flow and transforms the data into a line chart in a PNG image file. tLineChart generates a line chart from the input data to ease technical analysis. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. The schema of tLineChart contains three read-only columns named series (string), x (integer), and y (integer) respectively, in a fixed order. The data in any extra columns will be only passed to the next component, if any, without being presented in the generated line chart. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the output component.

Function Purpose Basic settings

Generated image path Name and path of the output image file. Chart title Enter the title of the line chart to be generated.

Domain axis label and Enter the domain axis (X axis) and range axis (Y axis) Range axis label labels. Plot orientation Include legend Select the plot orientation of the range axis: Vertical or Horizontal. Select this check box if you want your line chart to include a legend, indicating the lines of different series in different colors. Enter the width and height of the image, in pixels.

Image width and Image height

146

Talend Open Studio Components

Business Intelligence components


tLineChart

Moving average

Select this check box to show a moving average for each series on your line chart. With this check box selected, the Period field appears, letting you define a period of which you want to show the moving average. Define the lowest and highest values to be displayed on the range axis.

Lower bound and Upper bound

Chart background and Select the chart background color and the plot area Plot background background color. Advanced settings Usage tStatCatcher Statistics Select this check box to collect log data at the component level. This component is mainly used as Output component. It requires an Input component and Row main link as input.

Scenario: Creating a line chart to ease trend analysis


This scenario describes a simple Job that reads data from a CSV file and transforms the data into a line chart to facilitate trend analysis. The input file records how long (in minutes) per week a person watches different TV channels over ten weeks, as shown below:

Because the input file has a different structure than required by the tLineChart component, this use case uses the tMap component to map the data to a CSV file that meets the structure requirement before using the tLineChart component to generate a line chart file.
You will usually use the tMap component to adjust the input schema in accordance with the schema structure of the tLineChart component. For more information about how to use the tMap component, see Mapping data flows of Talend Open Studio User Guide and the section related to the tMap on page 1436.

Talend Open Studio Components

147

Business Intelligence components


tLineChart

Drop the following components from the Palette to the design workspace: two tFileInputDelimited components, a tMap, three tFileOutputDelimited components, and a tLineChart. Relabel the components to best describe their functionality. Double-click the first tFileInputDelimited component to display its Basic settings view.

Fill in the File name field by browsing to the input file. Specify the header row. In this use case, the first row of the input file is the header row. Leave the other parameters as they are. Click Edit schema to describe the data structure of the input file. In this use case, the input schema is made of four columns: Week, Mins_TVA, Mins_TVB, and Mins_TVC. Upon defining the column names and data type, click OK to close the schema dialog box.

148

Talend Open Studio Components

Business Intelligence components


tLineChart

Connect the tFileInputDelimited to the tMap using a Row > Main connection. Double-click the tMap to open the Map Editor.

Click the green plus button on top of the output panel to add three output tables: TV_A, TV_B, and TV_C. These output table names will appear as the labels of the connections linking the tMap to the output components on the design workspace. Use the Schema editor to add three columns to each output table: series (string), x (integer), and y (integer). In the relevant Expression field of the output tables, enter the series names, as shown above. These series names will appear in the legend of your line chart. Drop the Week column of the input table onto the x column of each output table. Drop the Mins_TVA column of the input table onto the y column of the TV_A table. Drop the Mins_TVB column of the input table onto the y column of the TV_B table.

Talend Open Studio Components

149

Business Intelligence components


tLineChart

Drop the Mins_TVC column of the input table onto the y column of the TV_C table. Click OK to save the mappings and close the Map Editor. Right-click the tMap component and select Row > TV_A to connect it to the first tFileOutputDelimited component. Connect the tMap to the other tFileOutputDelimited components in the same way but by selecting Row > TV_B and Row > TV_C respectively Double-click the first tFileOutputDelimited component to display its Basic settings view.

In the File Name field, define a CSV file to send the mapped data flows to. In this use case, we name the the output file to be created InputTV.csv. This file will be used as the input to the tLineChart component. If an existing file name is specified, make sure that the Append check box is cleared. Leave the other parameters as they are. For the other tFileOutputDelimited components, use the same file path as defined for the first tFileOutputDelimited component, and select the Append check box.
.Make sure that the Append check box is selected so that the mapped data flows

will go to the same file without overwriting the existing data. Connect the first tFileInputDelimited component to the second tFileInputDelimited component using a Trigger > OnSubjobOK connection. Connect the second tFileInputDelimited component to the tLineChart using a Row > Main connection. Double-click the second tFileInputDelimited component to display its Basic settings view.

150

Talend Open Studio Components

Business Intelligence components


tLineChart

Fill in the File name field with the file path and name defined in the Basic settings view of each of the tFileOutputDelimited components. In this use case, the input file to the tLineChart is InputTV.csv. Leave the other parameters as they are. As the input schema needs to have a structure required by the tLineChart component, we will copy the structure from the schema of the tLineChart component. Double-click the tLineChart component to display its Basic settings view.

Click Edit schema to open the schema dialog box.

Copy all the columns from the output schema to the input schema by clicking the left-pointing double arrow button. Then, click OK to close the schema dialog box. In the Generated image path field, define the path of the image file to be generated. In the Chart title field, define a title for the line chart. In this use case, the chart title is Average Weekly Viewing (per person). Define the domain (X) and range (Y) axis labels. In this use case, the axis labels are Week and Minutes respectively.
Talend Open Studio Components 151

Business Intelligence components


tLineChart

Define the image size, the moving average period, the lower and upper bounds, the chart background color, and the background color of the plot area, as you prefer. In this use case, we set the image size to 450 by 450, set the lower and upper bounds to 210 and 340 respectively, select light gray as the chart background color, and keep the rest settings are they are. Save your Job and press F6 to launch it. A line chart is generated as defined, showing a comparison of the average weekly viewing time and the viewing trends of different TV channels over the past ten weeks.

152

Talend Open Studio Components

Business Intelligence components


tMondrianInput

tMondrianInput
tMondrianInput Properties
Component family Business Intelligence/OLAP Cube tMondrianInput reads data from relational databases and produces multidimensional data sets based on an MDX query. tMondrianInput executes a multi-dimensional expression (MDX) query corresponding to the dataset structure and schema definition. Then it passes on the multidimensional dataset obtained to the next component via a Main row link. Mondrian Version DB type Property type Select the Mondrian version you are using. Select the relevant type of relational database Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Datasource Username and Password Schema and Edit Schema Name and path of the file containing the data. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Catalog MDX Query Path to the catalog (structure of the data warehouse). Type in the MDX query paying particularly attention to properly sequence the fields in order to match the schema definition and the data warehouse structure. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Function Purpose

Basic settings

Encoding

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component covers MDX queries for multi-dimensional datasets.

Talend Open Studio Components

153

Business Intelligence components


tMondrianInput

Scenario: Cross-join tables


This Job extracts multi-dimensional datasets from relational database tables stored in a MySQL base. The data are retrieved using a multidimensional expression (MDX query). Obviously you need to have to know the structure of your data, or at least have a structure description (catalog) as a reference for the dataset to be retrieved in the various dimensions.

Drop tMondrianInput and tLogRow from the Palette to the design workspace. Link the Mondrian connector to the output component using a Row Main connection. Select the tMondrianInput component and select the Component view.

In DB type field, select the relational database you are using with Mondrian. Select the relevant Repository entry as Property type, if you store your DB connection details centrally. In this example the properties are built-in. Fill out the details of connection to your DB: Host, Port, Database name, User Name and Password. Select the relevant Schema in the Repository if you store it centrally. In this example, the schema is to be set (built-in).

154

Talend Open Studio Components

Business Intelligence components


tMondrianInput

The relational database we want to query contains five columns: media, drink, unit_sales, store_cost and store_sales. The query aims at retrieving the unit_sales, store_cost and store_sales figures for various media / drink using an MDX query such as in the example below:

Back on the Basic settings tab of the tMondrianInput component, set the Catalog path to the data warehouse. This catalog describes the structure of the warehouse. Then type in the MDX query such as: "select {[Measures].[Unit Sales], [Measures].[Store Cost], [Measures].[Store Sales]} on columns, CrossJoin( { [Promotion Media].[All Media].[Radio], [Promotion Media].[All Media].[TV], [Promotion Media].[All Media].[Sunday Paper], [Promotion Media].[All Media].[Street Handout] }, [Product].[All Products].[Drink].children) on rows from Sales where ([Time].[1997])"

Talend Open Studio Components

155

Business Intelligence components


tMondrianInput

Eventually, select the Encoding type on the list. Select the tLogRow component and select the Print header check box to display the column names on the console. Then press F6 to run the Job.

The console shows the result of the unit_sales, store_cost and store_sales for each type of Drink (Beverages, Dairy, Alcoholic beverages) crossed with each media (TV, Sunday Paper, Street handout) as shown previously in a table form.

156

Talend Open Studio Components

Business Intelligence components


tMSSqlSCD

tMSSqlSCD
tMSSqlSCD Properties
Component family Databases/MSSQL Server tMSSqlSCD reflects and tracks changes in a dedicated MSSQL SCD table. tMSqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well desactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Server Port Schema Database Username and Password Table Database server IP address. Listening port number of DB server. Name of the DB schema. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

Function Purpose Basic settings

Talend Open Studio Components

157

Business Intelligence components


tMSSqlSCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings..

Use memory saving Mode Die on error

Advanced settings

Additional JDBC parameters

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Usage Select this check box to display each step during processing entries in a database.

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

158

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

tMysqlSCD
tMysqlSCD Properties
Component family Databases/MySQL

Function Purpose Basic settings

tMysqlSCD reflects and tracks changes in a dedicated MySQL SCD table. tMysqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. DB Version Host Port Database Username and Password Table Select the Mysql version you are using. Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

Talend Open Studio Components

159

Business Intelligence components


tMysqlSCD

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

Use memory saving mode Die on error

Advanced settings

Additional JDBC Parameters

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Usage Select this check box to display each step during processing entries in a database.

This component is used as Output component. It requires an Input component and Row main link as input.

SCD management methodologies


Slowly Changing Dimensions (SCDs) are dimensions that have data that slowly changes. The SCD editor offers the simplest method of building the data flow for the SCD outputs. In the SCD editor, you can map columns, select surrogate key columns, and set column change attributes through combining SCD types. The below figure illustrates an example of the SCD editor.

160

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

Talend Open Studio Components

161

Business Intelligence components


tMysqlSCD

SCD keys You must choose one or more source keys columns from the incoming data to ensure its unicity. You must set one surrogate key column in the dimension table and map it to an input column in the source table. The value of the surrogate key links a record in the source to a record in the dimension table. The editor uses this mapping to locate the record in the dimension table and to determine whether a record is new or changing. The surrogate key is typically the primary key in the source, but it can be an alternate key as long as it uniquely identifies a record and its value does not change. Source keys: Drag one or more columns from the Unused panel to the Source keys panel to be used as the key(s) that ensure the unicity of the incoming data. Surrogate keys: Set the column where the generated surrogate key will be stored. A surrogate key can be generated based on a method selected on the Creation list. Creation: Select any of the below methods to be used for the key generation: Auto increment: auto-incremental key. Input field: key is provided in an input field. When selected, you can drag the appropriate field from the Unused panel to the complement field. Routine: from the complement field, you can press Ctrl+ Space to display the autocompletion list and select the appropriate routine. Table max +1: the maximum value from the SCD table is incremented to create a surrogate key. DB Sequence: the database sequence enables you to generate a number sequence which is used to autogenerate a unique identifier for a field. From the complement field, you can indicate the name of the database sequence available.
This option is only available through the SCD Editor of the tOracleSCD component.

Combining SCD types The Slowly Changing Dimensions support four types of changes: Type 0 through Type 3. You can apply any of the SCD types to any column in a source table by a simple drag-and-drop operation. Type 0: is not used frequently. Some dimension data may be overwritten and other may stay unchanged over time. This is most appropriate when no effort has been made to deal with the changing dimension issues. Type 1: no history is kept in the database. New data overwrites old data. Use this type if tracking changes is not necessary. this is most appropriate when correcting certain typos, for example the spelling of a name.

162

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

Type2: the whole history is stored in the database. This type tracks historical data by inserting a new record in the dimensional table with a separate key each time a change is made. This is most appropriate to track updates, for example. SCD Type 2 principle lies in the fact that a new record is added to the SCD table when changes are detected on the columns defined. Note that although several changes may be made to the same record on various columns defined as SCD Type 2, only one additional line tracks these changes in the SCD table. The SCD schema in this type should include SCD-specific extra columns that hold standard log information such as: -start: adds a column to your SCD schema to hold the start date. You can select one of the input schema columns as a start date in the SCD table. -end: adds a column to your SCD schema to hold the end date value for a record. When the record is currently active, the end date is NULL or you can select Fixed Year Value and fill in a fictive year to avoid having a null value in the end date field. -version: adds a column to your SCD schema to hold the version number of the record. -active: adds a column to your SCD schema to hold the true or false status value. this column helps to easily spot the active record. Type 3: only the information about a previous value of a dimension is written into the database. This type tracks changes using separate columns. This is most appropriate to track only the previous value of a changing column.

Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
This five-component Java scenario describes a Job that tracks changes in four of the columns in a source delimited file, writes changes and the history of changes in an SCD table, and displays error information on the Run console. The source delimited file contains various personal details including firstname, lastname, address, city, company, age, and status. An id column helps ensuring the unicity of the data.

We want any change in the marital status to overwrite the existing old status record. This type of change is equivalent to an SCD Type 1. We want to insert a new record in the dimensional table with a separate key each time a person changes his/her company. This type of change is equivalent to an SCD Type 2.

Talend Open Studio Components

163

Business Intelligence components


tMysqlSCD

We want to track only the previous city and previous address of a person. This type of change is equivalent to an SCD Type 3. To realize this kind of scenario, it is better to divide it into three main steps: defining the main flow of the Job, setting up the SCD editor, and finally creating the relevant SCD table in the database. Step 1: Defining the main flow of the job: Drop the following components from the Palette onto the design workspace: a tMysqlConnection, a tFileInputDelimited, a tMysqlSCD, a tMysqlCommit, and two tLogRow components. Connect the tFileInputDelimited, the first tLogRow, and the tMysqlSCD using the Row Main link. This is the main flow of your Job. Connect the tMysqlConnection to the tFileInputDelimited and tMysqlSCD to tMysqlCommit using the OnComponntOk trigger. Connect the tMysqlSCD to the second tLogRow using the Row Rejects link. Two columns, errorCode and errorMessage, are added to the schema. This connection collects error information.

In the design workspace, double-click tMysqlConnection to display its Basic settings view and set the database connection details manually. The tMysqlConnection component should be used to avoid setting several times the same DB connection when multiple DB components are used.
If you have already stored the connection details locally in the Repository, drop the needed metadata item to the design workspace and the database connection detail will automatically display in the relevant fields. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide.

In this scenario, we want to connect to the SCD table where changes in the source delimited file will be tracked down.

164

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

In the design workspace, double-click tFileInputDelimited to display its Basic settings view.

Click the three-dot button next to the File Name field to select the path to the source delimited file, dataset.csv in this scenario, that contains the personal details. Define the row and field separators used in the source file.
The File Name, Row separator, and Field separators are mandatory.

If needed, set Header, Footer, and Limit. In this scenario, set Header to 1. Footer and limit for the number of processed rows are not set. Click Edit schema to describe the data structure of the source delimited file. In this scenario, the source schema is made of eight columns: id, firstName, lastName, address, city, company, age, and status.

Talend Open Studio Components

165

Business Intelligence components


tMysqlSCD

Define the basic settings for the first tLogRow in order to view the content of the source file with varying attributes in cells of a table on the console before being processed through the SCD component. In the design workspace, click the tMysqlSCD and select the Component tab to define its basic settings.

In the Basic settings view, select the Use an existing connection check box to reuse the connection details defined on the tMysqlConnection properties. In the Table field, enter the table name to be used to track changes. If needed, click Sync columns to retrieve the output data structure from the tFileInputDelimited. In the design workspace, double-click tMysqlCommit to define its basic settings. Select the relevant connection on the Component list if more than one connection exists. Define the basic settings of the second tLogRow in order to view reject information in cells of a table.

166

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

Step 2: Setting up the SCD editor: Double-click the tMysqlSCD component in the design workspace or click the three-dot button next to the SCD Editor in the components Basic settings view to open the SCD editor and build the data flow for the SCD outputs.

All the columns from the preceding component are displayed in the Unused panel of the SCD editor. All the other panels in the SCD editor are empty. From the Unused list, drop the id column to the Source keys panel to use it as the key to ensure the unicity of the incoming data. In the Surrogate keys panel, enter a name for the surrogate key in the Name field, SK1 in this scenario. From the Creation list, select the method to be used for the surrogate key generation, Auto-increment in this scenario. From the Unused list, drop the firstname and lastname columns to the Type 0 panel, changes in these two columns do not interest us. Drop the status column to the Type 1 panel. The new value will overwrite the old value. Drop the company column to the Type 2 panel. Each time a person changes his/her company, a new record will be inserted in the dimensional table with a separate key. In the Versioning area: - Define the start and end columns of your SCD table that will hold the start and end date values. The end date is null for current records until a change is detected. Then the end date gets filled in and a new record is added with no end date. In this scenario, we select Fixed Year Value for the end column and fill in a fictive year to avoid having a null value in the end date field. - Select the version check box to hold the version number of the record. - Select the active check box to spot the column that will hold the True or False status. True for the current active record and False for the modified record. Drop the address and city columns to the Type 3 panel to track only the information about the previous value of the address and city. For more information about SCD types, see SCD management methodologies on page 160.

Talend Open Studio Components

167

Business Intelligence components


tMysqlSCD

Click OK to validate your configuration and close the SCD editor. Click Edit schema to view the input and output data structures. The SCD output schema should include the SCD-specific columns defined in the SCD editor to hold standard log information.

168

Talend Open Studio Components

Business Intelligence components


tMysqlSCD

If you adjust any of the input schema definitions, you need to check, and reconfigure if necessary, the output flow definitions in the SCD editor to ensure that the output data structure is properly updated.

Step 3: Creating the SCD table: In the Basic settings view of the tMysqlSCD component, select Create table if not exists from the Action on table list to avoid creating and defining the SCD table manually. Save your Job and press F6 to execute it. The console shows the content of the input delimited file, and your SCD table is created in your database, containing the initial dataset.

Janet gets divorced and moves to Adelanto at 355 Golf Rd. She works at Greenwood. Adam gets married and moves to Belmont at 2505 Alisson ct. He works at Scoop.

Talend Open Studio Components

169

Business Intelligence components


tMysqlSCD

Martin gets a new job at Phillips and Brothers. Update the delimited file with the above information and press F6 to run your Job. The console shows the updated personal information and the rejected data, and the SCD table shows the history of valid changes made to the input file along with the status and version number. Because the name of Martins new company exceeds the length of the column company defined in the schema, this change is directed to the reject flow instead of being logged in the SCD table.

170

Talend Open Studio Components

Business Intelligence components


tMysqlSCDELT

tMysqlSCDELT
tMysqlSCDELT Properties
Component family Databases/MySQL

Function Purpose

tMysqlSCDELT reflects and tracks changes in a dedicated MySQL SCD table. tMysqlSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated MySQL SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. DB Version Use an existing connection Select the Mysql version you are using. Select this check box and click the relevant tMySqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Source table The IP address of the database server. Listening port number of database server. Name of the database User authentication data for a dedicated database. Name of the input MySQL SCD table.

Basic settings

Talend Open Studio Components

171

Business Intelligence components


tMysqlSCDELT

Table Action on table

Name of the table to be written. Note that only one table can be written at a time Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create the table: The table is removed and created again Create a table: A new table gets created. Create a table if not exists: A table gets created if it does not exist. Clear a table: The table content is deleted. You have the possibility to rollback the operation. Truncate a table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes.

172

Talend Open Studio Components

Business Intelligence components


tMysqlSCDELT

Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the strat date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record. Advanced settings Debug mode Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see tMysqlSCD on page 159 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

173

Business Intelligence components


tOracleSCD

tOracleSCD
tOracleSCD Properties
Component family Databases/Oracle

Function Purpose Basic settings

tOracleSCD reflects and tracks changes in a dedicated Oracle SCD table. tOracleSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type DB Version Host Port Database Schema Username and Password Table Select the relevant driver on the list. Select the Oracle version you are using. Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

174

Talend Open Studio Components

Business Intelligence components


tOracleSCD

Action on table

Select to perform one of the following operations on the table defined: - None: No action is carried out on the table. - Create table: A new table is created. - Create table if not exists: A table is created if it does not exist. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

Use memory saving Mode Die on error

Advanced settings

Additional JDBC parameters

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Usage Select this check box to display each step during processing entries in a database.

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

175

Business Intelligence components


tOracleSCDELT

tOracleSCDELT
tOracleSCDELT Properties
Component family Databases/Oracle

Function Purpose

tOracleSCDELT reflects and tracks changes in a dedicated Oracle SCD table. tOracleSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated DB2 SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Use an existing connection Select this check box and click the relevant tOracleConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type DB Version Host Port Database Username and Password Source table Select the relevant driver on the list. Select the Oracle version you are using. The IP address of the database server. Listening port number of database server. Name of the database User authentication data for a dedicated database. Name of the input DB2 SCD table.

Basic settings

176

Talend Open Studio Components

Business Intelligence components


tOracleSCDELT

Table Action on table

Name of the table to be written. Note that only one table can be written at a time Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create table: The table is removed and created again Create table: A new table gets created. Create table if not exists: A table gets created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes.

Talend Open Studio Components

177

Business Intelligence components


tOracleSCDELT

Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the start date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to display each step during processing entries in a database.

Debug mode

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see tOracleSCD on page 174 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

178

Talend Open Studio Components

Business Intelligence components


tPaloCheckElements

tPaloCheckElements
tPaloCheckElements Properties
Component family Business Intelligence/Cube OLAP/Palo This component checks whether elements are present in an incoming data flow existing in a given cube. This component can be used along with tPaloOutputMulti. It checks if the elements from the input stream exist in the given cube, before writing them. It can also define a default value to be used for nonexistent elements. Use an existing connection Select this check box and choose the relevant DB connection component from the Connection configuration list to use predefined connection details. When a Job contains a parent Job and a child Job, Connection configuration only lists the connection components on the same Job level, so if you need to use an existing connection from another level, ensure that the connection components available are sharing the connection required. For further information about sharing DB connections across Job levels, refer to Use or register a shared DB connection in Database components on page 315, in the properties table of the relevant connection component. Otherwise, you can deactivate the connection components and use the components Dynamic settings to define the connection manually. In this case, ensure that the connection name is not used elsewhere in the job, on any level. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database in which the data is to be written. Type in the name of the cube in which the data should be written. Host Name Enter the host name or the IP address of the host server.

Function Purpose

Basic settings

Talend Open Studio Components

179

Business Intelligence components


tPaloCheckElements

On element error

Select what should happen if an element does not exist: - Reject row: the corresponding row is rejected and placed in the reject flow. - Use default: the defined Default value is used. - Stop: the entire process is interrupted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema in the Talend Open Studio User Guide. Define the elements to be checked in the table provided. - Column: shows the column(s) from the input schema. It is completed automatically once a schema is retrieved or created. - Element type: select the element type for the input column. Only one column can be defined as Measure. - Default: type in the default value to be used if you have selected the Use default option in the On element error field.

Schema and Edit Schema

Advanced settings Usage Connections

tStatCatcher Statistics Select this check box to collect log data on the component level. This component requires an input component. Outgoing links (from one component to another): Row: Main; Rejects Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Rejects For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

This component only works on Normal Palo cubes.

Related scenario
For a related scenario, see Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube on page 222.

180

Talend Open Studio Components

Business Intelligence components


tPaloConnection

tPaloConnection
tPaloConnection Properties
Component family Business Intelligence/Cube OLAP/Palo This component opens a connection to a Palo Server and keeps it open throughout the duration of the process it is required for. Every other Palo component used in the process is able to use this connection. This component allows other components involved in a process to share its connection to a Palo server for the duration of the process. Host Name Server Port Username and Password Advanced settings Usage Connections Enter the host name or the IP address of the host server. Type in the listening port number of the Palo server. Enter the Palo user authentication data.

Function

Purpose Basic settings

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is used along with Palo components to offer a shared connection to a Palo server. Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if, On Subjob Ok, On Subjob Error, On Component Ok, On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Related scenario
For related scenarios, see Scenario: Creating a dimension with elements on page 199.

Talend Open Studio Components

181

Business Intelligence components


tPaloCube

tPaloCube
tPaloCube Properties
Component family Business Intelligence/Cube OLAP/Palo This component creates, deletes or clears Palo cubes from existing dimensions in a Palo database. This component performs operations on a given Palo cube. Use an existing connection Select this check box and choose the relevant DB connection component from the Connection configuration list to reuse predefined connection details. When a Job contains a parent Job and a child Job, Connection configuration only lists the connection components on the same Job level, so if you need to use an existing connection from another level, ensure that the connection components available are sharing the connection required. For further information about sharing DB connections across Job levels, refer to Use or register a shared DB connection in Database components on page 315, in the properties table of the relevant connection component. Otherwise, you can deactivate the connection components and use the components Dynamic settings to define the connection manually. In this case, ensure that the connection name is not used elsewhere in the job, on any level. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database in which the operation is to take place. Type in the name of the cube where the operation is to take place. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

182

Talend Open Studio Components

Business Intelligence components


tPaloCube

Cube type

From the drop-down list, select the type of cube on which the operation is to be carried out: - Normal: this is the normal and default type of cube. - Attribut: an Attribute cube is created with a normal cube. - User Info: User Info cubes can be created/modified with this component. Select the operation you want to carry out on the cube defined: - Create cube: the cube does not exist and will be created. - Create cube if not exists: the cube is created if it does not exist. - Delete cube if exists and create: the cube is deleted if it already exists and a new one will be created. - Delete cube: the cube is deleted from the database. - Clear cube: the data is cleared from the cube. Add rows and enter the name of existing database dimensions to be used in the cube. The order of the dimensions in the list determines the order of the dimensions created.

Action on cube

Dimension list

Advanced settings Usage Global Variables

tStatCatcher Statistics Select this check box to collect log data at the component level. Can be used as a standalone component for dynamic cube creation with a defined dimension list. Cubename: Indicates the name of the cube processed. This is available as an After variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The cube creation process does not create dimensions from scratch, so the dimensions to be used in the cube must be created beforehand.

Scenario: Creating a cube in an existing database


The job in this scenario creates a new two dimensional cube in the Palo demo database Biker.

Talend Open Studio Components

183

Business Intelligence components


tPaloCube

To replicate this scenario, proceed as follows: Drop tPaloCube from the Palette onto the design workspace. Double-click tPaloCube to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, Biker in this example. In the Cube field, type in the name you want to use for the cube to be created, for example, bikerTalend.
184 Talend Open Studio Components

Business Intelligence components


tPaloCube

In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning this cube will be normal and default. In the Action on cube field, select the action to be performed. In this scenario, select Create cube. Under the Dimension list table, click the plus button twice to add two rows into the table. In the Dimension list table, type in the name for each newly added row to replace the default row name. In this scenario, type in Months for the first row and Products for the second. These two dimensions exist already in the Biker database where the new cube will be created. Press F6 to run the Job. A new cube has been created in the Biker database and the two dimensions are added into this cube.

Talend Open Studio Components

185

Business Intelligence components


tPaloCubeList

tPaloCubeList
tPaloCubeList Properties
Component family Business Intelligence/Cube OLAP/Palo This component retrieves a list of cube details from the given Palo database. This component lists cube names, cube types, number of assigned dimensions, the number of filled cells from the given database. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Advanced settings Usage Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database whose cube details you want to retrieve. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used as a start component. It requires an output component.

186

Talend Open Studio Components

Business Intelligence components


tPaloCubeList

Global Variables

Number of cubes: indicates the number of the cubes processed from the given database. This is available as an After variable. Returns an Integer Cube_ID: indicates the IDs of the cubes being processed from the given database. This is available as a Flow variable. Returns an Integer Cubename: indicates the name of the cubes being processed from the given database. This is available as an Flow variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main, Iterate; Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The output schema is fixed and read-only.

Discovering the read-only output schema of tPaloCubeList


The below table presents information related to the read-only schema of the tPaloCubeList component.
Column Cube_id Cube_name Cube_dimensions Cube_cells Cube_filled_cells Cube_status int string int long long int Type Description Internal id of the cube. Name of the cube. Number of dimensions inside the cube. Number of calculated cells inside the cube. Number of filled cells inside the cube. Status of the cube. It may be: - 0: unloaded - 1: loaded - 2: changed Type of the cube. It may be: - 0: normal - 1: system - 2: attribute - 3: user info - 4. gpu type 187

Cube_type

int

Talend Open Studio Components

Business Intelligence components


tPaloCubeList

Scenario: Retrieving detailed cube information from a given database


The job in this scenario retrieves detailed information of the cubes pertaining to the demo Palo database, Biker.

To replicate this scenario, proceed as follows: Drop tPaloCubeList and tLogRow from the component Palette onto the design workspace. Right-click tPaloCubeList to open the contextual menu. From this menu, select Row > Main to link the two components. Double-click the tPaloCube component to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, Biker in this example. Press F6 to run the Job. The cube details are retrieved from the Biker database and are listed in the console of the Run view.

188

Talend Open Studio Components

Business Intelligence components


tPaloCubeList

For further information about how to inteprete the cube details listed in the console, see Discovering the read-only output schema of tPaloCubeList on page 187.

Talend Open Studio Components

189

Business Intelligence components


tPaloDatabase

tPaloDatabase
tPaloDatabase Properties
Component family Business Intelligence/Cube OLAP/Palo This component creates, drops or recreates databases in a given Palo server. This component manages the databases inside a Palo server. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database on which the given operation should take place. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

190

Talend Open Studio Components

Business Intelligence components


tPaloDatabase

Action on database

Select the operation you want to perform on the database of interest: - Create database: the database does not exist and will be created. - Create database if not exists: the database is created when it does not exist. - Delete database if exists and create: the database is deleted if exist and a new one is then created. - Delete database: the database is removed from the server

Advanced settings Usage Global Variables

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used in standalone for database management in a Palo server. Databasename: Indicates the name of the database being processed. This is available as an After variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Creating a database


The job in this scenario creates a new database on a given Palo server.

To replicate this scenario, proceed as follows: Drop tPaloDatabase from the component Palette onto the design workspace.

Talend Open Studio Components

191

Business Intelligence components


tPaloDatabase

Double-click the tPaloDatabase component to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, talenddatabase in this example. In the Action on database field, select the action to be performed. In this scenario, select Create database as the database to be created does not exist. Press F6 to run the Job. A new database is created on the given Palo server.

192

Talend Open Studio Components

Business Intelligence components


tPaloDatabaseList

tPaloDatabaseList
tPaloDatabaseList Properties
Component family Business Intelligence/Cube OLAP/Palo This component retrieves a list of database details from the given Palo server. This component lists database names, database types, number of cubes, number of dimensions, database status and database id from a given Palo server. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Advanced settings Usage Type in the listening port number of the Palo server. Enter the Palo user authentication data. Host Name Enter the host name or the IP address of the host server.

Function Purpose

Basic settings

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used as a start component. It requires an output component.

Talend Open Studio Components

193

Business Intelligence components


tPaloDatabaseList

Global Variables

Number of databases: Indicates the number of the databases processed. This is available as an After variable. Returns a Integer. Database_id: Indicates the id of the database being processed. This is available as an Flow variable. Returns a Long Databasename: Indicates the name of the database processed. This is available as an After variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main; Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The output schema is fixed and read-only.

Discovering the read-only output schema of tPaloDatabaseList


The below table presents information related to the read-only output schema of the tPaloDatabaseList component.
Database Database_id Database_name Database_dimensions Database_cubes Database_status long string int int int Type Description Internal ID of the database. Name of the database. Number of dimensions inside the database. Number of cubes inside the database. Status of the database. - 0 = unloaded - 1 = loaded - 2 = changed Type of the database. - 0 =normal - 1 =system - 3 =user info

Database_types

int

194

Talend Open Studio Components

Business Intelligence components


tPaloDatabaseList

Scenario: Retrieving detailed database information from a given Palo server


The job in this scenario retrieves details of all of the databases from a given Palo server.

To replicate this scenario, proceed as follows: Drop tPaloDatabaseList and tLogRow from the component Palette onto the design workspace. Right-click tPaloDatabaseList to open the contextual menu. From this menu, select Row > Main to link the two components. Double-click the tPaloDatabaseList component to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. Press F6 to run the Job. Details of all of the databases in the Palo server are retrieved and listed in the console of the Run view.

For further information about the output schema, see section Discovering the read-only output schema of tPaloDatabaseList on page 194.
Talend Open Studio Components 195

Business Intelligence components


tPaloDimension

tPaloDimension
tPaloDimension Properties
Component family Business Intelligence/Cube OLAP/Palo This component creates, drops or recreates dimensions with or without dimension elements inside a Palo database. This component manages Palo dimensions, even elements inside a database Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Dimension Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database in which the dimensions are managed. Type in the name of the dimension on which the given operation should take place. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

196

Talend Open Studio Components

Business Intelligence components


tPaloDimension

Action on dimension

Select the operation you want to perform on the dimension of interest: - None: no action is taken on this dimension. - Create dimension: the dimension does not exist and will be created. - Create dimension if not exists: this dimension is created only when it does not exist. - Delete dimension if exists and create: this dimension is deleted if exist and then a new one will be created. - Delete dimension: this dimension is removed from the database. Select this check box to activate the dimension management fields and create dimension elements along with the creation of this dimension.

Create dimension elements

The below fields Dimension type Select the type of the dimension to be created. The are available type may be: only when the - Normal Available Create dimension eleonly when the - User info ments check box is seaction on di- - System lected - Attribute mension is None. Commit size Schema and Edit Schema Type in the number of elements which will be created before saving them inside the dimension. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Consolidation type None With this option, you activate the corresponding parameter fields to be completed. Input Column: select a column from the drop-down list. The columns in the drop-down list are those you defined for the schema. The values from this selected column would be taken to process dimension elements. Select this check box to move directly the incoming elements into the given dimension. With this option, you will not define any consolidations or hierarchy.

Talend Open Studio Components

197

Business Intelligence components


tPaloDimension

Element type: Select the type of elements. It may be: - Numeric - Text Creation mode: Select creation mode for elements to be processed. This mode may be - Add: add simply an element to the dimension. - Force add: force the creation of this element. If exist this element will be recreated. - Update: updates this element if it exists. - Add or Update: if this element does not exist, it will be created otherwise it will be updated. This is the default option. - Delete: delete this element from the dimension Select this check box to create elements and consolidate them inside the given dimension. This consolidation structures the created elements in With this op- different levels. tion, you activate the corresponding parameter fields to be completed. Input Column: select a column from the drop-down list. The columns in the drop-down list are those you defined for the schema. The values from this selected column would be taken to process dimension elements. Element type: Select the type of elements. It may be: - Numeric - Text Creation mode: Select creation mode for elements to be created. This mode may be - Add: add simply an element to the dimension. - Force add: force the creation of this element. If the element exists, it will be recreated. - Update: updates this element if it exists. - Add or Update: if this element does not exist, it will be created, otherwise it will be updated. This is the default option. Select this check box to create elements and structure them based on a parent-child relationship. The input stream is responsible for the grouping of the With this op- consolidation. tion, you activate the corresponding parameter fields to be completed. Elements type Select the type of elements. It may be: - Numeric - Text Consolidation type Self-referenced Consolidation type Normal

198

Talend Open Studio Components

Business Intelligence components


tPaloDimension

Creation mode

Select creation mode for elements to be created. This mode may be - Add: add simply an element to the dimension. - Force add: force the creation of this element. If exist this element will be recreated. - Update: update this element if it exists. - Add or Update: if this element does not exist, it will be created otherwise it will be updated. This is the default option. Input Column: select a column from the drop-down list. The columns in the drop-down list are those you defined for the schema. The values from this selected column would be taken to process dimension elements. Hierarchy Element: select the type and the relationship of this input column in the consolidation. - Parent: set the input value as parent element. - Child: relate the input value to the parent value and build the consolidation. - Factor: define the factor for this consolidation.

Advanced settings Usage Global Variables

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used in standalone or as end component of a process. Dimensionname: Indicates the name of the dimension processed. This is available as an After variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

Deletion of dimension elements is only possible with the consolidation type None. Only consolidation type Self-Referenced allows the placing of an factor on this consolidation.

Scenario: Creating a dimension with elements


The job in this scenario creates a date dimension with simple element hierarchy composed of three levels: Year, Month, Date.

Talend Open Studio Components

199

Business Intelligence components


tPaloDimension

To replicate this scenario, proceed as follows: Drop tPaloConnection, tRowGenerator, tMap, tPaloDimension from the component Palette onto the design workspace. Right-click tPaloConnection to open the contextual menu. From the menu, select Trigger > On Subjob Ok to link it to tRowGenerator. Right click tRowGenerator to open the contextual menu.
tRowGenerator is used to generate rows at random in order to simplify this process. In the real case, you can use one of the other input components to load your actual data.

From the menu, select Row > Main to link it to tMap. Right click tMap to open the contextual menu. From the menu, select Row > New output to link to tPaloDimension and name it as out1 in the dialog box that pops up. Double-click the tPaloConnection component to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.

200

Talend Open Studio Components

Business Intelligence components


tPaloDimension

In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. Double-click tRowGenerator to open its editor.

On the upper part of the editor, click the plus button to add one column and rename it as random_date in the Column column. In the newly added row, select Date in the Type column and getRandomDate in the Functions column. In the Function parameters view on the lower part of this editor, type in the new minimum date and maximum date values in the Value column. In this example, the minimum is 2010-01-01, the maximum is 2010-12-31. Click OK to validate your modifications and close the editor. On the dialog box that pops up, click OK to propagate your changes. Double-click tMap to open its editor.

Talend Open Studio Components

201

Business Intelligence components


tPaloDimension

On the Schema editor view on the lower part of the tMap editor, under the out1 table, click the plus button to add three rows. In the Column column of the out1 table, type in the new names for the three newly added rows. They are Year, Month, and Date. These rows are then added automatically into the out1 table on the upper part of the tMap editor. In the out1 table on the upper part of the tMap editor, click the Expression column in the Year row to locate the cursor. Press Ctrl+space to open the drop-down variable list. Double-click TalendDate.formatDate to select it from the list. The expression to get the date displays in the Year row under the Expression column. The expression is TalendDate.formatDate("yyyy-MM-dd HH:mm:ss",myDate). Replace the default expression with TalendDate.formatDate("yyyy",row1.random_date) . Do the same for the Month row and the Date row to add this default expression and to replace it with TalendDate.formatDate("MM",row1.random_date) for the Month row and with TalendDate.formatDate("dd-MM-yyyy", row1.random_date) for the Date row. Click OK to validate this modification and accept the propagation by clicking OK in the dialog box that pops up. On the workspace, double-click tPaloDimension to open its Component view.

202

Talend Open Studio Components

Business Intelligence components


tPaloDimension

Select the Use an existing connection check box. Then tPaloConnection_1 displays automatically in the Connection configuration field. In the Database field, type in the database in which the new dimension is created, talendDatabase for this scenario. In the Dimension field, type in the name you want to use for the dimension to be created, for example, Date. In the Action on dimension field, select the action to be performed. In this scenario, select Create dimension if not exist. Select the Create dimension elements check box. In the Consolidation Type area, select the Normal check box. Under the element hierarchy table in the Consolidation Type area, click the plus button to add three rows into the table. In the Input column column of the element hierarchy table, select Year from the drop-down list for the first row, Month for the second and Date for the third. This determinates levels of elements from different columns of the input schema. Press F6 to run the Job. A new dimension is then created in your Palo database talendDatabase.

Talend Open Studio Components

203

Business Intelligence components


tPaloDimension

204

Talend Open Studio Components

Business Intelligence components


tPaloDimensionList

tPaloDimensionList
tPaloDimensionList Properties
Component family Business Intelligence/Cube OLAP/Palo This component retrieves a list of dimension details from the given Palo database. This component lists dimension names, dimension types, number of dimension elements, maximum dimension indent, maximum dimension depth, maximum dimension level, dimension id from a given Palo server. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Retrieve cube dimensions Type in the listening port number of the Palo server. Enter the Palo user authentication data. The name of the database where the dimensions of interest reside. Select this check box to retrieve dimension information from an existing cube. Host Name Enter the host name or the IP address of the host server.

Function Purpose

Basic settings

Talend Open Studio Components

205

Business Intelligence components


tPaloDimensionList

Cube Available when you select the Retrieve cube dimensions check box. Schema and Edit Schema

Type in the name of the cube from which dimension information is retrieved.

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings Usage Global Variables

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used in standalone or as start component of a process. Dimension name: Indicates the name of the dimension being processed. This is available as an Flow variable. Returns a String. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The output schema is fixed and read-only.

Discovering the read-only output schema of tPaloDimensionList


The below table presents information related to the read-only output schema of the tPaloDimensionList component.
Database Dimension_id long Type Description Internal ID of the dimension.

206

Talend Open Studio Components

Business Intelligence components


tPaloDimensionList

Database Dimension_name Dimension_attribute_cube Dimension_rights_cube Dimension_elements Dimension_max_level Dimension_max_indent Dimension_max_depth Dimension_type

Type string string string int int int int int

Description Name of the dimension. Name of the cube of attributes. Name of the cube of rights. Number of the dimension elements Maximum level of the dimension Maximum indent of the dimension Maximum depth of the dimension Type of the dimension. - 0 =normal - 1 =system - 2 =attribute - 3 =user info

Scenario: Retrieving detailed dimension information from a given database


The job in this scenario retrieves details of all of the dimensions from a given database.

To replicate this scenario, proceed as follows: Drop tPaloDimensionList and tLogRow from the component Palette onto the design workspace. Right-click tPaloDimensionList to open the contextual menu. From this menu, select Row > Main to link the two components. Double-click the tPaloDimensionList component to open its Component view.

Talend Open Studio Components

207

Business Intelligence components


tPaloDimensionList

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name where the dimensions of interest reside, Biker in this example. Press F6 to run the Job. Details of all the dimensions in the Biker database are retrieved and listed in the console of the Run view.

For further information about the output schema, see section Discovering the read-only output schema of tPaloDimensionList on page 206.

208

Talend Open Studio Components

Business Intelligence components


tPaloInputMulti

tPaloInputMulti
tPaloInputMulti Properties
Component family Business Intelligence/Cube OLAP/Palo This component retrieves data (elements as well as values) from a Palo cube. This component retrieves the stored or calculated values in combination with the element records out of a cube. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database where the elements of interest reside. Type in the name of the cube where the dimension elements to be retrieved are stored. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

Talend Open Studio Components

209

Business Intelligence components


tPaloInputMulti

Cube type

Select the cube type from the drop-down list for the cube of concern. This type may be: - Normal - Attribut - System - User Info Type in the row count of each batch to be retrieved. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. The MEASURE column and the TEXT column are read-only, but you can add other columns aside. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Commit size Schema and Edit Schema

Cube Query

Complete this table to precise the data you want to retrieve. The columns to be filled are: Column: the schema columns are added automatically to this column once defined in the schema editor. The schema columns are used to stored the retrieved dimension elements. Dimensions: type in each of the dimension names of the cube from which you want to retrieve dimension elements. The dimension order listed in this column must be consistent with the order given in the cube that stores these dimensions. Elements: type in the dimension elements from which data is retrieved. If several elements are needed from one single dimension, separate them with a coma.

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component requires an output component.

210

Talend Open Studio Components

Business Intelligence components


tPaloInputMulti

Connections

Outgoing links (from one component to another): Row: Main Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

According to the architecture of OLAP-Systems only one single value (text or numeric) could be retrieved from the cube. The MEASURE column and the TEXT column are fixed and read-only.

Scenario: Retrieving dimension elements from a given cube


The job in this scenario retrieves several dimension elements from a demo Palo cube Sales.

To replicate this scenario, proceed as follows: Drop tPaloInputMulti and tLogRow from the component Palette onto the design workspace. Right-click tPaloInputMulti to open its contextual menu. In the menu, select Row > Main to connect tPaloInputMulti to tLogRow with a row link. Double-click the tPaloInputMulti component to open its Component view.

Talend Open Studio Components

211

Business Intelligence components


tPaloInputMulti

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which the cube to be used is stored. In the Cube field, type in the cube name in which the dimensions of interests are stored. In this scenario, it is one of the demo cubes Sales. In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning this cube will be normal and default. Next to the Edit schema field, click the three-dot button to open the schema editor.

212

Talend Open Studio Components

Business Intelligence components


tPaloInputMulti

In the schema editor, click the plus button to add the rows of the schema to be edited. In this example, add rows corresponding to all of the dimensions stored in the Sales cube: Products, Regions, Months, Years, Datatypes, Measures. Type in them in the order given in this cube. Click OK to validate this editing and accept the propagation of this change to the next component. Then these columns are added automatically into the Column column of the Cube query table in the Component view. If the order is not consistent with the one in the Sales cube, adapt it using the up and down arrows under the schema table. In the Dimensions column of the Cube query table, type in each of the dimension names stored in the Sales cube regarding to each row in the Column column. In the Sales cube, the dimension names are: Products, Regions, Months, Years, Datatypes, Measures. In the Elements columns of the Cube query table, type in the dimension elements you want to retrieve regarding to the dimensions they belong to. In this example, the elements to be retrieved are All Products, Germany and Austria (Belonging to the same dimension Regions, these two elements are entered in the same row and separated with a coma.), Jan, 2009, Actual, Turnover. Click tLogRow to open its Component view.

Talend Open Studio Components

213

Business Intelligence components


tPaloInputMulti

In the Mode area, select the Table (print values in cells of a table) check box to display the execution result in a table. Press F6 to run the Job. The dimension elements and the corresponding Measure values display in the Run console.

214

Talend Open Studio Components

Business Intelligence components


tPaloOutput

tPaloOutput
tPaloOutput Properties
Component family Business Intelligence/Cube OLAP/Palo This component writes one row of data (elements as well as values) into a Palo cube. This component takes the input stream and writes it to a given Palo cube. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Commit size Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database where the cube of interest resides. Type in the name of the cube in which the incoming data is written. Type in the row count of each batch to be written into the cube. 215 Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

Talend Open Studio Components

Business Intelligence components


tPaloOutput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Column as Measure Create element if not exist Save cube at process end Advanced settings Usage Global variable

Select the column from the input stream which holds the Measure or Text values. Select this check box to create the element being processed if it does not exist originally. Select this check box to save the cube you have written the data in at the end of this process.

tStatCatcher Statistics Select this check box to collect log data at the component level. This component requires an input component. Number of lines: Indicates the number of the lines processed. This is available as an After variable. Returns a Integer. Outgoing links (from one component to another): Row: Iterate Trigger: Run if Incoming links (from one component to another): Row: Main; Reject For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Connections

Limitation

This component is able to write only one row of data into a cube.

Related scenario
For related topic, see Scenario 1: Writing data into a given cube on page 219.

216

Talend Open Studio Components

Business Intelligence components


tPaloOutputMulti

tPaloOutputMulti
tPaloOutputMulti Properties
Component family Business Intelligence/Cube OLAP/Palo This component writes data (elements as well as values) into a Palo cube. This component takes the input stream and writes it to a given Palo cube. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database where the cube of interest resides. Type in the name of the cube in which the incoming data is written. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

Talend Open Studio Components

217

Business Intelligence components


tPaloOutputMulti

Cube type

Select the cube type from the drop-down list for the cube of concern. This type may be: - Normal - Attribut - System - User Info Type in the row count of each batch to be written into the cube. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Commit size Schema and Edit Schema

Measure value Splash mode

Select the column from the input stream which holds the Measure or Text values. Select the splash mode used to write data into a consolidated element. The mode may be: - Add: it writes values to the underlying elements. - Default: it uses the default splash mode. - Set: it simply sets or replaces the current value and make the distribution based on the other values. - Disable: it applies no splashing. For further information about the Palo splash modes, see Palos user guide. Select this check box to add new values to the current values for a sum. Otherwise these new values will overwrite the current ones. Select this checkbox to call the supervision server. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Add values

Use eventprocessor Die on error

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component requires an input component.

218

Talend Open Studio Components

Business Intelligence components


tPaloOutputMulti

Connections

Outgoing links (from one component to another): Row: Main Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

Numeric measures are only be accepted as Double or String type. When the string type is used, write the value to be processed between quotation marks.

Scenario 1: Writing data into a given cube


The job in this scenario writes new values in the Sales cube given as demo in the Demo database installed with Palo.

To replicate this scenario, proceed as follows: Drop tFixedFlowInput and tPaloOutputMulti from the component Palette onto the design workspace. Right-click tFixedFlowInput to open its contextual menu. In this menu, select Row > Main to connect this component to tPaloOutputMulti. Double-click the tFixedFlowInput component to open its Component view.

Talend Open Studio Components

219

Business Intelligence components


tPaloOutputMulti

Click the three-dot button to open the schema editor.

In the schema editor, click the plus button to add 7 rows and rename them respectively as Products, Regions, Months, Years, Datatypes, Measures and Values. The order of these rows must be consistent with that of the corresponding dimensions in the Sales cube and the type of the Value column where the measure value resides is set to double/Double. Click OK to validate the editing and accept the propagation prompted by the dialog box that pops up. Then the schema column labels display automatically in the Value table under the Use single table check box, in the Mode area. In the Value table, type in values for each row in the Value column. In this example, these values are: Desktop L, Germany, Jan, 2009, Actual, Turnover, 1234.56. Double-click tPaloOutputMulti to open its Component view.

220

Talend Open Studio Components

Business Intelligence components


tPaloOutputMulti

In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, Demo in this example. In the Cube field, type in the name of the cube you want to write data in, for example, Sales. In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning this cube will be normal and default. In the Measure Value field, select the Measure element. In this scenario, select Value. Press F6 to run the Job. The inflow data has been written into the Sales cube.

Talend Open Studio Components

221

Business Intelligence components


tPaloOutputMulti

Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube
The job in this scenario tries to write data into the Sales cube but as the elements of interest do not exist in this cube, the inflow data is rejected.

To replicate this scenario, proceed as follows: Drop tFixedFlowInput, tPaloCheckElements, tPaloOutputMulti and tLogRow from the component Palette onto the design workspace. Right-click tFixedFlowInput to open its contextual menu. In this menu, select Row > Main to connect this component to tPaloCheckElements. Do the same to connect tPaloOutputMulti using row link. Right-click tPaloCheckElements to open its contextual menu. In this menu, select Row > Reject to connect this component to tLogRow. Double-click the tFixedFlowInput component to open its Component view.

Click the three-dot button to open the schema editor.


222 Talend Open Studio Components

Business Intelligence components


tPaloOutputMulti

In the schema editor, click the plus button to add 7 rows and rename them respectively as Products, Regions, Months, Years, Datatypes, Measures and Values. The order of these rows must be consistent with that of the corresponding dimensions in the Sales cube and the type of the Value column where the measure value resides is set to double/Double. Click OK to validate the editing and accept the propagation prompted by the dialog box that pops up. Then the schema column labels display automatically in the Value table under the Use single table check box, in the Mode area. In the Value table, type in values for each row in the Value column. In this example, these values are: Smart Products, Germany, Jan, 2009, Actual, Turnover, 1234.56. The Smart Products element does not exist in the Sales cube. Double-click tPaloCheckElements to open its Component view.

Talend Open Studio Components

223

Business Intelligence components


tPaloOutputMulti

In the Host name field, type in localhost. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, Demo in this example. In the Cube field, type in the name of the cube you want to write data in, for example, Sales. In the On Element error field, select Reject row from the drop-down list. In the element table at the bottom of the Basic settings view, click the Element type column in the Value row and select Measure from the drop down list. Double-click tPaloOutputMulti to open its Component view.

224

Talend Open Studio Components

Business Intelligence components


tPaloOutputMulti

In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which you want to create the cube, Demo in this example. In the Cube field, type in the name of the cube you want to write data in, for example, Sales. In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning this cube will be normal and default. In the Measure Value field, select the Measure element. In this scenario, select Value. Press F6 to run the Job. The data to be written is rejected and displayed in the console of the Run view. You can read that the error message is Smart Products.

Talend Open Studio Components

225

Business Intelligence components


tPaloRule

tPaloRule
tPaloRule Properties
Component family Business Intelligence/Cube OLAP/Palo This component creates or modifies rules in a given cube. This component allows you to manage rules in a given cube. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Cube rules Type in the listening port number of the Palo server. Enter the Palo user authentication data. Type in the name of the database where the dimensions applying the rules of interest reside. Type in the name of the cube whose dimension information is retrieved. Complete this table to perform various actions on specific rules. Definition: type in the rule to be applied. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

226

Talend Open Studio Components

Business Intelligence components


tPaloRule

External Id: type in the user-defined external ID. Comment: type in comment for this rule. Activated: select this check box to activate this rule. Action: select the action to be performed from the drop-down list. - Create: create this rule. - Delete: delete this rule. - Update: update this rule. Advanced settings Usage Connections tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used in standalone for rule creation, deletion or update. Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide. Limitation Update or deletion of a rule is available only when this rule has been created with external ID.

Scenario: Creating a rule in a given cube


The job in this scenario creates a rule applied on dimensions of a given cube.

To replicate this scenario, proceed as follows: Drop tPaloRule from the component Palette onto the design workspace. Double-click the tPaloRule component to open its Component view.

Talend Open Studio Components

227

Business Intelligence components


tPaloRule

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username field and the Password field, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name in which the dimensions applying the created rules reside, Biker in this example. In the Cube field, type in the name of the cube which the dimensions applying the created rules belong to, for example, Orders. Under the Cube rules table, click the plus button to add a new row. In the Cube rules table, type in ['2009'] = 123 in the Definition column, OrderRule1 in the External Id column and Palo Demo Rules in the Comment column. In the Activated column, select the check box. In the Action column, select Create from the drop-down list. Press F6 to run the Job. The new rule has been created and the value of every 2009 element is 123.

228

Talend Open Studio Components

Business Intelligence components


tPaloRule

Talend Open Studio Components

229

Business Intelligence components


tPaloRuleList

tPaloRuleList
tPaloRuleList Properties
Component family Business Intelligence/Cube OLAP/Palo This component retrieves a list of rule details from the given Palo database. This component lists all rules, formulas, comments, activation status, external IDs from a given cube. Use an existing connection Select this check box and click the relevant DB connection component on the Connection configuration to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Connection configuration presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For further information about Dynamic settings, see your studio user guide. Connection configuration Unavailable when using an existing connection. Server Port Username and Password Database Cube Type in the listening port number of the Palo server. Enter the Palo user authentication data. The name of the database where the cube of interest resides. Type in the name of the cube in which you want to retrieve the rule information. Host Name Enter the host name or the IP address of the host server.

Function Purpose Basic settings

230

Talend Open Studio Components

Business Intelligence components


tPaloRuleList

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings Usage Global Variables

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can be used in standalone or as start component of a process. Number of rules: Indicates the number of the rules processed. This is available as an After variable. Returns a Integer. External ruleID: Indicates the external IDs of the rules being processed. This is available as a Flow variable. Returns a String For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The output schema is fixed and read-only.

Discovering the read-only output schema of tPaloRuleList


The below table presents information related to the read-only output schema of the tPaloRuleList component.
Database rule_identifier rule_definition rule_extern_id rule_comment long string string string Type Description The internal identifier/id for this rule.. The formula of this rule. For further information about this formula, see the Palo user guide. The user-defined external id. The user-edited comment on this rule. 231

Talend Open Studio Components

Business Intelligence components


tPaloRuleList

Database rule_activated

Type boolean

Description Indicates if this rule had been activated or not.

Scenario: Retrieving detailed rule information from a given cube


The job in this scenario retrieves rule details applied on the dimensions of a given cube.

To replicate this scenario, proceed as follows: Drop tPaloRuleList and tLogRow from the component Palette onto the design workspace. Right-click tPaloRuleList to open the contextual menu. From this menu, select Row > Main to link the two components. Double-click the tPaloRuleList component to open its Component view.

In the Host name field, type in the host name or the IP address of the host server, localhost for this example. In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777. In the Username and Password fields, type in the authentication information. In this example, both of them are admin. In the Database field, type in the database name where the dimensions applying the rules of interest reside, Biker in this example. In the Cube field, type in the name of the cube which the rules of interest belong to.

232

Talend Open Studio Components

Business Intelligence components


tPaloRuleList

Press F6 to run the Job. Details of all of the rules in the Orders cube are retrieved and listed in the console of the Run view.

For further information about the output schema, see section Discovering the read-only output schema of tPaloRuleList on page 231.

Talend Open Studio Components

233

Business Intelligence components


tParAccelSCD

tParAccelSCD
tParAccelSCD Properties
Component family Databases/ParAccel

Function Purpose Basic settings

tParAccelSCD reflects and tracks changes in a dedicated ParAccel SCD table. tParAccelSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type Host Port Database Schema Username and Password Table Select the relevant driver on the list. Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

234

Talend Open Studio Components

Business Intelligence components


tParAccelSCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related scenarios, see tMysqlSCD Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

235

Business Intelligence components


tPostgresPlusSCD

tPostgresPlusSCD
tPostgresPlusSCD Properties
Component family Databases/PostgresPl us Server tPostgresPlusSCD reflects and tracks changes in a dedicated MSSQL SCD table. tPostgresPlusSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Server Port Database Schema Username and Password Table Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

Function Purpose Basic settings

236

Talend Open Studio Components

Business Intelligence components


tPostgresPlusSCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance.

Use memory saving Mode Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

237

Business Intelligence components


tPostgresPlusSCDELT

tPostgresPlusSCDELT
tPostgresPlusSCDELT Properties
Component family Databases/Postgresql

Function Purpose

tPostgresPlusSCDELT reflects and tracks changes in a dedicated Oracle SCD table. tPostgresPlusSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated PostgresPlus SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Use an existing connection Select this check box and click the relevant tPostgresPlusConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Source table The IP address of the database server. Listening port number of database server. Name of the database. Exact name of the schema User authentication data for a dedicated database. Name of the input DB2 SCD table.

Basic settings

238

Talend Open Studio Components

Business Intelligence components


tPostgresPlusSCDELT

Table Action on table

Name of the table to be written. Note that only one table can be written at a time Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create table: The table is removed and created again Create table: A new table gets created. Create table if not exists: A table gets created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes.

Talend Open Studio Components

239

Business Intelligence components


tPostgresPlusSCDELT

Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the start date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record. Advanced settings Debug mode Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

240

Talend Open Studio Components

Business Intelligence components


tPostgresqlSCD

tPostgresqlSCD
tPostgresqlSCD Properties
Component family Databases/Postgresql Server tPostgresqlSCD reflects and tracks changes in a dedicated Postrgesql SCD table. tPostgresqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Use an existing connection Either Built-in or Repository. Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Schema Username and Password Table Database server IP address. Listening port number of DB server. Name of the database. Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time.

Function Purpose Basic settings

Talend Open Studio Components

241

Business Intelligence components


tPostgresqlSCD

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SCD Editor

The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows.

Use memory saving Mode Die on error

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Select this check box to display each step during processing entries in a database.

Usage

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenario
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

242

Talend Open Studio Components

Business Intelligence components


tPostgresqlSCDELT

tPostgresqlSCDELT
tPostgresqlSCDELT Properties
Component family Databases/Postgresql

Function Purpose

tPostgresqlSCDELT reflects and tracks changes in a dedicated Postgresql SCD table. tPostgresqlSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated DB2 SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Use an existing connection Select this check box and click the relevant tPostgresqlConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Source table Table The IP address of the database server. Listening port number of database server. Name of the database User authentication data for a dedicated database. Name of the input DB2 SCD table. Name of the table to be written. Note that only one table can be written at a time Talend Open Studio Components 243

Basic settings

Business Intelligence components


tPostgresqlSCDELT

Action on table

Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create table: The table is removed and created again Create table: A new table gets created. Create table if not exists: A table gets created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes. Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the start date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record.

244

Talend Open Studio Components

Business Intelligence components


tPostgresqlSCDELT

Advanced settings

Debug mode

Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

245

Business Intelligence components


tSPSSInput

tSPSSInput
Before being able to benefit from all functional objectives of the SPSS components, make sure to do the following: -If you have already installed SPSS, add the path to the SPSS directory as the following: SET PATH=%PATH%;<DR>:\program\SPSS, or -If you have not installed SPSS, you must copy the SPSS IO spssio32.dll lib from the SPSS installation CD and paste it in Talend root directory.

tSPSSInput properties
Component family Business Intelligence

Function Purpose Basic settings

tSPSSInput reads data from an SPSS .sav file. tSPSSInput addresses SPSS .sav data to write it for example in another file. Sync schema Schema and Edit Schema Click this button to synchronize with the columns of the input SPSS .sav file. The schema metadata in this component is retrieved directly from the input SPSS .sav file and thus is read-only. You can click Edit schema to view the retrieved metadata. Name or path of the SPSS .sav file to be read. Select this check box to translate the labels of the stored values. If you select this check box, you need to retrieve the metadata again.

Filename Translate labels

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is used as a start component. It requires an output flow.

Scenario: Displaying the content of an SPSS .sav file


The following scenario creates a two-component Job, which aims at reading each row of a .sav file and displaying the output on the log console. Drop a tSPSSInput component and a tLogRow component from the Palette onto the design workspace.

Right-click on tPSSInput and connect it to tLogRow using a Main Row link.


246 Talend Open Studio Components

Business Intelligence components


tSPSSInput

Click tSPSSInput to display its Basic settings view and define the component properties.

Click the three-dot button next to the Filename field and browse to the SPSS .sav file you want to read. Click the three-dot button next to Sync schema. A message opens up prompting you to accept retrieving the schema from the defined SPSS file.

Click Yes to close the message and proceed to the next step. If required, click the three-dot button next to Edit schema to view the pre-defined data structure of the source SPSS file.

Click OK to close the dialog box. Save the Job and press F6 to execute it. The SPSS file is read row by row and the extracted fields are displayed on the log console.

Talend Open Studio Components

247

Business Intelligence components


tSPSSInput

To carry out translation on the stored values, complete the following: In the Basic settings view, select the Translate label check box. Click Sync Schema a second time to retrieve the schema after translation. A message opens up prompting you to accept retrieving the schema from the defined SPSS file. Click Yes to close the message and proceed to the next step. A second message opens up prompting you to accept propagating the changes. Click Yes to close the message and proceed to the next step. Save the Job and press F6 to execute it. The SPSS file is read row by row and the extracted fields are displayed on the log console after translating the stored values.

248

Talend Open Studio Components

Business Intelligence components


tSPSSOutput

tSPSSOutput
Before being able to benefit from all functional objectives of the SPSS components, make sure to do the following: -If you have already installed SPSS, add the path to the SPSS directory as the following: SET PATH=%PATH%;<DR>:\program\SPSS, or -If you have not installed SPSS, you must copy the SPSS IO spssio32.dll lib from the SPSS installation CD and paste it in Talend root directory.

tSPSSOutput properties
Component family Business Intelligence

Function Purpose Basic settings

tSPSSOutput writes data entries in an .sav file. tSPSSOutput writes or appends data to an SPSS .sav file. It creates SPSS files on the fly and overwrites existing ones. Sync schema Schema and Edit Schema Click this button to synchronize with the columns of the SPSS .sav file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Filename Write Type Name or path of the SPSS .sav file to be written. Select an operation from the list: Write: simply writes the new data. Append: writes the new data at the end of the existing data.

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component can not be used as start component. It requires an input flow

Scenario: Writing data in an .sav file


This Java scenario describes a very simple Job that writes data entries in an .sav file. Drop a tRowGenerator component and a tSPSSOutput component from the Palette onto the design workspace. Right-click on tRowGenerator and connect it to tSPSSOutput using a Main Row link.

Talend Open Studio Components

249

Business Intelligence components


tSPSSOutput

In the design workspace, double click tRowGenerator to display its Basic Settings view and open its editor. Here you can define your schema.

Click the plus button to add the columns you want to write in the .sav file. Define the schema and set the parameters to the columns.
Make sure to define the length of your columns. Otherwise, an error message will display when building your Job.

Click OK to validate your schema and close the editor. Click tSPSSOutput to display its Basic settings view and define the component properties.

Click the three-dot button next to the Filename field and browse to the SPSS .sav file in which you want to write data.

250

Talend Open Studio Components

Business Intelligence components


tSPSSOutput

Click the three-dot button next to Sync columns to synchronize columns with the previous component. In this example, the schema to be inserted in the .sav file consists of the two columns: id and country. If required, click Edit schema to view/edit the defined schema. From the Write Type list, select Write or Append to simply write the input data in the .sav file or add it to the end of the .sav file. Save the Job and press F6 to execute it. The data generated by the tRowGenerator component is written in the defined .sav file.

Talend Open Studio Components

251

Business Intelligence components


tSPSSProperties

tSPSSProperties
In order to benefit from all of the functional objectives of the SPSS components, do the following: -If you have already installed SPSS, add the path to the SPSS directory as the following: SET PATH=%PATH%;<DR>:\program\SPSS, or -If you have not installed SPSS, you must copy the SPSS IO spssio32.dll lib from the SPSS installation CD and paste it in the Talend root directory.

tSPSSProperties properties
Component family Business Intelligence

Function Purpose Basic settings

tSPSSProperties describes the properties of a defined SPSS .sav file. tSPSSProperties allows you to obtain information about the main properties of a defined SPSS .sav file. Schema and Edit Schema The schema metadata in this component is predefined and thus read-only. You can click Edit schema to view the predefined metadata. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Filename Name or path of the .sav file to be processed.

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. Use this component as a start component. It needs an output flow.

Related scenarios
For related topics, see: Scenario: Displaying the content of an SPSS .sav file on page 246. Scenario: Writing data in an .sav file on page 249.

252

Talend Open Studio Components

Business Intelligence components


tSPSSStructure

tSPSSStructure
Before being able to benefit from all functional objectives of the SPSS components, make sure to do the following: -If you have already installed SPSS, add the path to the SPSS directory as the following: SET PATH=%PATH%;<DR>:\program\SPSS, or -If you have not installed SPSS, you must copy the SPSS IO spssio32.dll lib from the SPSS installation CD and paste it in Talend root directory.

tSPSSStructure properties
Component family Business Intelligence

Function Purpose

tSPSSStructure retrieves information about the variables inside .sav files. tSPSSStructure addresses variables inside .sav files. You can use this component in combination with tFileList to gather information about existing *.sav files to further analyze or check the findings. Schema and Edit Schema The schema metadata in this component is predefined and thus read-only. It is based on the internal SPSS convention. You can click Edit schema to view the predefined metadata. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Filename Name or path of the .sav file to be processed.

Basic settings

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. Use this component as a start component. It needs an output flow.

Related scenarios
For related topics, see: Scenario: Displaying the content of an SPSS .sav file on page 246. Scenario: Writing data in an .sav file on page 249.

Talend Open Studio Components

253

Business Intelligence components


tSybaseSCD

tSybaseSCD
tSybaseSCD properties
Component family Databases/Sybase

Function Purpose Basic settings

tSybaseSCD reflects and tracks changes in a dedicated Sybase SCD table. tSybaseSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes into a dedicated SCD table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Use an existing connection Select this check box and click the relevant DB connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Table Schema and Edit schema Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository.

254

Talend Open Studio Components

Business Intelligence components


tSybaseSCD

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SCD Editor The SCD editor helps to build and configure the data flow for slowly changing dimension outputs. For more information, see SCD management methodologies on page 160. Select this check box to maximize system performance. This check box is cleared by default, meaning to skip the row on error and to complete the process for error-free rows. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

Use memory saving Mode Die on error

Advanced settings

Additional JDBC parameters

tStatCatcher Statistics Select this check box to collect log data at the component level. Debug mode Usage Select this check box to display each step during processing entries in a database.

This component is used as Output component. It requires an Input component and Row main link as input.

Related scenarios
For related topics, see Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

Talend Open Studio Components

255

Business Intelligence components


tSybaseSCDELT

tSybaseSCDELT
tSybaseSCDELT Properties
Component family Databases/Sybase

Function Purpose

tSybaseSCDELT reflects and tracks changes in a dedicated Sybase SCD table. tSybaselSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and logs the changes into a dedicated Sybase SCD table. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Source table Table The IP address of the database server. Listening port number of database server. Name of the database User authentication data for a dedicated database. Name of the input Sybase SCD table. Name of the table to be written. Note that only one table can be written at a time

Basic settings

256

Talend Open Studio Components

Business Intelligence components


tSybaseSCDELT

Action on table

Select to perform one of the following operations on the table defined: None: No action carried out on the table. Drop and create table: The table is removed and created again Create table: A new table gets created. Create table if not exists: A table gets created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Surrogate Key Creation

Select the surrogate key column from the list. Select the method to be used for the surrogate key generation. For more information regarding the creation methods, see SCD keys on page 162. Select one or more columns to be used as keys, to ensure the unicity of incoming data.

Source Keys

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for example. Select the columns of the schema that will be checked for changes. Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for example. Select the columns of the schema that will be checked for changes. Start date: Adds a column to your SCD schema to hold the start date value. You can select one of the input schema columns as Start Date in the SCD table. End Date: Adds a column to your SCD schema to hold the end date value for the record. When the record is currently active, the End Date column shows a null value, or you can select Fixed Year value and fill it in with a fictive year to avoid having a null value in the End Date field. Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This column helps to easily spot the active record. Log versions: Adds a column to your SCD schema to hold the version number of the record.

Talend Open Studio Components

257

Business Intelligence components


tSybaseSCDELT

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to display each step during processing entries in a database.

Debug mode

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is used as an output component. It requires an input component and Row main link as input.

Related Scenario
For related topics, see tMysqlSCD on page 159 Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3) on page 163.

258

Talend Open Studio Components

Custom Code components


This chapter details the major components which belong to the Custom Code family in the Talend Open Studio Palette. The Custom Code components enable you to create codes for specific needs, quickly and efficiently.

Custom Code components


tGroovy

tGroovy
tGroovy Properties
Component Family Custom Code

Function Purpose Basic settings

tGroovy allows you to enter customized code which you can integrate in the Talend programme. The code is run only once. tGroovy broadens the functionality if the Talend Job, using the Groovy language which is a simplified Java syntax. Groovy Script Variables Enter the Groovy code youo want to run. This table has two columns. Name: Name of the variable called in the code.. Value: Value associated with the variable. Select this check box to collect the log data at component level.

Advanced settings Usage Limitation

tStatCatcher Statistics

This component can be used alone or as a subjob along with one other component. Knowledge of the Groovy language is required.

Related Scenarios
For a scenario using the Groovy code, see Scenario: Calling a file which contains Groovy code on page 261. For a functional example, see Scenario: Printing out a variable content on page 263

260

Talend Open Studio Components

Custom Code components


tGroovyFile

tGroovyFile
tGroovyFile Properties
Component Family Custom Code

Function Purpose Basic settings

tGroovyFile allows you to call an existing Groovy script. tGroovyFile broadens the functionaility of Talend Jobs using the Groovy language which is a simplified Java syntax. Groovy File Variables Name and path of the file containing the Groovy code. This table contains two columns. Name: Name of the variable called in the code. Value: Value associated with this variable. Select this check box to collect the log data at component level.

Advanced settings Usage Limitation

tStatCatcher Statistics

This component can be used alone or as a sub-job along with another component. Knowledge of the Groovy language is required.

Scenario: Calling a file which contains Groovy code


This scenario uses tGroovyFile, on its own. The Job calls a file containing Groovy code in order to display the file information in the Console. Below, is an example of the information displayed:

Open the Custom_Code folder in the Palette and drop a tGroovyFile component onto the workspace. Double-click the component to display the Component view. In the Groovy File field, enter the path to the file containing the Groovy code, or browse to the file in your directory. In the Variables table, add a line by clicking the [+] button.
Talend Open Studio Components 261

Custom Code components


tGroovyFile

In the Name column, enter age, then in the Value column, enter 50, as in the screenshot.

Press F6 to save and run the Job. The Console displays the information contained in the input file, to which the variable result is added.

262

Talend Open Studio Components

Custom Code components


tJava

tJava
tJava Properties
Component family Custom Code

Function Purpose Basic settings

tJava enables you to enter personalized code in order to integrate it in Talend program. You can execute this code only once. tJava makes it possible to extend the functionalities of a Talend Job through using Java commands. Code Type in the Java code you want to execute according to the task you need to perform. For further information about Java functions syntax specific to Talend, see Talend Open Studio Online Help (Help Contents > Developer Guide > API Reference). For a complete Java reference, check http://java.sun.com/javaee/6/docs/api/ Enter the Java code that helps to import, if necessary, external libraries used in the Main code box of the Basic settings view. Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Advanced settings

Import

tStatCatcher Statistics Usage Limitation

This component is generally used as a one-component subjob. You should know Java language.

Scenario: Printing out a variable content


The following scenario is a simple demo of the extended application of the tJava component. The Job aims at printing out the number of lines being processed using a Java command and the global variable provided in Talend Open Studio.

Talend Open Studio Components

263

Custom Code components


tJava

Select and drop the following components from the Palette onto the design workspace: tFileInputDelimited, tFileOutputExcel, tJava. Connect the tFileInputDelimited to the tFileOutputExcel using a Row Main connection. The content from a delimited txt file will be passed on through the connection to an xls-type of file without further transformation. Then connect the tFileInputDelimited component to the tJava component using a Then Run link. This link sets a sequence ordering tJava to be executed at the end of the main process. Set the Basic settings of the tFileInputDelimited component. The input file used in this example is a simple text file made of two columns: Names and their respective Emails.

The schema has not been stored in the repository for this use case, therefore you need to set manually the two-column schema Click the Edit Schema button.

When prompted, click OK to accept the propagation, so that the tFileOutputExcel component gets automatically set with the input schema. Therefore no need to set the schema again. Set the output file to receive the input content without changes. If the file does not exist already, it will get created.

264

Talend Open Studio Components

Custom Code components


tJava

In this example, the Sheet name is Email and the Include Header box is selected. Then select the tJava component to set the Java command to execute.

In the Code area, type in the following command: String var = "Nb of line processed: "; var = var + globalMap.get("tFileInputDelimited_1_NB_LINE"); System.out.println(var); In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space bar on your keyboard and select the relevant global parameter. Save your Job and press F6 to execute it.

Talend Open Studio Components

265

Custom Code components


tJava

The content gets passed on to the Excel file defined and the Number of lines processed are displayed on the Run console.

266

Talend Open Studio Components

Custom Code components


tJavaFlex

tJavaFlex
tJavaFlex properties
Component family Custom Code

Function

tJavaFlex enables you to enter personalized code in order to integrate it in Talend program. With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of component dedicated to do a desired operation. tJava makes it possible to extend the functionalities of a Talend Job through using Java commands. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Data Auto Propagate Select this check box to automatically propagate the data to the component that follows. When you select this check box, you can not later do any transformation on the retrieved data by setting Java commands in the Main code field. Enter the Java code that will be called during the initialization phase. Enter the Java code to be applied for each line in the data flow. Enter the Java code that will be called during the closing phase. Enter the Java code that helps to import, if necessary, external libraries used in the Main code box of the Basic settings view. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Objective Basic settings

Start code Main code End code Advanced settings Import

tStatCatcher Statistics Usage

You can use this component as a start, intermediate or output component. You can as well use it as a one-component subjob.

Talend Open Studio Components

267

Custom Code components


tJavaFlex

Limitation

You should know the Java language.

Scenario 1: Generating data flow


This scenario describes a two-components Job that generates a three-line data flow describing different personal titles (Miss, Mrs, and Mr) and displaying them on the console.

Drop tJavaFlex and tLogRow from the Palette onto the design workspace. Connect the components together using a Row Main link. Double-click tJavaFlex to display its Basic settings view and define its properties.

From the Schema Type list, select Built-in and then click the three-dot button next to Edit schema to open the corresponding dialog box where you can define the data structure to pass to the component that follows.

268

Talend Open Studio Components

Custom Code components


tJavaFlex

Click the plus button to add two columns: key and value and then set their types to Integer and String respectively. Click Ok to validate your changes and close the dialog box. In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to automatically propagate data to the component that follows. In this example, we do not want to do any transformation on the retrieved data. In the Start code field, enter the code to be executed in the initialization phase. In this example, the code indicates the initialization of tJavaFlex by displaying the START message and sets up the loop and the variables to be used afterwards in the Java code: System.out.println("## START\n#"); String [] valueArray = {"Miss", "Mrs", "Mr"}; for (int i=0;i<valueArray.length;i++) {

In the Main code field, enter the code you want to apply on each of the data rows. In this example, we want to display each key with its value: row1.key = i; row1.value = valueArray[i];
In the Main code, row1 corresponds to the name of the link that comes out of tJavaFlex. If you rename this link, you have to modify the code of this field accordingly.

In the End code field, enter the code that will be executed in the closing phase. In this example, the brace (curly bracket) closes the loop and the code indicates the end of the execution of tJavaFlex by displaying the END message: } System.out.println("#\n## END");

If needed, double-click tLogRow and in its Basic settings view, click the button next to Edit schema to make sure that the schema has been correctly propagated. Save your Job and press F6 to execute it.

Talend Open Studio Components

269

Custom Code components


tJavaFlex

The three personal titles are displayed on the console along with their corresponding keys.

Scenario 2: Processing rows of data with tJavaFlex


This scenario describes a two-component Job that generates random data and then collects that data and does some transformation on it line by line using Java code through the tJavaFlex component.

Drop tRowGenerator and tJavaFlex from the Palette onto the design workspace. Connect the components together using a Row Main link. Double-click tRowGenerator to display its Basic settings view and the [RowGenerator Editor] dialog box where you can define the component properties.

Click the plus button to add four columns: number, txt, date and flag. Define the schema and set the parameters to the four columns according to the above capture. In the Functions column, select the three-dot function [...] for each of the defined columns.

270

Talend Open Studio Components

Custom Code components


tJavaFlex

In the Parameters column, enter 10 different parameters for each of the defined columns. These 10 parameters corresponds to the data that will be randomly generated when executing tRowGenerator. Click OK to validate your changes and close the editor. Double-click tJavaFlex to display its Basic settings view and define the components properties.

Click Sync columns to retrieve the schema from the preceding component. In the Start code field, enter the code to be executed in the initialization phase. In this example, the code indicates the initialization of the tJavaFlex component by displaying the START message and defining the variable to be used afterwards in the Java code: System.out.println("## START\n#"); int i = 0; In the Main code field, enter the code to be applied on each line of data. In this example, we want to show the number of each line starting from 0 and then the number and the random text transformed to upper case and finally the random date set in the editor of tRowGenerator. Then, we create a condition to show if the status is true or false and we increment the number of the line: System.out.print(" row" + i + ":"); System.out.print("# number:" + row1.number); System.out.print (" | txt:" + row1.txt.toUpperCase()); System.out.print(" | date:" + row1.date); if(row1.flag) System.out.println(" | flag: true"); else System.out.println(" | flag: false"); i++;

Talend Open Studio Components

271

Custom Code components


tJavaFlex

In the Main code filed, row1 corresponds to the name of the link that comes out of tJavaFlex. If you rename this link, you have to modify the code.

In the End code field, enter the code that will be executed in the closing phase. In this example, the code indicates the end of the execution of tJavaFlex by displaying the END message: System.out.println("#\n## END");

Save your Job and press F6 to execute it.

The console displays the randomly generated data that was modified by the java command set through tJavaFlex.

272

Talend Open Studio Components

Custom Code components


tJavaRow

tJavaRow

Proprits du tJavaRow
Component Family Custom Code

Function

tJavaRow allows you to enter customized code which you can integrate in a Talend programme.. With tJavaRow, you can enter the Java code to be applied to each row of the flow. tJavaRow allows you to broaden the functionality of Talend jobs, using the Java language. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in.. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide Code Enter the Java code to be applied to each line of the data flow. Enter the Java code required to import, if required, the external library used in the Main code field of the Basic settings tab. Select this check box to collect the log data at a component level..

Purpose Basic settings

Advanced settings

Import

tStatCatcher Statistics Usage

This component is used as an intermediary between two other components. It must be linked to both an input and an output component. Knowledge of Java language is necessary.

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

273

Custom Code components


tLibraryLoad

tLibraryLoad

tLibraryLoad Properties
Famille de composant Custom Code

Function Purpose Basic settings

tLibraryLoad allows you to import a library. tLibraryLoad allows you to load useable Java libraries in a Job. Library Select the library you want to import from the list, or click on the [...] button to browse to the library in your directory. Lib Paths: Enter the access path to your library, between double quotation marks. Enter the Java code required to import, if required, the external library used in the Main code field of the Basic settings tab. Select this check box to collect the log data at component level.

Advanced settings

Dynamic Libs Import

tStatCatcher Statistics Usage Limitation

This component may be used alone, although it is more logical to use it as part of a Job. n/a

Scenario: Checking the format of an e-mail addressl


This scenario uses two components, a tLibraryLoad and a tJava. The goal of this scenario is to check the format of an e-mail address and verify whether the format is valid or not.

In the Palette, open the Custom_Code folder, and slide a tLibraryLoad and tJava component onto the workspace. Connect tLibraryLoad to tJava using a Trigger > OnSubjobOk link.
274 Talend Open Studio Components

Custom Code components


tLibraryLoad

Double click on tLibraryLoad to display its Basic settings. From the Library list, select jakarta-oro-2.0.8.jar. In the Import field of the Advanced settings tab, type import org.apache.oro.text.regex.*;

Double click on tJava to display its Component view. In the Basic settings tab, enter your code, as in the screenshot below. The code allows you to check whether the character string pertains to an e-mail address, based on the regular expression: "^[\\w_.-]+@[\\w_.-]+\\.[\\w]+$".

Press F6 to save and run the Job.

Talend Open Studio Components

275

Custom Code components


tLibraryLoad

The Console displays the boolean false. Hence, the e-mail address is not valid as the format is incorrect.

276

Talend Open Studio Components

Custom Code components


tSetGlobalVar

tSetGlobalVar
tSetGlobalVar Properties
Component family Custom Code

Function Purpose Basic settings

tSetGlobalVar allows you to define and set global variables in GUI. tSetGlobalVar facilitates the process of defining global variables. Variables This table contains two columns. Key: Name of the variable to be called in the code. Value: Value assigned to this variable. Select this check box to gather the Job processing metadata at a Job level as well as at each component level.

Advanced settings

tStatCatcher Statistics

Usage Limitation

This component is generally used as a one-component subjob. Knowledge of Java language is required.

Scenario: Printing out the content of a global variable


This scenario is a simple Job that prints out the value of a global variable defined in the tSetGlobalVar component.

Drop the following components from the Palette onto the design workspace: tSetGlobalVar and tJava. Connect the tSetGlobalVar component to the tJava component using a Trigger > OnSubjobOk connection. Double-click the tSetGlobalVar component to display its Basic settings view.

Talend Open Studio Components

277

Custom Code components


tSetGlobalVar

Click the plus button to add a line in the Variables table, and fill the Key and Value fields with K1 and 20 respectively. Then double-click the tJava component to display its Basic settings view.

In the Code area, type in the following lines: String foo = "bar"; String K1; String Result = "The value is:"; Result = Result + globalMap.get("K1"); System.out.println(Result); In this use case, we use the Result variable. To access the global variable list, press Ctrl + Space bar on your keyboard and select the relevant global parameter. Save your Job and press F6 to execute it.

The content of global variable K1 is displayed on the console.

278

Talend Open Studio Components

Data Quality components


This chapter details the main components that you can find in the Data Quality family of the Talend Open Studio Palette. The Data Quality family comprises dedicated components that help you improve the quality of your data. These components covers various needs such as narrow down filtering the unique row, calculating CRC, finding data based on fuzzy matching, and so on.

Data Quality components


tAddCRCRow

tAddCRCRow
tAddCRCRow properties
Component family Data Quality

Function Purpose Basic settings

tAddCRCRow calculates a surrogate key based on one or several columns and adds it to the defined schema. Providing a unique ID helps improving the quality of processed data. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In this component, a new CRC column is automatically added. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Implication Select the check box facing the relevant columns to be used for the surrogate key checksum. Select a CRC type in the list. The longer the CRC, the least overlap you will have. Select this check box to collect log data at the component level.

Advanced Settings

CRC type tStatCatcher Statistics

Usage Limitation

This component is an intermediary step. It requires an input flow as well as an output. n/a

Scenario: Adding a surrogate key to a file


This scenario describes a Job adding a surrogate key to a delimited file schema.

Drop the following components: tFileInputDelimited, tAddCRCRow and tLogRow. Connect them using a Main row connection.

280

Talend Open Studio Components

Data Quality components


tAddCRCRow

In the tFileInputDelimited Component view, set the File Name path and all related properties in case these are not stored in the Repository.

Create the schema through the Edit Schema button, in case the schema is not stored already in the Repository. In Java, mind the data type column and in case of Date pattern to be filled in, check out http://java.sun.com/j2se/1.5.0/docs/api/index.html. In the tAddCRCRow Component view, select the check boxes of the input flow columns to be used to calculate the CRC.

Notice that a CRC column (read-only) has been added at the end of the schema. Select CRC32 as CRC Type to get a longer surrogate key.

In the Basic settings view of tLogRow, select the Print values in cells of a table option to display the output data in a table on the Console. Then save your Job and press F6 to execute it.

Talend Open Studio Components

281

Data Quality components


tAddCRCRow

An additional CRC Column has been added to the schema calculated on all previously selected columns (in this case all columns of the schema).

282

Talend Open Studio Components

Data Quality components


tChangeFileEncoding

tChangeFileEncoding
tChangeFileEncoding component belongs to two component families: Data Quality and File. For more information about tChangeFileEncoding, see tChangeFileEncoding on page 1031.

Talend Open Studio Components

283

Data Quality components


tExtractRegexFields

tExtractRegexFields
tExtractRegexFields belongs to two component families: Data Quality and Processing. For more information on tExtractRegexFields, see tExtractRegexFields on page 1420.

284

Talend Open Studio Components

Data Quality components


tFuzzyMatch

tFuzzyMatch
tFuzzyMatch properties
Component family Data Quality

Function Purpose Basic settings

Compares a column from the main flow with a reference column from the lookup flow and outputs the main flow data displaying the distance Helps ensuring the data quality of any source data against a reference data source. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Two read-only columns, Value and Match are added to the output schema automatically. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Matching type Select the relevant matching algorithm among: Levenshtein: Based on the edit distance theory. It calculates the number of insertion, deletion or substitution required for an entry to match the reference entry. Metaphone: Based on a phonetic algorithm for indexing entries by their pronunciation. It first loads the phonetics of all entries of the lookup reference and checks all entries of the main flow against the entries of the reference flow. Double Metaphone: a new version of the Metaphone phonetic algorithm, that produces more accurate results than the original algorithm. It can return both a primary and a secondary code for a string. This accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. (Levenshtein only) Set the minimum number of changes allowed to match the reference. If set to 0, only perfect matches are returned. (Levenshtein only) Set the maximum number of changes allowed to match the reference.

Min distance

Max distance

Talend Open Studio Components

285

Data Quality components


tFuzzyMatch

Matching column

Select the column of the main flow that needs to be checked against the reference (lookup) key column Select this check box if you want to get the best match possible, in case several matches are available. In case several matches are available, all of them are displayed unless the unique match box is selected. Define the delimiter between all matches.

Unique matching

Matching item separator

Usage

This component is not startable (green background) and it requires two input components and an output component.

Limitation/prerequisite Perl users: Make sure the relevant packages are installed. Check the Module view for modules to be installed

Scenario 1: Levenshtein distance of 0 in first names


This scenario describes a four-component Job aiming at checking the edit distance between the First Name column of an input file with the data of the reference input file. The output of this Levenshtein type check is displayed along with the content of the main flow on a table

Drag and drop the following components from the Palette to the design workspace: tFileInputDelimited (x2), tFuzzyMatch, tFileOutputDelimited. Define the first tFileInputDelimited Basic settings. Browse the system to the input file to be analyzed and most importantly set the schema to be used for the flow to be checked. In the schema, set the Type of data in the Java version, especially if you are in Built-in mode. Link the defined input to the tFuzzyMatch using a Main row link. Define the second tFileInputDelimited component the same way.
Make sure the reference column is set as key column in the schema of the lookup flow.

286

Talend Open Studio Components

Data Quality components


tFuzzyMatch

Then connect the second input component to the tFuzzyMatch using a main row (which displays as a Lookup row on the design workspace). Select the tFuzzyMatch Basic settings. The Schema should match the Main input flow schema in order for the main flow to be checked against the reference.

Note that two columns, Value and Matching, are added to the output schema. These are standard matching information and are read-only. Select the method to be used to check the incoming data. In this scenario, Levenshtein is the Matching type to be used. Then set the distance. In this method, the distance is the number of char changes (insertion, deletion or substitution) that needs to be carried out in order for the entry to fully match the reference.

In this use case, we want the distance be of 0 for the min. or for the max. This means only the exact matches will be output. Also, clear the Case sensitive check box. And select the column of the main flow schema that will be selected. In this example, the first name. No need to select the Unique matching check box nor hence the separator. Link the tFuzzyMatch to the standard output tLogRow. No other parameters than the display delimiter is to be set for this scenario.
Talend Open Studio Components 287

Data Quality components


tFuzzyMatch

Save the Job and press F6 to execute the Job.

As the edit distance has been set to 0 (min and max), the output shows the result of a regular join between the main flow and the lookup (reference) flow, hence only full matches with Value of 0 are displayed. A more obvious example is with a minimum distance of 1 and a max. distance of 2, see Scenario 2: Levenshtein distance of 1 or 2 in first names on page 288.

Scenario 2: Levenshtein distance of 1 or 2 in first names


This scenario is based on the scenario 1 described above. Only the min and max distance settings in tFuzzyMatch component get modified, which will change the output displayed. In the Component view of the tFuzzyMatch, change the min distance from 0 to 1. This excludes straight away the exact matches (which would show a distance of 0). Change also the max distance to 2 as the max distance cannot be lower than the min distance. The output will provide all matching entries showing a discrepancy of 2 characters at most.

No other change of the setting is required. Make sure the Matching item separator is defined, as several references might be matching the main flow entry. Save the new Job and press F6 to run it.

288

Talend Open Studio Components

Data Quality components


tFuzzyMatch

As the edit distance has been set to 2, some entries of the main flow match several reference entries. You can also use another method, the metaphone, to assess the distance between the main flow and the reference,

Scenario 3: Metaphonic distance in first name


This scenario is based on the scenario 1 described above.

Change the Matching type to Metaphone. There is no min nor max distance to set as the matching method is based on the discrepancies with the phonetics of the reference. Save the Job and press F6. The phonetics value is displayed along with the possible matches.

Talend Open Studio Components

289

Data Quality components


tIntervalMatch

tIntervalMatch
tIntervalMatch properties
Component family Data Quality

Function

tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow (Java) or a given lookup file (Perl). Then it matches a specified value to a range of values and returns related information. Helps to return a value based on a Join relation. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Java only Search column Select the main flow column containing the values to be matched to a range of values

Purpose Basic settings

Column (LOOKUP) Select the lookup flow column containing the values to be returned when the Join is ok. Lookup Column min/ bounds strictly (min) Lookup Column max/ bounds strictly (max) Advanced settings Usage Limitation tStatCatcher Statistics Select the column containing the min value of the tange. Select the check box if the boundary is strict. Select the column containing the max value of the tange. Select the check box if the boundary is strict. Select this check box to collect log data at the component level.

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. n/a

The Perl properties being quite different from the Java properties, they are described in a separate table below.
PERL basic settings

290

Talend Open Studio Components

Data Quality components


tIntervalMatch

Basic settings

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Perl only

File Name Field separator Row separator Lookup index Column

Enter the file that contains the range of values. It functions as a lookup flow. Character, string or regular expression to separate fields in the lookup file. String (ex: \non Unix) to distinguish rows in the lookup file. Position of the min column in the lookup file: 0 for first col, 1 for second col, etc. Make sure the interval min and max columns are adjacent. Select the main flow column containing the values to be matched to a range of values

Search column Usage Limitation

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. For the time being, the Perl version of the tIntervalMatch does not accept a real Lookup flow (but only a reference file in the actual components settings)

Scenario: Identifying Ip country (Perl and Java)


The following scenario describes a Job designed in parallel in both languages, Perl and Java. In this Job, a incoming main flow provides 2 columns: Documents and IP dummy values. A second file used as lookup flow in Java and reference range file in Perl contains a list of sorted IP ranges and their corresponding country. This Job aims at retrieving each documents country from their IP value, in other words, creating a Join between the main flow and the lookup flow.

In Perl, the Job requires one tFileInputDelimited, a tIntervalMatch and a tLogRow.


Talend Open Studio Components 291

Data Quality components


tIntervalMatch

In Java, the Job requires one extra tFileInputDelimited, a tIntervalMatch and a tLogRow. Drop the components onto the design workspace. Set the basic settings of the tFileInputDelimited component.

The schema is made of two columns, respectively Document and IP (Java only) Set the Type column on String for the Document column and Integer for the IP column. (Java only) Set now the second tFileInputDelimited properties.

(Java only) Dont forget to define the Type of data. (Both Java and Perl) Propagate the schema from the incoming main flow to the tIntervalMatch component.

292

Talend Open Studio Components

Data Quality components


tIntervalMatch

(Both Java and Perl) Note that the output schema from the tIntervalMatch component is read-only and is made of the input schema plus an extra Lookup column which will output the requested lookup data. Set the other properties of the tIntervalMatch component. (Perl only), the lookup file is defined directly in the setting of the tIntervalMatch.

(Perl only) In File Name field, set the path to the lookup file. Set the Row and Field separator of the lookup file. (Perl only) In Lookup column index field, set the inferior bound of the data range. This corresponds to the position of the column containing the min value of the range (0 for the first column). (Perl only) No need to set the lookup values to be returned as all values from the lookup will be outputted. (Java only) Set the tIntervalMatch other properties such as the min and max column corresponding to the range bounds.

(Java only) In the Column Lookup field, select the column where are the values to be returned. (Both Java and Perl) In the Search column field, select the main flow column containing the values to be matched to the range values.

Talend Open Studio Components

293

Data Quality components


tIntervalMatch

(Both Java and Perl) The tLogRow component does not require any specific setting for this Job. Both Perl and Java Jobs output the same result with slight differences in the way they display.

The Perl results include the range values whereas the Java output only includes the requested return value (country).

294

Talend Open Studio Components

Data Quality components


tParseAddress

tParseAddress
tParseAddress properties
Component Family Data Quality

Function Purpose Basic settings

This component analyses addresses in a defined schema column and parses them into different field types. Cut data in several columns to sort the different parts, in order to improve your data quality. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Column to parse Country Select the column in which you want to analyze and parse data. Select the customers country. Select this check box in order to correct the case, if required. For example, to put a capital letter at the beginning of a name. Select this check box to delete the non-alphanumerical characters before trying to parse addresses. Select this check box to automatically abbreviate region names. If the name is already abbreviated, it will not be modified. Select this check box to authorize only abbreviated region names. Data processing will be faster and will check that the address matches postal standards. Select this check box to collect log data for the job as a whole, as well as for each component.

Advanced settings

Correct case

Auto clean

Abbreviate subcountry Allow only abbreviated subcountry tStatCatcher Statistics Usage Limitation

This component acts as an intermediary. It requires an input and an output flow. n/a

Talend Open Studio Components

295

Data Quality components


tParseAddress

Related scenario
No scenario is available for this component yet.

296

Talend Open Studio Components

Data Quality components


tParseName

tParseName
tParseName Properties
Component Family Data Quality

Function Purpose

This component retrieves names presented in different forms and extracts each item according to its type. The tParseName analyses a file containing names and extracts items according to their type, in order to improve the quality of the processed data. It also rejects the incorrect items. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Column to parse Select the column from which you want to extract the different items. Select this check box to ignore data containing joint names. This will make the data processing faster. Deselect it in order to return and cut this data by type. Select this check box so that names will be processed regardless of the form they are presented in. The component puts the names in order and performs the normal analysis. Note that if the name can be analyzed, its original order is not saved as a property. Select this check box to use all possible titles. If you deselect it, you can only use standard titles. Select this check box in order to correct the case, that is to say, if needed, to put a capital letter at the beginning of a name. Select this check box to delete the non-alphanumerical characters before trying to parse addresses. Select this check box to collect log data for the job as a whole, as well as for each component.

Basic settings

Advanced settings

Ignore joint names. Mr John Smith and Ms Mary Jones -> Mr John Smith Allow reversed. Smith, Mr AB -> Mr AB Smith

Extend titles Correct case

Auto clean

tStatCatcher Statistics

Talend Open Studio Components

297

Data Quality components


tParseName

Usage Limitation

This component acts as an intermediary. It requires an input and an output flow. n/a

Related scenario
No scenario is available for this component yet.

298

Talend Open Studio Components

Data Quality components


tReplaceList

tReplaceList
tReplaceList Properties
Component family Data Quality

Function Purpose Basic settings

Carries out a Search and Replace operation in the input columns defined based on an external lookup. Helps to cleanse all files before further processing. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Two read-only columns, Value and Match are added to the output schema automatically. The data Type defined in the schemas must be consistent, ie., an integer can only be replaced by another integer using an integer as a look up field. Values of one type cannot be replaced by values of another type. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Lookup search column Type in the position number of the column to be searched in the lookup schema. 0: first column read 1: second column read n: position number of the column in the schema read.

In order to ensure the uniqueness of values being searched, make sure this column is marked as Key in your lookup schema. Lookup replacement Type in the position number of the column where column the replacement values are stored. 0: first column read 1: second column read n: position number of the column in the schema read Column options Select the columns of the main flow where the replacement is to be carried out.

Talend Open Studio Components

299

Data Quality components


tReplaceList

Advanced settings Usage

tStatCatcher Statistics

Select this check box to collect log data at the component level.

tReplaceList is an intermediary component. It requires an input flow and an output component.

Scenario: Replacement from a reference file


The following Job searches and replaces a list of countries with their corresponding codes. The relevant codes are taken from a reference file placed as lookup flow in the Job. The main flow is replicated and both outputs are displayed on the console, in order to show the main flow before and after replacement.

Drop the following components from the Palette to the design workspace: tMysqlInput, tFileInputDelimited, tReplicate, tReplaceList and tLogRow (x2). Note that if your input schemas are stored in the Repository, you can simply drag and drop the relevant node from the Repositorys Metadata Manager onto the design workspace to retrieve automatically the input components setting. For more information, see How to drop components from the Metadata node in Talend Open Studio User Guide. Connect the components using Main Row connections via a right-click on each component. Notice that the main row coming from the reference flow (tFileInputDelimited) is called a lookup row. Select the tMysqlInput component and set the input flow parameters.

300

Talend Open Studio Components

Data Quality components


tReplaceList

The input schema is made of two columns: Names, States. The column States gathered the name of the United States of America which are to be replaced by their respective code. In the Query field, make sure the State column is included in the Select statement. In this use case, all columns are selected. Check the tReplicate component setting. The schema is simply duplicated into two identical flows, but no change to the schema can be made. Then double-click on the tFileInputDelimited component, to set the reference file.

The file includes two columns: Postal, State where Postal provides the zipcode corresponding to the name given in the respective row of the State column. The fields are delimited by semicolons and rows are separated by carriage returns. Edit the lookup flow schema.

Talend Open Studio Components

301

Data Quality components


tReplaceList

Make sure the lookup search column (in this use case: State) is a key, in order to ensure the uniqueness of the values being searched. Select the tReplaceList and set the operation to carry out. The schema is retrieved from the previous component of the main flow.

In Lookup search index field, type in the position index of the column being searched. In this use case, State is the second column of the lookup input file, therefore type in 1 in this field. In Lookup replacement index field, fill in the position number of the column containing the replacement values, in this example: Postal for the State codes. In the Column options table, select the States column as in this use case, the State names are to be replaced with their corresponding code. In both tLogRow components, select the Print values in table cells check box for a better readability of the outputs. Save the Job and press F6 to execute it.

302

Talend Open Studio Components

Data Quality components


tReplaceList

The first flow output shows the States column with full state names as it comes from the main input flow. The second flow output shows the States column after the State column names have been replaced with their respective codes.

Talend Open Studio Components

303

Data Quality components


tSchemaComplianceCheck

tSchemaComplianceCheck
tSchemaComplianceCheck Properties
Component family Data Quality

Function Purpose Basic settings

Validates all input rows against a reference schema or checks type, nullability, length of rows against reference values. The validation can be carried out in full or partly. Helps to ensure the data quality of any source data against a reference data source. Base Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Describe the structure and nature of your data to be processed as it is. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and Job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Perl only Date language / Date format For the validation of date formats containing string such as 25 Dec 2007, use the Date Language field and to distinguish the way months and days are ordered, use the Date format field. Select this option to carry out all checks on all columns against the base schema. In Perl, this is a check box -- when selected, the Columns to check table is hidden. Select this option to carry out particular checks on particular columns. When this option is selected, the Checked Columns table and the Trim the excess content of column when length checking chosen and the length is greater than defined length check box show.

Check all columns from schema

Java only

Custom defined

Checked Columns (in In this table, define what checks are to be carried out on Perl: Columns to check) which columns. Column: Displays the columns names. Type: In Perl, select the check box in an individual column to verify the data type of the column against the base schema definition. To carry out this verification on all the columns, select the check box in the table header. In Java, select the type of data each column is supposed to contain. This validation is mandatory for all columns. Java only Date pattern: Define the expected date format for each column with the data type of Date.

304

Talend Open Studio Components

Data Quality components


tSchemaComplianceCheck

Java only

Nullable: Select the check box in an individual column to define the column to be nullable, that is, to allow rows with this column empty to go to the output flow regardless of the base schema definition. To define all columns to be nullable, select the check box in the table header. Undefined or empty: Select the check box in an individual column to reject rows with this column empty while the column is not nullable in the base schema definition. To carry out this verification on all the columns, select the check box in the table header. Max length: Select the check box in an individual column to verify the data length of the column against the length definition of the base schema. To carry out this verification on all the columns, select the check box in the table header.

Perl only

Java only

Trim the excess content of column when length checking chosen and the length is greater than defined length Use another schema for compliance check tStatCatcher Statistics Use Fastest Date Check

Select this check box to remove the part in excess of the defined length from the valid output flow instead of rejecting the row if the length check option is selected.

Java only

Define a reference schema as you expect the data to be, in order to reject the non-compliant data. It can be restrictive on data type, null values, and/or length. Select this check box to collect log data at the component level. Select this check box to perform a fast date format check using the TalendDate.isDate() method of the TalendDate system routine if Date pattern is not defined. For more information about routines, see Managing routines in Talend Open Studio User Guide. Select this check box to treat any empty fields in any columns as null values, instead of empty strings. By default, this check box is selected. When it is cleared, the Choose Column(s) table shows to let you select individual columns.

Advanced settings Java only

Java only

Treat all empty string as NULL

Usage

This component is an intermediary step in the flow allowing to exclude from the main flow the non-compliant data. This component cannot be a start component as it requires an input flow. It also requires at least one output component to gather the validated flow, and possibly a second output component for rejected data using Rejects link. For more information, see Row connection in Talend Open Studio User Guide.

Scenario: Validating data against schema (java)


This very basic scenario shows how to check the type, nullability and length of an incoming flow against a defined reference schema. The incoming flow comes from a simple CSV file that contains heterogeneous data including wrong data type, data exceeding the maximum length, wrong ID and null values in non-nullable columns, as shown below:

Talend Open Studio Components

305

Data Quality components


tSchemaComplianceCheck

Upon validation, the valid rows and the rejected rows are displayed respectively in two tables on the Run console.

Drop the following components: a tFileInputDelimited, a tSchemaComplianceCheck, and two tLogRow components from the Palette to the design workspace. Connect the tFileInputDelimited component to the tSchemaComplianceCheck component using a Row > Main connection. Connect the tSchemaComplianceCheck component to the first tLogRow component using a Row > Main connection. This output flow will gather the valid data. Connect the tSchemaComplianceCheck component to the second tLogRow component using a Row > Rejects connection. This second output flow will gather the non-compliant data. Select the Rejects connection, and notice that the schema passed to the second tLogRow contains two more columns: ErrorCode and ErrorMessage. These two read-only columns provide information about the rejected data to ease error handling and troubleshooting if needed. Double-click the tFileInputDelimited component to display its Basic settings view.

306

Talend Open Studio Components

Data Quality components


tSchemaComplianceCheck

Fill in the File name field by browsing to the input file. Specify the header row. In this use case, the first row of the input file is the header row. Leave the other parameters as they are. Click Edit schema to describe the data structure of the input file. In this use case, the schema is made of five columns: ID, Name, BirthDate, State, and City.

Leave the Type field as permissive as possible (especially in Java). You will define the actual type of the data in the tSchemaComplianceCheck component. Fill the Length field for the Name, State and City columns with 7, 10 and 10 respectively. Click OK to propagate the schema and close the schema dialog box. Double-click the tSchemaComplianceCheck component to display its Basic settings view, wherein you will define most of the validation parameters.

Talend Open Studio Components

307

Data Quality components


tSchemaComplianceCheck

Select the Custom defined option in the Mode area to perform custom defined checks. In this example, we use the Checked columns table to set the validation parameters. However, you can also select the Check all columns from schema check box if you want to perform all the checks (type, nullability and length) on all the columns against the base schema, or select the Use another schema for compliance check option and define a new schema as the expected structure of the data. In the Checked Columns table, define the checks to be performed. In this use case: - The type of the ID column should be Int. - The length of the Name, State and City columns should be checked. - The type of the BirthDate column should be Date, and the expected date pattern is dd-MM-yyyy. - All the columns should be checked for null values, so clear the Nullable check box for all the columns.
To send rows containing fields exceeding the defined maximum length to the reject flow, make sure that the Trim the excess content of column when length checking chosen and the length is greater than defined length check box is cleared.

In the Advanced settings view of the tSchemaComplianceCheck component, select the Treat all empty string as NULL option to sent any rows containing empty fields to the reject flow. To view the validation result in tables on the Run console, double-click each tLogRow component and select the Table option in the Basic settings view. Save your Job and press F6 to launch it. Two tables are displayed on the console, showing the valid data and rejected data respectively.

308

Talend Open Studio Components

Data Quality components


tSchemaComplianceCheck

Talend Open Studio Components

309

Data Quality components


tUniqRow

tUniqRow
tUniqRow Properties
Component family Data Quality

Function Purpose Basic settings

Compares entries and sorts out duplicate entries from the input flow. Ensures data quality of input or output flow in a Job. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Unique key In this area, select one or more columns to carry out deduplication on the particular column(s) - Select the Key attribute check box to carry out deduplication on all the columns - Select the Case sensitive check box to differentiate upper case and lower case Select this check box if you want to have only the first duplicated entry in the column(s) defined as key(s) sent to the output flow for duplicates.

Advanced settings

Only once each duplicated key

Use of disk (suitable Select this check box to enable generating for processing large temporary files on the hard disk when processing row set) a large amount of data. This helps to prevent Job execution failure caused by memory overflow. With this check box selected, you need also to define: - Buffer size in memory: Select the number of rows that can be buffered in the memory before a temporary file is to be generated on the hard disk. - Directory for temp files: Set the location where the temporary files should be stored. Make sure that you specify an existing directory for temporary files; otherwise your Job execution will fail.

310

Talend Open Studio Components

Data Quality components


tUniqRow

tStatCatcher Statistics Usage Limitation

Select this check box to gather the job processing metadata at a job level as well as at each component level.

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. n/a

Scenario: Deduplicating entries


In this five-component Java Job, we will sort entries on an input name list, find out duplicated names, and display the unique names and the duplicated names on the Run console.

Drop a tFileInputDelimited, a tSortRow, a tUniqRow, and two tLogRow components from the Palette to the design workspace, and name the components as shown above. Connect the tFileInputDelimited component, the tSortRow component, and the tUniqRow component using Row > Main connections. Connect the tUniqRow component and the first tLogRow component using a Main > Uniques connection, and then connect the tUniqRow component and the second tLogRow component using a Main > Duplicates connection. Double-click the tFileInputDelimited component to display its Basic settings view.

Select Built-In from the Property Type list. Click the [...] button next to the File Name field to browse to your input file.

Talend Open Studio Components

311

Data Quality components


tUniqRow

Define the header and footer rows. In this use case, the first row of the input file is the header row. Click Edit schema to define the schema for this component. In this use case, the input file has five columns: Id, FirstName, LastName, Age, and City. Then click OK to propagate the schema and close the schema editor. Double-click the tSortRow component to display its Basic settings view.

To rearrange the entries in the alphabetic order of the names, add two rows in the Criteria table by clicking the plus button, select the FirstName and LastName columns under Schema column, select alpha as the sorting type, and select the sorting order. Double-click the tUniqRow component to display its Basic settings view.

In the Unique key area, select the columns on which you want deduplication to be carried out. In this use case, you will sort out duplicated names. In the Basic settings view of each of the tLogRow components, select the Table option to view the Job execution result in table mode. Save your Job and press F6 to run it. The unique names and duplicated names are displayed in different tables on the Run console.

312

Talend Open Studio Components

Data Quality components


tUniqRow

Talend Open Studio Components

313

Data Quality components


tUniqRow

314

Talend Open Studio Components

Database components
This chapter details the main components which you can find in the Databases family of the Talend Open Studio Palette. The Databases family comprises the most popular database connectors. These connectors cover various needs, including: opening connections, reading and writing tables, committing transactions as a whole, as well as performing rollback for error handling. More than 40 RDBMS are supported.

Database components
tAccessBulkExec

tAccessBulkExec
tAccessBulkExec properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a delimited file and then to perform various actions on the file in an Access database, in a two step process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a separate section. The advantage of using a two step process is that it makes it possible to carry out transformations on the data before loading it in the database.
Component family Databases/Access

Function Purpose Basic settings

This component executes an Insert action on the data provided. As a dedicated component, tAccessBulkExec offers gains in performance when carrying out Insert operations in an Access database. Property type Either Built-in or Repository. Built-in: No property data is stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and select the appropriate tAccessConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Select the version of your database. Type in the directory where your database is stored. DB user authentication data.

DB version Database Username and Password

316

Talend Open Studio Components

Database components
tAccessBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Name of the table to be written. Note that only one table can be written at a time and that the table must exist already for the insert operation to succeed. Browse to the delimited file to be loaded into your database. On the data of the table defined, you can perform: Insert: Add new entries to the table. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table

Local filename Action on data Schema and Edit Schema

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to include the column header.

Include header

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tAccessOutputBulk component. Used together, they can offer gains in performance while feeding an Access database.

Related scenarios
For use cases in relation with tAccessBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

Talend Open Studio Components

317

Database components
tAccessCommit

tAccessCommit
tAccessCommit Properties
This component is closely related to tAccessConnection and tAccessRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Access

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tAccessConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tAccessCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Access components, especially with tAccessConnection and tAccessRollback components. n/a

Related scenario
This component is closely related to tAccessConnection and tAccessRollback. It usually doesnt make much sense to use one of these without using a tAccessConnection component to open a connection for the current transaction. For tAccessCommit related scenario, see tMysqlConnection on page 594.

318

Talend Open Studio Components

Database components
tAccessConnection

tAccessConnection
tAccessConnection Properties
This component is closely related to tAccessCommit, tAccessInput and tAccessOutput. It usually does not make much sense to use one of these without using a tAccessConnection component to open a connection for the current transaction.
Component family Databases/Access

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Database Username and Password Use or register a shared DB Connection Access 2003 or later versions. Name of the database. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Specify additional connection properties for the DB connection you are creating.

Advanced settings Usage Limitation

Additional JDBC parameters

This component is to be used along with Access components, especially with tAccessCommit and tAccessOutput components. n/a

Scenario: Inserting data in parent/child tables


The following Job is dedicated to advanced database users, who want to carry out multiple table insertions using a parent table Table1 to generate two child tables: Name and Birthday. In Access 2007, create an Access database named Database1. Once the Access database is created, create a table named Table1 with two column headings: Name and Birthday.

Talend Open Studio Components

319

Database components
tAccessConnection

Back into Talend Open Studio, the Job requires twelve components including tAccessConnection, tAccessCommit, tAccessInput, tAccessOutput and tAccessClose.

Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited, tMap, tAccessOutput (x2), tAccessInput (x2), tAccessCommit, tAccessClose and tLogRow (x2). Connect the tFileList component to the input file component using an Iterate link. Thus, the name of the file to be processed will be dynamically filled in from the tFileList directory using a global variable. Connect the tFileInputDelimited component to the tMap component and dispatch the flow between the two output Access components. Use a Row link for each of these connections representing the main data flow. Set the tFileList component properties, such as the directory where files will be fetched from. Add a tAccessConnection component and connect it to the starter component of this Job. In this example, the tFileList component uses an OnComponentOk link to define the execution order. In the tAccessConnection Component view, set the connection details manually or fetch them from the Repository if you centrally store them as a Metadata DB connection entry. For more information about Metadata, see How to centralize the Metadata items in Talend Open Studio User Guide.

320

Talend Open Studio Components

Database components
tAccessConnection

In the tFileInputDelimited components Basic settings view, press Ctrl+Space bar to access the variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH. For more information about using variables, see How to use variables in a Job in Talend Open Studio User Guide.

Set the rest of the fields as usual, defining the row and field separators according to your file structure. Then set the schema manually through the Edit schema dialog box or select the schema from the Repository. Make sure the data type is correctly set, in accordance with the nature of the data processed. In the tMap Output area, add two output tables, one called Name for the Name table, the second called Birthday, for the Birthday table. For more information about the tMap component, see tMap operation in Talend Open Studio User Guide. Drag the Name column from the Input area, and drop it to the Name table. Drag the Birthday column from the Input area, and drop it to the Birthday table.

Then connect the output row links to distribute the flow correctly to the relevant DB output components. In each of the tAccessOutput components Basic settings view, select the Use an existing connection check box to retrieve the tAccessConnection details. In Perl version, the Commit every field does not show anymore as you are supposed to use the tAccessCommit instead to manage the global transaction commit. In Java version, ignore the field as this command will get overridden by the tAccessCommit.

Talend Open Studio Components

321

Database components
tAccessConnection

Set the Table name making sure it corresponds to the correct table, in this example either Name or Birthday. There is no action on the table as they are already created. Select Insert as Action on data for both output components. Click on Sync columns to retrieve the schema set in the tMap. Then connect the first tAccessOutput component to the first tAccessInput component using an OnComponentOk link. In each of the tAccessInput components Basic settings view, select the Use an existing connection check box to retrieve the distributed data flow. Then set the schema manually through Edit schema dialog box. Then set the Table Name accordingly. In tAccessInput_1, this will be Name. Click on the Guess Query. Connect each tAccessInput component to tLogRow component with a Row > Main link. In each of the tLogRow components basic settings view, select Table in the Mode field. Add the tAccessCommit component below the tFileList component in the design workspace and connect them together using an OnComponentOk link in order to terminate the Job with the transaction commit. In the basic settings view of tAccessCommit component and from the Component list, select the connection to be used, tAccessConnection_1 in this scenario. Save your Job and press F6 to execute it.

The parent table Table1 is reused to generate the Name table and Birthday table.

322

Talend Open Studio Components

Database components
tAccessInput

tAccessInput
tAccessInput properties
Component family Databases/Access

Function Purpose

tAccessInput reads a database and extracts fields based on a query. tAccessInput executes a DB query with a strictly defined statement which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and select the appropriate tAccessConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Select the version of Access that you are using. Name of the database DB user authentication data.

Basic settings

DB Version Database Username and Password

Talend Open Studio Components

323

Database components
tAccessInput

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating.

tStatCatcher Statistics Select this check box to collect log data at the component level. Trim all the String/Char columns Trim column Usage Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Related topic in description of tContextLoad.

324

Talend Open Studio Components

Database components
tAccessOutput

tAccessOutput
tAccessOutput properties
Component family Databases/Access

Function Purpose

tAccessOutput writes, updates, makes changes or suppresses entries in a database. tAccessOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and select the appropriate tAccessConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Select the version of Access that you are using. Name of the database DB user authentication data.

Basic settings

DB Version Database Username and Password

Talend Open Studio Components

325

Database components
tAccessOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries. Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing. Delete: Remove entries corresponding to the input flow. You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation.

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

326

Talend Open Studio Components

Database components
tAccessOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. You can press Ctrl+Space to access a list of predefined global variables. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at executions. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Additional JDBC parameters

Commit every

Additional Columns

tStatCatcher Statistics Select this check box to collect log data at the component level. Use field options Enable debug mode Support null in SQL WHERE statement Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema.

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Access database. It also allows you to create a reject flow using a Row > Rejects link to filSchemaSchemater data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see:
Talend Open Studio Components 327

Database components
tAccessOutput

tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

328

Talend Open Studio Components

Database components
tAccessOutputBulk

tAccessOutputBulk
tAccessOutputBulk properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a delimited file and then to perform various actions on the file in an Access database, in a two step process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a separate section. The advantage of using a two step process is that it makes it possible to carry out transformations on the data before loading it in the database.
Component family Databases/Access

Function Purpose Basic settings

tAccessOutputBulk writes a delimited file. tAccessOutputBulk prepares the file which contains the data used to feed the Access database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic: How to define variables from the Component view of the Talend Open Studio User Guide

Create directory if not Select this check box to create the as yet non-existant exists file directory that specified in the File name field. Append Schema and Edit schema Select this check box to add any new rows to the end of the file A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Advanced settings Include header Encoding Select this check box to include the column header in the file. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Talend Open Studio Components

329

Database components
tAccessOutputBulk

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tAccessBulkExec component. Used together they offer gains in performance while feeding an Access database.

Related scenarios
For use cases in relation with tAccessOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

330

Talend Open Studio Components

Database components
tAccessOutputBulkExec

tAccessOutputBulkExec
tAccessOutputBulkExec properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a delimited file and then to perform various actions on the file in an Access database, in a two step process. These two steps are fused together in tAccessOutputBulkExec.
Component family Databases/Access

Function Purpose Basic settings

The tAccessOutputBulkExec component executes an Insert action on the data provided. As a dedicated component, it improves performance during Insert operations in an Access database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and select the appropriate tAccessConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Select the version of Access that you are using. Name of the database DB user authentication data.

DB Version DB name Username and Password

Talend Open Studio Components

331

Database components
tAccessOutputBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if doesnt exist: The table is created if it does not already exist. Clear a table: The table content is deleted. Name of the table to be written. Note that only one table can be written at a time and that the table must already exist for the insert operation to succeed Name of the file to be processed. Related topic:How to define variables from the Component view of the Talend Open Studio User Guide On the data of the table defined, you can perform: Insert: Add new entries to the table. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table

FileName

Action on data Schema and Edit schema

Create directory if not exists Append Advanced settings Additional JDBC parameters

Select this check box to create the as yet non existant file directory specified in the File name field. Select this check box to append new rows to the end of the file. Specify additional connection properties for the DB connection you are creating. You can press Ctrl+Space to access a list of predefined global variables. Select this check box to include the column header to the file. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Include header Encoding

tStatCatcher Statistics Select this check box to collect the log data at the component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded in the database. n/a

332

Talend Open Studio Components

Database components
tAccessOutputBulkExec

Related scenarios
For use cases in relation with tAccessOutputBulkExec, see the following scenarios: tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database

Talend Open Studio Components

333

Database components
tAccessRollback

tAccessRollback
tAccessRollback properties
This component is closely related to tAccessConnection and tAccessCommit components. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Access

Function Purpose Basic settings

tAccessRollback cancels the transaction committed in the connected DB. Avoids involuntary commitment of part of a transaction. Component list Select the tAccessConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Access components, especially with tAccessConnection and tAccessCommit. n/a

Related scenarios
For tAccessRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

334

Talend Open Studio Components

Database components
tAccessRow

tAccessRow
tAccessRow properties
Component family Databases/Access

Function

tAccessRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tAccessRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and select the appropriate tAccessConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Select the Access database version that you are using. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

DB Version Database Username and Password Schema and Edit Schema

Talend Open Studio Components

335

Database components
tAccessRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Table Name Query type Name of the source table where changes made to data should be captured. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased Commit every Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Commit every

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

336

Talend Open Studio Components

Database components
tAccessRow

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

337

Database components
tAS400Close

tAS400Close
tAS400Close properties
Component family Databases/AS400

Function Purpose Basic settings

tAS400Close closes the transaction committed in the connected DB. Close a transaction. Component list Select the tAS400Connection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with AS400 components, especially with tAS400Connection and tAS400Commit. n/a

Related scenario
No scenario is available for this component yet.

338

Talend Open Studio Components

Database components
tAS400Commit

tAS400Commit
tAS400Commit Properties
This component is closely related to tAS400Connection and tAS400Rollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/AS400

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tAS400Connection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tAS400Commit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with AS400 components, especially with tAS400Connection and tAS400Rollback components. n/a

Related scenario
This component is closely related to tAS400Connection and tAS400Rollback. It usually doesnt make much sense to use one of these without using a tAS400Connection component to open a connection for the current transaction. For tAS400Commit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

339

Database components
tAS400Connection

tAS400Connection
tAS400Connection Properties
This component is closely related to tAS400Commit and tAS400Rollback. It usually doesnt make much sense to use one of the components without using a tAS400Connection component to open a connection for the current transaction.
Component family Databases/AS400

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Database Username and Password Use or register a shared DB Connection Select the AS400 version in use Database server IP address Name of the database DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Additional JDBC parameters

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with AS400components, especially with tAS400Commit and tAS400Rollback components. n/a

340

Talend Open Studio Components

Database components
tAS400Connection

Related scenario
This component is closely related to tAS400Commit and tAS400Rollback. It usually doesnt make much sense to use one of these without using a tAS400Connection component to open a connection for the current transaction. For tAS400Connection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

341

Database components
tAS400Input

tAS400Input
tAS400Input properties
Component family Databases/AS400

Function Purpose

tAS400Input reads a database and extracts fields based on a query. tAS400SInput executes a DB query with a strictly defined statement which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Use an existing connection Select this check box and click the relevant tAS400Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. DB Version Host Port Database Select the AS 400 version in use Database server IP address Listening port number of DB server. Name of the database

Basic settings

342

Talend Open Studio Components

Database components
tAS400Input

Username and Password Schema and Edit Schema

DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topic, see tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

343

Database components
tAS400LastInsertId

tAS400LastInsertId
tAS400LastInsertId properties
Component family Databases

Function Purpose Basic settings

tAS400LastInsertId fetches the last inserted ID from a selected AS400 Connection. tAS400LastInsertId obtains the primary key value of the record that was last inserted in an AS400 table by a user. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flow charts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Component list Select the relevant tAS400Connection component in the list if more than one connection is planned for the current job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used as an intermediary component. n/a

Related scenario
For a related scenario, see Scenario: Get the ID for the last inserted record of the tMysqlLastInsertId component.

344

Talend Open Studio Components

Database components
tAS400Output

tAS400Output
tAS400Output properties
Component family Databases/DB2

Function Purpose

tAS400Output writes, updates, makes changes or suppresses entries in a database. tAS400Output executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. DB Version Use an existing connection Select the AS400 version in use Select this check box and click the relevant tAS400Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Database server IP address Listening port number of DB server. Name of the database

Basic settings

Talend Open Studio Components

345

Database components
tAS400Output

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

346

Talend Open Studio Components

Database components
tAS400Output

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to have access to the Commit every field where you can define the commit operation. Commit every: Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

Advanced settings

Use commit control

Additional Columns

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

347

Database components
tAS400Output

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a AS400 database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see tDBOutput Scenario: Displaying DB output. tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

348

Talend Open Studio Components

Database components
tAS400Rollback

tAS400Rollback
tAS400Rollback properties
This component is closely related to tAS400Commit and tAS400Connection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/AS400

Function Purpose Basic settings

tAS400Rollback cancels the transaction committed in the connected DB. Avoids involuntary commitment of part of a transaction. Component list Select the tAS400Connection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with AS400 components, especially with tAS400Connection and tAS400Commit. n/a

Related scenarios
For tAS400Rollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

Talend Open Studio Components

349

Database components
tAS400Row

tAS400Row
tAS400Row properties
Component family Databases/AS400

Function

tAS400Row is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tAS400Row acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Use an existing connection Select this check box and click the relevant tAS400Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Port Database Username and Password Select the AS400 version in use Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

350

Talend Open Studio Components

Database components
tAS400Row

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Additional JDBC Parameters

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

351

Database components
tAS400Row

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

352

Talend Open Studio Components

Database components
tCreateTable

tCreateTable
You can find this component at the root of Databases group of the Palette of Talend Open Studio. tCreateTable covers needs related indirectly to the use of any database.

tCreateTable Properties
Component family Databases

Function Purpose Basic settings

tCreateTable creates, drops and creates or clear the specified table. This Java specific component helps create or drop any database table Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Database Type Select the DBMS type from the list. The component properties may differ slightly according to the database type selected from the list. Select the action to be carried out on the database among: Create table: when you know already that the table doesnt exist. Create table if not exists: when you dont know whether the table is already created or not Drop table if exits and create: when you know that the table exists already and needs to be replaced. Select this check box if you want to save the created table temporarily.

Table Action

MySQL

Temporary table

Talend Open Studio Components

353

Database components
tCreateTable

MSSQLServer, MySQL, Oracle, PostgresPlus, Postgresql

Use an existing connection

Select this check box in case you use a database connection component, for example: tMysqlConnection or tOracleConnection, etc. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide.

Oracle Access

Connection Type Access File

Drop-down list of available drivers. Name and path of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Name and path of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Name and path of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Name and path of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select from the list a framework for your database. Select from the list the Server Mode that correspond to your DB setup. Select this check box to enable the secured mode, if required. Select the database version in use. Select the table type from the drop-down list. The type may be: - SET TABLE: tables which do not allow to duplicate. - MULTI SET TABLE: tables allowing duplicate rows

Firebird

Firebird File

Interbase

Interbase File

SQLite

SQLite File

JavaDb HSQLDb HSQLDb AS400/Oracle Teradata

Framework Type Running Mode Use TLS/SSL Sockets DB Version Create

354

Talend Open Studio Components

Database components
tCreateTable

All database types except Access, JavaDb, SQLite and ODBC

Host

Database server IP address

Database name All database types except Access, Firebird, HSQLDb, SQLite and ODBC JavaDb All database types except Access, AS400, Firebird, Interbase, JavaDb, SQLite and ODBC HSQLDb Informix ODBC DB Root Path Port

Name of the database.

Browse to your database root. Listening port number of the DB server.

DB Alias DB Server ODBC Name Username and Password Table name Schema and Edit Schema

Name of the database. Name of the database server. Name of the database. DB user authentication data. Type in between quotes a name for the newly created table. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various Jobs and projects. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings

tStatcatcher Statistics

Select this check box to gather the job processing metadata at a Job level as well as at each component level. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

AS400/ MSSQL Server

Additional JDBC Parameters

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries. More scenarios are available for specific DB Input components

Scenario: Creating new table in a Mysql Database


The Job described below aims at creating a table in a database, made of a dummy schema taken from a delimited file schema stored in the Repository. This Job is composed of a single component.

Talend Open Studio Components

355

Database components
tCreateTable

Drop a tCreateTable component from the Databases family in the Palette to the design workspace. In the Basic settings view, and from the Database Type list, select Mysql for this scenario.

From the Table Action list, select Create table. Select the Use Existing Connection check box only if you are using a dedicated DB connection component, see tMysqlConnection on page 594. In this example, we wont use this option. In the Property type field, select Repository so that the connection fields that follow are automatically filled in. If you have not defined your DB connection metadata in the DB connection directory under the Metadata node, fill in the details manually as Built-in. In the Table Name field, fill in a name for the table to be created. If you want to retrieve the Schema from the Metadata (it doesnt need to be a DB connection Schema metadata), select Repository then the relevant entry. In any case (Built-in or Repository) click Edit Schema to check the data type mapping.

356

Talend Open Studio Components

Database components
tCreateTable

Click the Reset DB Types button in case the DB type column is empty or shows discrepancies (marked in orange). This allows you to map any data type to the relevant DB data type. Click OK to validate your changes and close the dialog box. Save your Job and press F6 to execute it. The table is created empty but with all columns defined in the Schema.

Talend Open Studio Components

357

Database components
tDB2BulkExec

tDB2BulkExec
tDB2BulkExec properties
Component family Databases/DB2

Function Purpose Basic settings

tDB2BulkExec executes the Insert action on the data provided. As a dedicated component, tDB2BulkExec allows gains in performance during Insert operations to a DB2 database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Table Schema Username and Password Table Database server IP address Listening port number of DB server. Name of the database Name of the DB schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time

358

Talend Open Studio Components

Database components
tDB2BulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Drop table if exists and create: The table is removed if it already exists and created again. Clear table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, hence can reuse it. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit Schema

Data file

Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. Character, string or regular expression to separate fields. Use this field to define the way months and days are ordered. Use this field to define the way hours, minutes and seconds are ordered. Use this field to define the way date and time are ordered. When the box is ticked, tables blocked in "pending" status following a bulk load are de-blocked. Click + to add data loading options: Parameter: select a loading parameter from the list. Value: enter a value for the parameter selected.

Action on data

Advanced settings

Field terminated by Date Format Time Format Timestamp Format Remove load pending Load options

Talend Open Studio Components

359

Database components
tDB2BulkExec

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This dedicated component offers performance and flexibility of DB2 query handling.

Related scenarios
For tDB2BulkExec related topics, see: tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database. tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

360

Talend Open Studio Components

Database components

Talend Open Studio Components

361

Database components
tDB2Close

tDB2Close
tDB2Close properties
Component family Databases/DB2

Function Purpose Basic settings

tDB2Close closes the transaction committed in the connected DB. Close a transaction. Component list Select the tDB2Connection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with DB2 components, especially with tDB2Connection and tDB2Commit. n/a

Related scenario
No scenario is available for this component yet.

362

Talend Open Studio Components

Database components
tDB2Commit

tDB2Commit
tDB2Commit Properties
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/DB2

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tDB2Connection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tDB2Commit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with DB2 components, especially with tDB2Connection and tDB2Rollback components. n/a

Related scenario
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesnt make much sense to use one of these without using a tDB2Connection component to open a connection for the current transaction. For tDB2Commit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

363

Database components
tDB2Connection

tDB2Connection
tDB2Connection properties
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make much sense to use one of these without using a tDB2Connection to open a connection for the current transaction.
Component family Databases/DB2

Function Purpose Basic settings

tDB2Connection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host name Port Database Table Schema Username and Password Encoding Database server IP address. Listening port number of DB server. Name of the database. Name of the schema. DB user authentication data. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Use or register a shared DB Connection

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with DB2 components, especially with tDB2Commit and tDB2Rollback. n/a

364

Talend Open Studio Components

Database components
tDB2Connection

Related scenarios
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make much sense to use one of these without using a tDB2Connection component to open a connection for the current transaction. For tDB2Connection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

365

Database components
tDB2Input

tDB2Input
tDB2Input properties
Component family Databases/DB2

Function Purpose

tDB2Input reads a database and extracts fields based on a query. tDB2Input executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Name of the schema. DB user authentication data.

Basic settings

366

Talend Open Studio Components

Database components
tDB2Input

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Select the source table where to capture any changes made on data.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Trim all the String/Char columns Trim column Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for DB2 databases.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also the related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

367

Database components
tDB2Output

tDB2Output
tDB2Output properties
Component family Databases/DB2

Function Purpose

tDB2Output writes, updates, makes changes or suppresses entries in a database. tDB2Output executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Table schema Database server IP address Listening port number of DB server. Name of the database Name of the DB schema.

Basic settings

368

Talend Open Studio Components

Database components
tDB2Output

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Talend Open Studio Components

369

Database components
tDB2Output

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Convert columns and table names to uppercase Enable debug mode Support null in SQL WHERE statement Select this check box to customize a request, especially when there is double action on data. Select this check box to uppercase the names of the columns and the name of the table. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema. Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Insert, the Update or the Delete option in the Action on data field.

Advanced settings

Commit every

Additional Columns

Use batch size

tStatCatcher Statistics Select this check box to collect log data at the component level.

370

Talend Open Studio Components

Database components
tDB2Output

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a DB2 database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tDB2Output related topics, see tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

371

Database components
tDB2Rollback

tDB2Rollback
tDB2Rollback properties
This component is closely related to tDB2Commit and tDB2Connection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/DB2

Function Purpose Basic settings

tDB2Rollback cancels the transaction committed in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tDB2Connection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with DB2 components, especially with tDB2Connection and tDB2Commit. n/a

Related scenarios
For tDB2Rollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

372

Talend Open Studio Components

Database components
tDB2Row

tDB2Row
tDB2Row properties
Component family Databases/DB2

Function

tDB2Row is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tDB2Row acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

Talend Open Studio Components

373

Database components
tDB2Row

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Die on error

Advanced settings

Propagate QUERYs recordset Commit every

Use PreparedStatement

Related scenarios
For tDB2Row related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment
374 Talend Open Studio Components

Database components
tDB2Row

tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

375

Database components
tDB2SCD

tDB2SCD
tDB2SCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tDB2SCD.

376

Talend Open Studio Components

Database components
tDB2SCDELT

tDB2SCDELT
tDB2SCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tDB2SCDELT.

Talend Open Studio Components

377

Database components
tDB2SP

tDB2SP
tDB2SP properties
Component family Databases/DB2

Function Purpose Basic settings

tDB2SP calls the database stored procedure. tDB2SP offers a convenient way to centralize multiple or complex queries in a database and call them easily. Use an existing connection Select this check box and click the relevant tDB2Connection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

378

Talend Open Studio Components

Database components
tDB2SP

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SP Name Is Function / Return result in Parameters Type in the exact name of the Stored Procedure Check this box, if a value only is to be returned. Select on the list the schema column, the value to be returned is based on. Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are parameters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax.

Related scenarios
For related topic, see tMysqlSP Scenario: Finding a State Label using a stored procedure. Check as well the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Talend Open Studio Components

379

Database components
tDBInput

tDBInput
tDBInput properties
Component family Databases/DB Generic tDBInput reads a database and extracts fields based on a query. tDBInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. For performance reasons, a specific Input component (e.g.: tMySQLInput for MySQL database) should always be preferred to the generic component. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Connection type Database Schema Username and Password Schema and Edit Schema Drop-down list of available DBMS drivers. Name of the database Exact name of the schema DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Table Name Query type Name of the source table where changes made to data should be captured. Either Built-in or Repository.

Function Purpose

Basic settings

380

Talend Open Studio Components

Database components
tDBInput

Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Advanced settings

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries using a generic ODBC connection.

Scenario 1: Displaying selected data from DB table


The following scenario creates a two-component Job, reading data from a database using a DB query and outputting delimited data into the standard output (console).

Drop a tDBInput and tLogRow component from the Palette to the design workspace. Right-click on the tDBInput component and select Row > Main. Drag this main row link onto the tLogRow component and release when the plug symbol displays. Double-click the tDBInput so the Component view shows up, and define the properties:

The component property data are Built-In for this scenario.


Talend Open Studio Components 381

Database components
tDBInput

Fill in the database name, the username and password in the corresponding fields. The schema is Built-In.This means that it is available for this Job and on this station only. Click on Edit Schema and create a 2-column description including shop code and sales. Enter the table name in the corresponding field. Type in the query making sure it includes all columns in the same order as defined in the Schema. In this case, as well select all columns of the schema, the asterisk symbol makes sense. Click on the second component to define it. Enter the fields separator. In this case, a pipe separator. Now go to the Run tab, and click on Run to execute the Job. The DB is parsed and queried data is extracted from the specified table and passed on to the job log console. You can view the output file straight on the console.

Scenario 2: Using StoreSQLQuery variable


StoreSQLQuery is a variable that can be used to debug a tDBInput scenario which does not operate correctly. It allows you to dynamically feed the SQL query set in your tDBInput component. Use the same scenario as scenario 1 above and add a third component, tJava. Connect tDBInput to tJava using a trigger connection of the OnComponentOk type. In this case, we want the tDBInput to run before the tJava component.

382

Talend Open Studio Components

Database components
tDBInput

Set both tDBInput and tLogRow component as in tDBInput scenario 1. Click anywhere on the design workspace to display the Contexts property panel. Create a new parameter called explicitly StoreSQLQuery. Enter a default value of 1. This value of 1 means the StoreSQLQuery is true for a use in the QUERY global variable. Click on the tJava component to display the Component view. Enter the System.Out.println()command to display the query content, press Ctrl+Space bar to access the variable list and select the global variable QUERY.

Go to your Run tab and execute the Job. The query entered in the tDBInput component shows at the end of the job results, on the log:

Talend Open Studio Components

383

Database components
tDBOutput

tDBOutput
tDBOutput properties
Component family Databases

Function Purpose

tDBOutput writes, updates, makes changes or suppresses entries in a database. tDBOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Specific Output component should always be preferred to generic component. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Connection type Database Username and Password Table List of available drivers. Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time

Basic settings

384

Talend Open Studio Components

Database components
tDBOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Clear data in table Schema and Edit schema

Select this check box to delete data in the selected table before any operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution.

Advanced settings

Commit every

Talend Open Studio Components

385

Database components
tDBOutput

Additional Columns

This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After depending on the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Use field options Enable debug mode

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Scenario: Displaying DB output


This following scenario is a three-component Job aiming at creating a new table in the database defined and filling it with data. The tFileInputdelimited passes on the Input flow to the tDBOutput component. As the content of a DB is not viewable as such, a tLogRow component is used to display the main flow on the Run console.

Drop the three components required for this Job from the Palette to the design workspace. On the Basic settings tab of tFileInputDelimited, define the input flow parameters. In this use case, the file contains cars owner id, makes, color and registration references organized as follows: semi-colon as field separator, carriage return as row separator. The input file contains a header row to be considered in the schema. If this file is already described in your metadata, you can retrieve the properties by selecting the relevant repository entry list.
386 Talend Open Studio Components

Database components
tDBOutput

And also, if your schema is already loaded in the Repository, select Repository as Property Type and choose the relevant metadata entry in the list. If you havent defined the schema already, define the data structure in the built-in schema you edit. Then define the tDBOutput component to configure the output flow. Select the database to connect to. Note that you can store all the database connection details in different context variables. For more information about how to create and use context variables How to centralize contexts and variables in Talend Open Studio User Guide.

Fill in the table name in the Table field. Then select the operations to be performed: As Action on table, select Drop and create table in the list. This allows you to overwrite the possible existing table with the new selected data. Alternatively you can insert only extra rows into an existing table, but note that duplicate management is not supported natively. See tUniqRow Properties for further information. As Action on data, select Insert. The data flow incoming as input will be thus added to the selected table. To view the output flow easily, connect the DBOuput component to an tLogRow component. Define the field separator as a pipe symbol. Press F6 to execute the Job. As the processing can take some time to reach the tLogRow component, we recommend you to enable the Statistics functionality on the Run console.

Talend Open Studio Components

387

Database components
tDBOutput

Related topic: tMysqlOutput properties

388

Talend Open Studio Components

Database components
tDBSQLRow

tDBSQLRow
tDBSQLRow properties
Component family Databases/DB Generic tDBSQLRow is the generic component for database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. For performance reasons, specific DB component should always be preferred to the generic component. Depending on the nature of the query and the database, tDBSQLRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Database Username and Password Schema and Edit Schema Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Table Name Query Name of the source table where changes made to data should be captured. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Function

Purpose

Basic settings

Talend Open Studio Components

389

Database components
tDBSQLRow

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries. Note: Use the relevant DBRow component according to the DB type you use. Most of databases have their specific DBRow components.

Scenario: Resetting a DB auto-increment


This scenario describes a single component Job which aims at re-initializing the DB auto-increment to 1. This job has no output and is generally to be used before running a script.

Drag and drop a tDBSQLRow component from the Palette to the design workspace. On the Basic settings panel, fill in the DB connection properties.

390

Talend Open Studio Components

Database components
tDBSQLRow

The general connection information to the database is stored in the Repository. The Database Driver is a generic ODBC driver. The Schema is built-in for this Job and describes the Talend database structure. The schema doesnt really matter for this particular instance of Job as the action is made on the table auto-increment and not on data. The Query type is also built-in. Click on the three dot button to launch the SQLbuilder editor, or else type in directly in the Query area: Alter table <TableName> auto_increment = 1 Then click OK to validate the Basic settings. Then press F6 to run the Job. The database autoincrement is reset to 1. Related topics: tMysqlRow properties.

Talend Open Studio Components

391

Database components
tEXAInput

tEXAInput
tEXAInput properties
Component Family Databases/EXA

Function Purpose

tEXAInput reads databases and extracts fields using queries. tEXAInput executes queries in databases according to a strict order which must correspond exactly to that defined in the schema. The list of fields retrieved is then transmitted to the following component via a Main row link. Property type Built-in or Repository. Built-in: No properties stored centrally Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host name Port Schema name Username et Password Schema and Edit schema Database server IP address. Listening port number of the DB server Enter the schema name. User authentication information. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide Table Name Enter the table name.

Basic settings

Query type and Query Enter your database query, taking care to ensure that the order of the fields corressponds exactly to that defined in the schema. Guess Query Guess schema Click this button to generate a query that corresponds to your table schema in the Query field. Click this button to retrieve the schema from the table.

392

Talend Open Studio Components

Database components
tEXAInput

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to delete the spaces at the start and end of fields in all of the columns containing strings. Deletes the spaces from the start and end of fields in the selected columns.

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect the log data and a component level. Usage This component covers all possible SQL queries for EXA databases.

Related scenarios
For scenarios in which tEXAInput might be used, see the following tBIInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table Scenario 2: Using StoreSQLQuery variable Scenario: Writing dynamic columns from a MySQL database to an output file.

Talend Open Studio Components

393

Database components
tEXAOutput

tEXAOutput
tEXAOutput properties
Famille de composant Databases/EXA

Function Purpose Basic settings

tEXAOutput writes, updates, modifies or deletes data from databases. tEXAOutput executes the action defined on the table and/or on the table data, depending on the function of the input flow, from the preceding component. Property type Built-in or Repository. Built-in: No properties stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of User Guide. Host Port Schema name Username and Password Table Action on table Database server IP address. Listening port number of the DB serve. Enter the schema name. User authentication data. Name of the table to be created. You can only create one table at a time. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted.

394

Talend Open Studio Components

Database components
tEXAOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation .

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this box to display the Commit every field in which you can define the number of rows to be processed brefore comitting. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables.

Advanced settings

Use commit control

Additional JDBC parameters

Talend Open Studio Components

395

Database components
tEXAOutput

Additional Columns

This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing Name: Enter the name of the column to be modified or inserted. SQL expression: Enter the SQL expression to be executed to modify or insert data in the corresponding columns. Position : Select Before, Replace or After, depending on the action to be carried out on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Use field options

Select this check box to customize a request, particularly when there are several actions to be carried out on the data. Select this check box to display each step of the process by which the data is written in the database.

Enable debug mode

tStatCatcher Statistics Select this check box to collect the log data at a component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in an EXA database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For a user scenario, see the tMySqlOutput example, Scenario 3: Retrieve data in error with a Reject link.

Related scenario
For a scenario in which tEXAOutput might be used, see: the tDBOutput Scenario: Displaying DB output the tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table

396

Talend Open Studio Components

Database components
tEXARow

tEXARow
tEXARow properties
Component Family Databases/EXA

Function

The tEXARow component is specific to this type of database. It executes SQL queries on specified databases. The Row suffix indicates that it is used to channel a flow in a Job although it doesnt produce any output data. Depending on the nature of the query and the database, tEXARow acts on the actual structure of the database, or indeed the data, although without modifying them. Property type Built-in ou Repository. Built-in: No properties stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Schema name Username and Password Schema and Edit Schema Database server IP address. Listening port number of the DB server. Enter the schema name. User authentication information. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide Table Name Query type Name of the table to be processed. Built-in or Repository. Built-in: Enter the query manually or with the help of the SQLBuilder. Repository: Select the appropriate query from the Repository. The Query field is then completed automatically. Guess Query Click the Guess Query button to generate the query that corresponds to the table schema in the Query field.

Purpose

Basic settings

Talend Open Studio Components

397

Database components
tEXARow

Query Die on error

Enter your query, taking care to ensure that the field order matches that defined in the schema. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the query results in one of the flow columns. Select the particular column from the use column list. Number of rows to be included in the batch before the data is written. This option guarantees the quality of the transaction (although there is no rollback option) and improves performance.

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Commit every

tStatCatcher Statistics Select this check box to collect the log data at a component level. Usage This component offers query flexibility as it covers all possible SQL query requirements.

Related scenarios
For a scenario in which tEXARow might be used, see: the tDBSQLRow Scenario: Resetting a DB auto-increment the tMySQLRow Scenario 1: Removing and regenerating a MySQL table index

398

Talend Open Studio Components

Database components
tEXistConnection

tEXistConnection
tEXistConnection properties
This component is closely related to tEXistGet and tEXistPut. Once you have set the connection properties in this component, you have the option of reusing the connection without having to set the properties again for each tEXist component used in the Job.
Component family Databases/eXist

Function Purpose Basic settings

tEXistConnection opens a connection to an eXist database in order that a transaction may be carried out. Opens a connection to an eXist database in order that a transaction may be carried out. URI Collection Driver URI of the database you want to connect to. Enter the path to the collection of interest on the database server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. User authentication information.

Username and Password Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. This component is to be used along with the other tEXist components such as tEXistGet and tEXistPut. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Usage

Limitation

Related scenarios
This component is closely related to tEXistGet and tEXistPut. It usually does not make much sense to use one of these without using a tEXistConnection component to open a connection for the current transaction. For tEXistConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

399

Database components
tEXistDelete

tEXistDelete
tEXistDelete properties
Component family Databases/eXist

Function Purpose Basic settings

This component deletes resources from an eXist database. tEXistDelete deletes specified resources from remote eXist databases. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter the path to the collection of interest on the database server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. User authentication information. Either Resource, Collection, or All. Click the plus button to add the lines you want to use as filters: Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Username and Password Target Type Files

Advanced settings

tStatCatcher Statistics

400

Talend Open Studio Components

Database components
tEXistDelete

Usage

This component is typically used as a single component sub-job but can also be used as an output or end object. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

401

Database components
tEXistGet

tEXistGet
tEXistGet properties
Component family Databases/eXist

Function Purpose Basic settings

This component retrieves resources from a remote eXist DB server. tEXistGet downloads selected resources from a remote DB server to a defined local directory. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter the path to the collection of interest on the database server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. User authentication information. Path to the files destination location. Click the plus button to add the lines you want to use as filters: Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions

Username and Password Local directory Files

402

Talend Open Studio Components

Database components
tEXistGet

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is typically used as a single component sub-job but can also be used as an output or end object. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Scenario: Retrieve resources from a remote eXist DB server


This is a single-component Job that retrieves data from a remote eXist DB server and download the data to a defined local directory. This simple Job requires one component: tEXistGet.

Drop the tEXistGet component from the Palette into the design workspace. Double-click the tEXistGet component to open the Component view and define the properties in its Basic settings view.

Talend Open Studio Components

403

Database components
tEXistGet

Fill in the URI field with the URI of the eXist database you want to connect to. In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in this use case is for demonstration purpose only and is not an active address. Fill in the Collection field with the path to the collection of interest on the database server, /db/talend in this scenario. Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this scenario. Fill in the Username and Password fields by typing in admin and talend respectively in this scenario. Click the three-dot button next to the Local directory field to set a path for saving the XML file downloaded from the remote database server. In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Desktop/ExistGet. In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a complete file name to retrieve data from a particular file on the server, or a filemask to retrieve data from a set of files. In this scenario, fill in dictionary_en.xml. Save your Job and press F6 to execute it.

404

Talend Open Studio Components

Database components
tEXistGet

The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.

Talend Open Studio Components

405

Database components
tEXistList

tEXistList
tEXistList properties
Component family Databases/eXist

Function Purpose Basic settings

This component lists the resources stored on a remote DB server. tEXistList lists the resources stored on a remote database server. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter the path to the collection of interest on the database server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. Server authentication information. Click the plus button to add the lines you want to use as filters:. Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions. Either Resource, Collection or All contents:

Username and Password Files

Target Type

406

Talend Open Studio Components

Database components
tEXistList

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is typically used along with a tEXistGet component to retrieve the files listed, for example. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

407

Database components
tEXistPut

tEXistPut
tEXistPut properties
Component family Databases/eXist

Function Purpose Basic settings

This component uploads resources to a DB server. tEXistPut uploads specified files from a defined local directory to a remote DB server. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter a path to indicate where the resource is to be saved on the server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. User authentication information. Path to the source location of the file(s). Click the plus button to add the lines you want to use as filters:. Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions.

Username and Password Local directory Files

408

Talend Open Studio Components

Database components
tEXistPut

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is typically used as a single component sub-job but can also be used as an output or end object. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

409

Database components
tEXistXQuery

tEXistXQuery
tEXistXQuery properties
Component family Databases/eXist

Function Purpose Basic settings

This component uses local files containing XPath queries to query XML files stored on remote databases. tEXistXQuery queries XML files located on remote databases and outputs the results to an XML file stored locally. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter the path to the XML file location on the database. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. DB server authentication information. Browse to the local file containing the query to be executed. Browse to the directory in which the query results should be saved.

Username and Password XQuery Input File Local Output

410

Talend Open Studio Components

Database components
tEXistXQuery

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is typically used as a single component Job but can also be used as part of a more complex Job. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

411

Database components
tEXistXUpdate

tEXistXUpdate
tEXistXUpdate properties
Component family Databases/eXist

Function Purpose Basic settings

This component processes XML file records and updates the records on the DB server. tEXistXUpdate processes XML file records and updates the existing records on the DB server. Use an existing Select this check box and click the relevant connection/Compon tEXistConnection component on the ent List Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. URI Collection Driver URI of the database you want to connect to. Enter the path to the collection and file of interest on the database server. This field is automatically populated with the standard driver. Users can enter a different driver, depending on their needs. DB server authentication information. Browse to the local file in the local directory to be used to update the records on the database. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Username and Password Update File Advanced settings tStatCatcher Statistics

412

Talend Open Studio Components

Database components
tEXistXUpdate

Usage

This component is typically used as a single component sub-job but can also be used as part of a more complex Job. eXist-db is an open source database management system built using XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. For further information about XQuery, see http://exist.sourceforge.net/xquery.html. For further information about the XQuery update extension, see http://exist.sourceforge.net/update_ext.html. n/a

Limitation

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

413

Database components
tFirebirdClose

tFirebirdClose
tFirebirdClose properties
Component family Databases/Firebird

Function Purpose Basic settings

tFirebirdClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tFirebirdConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Firebird components, especially with tFirebirdConnection and tFirebirdCommit. n/a

Related scenario
No scenario is available for this component yet.

414

Talend Open Studio Components

Database components
tFirebirdCommit

tFirebirdCommit
tFirebirdCommit Properties
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Firebird

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tFirebirdConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tFirebirdCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Firebird components, especially with tFirebirdConnection and tFirebirdRollback components. n/a

Related scenario
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually doesnt make much sense to use one of these without using a tFirebirdConnection component to open a connection for the current transaction. For tFirebirdCommit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

415

Database components
tFirebirdConnection

tFirebirdConnection
tFirebirdConnection properties
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not make much sense to use one of these without using a tFirebirdConnection to open a connection for the current transaction.
Component family Databases/Firebird

Function Purpose Basic settings

tFirebirdConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host name Database Username and Password Use or register a shared DB Connection Database server IP address. Name of the database. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with Firebird components, especially with tFirebirdCommit and tFirebirdRollback. n/a

Related scenarios
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not make much sense to use one of these without using a tFirebirdConnection component to open a connection for the current transaction.

416

Talend Open Studio Components

Database components
tFirebirdConnection

For tFirebirdConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

417

Database components
tFirebirdInput

tFirebirdInput
tFirebirdInput properties
Component family Databases/FireBird

Function Purpose

tFirebirdInput reads a database and extracts fields based on a query. tFirebirdInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit schema Database server IP address Listening port number of the DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Basic settings

Advanced Settings

Trim all the String/Char columns Trim column

Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for FireBird databases.

418

Talend Open Studio Components

Database components
tFirebirdInput

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

419

Database components
tFirebirdOutput

tFirebirdOutput
tFirebirdOutput properties
Component family Databases/FireBird

Function Purpose

tFirebirdOutput writes, updates, makes changes or suppresses entries in a database. tFirebirdOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Table Action on table Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted.

Basic settings

420

Talend Open Studio Components

Database components
tFirebirdOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing.

Advanced settings

Commit every

Additional Columns

Talend Open Studio Components

421

Database components
tFirebirdOutput

Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Support null in SQL WHERE statement Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Firebird database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

422

Talend Open Studio Components

Database components
tFirebirdRollback

tFirebirdRollback
tFirebirdRollback properties
This component is closely related to tFirebirdCommit and tFirebirdConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Firebird

Function Purpose Basic settings

tFirebirdRollback cancels the transaction committed in the connected database. This component avoids to commit part of a transaction involuntarily.. Component list Select the tFirebirdConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Firebird components, especially with tFirebirdConnection and tFirebirdCommit. n/a

Related scenario
For tFirebirdRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of tMysqlRollback component.

Talend Open Studio Components

423

Database components
tFirebirdRow

tFirebirdRow
tFirebirdRow properties
Component family Databases/FireBird

Function

tFirebirdRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tFirebirdRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tFirebirdConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Database Username and Password Schema and Edit Schema Database server IP address Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

424

Talend Open Studio Components

Database components
tFirebirdRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased Commit every Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment
Talend Open Studio Components 425

Database components
tFirebirdRow

tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

426

Talend Open Studio Components

Database components
tGreenplumBulkExec

tGreenplumBulkExec

tGreenplumBulkExec Properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database. These two steps are fused together in the tGreenplumOutputBulkExec component, detailed in a separate section. The advantage using a two step process is that it makes it possible to transform data before it is loaded in the database.
Component Family Databases/Greenplum

Function Purpose Basic settings

tGreenplumBulkExec performs an Insert action on the data. tGreenplumBulkExec is a component which is specifically designed to improve performance when loading data in ParAccel database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box if you use a configured tGreenplumConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address. Listening port number of DB server. Name of the database. Exact name of the schema. DB user authentication data.

Talend Open Studio Components

427

Database components
tGreenplumBulkExec

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. Path and name of the file to be processed. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Filename Schema and Edit Schema

Advanced settings

Action on data

Select the operation you want to perform: Bulk insert Bulk update The details asked will be different according to the action chosen.

Copy the OID for each Retrieve the ID item for each row. row Contains a header line Specify that the table contains header. with the names of each column in the file File type Null string Fields terminated by Escape char Text enclosure Force not null for columns Select the file type to process. String displayed to indicate that the value is null. Character, string or regular expression to separate fields. Character of the row to be escaped Character used to enclose text. Define the columns nullability Force not null:: Select the check box next to the column you want to define as not null.

tStatCatcher Statistics Select this check box to collect log data at the component level.

428

Talend Open Studio Components

Database components
tGreenplumBulkExec

Usage

This component is generally used with a tGreenplumOutputBulk component. Used together they offer gains in performance while feeding a Greenplum database.

Related scenarios
For more information about tGreenplumBulkExec, see: Scenario: Inserting transformed data in MySQL database from tMysqlOutputBulk. Scenario: Inserting data in MySQL database from tMysqlOutputBulkExec. Scenario: Truncating and inserting file data into Oracle DB from tOracleBulkExec.

Talend Open Studio Components

429

Database components
tGreenplumClose

tGreenplumClose
tGreenplumClose properties
Component family Databases/Greenplum

Function Purpose Basic settings

tGreenplumClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tGreenplumConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Greenplum components, especially with tGreenplumConnection and tGreenplumCommit. n/a

Related scenario
No scenario is available for this component yet.

430

Talend Open Studio Components

Database components
tGreenplumCommit

tGreenplumCommit
tGreenplumCommit Properties
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Greenplum

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tGreenplumConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tGreenplumCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Greenplum components, especially with tGreenplumConnection and tGreenplumRollback components. n/a

Related scenario
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually doesnt make much sense to use one of these without using a tGreenplumConnection component to open a connection for the current transaction. For tGreenplumCommit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

431

Database components
tGreenplumConnection

tGreenplumConnection
tGreenplumConnection properties
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually does not make much sense to use one of these without using a tGreenplumConnection to open a connection for the current transaction.
Component family Databases/Greenplum

Function Purpose Basic settings

tGreenplumConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Use or register a shared DB Connection Database server IP address. Listening port number of DB server. Name of the database. Exact name of the schema. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with Greenplum components, especially with tGreenplumCommit and tGreenplumRollback. n/a

432

Talend Open Studio Components

Database components
tGreenplumConnection

Related scenarios
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually does not make much sense to use one of these without using a tGreenplumConnection component to open a connection for the current transaction. For tGreenplumConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

433

Database components
tGreenplumInput

tGreenplumInput
tGreenplumInput properties
Component family Databases/Greenplum

Function Purpose

tGreenplumInput reads a database and extracts fields based on a query. tGreenplumInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Schema Username and Password Schema and Edit schema Database server IP address. Listening port number of DB server. Name of the database. Exact name of the schema. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field.

Basic settings

434

Talend Open Studio Components

Database components
tGreenplumInput

Guess schema Advanced settings Use cursor Trim all the String/Char columns Trim column

Click the Guess schema button to retrieve the table schema. When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for FireBird databases.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

435

Database components
tGreenplumOutput

tGreenplumOutput

tGreenplumOutput Properties
Component Family Databases/Greenplum

Function Purpose Basic settings

tGreenplumOutput writes, updates, modifies or deletes the data in a database. tGreenplumOutput executes the action defined on the table and/or on the data of a table, according to the input flow form the previous component. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box if you use a configured tGreenplumConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address. Listening port number of DB server. Name of the database DB user authentication data.

436

Talend Open Studio Components

Database components
tGreenplumOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Talend Open Studio Components

437

Database components
tGreenplumOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options

Select this check box to customize a request, especially when there is double action on data.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Greenplum databases. It allows you to carry out actions on a table or on the data of a table in a Greenplum database. It enables you to create a reject flow, with a Row > Rejects link filtering the data in error. For a usage example, see Scenario 3: Retrieve data in error with a Reject link from component tMysqlOutput.

Related scenarios
For a related scenario, see: Scenario: Displaying DB output from tDBOutput. Scenario 1: Adding a new column and altering data in a DB table from tMySQLOutput.

438

Talend Open Studio Components

Database components
tGreenplumOutputBulk

tGreenplumOutputBulk
tGreenplumOutputBulk properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tGreenplumOutputBulkExec component, detailed in a separate section. The advantage of using a two step process is that it makes it possible to transform data before it is loaded in the database.
Component family Databases/Greenplum

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the Greenplum standards Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the records A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check to include the column header.

Talend Open Studio Components

439

Database components
tGreenplumOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to collect log data at the component level.

tStaCatcher statistics Usage

This component is to be used along with tGreenplumBulkExec component. Used together they offer gains in performance while feeding a Greenplum database.

Related scenarios
For use cases in relation with tGreenplumOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

440

Talend Open Studio Components

Database components
tGreenplumOutputBulkExec

tGreenplumOutputBulkExec
tGreenplumOutputBulkExec properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tGreenplumOutputBulkExec component.
Component family Databases/Greenplum

Function Purpose Basic settings

Executes the action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Greenplum database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database name Schema Username and Password Table Database server IP address. Listening port number of DB server. Name of the database. Exact name of the schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. You have the possibility to rollback the operation. Name of the file to be processed. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Action on table

File Name Schema and Edit Schema

Talend Open Studio Components

441

Database components
tGreenplumOutputBulkExec

Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Advanced settings Action on data Select the operation you want to perform: Bulk insert Bulk update The details asked will be different according to the action chosen.

Copy the OID for each Retrieve the ID item for each row. row Contains a header line Specify that the table contains header. with the names of each column in the file File type Null string Row separator Fields terminated by Escape char Text enclosure Force not null for columns tStatCatcherStatistics Usage Limitation Select the file type to process. String displayed to indicate that the value is null. String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Character of the row to be escaped Character used to enclose text. Define the columns nullability Force not null: Select the check box next to the column you want to define as not null. Select this check box to collect log data at the component level.

This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tGreenplumOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

442

Talend Open Studio Components

Database components
tGreenplumRollback

tGreenplumRollback
tGreenplumRollback properties
This component is closely related to tGreenplumCommit and tGreenplumConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Greenplum

Function Purpose Basic settings

tGreenplumRollback cancels the transaction committed in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tGreenplumConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Greenplum components, especially with tGreenplumConnection and tGreenplumCommit. n/a

Related scenarios
For tGreenplumRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

Talend Open Studio Components

443

Database components
tGreenplumRow

tGreenplumRow
tGreenplumRow Properties
Component Family Databases/Greenplum

Function

tGreenplumRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tGreenplumRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tFirebirdConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data.

Purpose

Basic settings

444

Talend Open Studio Components

Database components
tGreenplumRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name Query type

Name of the table to be read. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder. Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Guess Query

Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Query

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Talend Open Studio Components 445

Database components
tGreenplumRow

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For a related scenario, see: Scenario: Resetting a DB auto-increment from tDBSQLRow. Scenario 1: Removing and regenerating a MySQL table index from tMySQLRow.

446

Talend Open Studio Components

Database components
tGreenplumSCD

tGreenplumSCD
tGreenplumSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tGreenplumSCD.

Talend Open Studio Components

447

Database components
tHiveClose

tHiveClose
tHiveClose properties
Component Family Databases/Hive

Function Purpose Basic settings Advanced settings Usage Limitation

tHiveClose closes an active connection to a database. This component closes connection to a Hive databases. Component list If there is more than one connection used in the Job, select tHiveConnection from the list.

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is generally used as an input component. It requires an output component. n/a

Related scenario
This component is for use with tHiveConnection. It is generally used along with tHiveConnection as the latter allows you to open a connection for the transaction which is underway. For a scenario in which tHiveClose might be used, see tMysqlConnection.

448

Talend Open Studio Components

Database components
tHiveConnection

tHiveConnection
tHiveConnection properties
Database Family Databases/Hive

Function Purpose Basic settings

tIHiveConnection opens a connection to a database in order that a transaction may be made. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username et Password Encoding Database server IP address. DB server listening port. Name of the database. DB user authentication data. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Use or register a shared DB Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is generally used with other Informix components, particularly tHiveClose. n/a

Related scenario
For a scenario in which tHiveConnection, might be used, see Scenario: Inserting data in mother/daughter tables.

Talend Open Studio Components

449

Database components
tHiveRow

tHiveRow
tHiveRow properties
Component family Databases/Hive

Function

tHiveRow is the dedicated component for this database. It executes the SQL query stated in the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tHiveRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write your SQL statements easily. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Use an existing connection Select this check box and click the relevant tHiveConnection component from the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

450

Talend Open Studio Components

Database components
tHiveRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Table Name Query type Name of the table to be processed. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list.

Query

Die on error

Advanced settings

Propagate QUERYs recordset

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the benefit of flexible DB queries and covers all possible Hive QL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

451

Database components
tHSQLDbInput

tHSQLDbInput
tHSQLDbInput properties
Component family Databases/HSQLDb

Function Purpose

tHSQLDbInput reads a database and extracts fields based on a query. tHSQLDbInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Running Mode Select on the list the Server Mode corresponding to your DB setup among the four propositions : HSQLDb Server, HSQLDb WebServer, HSQLDb In Process Persistent, HSQLDb In Memory. select this check box to enable the secured mode if required. Database server IP address Listening port number of DB server. Alias name of the database DB user authentication data. Specify the directory to the database you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode. By default, if the database you specify in this field does not exist, it will be created automatically. If you want to change this default setting, modify the connection parameter set in the Additional JDBC parameter field in the Advanced settings view Db name Enter the database name that you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode and the HSQLDb In Memory running mode.

Basic settings

Use TLS/SSL sockets Host Port Database Alias Username and Password DB path

452

Talend Open Studio Components

Database components
tHSQLDbInput

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. When the runding mode is HSQLDb In Process Persistent , this additional property is set as ifexists=true by default, meaning that the database will be automatically created when needed. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Global Variables This component covers all possible SQL queries for HSQLDb databases. Number of Lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. Query: Indicates the query to be processed. This is available as a Flow variable. Returns a string For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide. Connections Outgoing links (from one component to another): Row: Main; Iterate Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. Incoming links (from one component to another): Row: Iterate; Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Talend Open Studio Components

453

Database components
tHSQLDbInput

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable Scenario: Writing dynamic columns from a MySQL database to an output file.

454

Talend Open Studio Components

Database components
tHSQLDbOutput

tHSQLDbOutput
tHSQLDbOutput properties
Component family Databases/HSQLDb

Function Purpose

tHSQLDbOutput writes, updates, makes changes or suppresses entries in a database. tHSQLDbOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Running Mode Select on the list the Server Mode corresponding to your DB setupamong the four propositions : HSQLDb Server, HSQLDb WebServer, HSQLDb In Process Persistent, HSQLDb In Memory. Select this check box to enable the secured mode if required. Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Specify the directory to the database you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode. By default, if the database you specify in this field does not exist, it will be created automatically. If you want to change this default setting, modify the connection parameter set in the Additional JDBC parameter field in the Advanced settings view

Basic settings

Use TLS/SSL sockets Host Port Database Username and Password DB path

Talend Open Studio Components

455

Database components
tHSQLDbOutput

Db name

Enter the database name that you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode and the HSQLDb In Memory running mode. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Table Action on table

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

456

Talend Open Studio Components

Database components
tHSQLDbOutput

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. When the runding mode is HSQLDb In Process Persistent , this additional property is set as ifexists=true by default, meaning that the database will be automatically created when needed. You can press Ctrl+Space to access a list of predefined global variables. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

Advanced settings

Additional JDBC parameters

Commit every

Additional Columns

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

457

Database components
tHSQLDbOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a MySQL database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link Number of Lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. NB line Updated: Indicates the number of lines updated. This is available as an After variable. Returns an integer. NB line Inserted: Indicates the number of lines inserted. This is available as an After variable. Returns an integer. NB line Deleted: Indicates the number of lines deleted. This is available as an After variable. Returns an integer. NB line Rejected: Indicates the number of lines rejected. This is available as an After variable. Returns an integer Query: Indicates the query to be processed. This is available as a After variable. Returns a string For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide. Outgoing links (from one component to another): Row: Main; Reject Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. Incoming links (from one component to another): Row: Main; Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Global Variables

Connections

Related scenarios
For related topics, see tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

458

Talend Open Studio Components

Database components
tHSQLDbRow

tHSQLDbRow
tHSQLDbRow properties
Component family Databases/HSQLDb

Function

tHSQLDbRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tHSQLDbRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Running Mode Select on the list the Server Mode corresponding to your DB setup among the four propositions : HSQLDb Server, HSQLDb WebServer, HSQLDb In Process Persistent, HSQLDb In Memory. Select this check box to enable the secured mode if required. Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Specify the directory to the database you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode. By default, if the database you specify in this field does not exist, it will be created automatically. If you want to change this default setting, modify the connection parameter set in the Additional JDBC parameter field in the Advanced settings view Database Enter the database name that you want to connect to. This field is available only to the HSQLDb In Process Persistent running mode and the HSQLDb In Memory running mode. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. 459

Purpose

Basic settings

Use TLS/SSL sockets Host Port Database Alias Username and Password DB path

Schema and Edit Schema

Talend Open Studio Components

Database components
tHSQLDbRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. When the runding mode is HSQLDb In Process Persistent , this additional property is set as ifexists=true by default, meaning that the database will be automatically created when needed. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Commit every

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Global Variables This component offers the flexibility of the DB query and covers all possible SQL queries. Query: Indicates the query to be processed. This is available as a Flow variable. Returns a string For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

460

Talend Open Studio Components

Database components
tHSQLDbRow

Connections

Outgoing links (from one component to another): Row: Main; Reject; Iterate Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. Incoming links (from one component to another): Row: Main; Iterate Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

461

Database components
tInformixBulkExec

tInformixBulkExec
tInformixBulkExec Properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tInformixOutputBulkExec component, detailed in another section. The advantage of using two components is that data can be transformed before it is loaded in the database.
Component Family Databases/Informix

Function Purpose Basic settings

tInformixBulkExec executes Insert operations on the data supplied. tInformixBulkExec is a dedicated component which improves performance during Insert operations in Informix databases. Property type Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution Platform Use an existing connection Select the operating system you are using. Select this check box and click the relevant tInformixBulkExec component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Database server IP address. DB server listening port. Name of the database. Name of the schema.

462

Talend Open Studio Components

Database components
tInformixBulkExec

Username et Password Instance

DB user authentication data. Name of the Informix instance to be used. This information can generally be found in the SQL hosts file. Name of the table to be written. Note that only one table can be written at a time. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Action on table

Schema and Edit Schema

Informix Directory Data file

Indicate the access path to your Informix directory. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide.

Talend Open Studio Components

463

Database components
tInformixBulkExec

Action on data

On the data of the table defined, you can perform the following operations: Insert: Add new data to the table. If duplicates are found, the job stops. Update: Update the existing table data. Insert or update: Add data or update the existing data. Update or insert : Update the existing entries or create them if they do not already exist. Delete: Delete the entry data which corresponds to the input flow. You must specify at least one key upon which the Update and Delete operations are to be based. It is possible to define the columns which should be used as the key from the schema, from both the Basic Settings and the Advanced Settings, to optimise these operations.

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Character, string or regular expression which separates the fields. Select this check box to define the decimal separator in the Decimal separator field. Select the date format that you want to apply. Enter the numbere of rows to be processed before the commit. Enter the number of rows in error at which point the Job should stop.

Field terminated by Set DBMONEY Set DBDATE Rows Before Commit Bad Rows Before Abort

tStatCatcher Statistics Select this check box to colelct the log data at component level. Output Usage Where the output should go.

This component offers database query flexibility and covers all possible DB2 queries which may be required.

Related scenario
For a scenario in which tInformixBulkExec might be used, see: the tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database. the tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

464

Talend Open Studio Components

Database components
tInformixClose

tInformixClose
tInformixClose properties
Component Family Databases/Informix

Function Purpose Basic settings Advanced settings Usage Limitation

tInformixClose closes an active connection to a database. This component closes connection to Informix databases. Component list If there is more than one connection used in the Job, select tInformixConnection from the list.

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is generally used as an input component. It requires an output component. n/a

Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used along with tInformixConnection as the latter allows you to open a connection for the transaction which is underway. To see a scenario in which tInformixClose might be used, see tMysqlConnection.

Talend Open Studio Components

465

Database components
tInformixCommit

tInformixCommit
tInformixCommit properties
This component is closely related to tInformixConnection and tInformixRollback. They are generally used to execute transactions together.
Component Family Databases/Informix

Function Purpose Basic settings

tInformixCommit validates data processed in a job from a connected database. Using a single connection, make a global commit just once instead of commiting every row or batch of rows separately. This improves performance. Component list Close connection If there is more than one connection in the Job, select tInformixConnection from the list. This check box is selected by default. It means that the database conenction will be closed once the commit has been made. Clear the check box to continue using the connection once the component has completed its task. If you are using a Row > Main type connection to link tInformixCommit to your Job, your data will be committed row by row. If this is the case, do not select this check bx otherwise the conenction will be closed before the commit of your first row is finalized.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component is generally used along with Informix components, particularly tInformixConnection and tInformixRollback. n/a

Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used along with tInformixConnection as the latter allows you to open a connection for the transaction which is underway To see a scenario in which tInformixCommit might be used, see tMysqlConnection.

466

Talend Open Studio Components

Database components
tInformixConnection

tInformixConnection
tInformixConnection properties
This component is closely related to tInformixCommit and tInformixRollback. They are generally used along with tInformixConnection, with tInformixConnection opening the connection for the transaction.
Database Family Databases/Informix

Function Purpose Basic settings

tInformixConnection opens a connection to a database in order that a transaction may be made. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username et Password Instance Database server IP address. DB server listening port. Name of the database. Name of the schema DB user authentication data. Name of the Informix instance to be used. This information can generally be found in the SQL hosts file. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Clear this check box when the database is configured in NO_LOG. mode. If the check box is selected, you can choose whether to activate the Auto Commit option.

Additional JDBC parameters

Use or register a shared DB Connection

Advanced settings

Use Transaction

tStatCatcher Statistics Select this check box to collect the log data at a component level.

Talend Open Studio Components

467

Database components
tInformixConnection

Usage Limitation

This component is generally used with other Informix components, particularly tInformixCommit and tInformixRollback. n/a

Related scenario
For a scenario in which the tInformixConnection, might be used, see Scenario: Inserting data in mother/daughter tables.

468

Talend Open Studio Components

Database components
tInformixInput

tInformixInput
tInformixInput properties
Component family Databases/Informix

Function Purpose

tInformixInput reads a database and extracts fields based on a query. tInformixInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database DB server Username and Password Schema and Edit schema Database server IP address Listening port number of DB server. Name of the database Name of the database server DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Basic settings

Usage

This component covers all possible SQL queries for DB2 databases.

Talend Open Studio Components

469

Database components
tInformixInput

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also the tContextLoad Scenario: Dynamic context use in MySQL DB insert.

470

Talend Open Studio Components

Database components
tInformixOutput

tInformixOutput
tInformixOutput properties
Component family Databases/Informix

Function Purpose

tInformixOutput writes, updates, makes changes or suppresses entries in a database. tInformixOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database DB server Username and Password Table Action on table Database server IP address Listening port number of DB server. Name of the database Name of the database server DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again.. Clear a table: The table content is deleted.

Basic settings

Talend Open Studio Components

471

Database components
tInformixOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables.

Advanced settings

Additional JDBC parameters

472

Talend Open Studio Components

Database components
tInformixOutput

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at executions. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Additional Columns

Use field options Enable debug mode Use Batch Size Optimize the batch insertion

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. When selected, enables you to define the number of lines in each processed batch. Ensure the check box is selected, to optimize the insertion of batches of data.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Informix database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tInformixOutput related topics, see tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

473

Database components
tInformixOutputBulk

tInformixOutputBulk
tInformixOutputBulk properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tInformixOutputBulkExec component, detailed in another section. The advantage of using two components is that data can be transformed before it is loaded in the database.
Component family Databases/Informix

Function Purpose Basic settings

Writes a file composed of columns, based on a defined delimiter and on Informix standards. Prepares the file to be used as a parmameter in the INSERT query used to feed Informix databases. Property type Built-in or Repository. Built-in: No property data stored centrally Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to append new rows to the end of the file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit schema

Advanced settings

Row separator Field separator Set DBMONEY Set DBDATE

String (ex: \non Unix) to distinguish rows. Character, string or regular expression used to separate fields Select this box if you want to define the decimal separator in the corresponding field. Select the date format that you want to apply.

474

Talend Open Studio Components

Database components
tInformixOutputBulk

Create directory if not This check box is selected automatically. The option exists allows you to create a folder for the output file if it doesnt already exist. Custom the flush buffer size Select this box in order to customize the memory size used to store the data temporarily. In the Row number field enter the number of rows at which point the memory should be freed. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Encoding

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is generally used along with tInformixBulkExec. Together, they improve performance levels when adding data to an Informix database.

Related scenario
For a scenario in which tInformixOutputBulk might be used, see: the tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database. the tMysqlOutputBulkExec Scenario: Inserting data in MySQL database.

Talend Open Studio Components

475

Database components
tInformixOutputBulkExec

tInformixOutputBulkExec
tInformixOutputBulkExec properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
Component Family Databases/Informix

Function Purpose Basic settings

tInformixOutputBulkExec carries out Insert operations using the data provided. tInformixOutputBulkExec is a dedicated componant which improves performance during Insert operations in Informix databases. Property Type Built-in ou Repository. No properties stored centrally Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution platform Use an existing connection Select the operating system you are using. Select the check box and choose the appropriate tInformixConnection component from the list to use pre-defined connection parameters. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Database server IP address. DB server listening port. Name of the database. Name of the schema. DB user authentication data.

476

Talend Open Studio Components

Database components
tInformixOutputBulkExec

Instance

Name of the Informix instance to be used. This information can generally be found in the SQL hosts file. Name of the table to be written. Note that only one table can be written at a time and the table must already exist for the insert operation to be authorised. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table

Action on table

Schema and Edit schema

Informix Directory Data file

Indicate the access path to your Informix directory. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add rows to the end of the file. Select the operation you want to perform: Bulk insert Bulk update The details asked will be different according to the action chosen. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. String (ex: \non Unix) to distinguish rows.

Append Action on data

Advanced settings

Additional JDBC parameters

Row separator

Talend Open Studio Components

477

Database components
tInformixOutputBulkExec

Fields terminated by Set DBMONEY Set DBDATE Rows Before Commit Bad Rows Before Abort

Character, string or regular expression used to separate the fields Select this check box to define the decimal separator used in the corresponding field. Select the date format you want to apply. Enter the number of rows to be processed before the commit. Enter the number of rows in error at which point the Job should stop.

Create directory if not This check box is selected by default. It creates a exists directory to hold the output table if required. Custom the flush buffer size Select this box in order to customize the memory size used to store the data temporarily. In the Row number field enter the number of rows at which point the memory should be freed. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Encoding

tStatCatcher Statistics Select this check box to collect the log data at a component level. Output Usage Limitation Where the output should go.

This component is generally used when no particular transformation is required on the data to be inserted in the database. n/a

Related scenario
For a scenario in which tInformixOutputBulkExec might be used, see: the tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database. the tMysqlOutputBulkExec Scenario: Inserting data in MySQL database.

478

Talend Open Studio Components

Database components
tInformixRollback

tInformixRollback
tInformixRollback properties
This component is closely related to tInformixCommit and tInformixConnection. They are generally used together to execute transactions.
Famille de composant Databases/Informix

Function Purpose Basic settings

tInformixRollback cancels transactions in connected databases. This component prevents involuntary transaction commits. Component list Select the tInformixConnection component from the list if you plan to add more than one connection to the Job. Clear this checkbox if you want to continue to use the connection once the component has completed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect the log data at a component level. This component must be used with other Informix components, particularly tInformixConnection and tInformixCommit. n/a

Related Scenario
For a scenario in which tInformixRollback might be used, see the tMysqlRollback Scenario: Rollback from inserting data in mother/daughter tables.

Talend Open Studio Components

479

Database components
tInformixRow

tInformixRow
tInformixRow properties
Component family Databases/Informix

Function

tInformixRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tInformixRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tInformixConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

480

Talend Open Studio Components

Database components
tInformixRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

481

Database components
tInformixRow

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

482

Talend Open Studio Components

Database components
tInformixSCD

tInformixSCD
The tInformixSCD component belongs to two different families: Business Intelligence and Databases. For further information, see tInformixSCD.

Talend Open Studio Components

483

Database components
tInformixSP

tInformixSP
tInformixSP properties
Component Family Databases/Informix

Function Purpose Basic settings

tInformixSP calls procedures stored in a database. tInformixSP allows you to centralise multiple and complex queries in a database and enables you to call them more easily. Property type Built-in ou Repository. Built-in: No properties stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select the check box and choose the appropriate tInformixConnection component from the list to use pre-defined connection parameters. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Instance Database server IP address. Listening port number of DB server. Name of the database. Name of the schema. User authentication information. Name of the Informix instance to be used. This information can generally be found in the SQL hosts file.

484

Talend Open Studio Components

Database components
tInformixSP

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SP Name Is Function / Return result in

Enter the exact name of the stored procedure (SP). Select this check box if only one value must be returned. From the list, select the the schema column upon which the value to be obtained is based. Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are paramaters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records. Clear this check box if the database is configured in the NO_LOG mode. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

Parameters

Use Transaction Advanced settings Additional JDBC parameters

tStatCatcher Statistics Select this check box to collect log data at a component level. Usage Limitation This is an intermediary component? It can also be used as an entry component. In this case, only the entry parameters are authorized. The stored procedure syntax must correspond to that of the database.

Related scenario
For a scenarion in which tInformixSP may be used, see: the tMysqlSP Scenario: Finding a State Label using a stored procedure.
Talend Open Studio Components 485

Database components
tInformixSP

the tOracleSP Scenario: Checking number format using a stored procedure Also, see the tPostgresqlCommit if you want to analyse a set of records in a table or SQL query.

486

Talend Open Studio Components

Database components
tIngresClose

tIngresClose
tIngresClose properties
Component family Databases/Ingres

Function Purpose Basic settings

tIngresClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tIngresConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Ingres components, especially with tIngresConnection and tIngresCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

487

Database components
tIngresCommit

tIngresCommit
tIngresCommit Properties
This component is closely related to tIngresConnection and tIngresRollback. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Ingres

Function Purpose

Validates the data processed through the Job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tIngresConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tIngresCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Ingres components, especially with tIngresConnection and tIngresRollback. n/a

Related scenario
For tIngresCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

488

Talend Open Studio Components

Database components
tIngresConnection

tIngresConnection
tIngresConnection Properties
This component is closely related to tIngresCommit and tIngresRollback. It usually does not make much sense to use one of these without using a tIngresConnection component to open a connection for the current transaction.
Component family Databases/Ingres

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Server Port Database Username and Password Use or register a shared DB Connection Database server IP address. Listening port number of DB server. Name of the database DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Ingres components, especially with tIngresCommit and tIngresRollback. n/a

Related scenarios
For tIngresConnection related scenario, see Scenario: Inserting data in mother/daughter tables.

Talend Open Studio Components

489

Database components
tIngresInput

tIngresInput
tIngresInput properties
Component family Databases/Ingres

Function Purpose

tIngresInput reads a database and extracts fields based on a query. tIngresInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Server Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Basic settings

Advanced settings

Trim all the String/Char columns Trim column

Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

490

Talend Open Studio Components

Database components
tIngresInput

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Ingres databases.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also, the tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

491

Database components
tIngresOutput

tIngresOutput
tIngresOutput properties
Component family Databases/Ingres

Function Purpose

tIngresOutput writes, updates, makes changes or suppresses entries in a database. tIngresOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Username and Password Table Action on table Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted.

Basic settings

492

Talend Open Studio Components

Database components
tIngresOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. 493

Advanced settings

Commit every

Additional Columns

Talend Open Studio Components

Database components
tIngresOutput

Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Ingres database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

494

Talend Open Studio Components

Database components
tIngresRollback

tIngresRollback
tIngresRollback properties
This component is closely related to tIngresCommit and tIngresConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Ingres

Function Purpose Basic settings

tIngresRollback cancels the transaction committed in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tIngresConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Ingres components, especially with tIngresConnection and tIngresCommit. n/a

Related scenarios
For tIngresRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

Talend Open Studio Components

495

Database components
tIngresRow

tIngresRow
tIngresRow properties
Component family Databases/Ingres

Function

tIngresRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tIngresRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address. Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Purpose

Basic settings

496

Talend Open Studio Components

Database components
tIngresRow

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Advanced Settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

497

Database components
tIngresSCD

tIngresSCD
tIngresSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tIngresSCD.

498

Talend Open Studio Components

Database components
tInterbaseClose

tInterbaseClose
tInterbaseClose properties
Component family Databases/Interbase

Function Purpose Basic settings

tInterbaseClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tInterbaseConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Interbase components, especially with tInterbaseConnection and tInterbaseCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

499

Database components
tInterbaseCommit

tInterbaseCommit
tInterbaseCommit Properties
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually doesnt make much sense to use JDBC components independently in a transaction.
Component family Databases/Interbase

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tInterbaseConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tInterbaseCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Interbase components, especially with the tInterbaseConnection and tInterbaseRollback components. n/a

Related scenario
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually doesnt make much sense to use JDBC components without using the tInterbaseConnection component to open a connection for the current transaction. For tInterbaseCommit related scenario, see tMysqlConnection on page 594.

500

Talend Open Studio Components

Database components
tInterbaseConnection

tInterbaseConnection
tInterbaseConnection properties
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does not make much sense to use one of these without using a tInterbaseConnection to open a connection for the current transaction.
Component family Databases/Interbase

Function Purpose Basic settings

tInterbaseConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host name Database Username and Password Use or register a shared DB Connection Database server IP address. Name of the database. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with Interbase components, especially with tInterbaseCommit and tInterbaseRollback. n/a

Related scenarios
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does not make much sense to use one of these without using a tInterbaseConnection component to open a connection for the current transaction.

Talend Open Studio Components

501

Database components
tInterbaseConnection

For tInterbaseConnection related scenario, see tMysqlConnection on page 594.

502

Talend Open Studio Components

Database components
tInterbaseInput

tInterbaseInput
tInterbaseInput properties
Component family Databases/Interbase

Function Purpose

tInterbaseInput reads a database and extracts fields based on a query. tInterbaseInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Database Username and Password Schema and Edit Schema Database server IP address Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Basic settings

Advanced settings

Trim all the String/Char columns Trim column

Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Talend Open Studio Components

503

Database components
tInterbaseInput

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Interbase databases.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also the related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

504

Talend Open Studio Components

Database components
tInterbaseOutput

tInterbaseOutput
tInterbaseOutput properties
Component family Databases/Interbase

Function Purpose

tInterbaseOutput writes, updates, makes changes or suppresses entries in a database. tInterbaseOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Database Username and Password Table Action on table Database server IP address Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted.

Basic settings

Talend Open Studio Components

505

Database components
tInterbaseOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Clear data in table Schema and Edit Schema

Wipes out data from the selected table before action. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution.

Advanced settings

Commit every

506

Talend Open Studio Components

Database components
tInterbaseOutput

Additional Columns

This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Use field options Enable debug mode

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Interbase database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

507

Database components
tInterbaseRollback

tInterbaseRollback
tInterbaseRollback properties
This component is closely related to tInterbaseCommit and tInterbaseConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Interbase

Function Purpose Basic settings

tInterbaseRollback cancels the transaction committed in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tInterbaseConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Interbase components, especially with tInterbaseConnection and tInterbaseCommit. n/a

Related scenarios
For tInterbaseRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

508

Talend Open Studio Components

Database components
tInterbaseRow

tInterbaseRow
tInterbaseRow properties
Component family Databases/Interbase

Function

tInterbaseRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tInterbaseRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tInterbaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Database Username and Password Schema and Edit Schema Database server IP address Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

Talend Open Studio Components

509

Database components
tInterbaseRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased Commit every Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment
510 Talend Open Studio Components

Database components
tInterbaseRow

tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

511

Database components
tJavaDBInput

tJavaDBInput
tJavaDBInput properties
Component family Databases/JavaDB

Function Purpose

tJavaDBInput reads a database and extracts fields based on a query. tJavaDBInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Framework Database DB root path Username and Password Schema and Edit schema Select your Java database framework on the list Name of the database Browse to your database root. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Basic settings

Advanced settings

Trim all the String/Char columns Trim column

Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

512

Talend Open Studio Components

Database components
tJavaDBInput

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL database queries.

Related scenarios
For related topics, see the tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. See also the related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

513

Database components
tJavaDBOutput

tJavaDBOutput
tJavaDBOutput properties
Component family Databases/JavaDB

Function Purpose

tJavaDBOutput writes, updates, makes changes or suppresses entries in a database. tJavaDBOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Framework Database DB root path Username and Password Table Action on table Select your Java database framework on the list Name of the database Browse to your database root. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted.

Basic settings

514

Talend Open Studio Components

Database components
tJavaDBOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. 515

Advanced settings

Commit every

Additional Columns

Talend Open Studio Components

Database components
tJavaDBOutput

Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Java database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMysqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

516

Talend Open Studio Components

Database components
tJavaDBRow

tJavaDBRow
tJavaDBRow properties
Component family Databases/JavaDB

Function

tJavaDBRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tJavaDBRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Framework Database DB root path Username and Password Schema and Edit Schema Select your Java database framework on the list Name of the database Browse to your database root. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Purpose

Basic settings

Talend Open Studio Components

517

Database components
tJavaDBRow

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

518

Talend Open Studio Components

Database components
tJDBCColumnList

tJDBCColumnList
tJDBCColumnList Properties
Component family Databases/JDBC

Function Purpose Basic settings

Iterates on all columns of a given table through a defined JDBC connection. Lists all column names of a given JDBC table. Component list Select the tJDBCConnection component in the list if more than one connection are planned for the current Job. Enter the name of the tabe.

Table name Usage Limitation

This component is to be used along with JDBC components, especially with tJDBCConnection. n/a

Related scenario
For tJDBCColumnList related scenario, see Scenario: Iterating on a DB table and listing its column names.

Talend Open Studio Components

519

Database components
tJDBCClose

tJDBCClose
tJDBCClose properties
Component family Databases/JDBC

Function Purpose Basic settings

tJDBCClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tJDBCConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with JDBC components, especially with tJDBCConnection and tJDBCCommit. n/a

Related scenario
No scenario is available for this component yet.

520

Talend Open Studio Components

Database components
tJDBCCommit

tJDBCCommit
tJDBCCommit Properties
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesnt make much sense to use JDBC components independently in a transaction.
Component family Databases/JDBC

Function Purpose

Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tJDBCConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tJDBCCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with JDBC components, especially with the tJDBCConnection and tJDBCRollback components. n/a

Related scenario
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesnt make much sense to use JDBC components without using the tJDBCConnection component to open a connection for the current transaction. For tJDBCCommit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

521

Database components
tJDBCConnection

tJDBCConnection
tJDBCConnection Properties
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesnt make much sense to use one of JDBC components without using the tJDBCConnection component to open a connection for the current transaction.
Component family Databases/JDBC

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated.

JDBC URL

Enter the JDBC URL to connect to the desired DB. For example, enter: jdbc:mysql://IP address/database name to connect to a mysql database. Click the plus button under the table to add lines of the count of your need for the purpose of loading several JARs. Then on each line, click the three dot button to open the Select Module wizard from which you can select a driver JAR of your interest for each line. Enter the driver class related o your connection. For example, enter com.mysql.jdbc.Driver as a driver class to connect to a mysql database. Enter your DB authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to display the Auto Commit check box. Select it to activate auto commit mode. Once you clear the Use Auto-Commit check box, the auto-commit statement will be removed from the codes.

Driver JAR

Driver Class

Username and Password Use or register a shared DB Connection

Advanced settings

Use Auto-Commit

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is to be used along with JDBC components, especially with the tJDBCCommit and tJDBCRollback components. n/a

522

Talend Open Studio Components

Database components
tJDBCConnection

Related scenario
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesnt make much sense to use one of JDBC components without using the tJDBCConnection component to open a connection for the current transaction. For tJDBCConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

523

Database components
tJDBCInput

tJDBCInput
tJDBCInput properties
Component family Databases/JDBC

Function Purpose

tJDBC reads any database using a JDBC API connection and extracts fields based on a query. tJDBC executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tJDBCConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. JDBC URL Type in the database location path

Basic settings

524

Talend Open Studio Components

Database components
tJDBCInput

Driver JAR

Click the plus button under the table to add lines of the count of your need for the purpose of loading several JARs. Then on each line, click the three dot button to open the Select Module wizard from which you can select a driver JAR of your interest for each line. Type in the Class name to be pointed to in the driver. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Class Name Username and Password Schema and Edit schema

Table Name

Type in the name of the table.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Use cursor Trim all the String/Char columns Trim column When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for any database using a JDBC connection.

Related scenarios
Related topics in tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

525

Database components
tJDBCOutput

tJDBCOutput
tJDBCOutput properties
Component family Databases/JDBC

Function Purpose Basic settings

tJDBCOutput writes, updates, makes changes or suppresses entries in any type of database connected to a JDBC API. tJDBCOutput executes the action defined on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tJDBCConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. JDBC URL Type in the database location path

526

Talend Open Studio Components

Database components
tJDBCOutput

Driver JAR

Click the plus button under the table to add lines of the count of your need for the purpose of loading several JARs. Then on each line, click the three dot button to open the Select Module wizard from which you can select a driver JAR of your interest for each line. Type in the Class name to be pointed to in the driver. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Class Name Username and Password Table Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Talend Open Studio Components

527

Database components
tJDBCOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options Enable debug mode Use Batch Size

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. When selected, enables you to define the number of lines in each processed batch.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a JDBC database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tJDBCOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

528

Talend Open Studio Components

Database components
tJDBCRollback

tJDBCRollback
tJDBCRollback properties
This component is closely related to tJDBCCommit and tJDBCConnection. It usually doesnt make much sense to use JDBC components independently in a transaction.
Component family Databases/JDBC

Function Purpose Basic settings

Cancels the transaction committed in the connected DB. Avoid commiting part of a transaction accidentally. Component list Select the tJDBCConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with JDBC components, especially with tJDBCConnection and tJDBCCommit components. n/a

Related scenario
This component is closely related to tJDBCConnection and tJDBCCommit. It usually doesnt make much sense to use JDBC components without using the tJDBCConnection component to open a connection for the current transaction. For tJDBCRollback related scenario, see tMysqlRollback on page 636.

Talend Open Studio Components

529

Database components
tJDBCRow

tJDBCRow
tJDBCRow properties
Component family Databases/JDBC

Function

tJDBCRow is the component for any type database using a JDBC API. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tJDBCRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Use an existing connection Select this check box and click the relevant tJDBCConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. JDBC URL Driver JAR Type in the database location path Click the plus button under the table to add lines of the count of your need for the purpose of loading several JARs. Then on each line, click the three dot button to open the Select Module wizard from which you can select a driver JAR of your interest for each line. Type in the Class name to be pointed to in the driver. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Purpose

Basic settings

Class Name Username and Password Schema and Edit Schema

530

Talend Open Studio Components

Database components
tJDBCRow

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased Commit every Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query for any database using a JDBC connection and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment
Talend Open Studio Components 531

Database components
tJDBCRow

tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

532

Talend Open Studio Components

Database components
tJDBCSP

tJDBCSP
tJDBCSP Properties
Component family Databases/JDBC

Function Purpose Basic settings

tJDBCSP calls the specified database stored procedure. tJDBCSP offers a convenient way to centralize multiple or complex queries in a database and call them easily. JDBC URL Driver JAR Type in the database location path Click the plus button under the table to add lines of the count of your need for the purpose of loading several JARs. Then on each line, click the three dot button to open the Select Module wizard from which you can select a driver JAR of your interest for each line. Type in the Class name to be pointed to in the driver. DB user authentication data. In SP principle, the schema is an input parameter. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes Built-in. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SP Name Is Function / Return result in Type in the exact name of the Stored Procedure. Select this check box , if a value only is to be returned. Select on the list the schema column, the value to be returned is based on.

Class Name Username and Password Schema and Edit Schema

Talend Open Studio Components

533

Database components
tJDBCSP

Parameters

Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are paramaters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Usage Limitation

This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax.

Related scenario
For related scenarios, see: tMysqlSP Scenario: Finding a State Label using a stored procedure. tOracleSP Scenario: Checking number format using a stored procedure Check as well the tParseRecordSet component if you want to analyze a set of records from a database table or DB query and return single records.

534

Talend Open Studio Components

Database components
tJDBCTableList

tJDBCTableList
tJDBCTableList Properties
Component family Databases/JDBC

Function Purpose Basic settings

Iterates on a set of table names through a defined JDBC connection. Lists the names of a given set of JDBC tables using a select statement based on a Where clause. Component list Select the tJDBCConnection component in the list if more than one connection are planned for the current Job.

Where clause for table Enter the Where clause to identify the tables to iterate name selection on. Usage Limitation This component is to be used along with JDBC components, especially with tJDBCConnection. n/a

Related scenario
For tJDBCTableList related scenario, see Scenario: Iterating on a DB table and listing its column names.

Talend Open Studio Components

535

Database components
tLDAPAttributesInput

tLDAPAttributesInput
tLDAPAttributesInput Properties
Component family Databases/LDAP

Function Purpose

tLDAPAttributesInput analyses each object found via the LDAP query and lists a collection of attributes associated with the object. tLDAPAttributesInput executes an LDAP query based on the given filter and corresponding to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Base DN Protocol LDAP Directory server IP address. Listening port number of server. Path to users authorised tree leaf. Select the protocol type on the list. LDAP : no encryption is used LDAPS: secured LDAP. When this option is chosen, the Advanced CA check box appears. Once selected, the advanced mode allows you to specify the directory and the keystore password of the certificate file for storing a specific CA. However, you can still deactivate this certificate validation by selecting the Trust all certs check box. TLS: certificate is used. When this option is chosen, the Advanced CA check box appears and is used the same way as that of the LDAPS type. Select the Authentication check box if LDAP login is required. Note that the login must match the LDAP syntax requirement to be valid. e.g.: cn=Directory Manager. Type in the filter as expected by the LDAP directory db. Type in the value separator in multi-value fields. Select the option on the list. Never improves search performance if you are sure that no alias is to be dereferenced. By default, Always is to be used: Always: Always dereference aliases Never: Never dereferences aliases. Searching:Dereferences aliases only after name resolution. Finding: Dereferences aliases only during name resolution

Basic settings

Authentication User and Password

Filter Multi valued field separator Alias dereferencing

536

Talend Open Studio Components

Database components
tLDAPAttributesInput

Referral handling

Select the option on the list: Ignore: does not handle request redirections Follow:does handle request redirections Fill in a limit number of records to be read If needed. Fill in a timeout period for the directory. access Specify the number of entries returned at a time by the LDAP server. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. As this component is intended to list the attributes associated with a LDAP object, its schema is then pre-defined. You should retain these established columns, even though you may need to add some new columns. Hence you should use the Built-in mode. The pre-defined schema lists: - objectclass: list of object classes - mandatoryattributes: list of mandatory attributes to these classes - optionalattributes: list of optional attributes to these classes - objectattributes: list of attributes that are essential for the analysed object.

Limit Time Limit Paging Die on error

Schema and Edit Schema

Advanced settings

Class Definition Root tStatCatcher Statistics

Specify the root of the object class definition namespace. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component covers all possible LDAP queries. Note: Press Ctrl + Space bar to access the global variable list, including the GetResultName variable to retrieve automatically the relevant Base

Talend Open Studio Components

537

Database components
tLDAPAttributesInput

Related scenario
The tLDAPAttributesInput component follows the usage similar to that of tLDAPInput. Hence for tLDAPInput related scenario, see Scenario: Displaying LDAP directorys filtered content.

538

Talend Open Studio Components

Database components
tLDAPInput

tLDAPInput
tLDAPInput Properties
Component family Databases/LDAP

Function Purpose

tLDAPInput reads a directory and extracts data based on the defined filter. tLDAPInput executes an LDAP query based on the given filter and corresponding to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Base DN LDAP Directory server IP address. Listening port number of server. Path to the users authorised tree leaf. To retrieve the full DN information, enter a field named DN in the schema, in either upper case or lower case. Select the protocol type on the list. LDAP : no encryption is used LDAPS: secured LDAP. When this option is chosen, the Advanced CA check box appears. Once selected, the advanced mode allows you to specify the directory and the keystore password of the certificate file for storing a specific CA. However, you can still deactivate this certificate validation by selecting the Trust all certs check box. TLS: certificate is used When this option is chosen, the Advanced CA check box appears and is used the same way as that of the LDAPS type. Select the Authentication check box if LDAP login is required. Note that the login must match the LDAP syntax requirement to be valid. e.g.: cn=Directory Manager. Type in the filter as expected by the LDAP directory db. Type in the value separator in multi-value fields.

Basic settings

Protocol

Authentication User and Password

Filter Multi valued field separator

Talend Open Studio Components

539

Database components
tLDAPInput

Alias dereferencing

Select the option on the list. Never improves search performance if you are sure that no alias is to be dereferenced. By default, Always is to be used: Always: Always dereference aliases Never: Never dereferences aliases. Searching:Dereferences aliases only after name resolution. Finding: Dereferences aliases only during name resolution Select the option on the list: Ignore: does not handle request redirections Follow:does handle request redirections Fill in a limit number of records to be read If needed. Fill in a timeout period for the directory. access Specify the number of entries returned at a time by the LDAP server. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Referral handling

Limit Time Limit Paging Die on error

Schema and Edit schema

Usage

This component covers all possible LDAP queries. Note: Press Ctrl + Space bar to access the global variable list, including the GetResultName variable to retrieve automatically the relevant Base.

Scenario: Displaying LDAP directorys filtered content


The Job described below simply filters the LDAP directory and displays the result on the console.

Drop the tLDAPInput component along with a tLogRow from the Palette to the design workspace.

540

Talend Open Studio Components

Database components
tLDAPInput

Set the tLDAPInput properties. Set the Property type on Repository if you stored the LDAP connection details in the Metadata Manager in the Repository. Then select the relevant entry on the list. In Built-In mode, fill in the Host and Port information manually. Host can be the IP address of the LDAP directory server or its DNS name. No particular Base DN is to be set.

Then select the relevant Protocol on the list. In this example: a simple LDAP protocol is used. Select the Authentication check box and fill in the login information if required to read the directory. In this use case, no authentication is needed. In the Filter area, type in the command, the data selection is based on. In this example, the filter is: (&(objectClass=inetorgperson)&(uid=PIERRE DUPONT)). Fill in Multi-valued field separator with a comma as some fields may hold more than one value, separated by a comma. As we dont know if some aliases are used in the LDAP directory, select Always on the list. Set Ignore as Referral handling. Set the limit to 100 for this use case.

Talend Open Studio Components

541

Database components
tLDAPInput

Set the Schema as required by your LDAP directory. In this example, the schema is made of 6 columns including the objectClass and uid columns which get filtered on. In the tLogRow component, no particular setting is required.

Only one entry of the directory corresponds to the filter criteria given in the tLDAPInput component.

542

Talend Open Studio Components

Database components
tLDAPOutput

tLDAPOutput
tLDAPOutput Properties
Component family Databases/LDAP

Function Purpose

tLDAPOutput writes into an LDAP directory. tLDAPOutput executes an LDAP query based on the given filter and corresponding to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Base DN Protocol LDAP Directory server IP address. Listening port number of server. Path to users authorized tree leaf. Select the protocol type on the list. LDAP : no encryption is used LDAPS: secured LDAP. When this option is chosen, the Advanced CA check box appears. Once selected, the advanced mode allows you to specify the directory and the keystore password of the certificate file for storing a specific CA. However, you can still deactivate this certificate validation by selecting the Trust all certs check box. TLS: certificate is used When this option is chosen, the Advanced CA check box appears and is used the same way as that of the LDAPS type. Fill in the User and Password as required by the directory Note that the login must match the LDAP syntax requirement to be valid. e.g.: cn=Directory Manager. Character, string or regular expression to separate data in a multi-value field.

Basic settings

User and Password

Multi valued field separator

Talend Open Studio Components

543

Database components
tLDAPOutput

Alias dereferencing

Select the option on the list. Never improves search performance if you are sure that no aliases is to be dereferenced. By default, Always is to be used: Always: Always dereference aliases Never: Never dereferences aliases. Searching:Dereferences aliases only after name resolution. Finding: Dereferences aliases only during name resolution Select the option on the list: Ignore: does not handle request redirections Follow:does handle request redirections Select the editing mode on the list: Add: add a value in a multi-value attribute, Insert: insert new data, Updata: updates the existing data, Delete: remove the selected data from the directory, Insert or Update: insert new data or update existing ones. Select in the list the type of the LDAP input entity used. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Referral handling

Insert mode

DN Column Name Schema and Edit schema

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings Usage

tStatCatcher Statistics

This component covers all possible LDAP queries. Note: Press Ctrl + Space bar to access the global variable list, including the GetResultName variable to retrieve the relevant DN Base automatically. This component allows you to carry out actions on a table or on the data of a table in an database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Scenario: Editing data in a LDAP directory


The following scenario describes a Job that reads an LDAP directory, updates the email of a selected entry and displays the output before writing the LDAP directory. To keep it simple, no alias

544

Talend Open Studio Components

Database components
tLDAPOutput

dereferencing nor referral handling is performed. This scenario is based on LDAPInputs Scenario: Displaying LDAP directorys filtered content. The result returned was a single entry, related to an organisational person, whom email is to be updated.

Drop the tLDAPInput, tLDAPOutput, tMap and tLogRow components from the Palette to the design workspace. Connect the input component to the tMap then to the tLogRow and to the output component. In the tLDAPInput Component view, set the connection details to the LDAP directory server as well as the filter as described in Scenario: Displaying LDAP directorys filtered content. Change the schema to make it simpler, by removing the unused fields: dc, ou, objectclass.

Then open the mapper to set the edit to be carried out. Drag & drop the uid column from the input table to the output as no change is required on this column.

In the Expression field of the dn column (output), fill in with the exact expression expected by the LDAP server to reach the target tree leaf and allow directory writing on the condition that you havent set it already in the Base DN field of the tLDAPOutput component. In this use case, the GetResultName global variable is used to retrieve this path automatically. Press Ctrl+Space bar to access the variable list and select tLDAPInput_1_RESULT_NAME. In the mail columns expression field, type in the new email that will overwrite the current data in the LDAP directory. In this example, we change to Pierre.Dupont@talend.com. Click OK to validate the changes. The tLogRow component doesnt need any particular setting.

Talend Open Studio Components

545

Database components
tLDAPOutput

Then select the tLDAPOutput component to set the directory writing properties.

Set the Port and Host details manually if they arent stored in the Repository. In Base DN field, set the highest tree leaf you have the rights to access. If you havent set previously the exact and full path of the target DN you want to access, then fill in it here. In this use case, the full DN is provided by the dn output from the tMap component, therefore only the highest accessible leaf is given: o=directoryRoot. Select the relevant protocol to be used: LDAP for this example. Then fill in the User and Password as expected by the LDAP directory. Use the default setting of Alias Dereferencing and Referral Handling fields, respectively Always and Ignore. The Insert mode for this use case is Update (the email address). The schema was provided by the previous component through the propagation operation. Save the Job and execute.

The output shows the following fields: dn, uid and mail as defined in the Job.

546

Talend Open Studio Components

Database components
tLDAPRenameEntry

tLDAPRenameEntry
tLDAPRenameEntry properties
Component family Databases/LDAP

Function Purpose Basic settings

tLDAPRenameEntry renames entries in an LDAP directory. The tLDAPRenameEntry component rename ones or more entries in a specific LDAP directory. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Base DN Protocol LDAP directory server IP address. Number of the listening port of the server. Path to users authorized tree leaf. Select the protocol type on the list. LDAP: no encryption is used, LDAPS: secured LDAP, TLS: certificate is used. Fill in user authentication information. Note that the login must match the LDAP syntax requirement to be valid. e.g.: cn=Directory Manager. Select the option on the list. Never improves search performance if you are sure that no alias is to be dereferenced. By default, Always is to be used: Always: Always dereference aliases, Never: Never dereferences aliases, Searching: Dereferences aliases only after name resolution, Finding: Dereferences aliases only during name resolution. Select the option on the list: Ignore: does not handle request redirections, Follow: does handle request redirections. Select from the list the schema column that holds the old DN (Previous DN) and the column that holds the new DN (New DN). A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

User and Password

Alias dereferencing

Referrals handling

Previous DN and New DN Schema and Edit Schema

Talend Open Studio Components

547

Database components
tLDAPRenameEntry

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. Select this check box to collect log data at the component level.

Advanced settings Usage

tStatCatcher Statistics

This component covers all possible LDAP queries. It is usually used as a one-component subjob but you can use it with other components as well. Note: Press Ctrl + Space bar to access the global variable list, including the GetResultName variable to retrieve automatically the relevant DN Base.

Related scenarios
For use cases in relation with tLDAPRenameEntry, see the following scenarios: tLDAPInput Scenario: Displaying LDAP directorys filtered content. tLDAPOutput Scenario: Editing data in a LDAP directory.

548

Talend Open Studio Components

Database components
tMaxDBInput

tMaxDBInput
tMaxDBInput properties
Component Family Databases/MaxDB

Function Purpose

tMaxDBInput reads a database and extracts fields based on a query. tMaxDBInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host name Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Table name Type in the table name.

Basic settings

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field.

Talend Open Studio Components

549

Database components
tMaxDBInput

Guess schema Advanced settings Trim all the String/Char columns Trim column

Click the Guess schema button to retrieve the table schema. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenario
For a related scenario, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable Scenario: Writing dynamic columns from a MySQL database to an output file.

550

Talend Open Studio Components

Database components
tMaxDBOutput

tMaxDBOutput
tMaxDBOutput properties
Component Family Databases/MaxDB

Function Purpose

tMaxDBOutput writes, updates, makes changes or suppresses entries in a database. tMaxDBOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Username and Password Table Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Basic settings

Action on table

Talend Open Studio Components

551

Database components
tMaxDBOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing.

Advanced settings

Commit every

Additional Columns

552

Talend Open Studio Components

Database components
tMaxDBOutput

Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenario
For a related scenario, see: Scenario: Displaying DB output from tDBOutput. Scenario 1: Adding a new column and altering data in a DB table from tMySQLOutput.

Talend Open Studio Components

553

Database components
tMaxDBRow

tMaxDBRow
tMaxDBRow properties
Component Family Databases/MaxDB

Function

tMaxDBRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tMaxDBRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database. DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Table name Type in the table name.

Purpose

Basic settings

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link.

Die on error

554

Talend Open Studio Components

Database components
tMaxDBRow

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

Related scenario
For a related scenario, see: Scenario 1: Displaying selected data from DB table Scenario 2: Using StoreSQLQuery variable

Talend Open Studio Components

555

Database components
tMSSqlBulkExec

tMSSqlBulkExec
tMSSqlBulkExec properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tMSSqlOutputBulkExec component, detailed in a separate section. The advantage of using a two step process is that the data can be transformed before it is loaded in the database.
Component family Databases/MSSql

Function Purpose Basic settings

Executes the Insert action on the provided data. As a dedicated component, tMSSqlBulkExec offers gains in performance while carrying out the Insert operations to a MSSql database Property type Either Built-in or Repository. Built-in: No property data is stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database. Name of the schema. DB user authentication data.

556

Talend Open Studio Components

Database components
tMSSqlBulkExec

Table

Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Action on table

Schema and Edit Schema

Remote File Name

Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select the action to be carried out Bulk insert Bulk update Bcp query out Depending on the action selected, the requied information varies. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Character, string or regular expression to separate fields. Character, string or regular expression to separate rows. Type in the number of the row where the action should start This value can be any of the followings: OEM (by default value) ACP RAW User-defined 557

Advanced settings

Action

Bulk insert & Bulk update

Additional JDBC parameters

Fields terminated Rows terminated First row Code page

Talend Open Studio Components

Database components
tMSSqlBulkExec

Data file type Output

Select the type of data being handled. Select the type of output for the standard output of the MSSql database: to console, to global variable.

tStatCatcher Statistics Select this check box to collect log data at the component level. Bcp query out Fields terminated Rows terminated Data file type Output Character, string or regular expression to separate fields. Character, string or regular expression to separate rows. Select the type of data being handled. Select the type of output to pass the processed data onto: to console: data is viewed in the Log view. to global variable: data is put in output variable linked to a tsystem component

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tMSSqlOutputBulk component. Used together, they can offer gains in performance while feeding a MSSql database.

Related scenarios
For use cases in relation with tMSSqlBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

558

Talend Open Studio Components

Database components

Talend Open Studio Components

559

Database components
tMSSqlColumnList

tMSSqlColumnList
tMSSqlColumnList Properties
Component family Databases/MS SQL

Function Purpose Basic settings

Iterates on all columns of a given table through a defined MS SQL connection. Lists all column names of a given MSSql table. Component list Select the tMSSqlConnection component in the list if more than one connection are planned for the current job. Enter the name of the tabe.

Table name Usage Limitation

This component is to be used along with MSSql components, especially with tMSSqlConnection. n/a

Related scenario
For tMSSqlColumnList related scenario, see Scenario: Iterating on a DB table and listing its column names.

560

Talend Open Studio Components

Database components
tMSSqlClose

tMSSqlClose
tMSSqlClose properties
Component family Databases/MSSql

Function Purpose Basic settings

tMssqlClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tMssqlConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with tMssql components, especially with tMssqlConnection and tMssqlCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

561

Database components
tMSSqlCommit

tMSSqlCommit

tMSSqlCommit properties
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/MSSql

Function Purpose

tMSSqlCommit validates the data processed through the job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tMSSqlConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tMSSqlCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close connection

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is to be used along with Mssql components, especially with tMSSqlConnection and tMSSqlRollback components. n/a

Usage Limitation

Related scenarios
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not make much sense to use one of these without using a tMSSqlConnection component to open a connection for the current transaction. For a tMSSqlCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

562

Talend Open Studio Components

Database components
tMSSqlConnection

tMSSqlConnection

tMSSqlConnection properties
This component is closely related to tMSSqlCommit and tMSSqlRollback. Both components are usually used with a tMSSqlConnection component to open a connection for the current transaction.
Component family Databases/MSSQL

Function Purpose Basic settings

tMSSqlConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Schema Database Username and Password Additional JDBC parameters Use or register a shared DB Connection Database server IP address. Listening port number of DB server. Schema name. Name of the database. DB user authentication data. Specify additional connection properties for the DB connection you are creating. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with MSSql components, especially with tMSSqlCommit and tMSSqlRollback. n/a

Talend Open Studio Components

563

Database components
tMSSqlConnection

Related scenarios
This component is closely related to tMSSqlCommit and tMSSqlRollback. It usually does not make much sense to use one if these without using a tMSSqlConnection component to open a connection for the current transaction. For tMSSqlConnection related scenario, see Scenario: Inserting data in mother/daughter tables.

564

Talend Open Studio Components

Database components
tMSSqlInput

tMSSqlInput
tMSSqlInput properties
Component family Databases/MS SQL Server tMSSqlInput reads a database and extracts fields based on a query. tMSSqlInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Database server IP address. Listening port number of DB server. Name of the database. Name of the schema.

Function Purpose

Basic settings

Talend Open Studio Components

565

Database components
tMSSqlInput

Username and Password Schema and Edit Schema

DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for MS SQL server databases.

Related scenarios
Related topics in tDBInput and tMysqlInput scenarios: Scenario 1: Displaying selected data from DB table Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Related topic in tContextLoad Scenario: Dynamic context use in MySQL DB insert.

566

Talend Open Studio Components

Database components
tMSSqlLastInsertId

tMSSqlLastInsertId
tMSSqlLastInsertId properties
Component Family Databases/MS SQL server tMSSqlLastInsertId displays the last IDs added to a table from a MSSql specified connection. tMSSqlLastInsertId enables you to retrieve the last primary keys added by a user to a MSSql table. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Component list Select the tMSSqlConnection component on the Component list to reuse the connection details you already defined, if there are more than one component in this list.

Function Purpose Basic settings

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenario
For a related scenario, see Scenario: Get the ID for the last inserted record on page 604.

Talend Open Studio Components

567

Database components
tMSSqlOutput

tMSSqlOutput
tMSSqlOutput properties
Component family Databases/MS SQL server tMSSqlOutput writes, updates, makes changes or suppresses entries in a database. tMSSqlOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Schema Database Database server IP address Listening port number of DB server. Name of the schema. Name of the database

Function Purpose

Basic settings

568

Talend Open Studio Components

Database components
tMSSqlOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Turn on identity insert Select this check box to use your own sequence for the identity value of the inserted records (instead of having the SQL Server pick the next sequential value). Action on data On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Single Insert Query: Add entries to the table in a batch Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. Insert if not exist : Add new entries to the table if they do not exist. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Talend Open Studio Components

569

Database components
tMSSqlOutput

Specify identity field

Select this check box to specify the identity field, which is made up of an automatically incrementing identification number. When this check box is selected, three other fields display: Identity field: select the column you want to define as the identity field from the list. Start value: type in a start value, used for the very first row loaded into the table. Step: type in an incremental value, added to the value of the previous row that was loaded. You can also specify the identity field from the schema of the component. To do so, set the DB Type of the relevant column to INT IDENTITY. When the Specify identity field check box is selected, the INT IDENTITY DB Type in the schema is ignored. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit schema

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column

Advanced settings

Additional JDBC parameters

Commit every

Additional Columns

570

Talend Open Studio Components

Database components
tMSSqlOutput

SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Support null in SQL WHERE statement Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure that the Nullable check box is selected for the corresponding columns in the schema. Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Insert, the Update, the Single Insert Query or the Delete option in the Action on data field. If you are using the MS Sql Server 2008 version, make sure that the Batch Size is less than or equal to 2000 parameter markers divided by the number of columns in the schema.

Use batch size

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a MSSql database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tMSSqlOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

571

Database components
tMSSqlOutputBulk

tMSSqlOutputBulk
tMSSqlOutputBulk properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tMSSqlOutputBulkExec component, detailed in a separate section. The advantage of using a two step process is that the data can be transformed before it is loaded in the database.
Component family Databases/MSSql

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the MSSql standards. Prepares the file to be used as parameter in the INSERT query to feed the MSSql database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the records. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check to include the column header.

572

Talend Open Studio Components

Database components
tMSSqlOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to collect log data at the component level.

tStaCatcher statistics Usage

This component is to be used along with tMSSqlBulkExec component. Used together they offer gains in performance while feeding a MSSql database.

Related scenarios
For use cases in relation with tMSSqlOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

Talend Open Studio Components

573

Database components
tMSSqlOutputBulkExec

tMSSqlOutputBulkExec
tMSSqlOutputBulkExec properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tMSSqlOutputBulkExec component.
Component family Databases/MSSql

Function Purpose Basic settings

Executes actions on the provided data provided. As a dedicated component, it allows gains in performance during Insert operations to a MSSql database. Action Select the action to be carried out Bulk insert Bulk update Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port DB name Schema Database server IP address Listening port number of DB server. Name of the database Name of the schema.

Property type

574

Talend Open Studio Components

Database components
tMSSqlOutputBulkExec

Username and Password Table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. Clear a table: The table content is deleted. You have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Action on table

Schema and Edit schema

File Name

Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the records Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows. Type in the number of the row where the action should start. Select this check box to include the column header.

Append Advanced settings Additional JDBC parameters

Field separator Row separator First row Include header

Talend Open Studio Components

575

Database components
tMSSqlOutputBulkExec

Code page Data file type Encoding

OEM code pages used to map a specific set of characters to numerical code point values. Select the type of data being handled. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to collect log data at the component level.

tStaCatcher statistics Usage Limitation

This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tMSSqlOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

576

Talend Open Studio Components

Database components
tMSSqlRollback

tMSSqlRollback
tMSSqlRollback properties
This component is closely related to tMSSqlCommit and tMSSqlConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases

Function Purpose Basic settings

Cancel the transaction commit in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tMSSqlConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with MSSql components, especially with tMSSqlConnection and tMSSqlCommit components. n/a

Related scenario
For tMSSqlRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

Talend Open Studio Components

577

Database components
tMSSqlRow

tMSSqlRow
tMSSqlRow properties
Component family Databases/DB2

Function

tMSSqlRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tMSSqlRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Name of the schema. DB user authentication data.

Purpose

Basic settings

578

Talend Open Studio Components

Database components
tMSSqlRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Name of the table to be used.

Turn on identity insert Select this check box to use your own sequence for the identity value of the inserted records (instead of having the SQL Server pick the next sequential value). Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list.

Query

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset

Talend Open Studio Components

579

Database components
tMSSqlRow

Use PreparedStatement

Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

580

Talend Open Studio Components

Database components
tMSSqlSCD

tMSSqlSCD
tMSSqlSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tMSSqlSCD.

Talend Open Studio Components

581

Database components
tMSSqlSP

tMSSqlSP
tMSSqlSP Properties
Component family Databases/MSSql

Function Purpose Basic settings

tMSSqlSP calls the database stored procedure. tMSSqlSP offers a convenient way to centralize multiple or complex queries in a database and call them easily. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tMSSqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database. Name of the schema. DB user authentication data. In SP principle, the schema is an input parameter. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository.

582

Talend Open Studio Components

Database components
tMSSqlSP

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SP Name Is Function / Return result in Parameters Type in the exact name of the Stored Procedure Select this check box, if only a value is to be returned. Select on the list the schema column, the value to be returned is based on. Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are paramaters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

Advanced settings

Additional JDBC parameters

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax.

Related scenario
For related scenarios, see: tMysqlSP Scenario: Finding a State Label using a stored procedure. tOracleSP Scenario: Checking number format using a stored procedure Check as well the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Talend Open Studio Components

583

Database components
tMSSqlTableList

tMSSqlTableList
tMSSqlTableList Properties
Component family Databases/MS SQL

Function Purpose Basic settings

Iterates on a set of table names through a defined MS SQL connection. Lists the names of a given set of MSSql tables using a select statement based on a Where clause. Component list Select the tMSSqlConnection component in the list if more than one connection are planned for the current job.

Where clause for table Enter the Where clause to identify the tables to iterate name selection on. Advanced settings Usage Limitation tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with MSSql components, especially with tMSSqlConnection. n/a

Related scenario
For tMSSqlTableList related scenario, see Scenario: Iterating on a DB table and listing its column names.

584

Talend Open Studio Components

Database components
tMysqlBulkExec

tMysqlBulkExec
tMysqlBulkExec properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database. These two steps are fused together in the tMysqlOutputBulkExec component, detailed in a separate section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family Databases/Mysql

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, tMysqlBulkExec offers gains in performance while carrying out the Insert operations to a Mysql database Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Use an existing connection Select the version of My SQL that you are using. Select this check box when using a configured tMysqlConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Talend Open Studio Components

585

Database components
tMysqlBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. Clear table: The table content is deleted. You have the possibility to rollback the operation. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table

Local file Name

Schema and Edit Schema

Advanced settings

Additional JDBC parameters

Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Character or sequence of characters used to separate lines. Character, string or regular expression to separate fields. Character used to enclose text. On the data of the table defined, you can perform: Insert records in table: Add new records to the table. Update records in table: Make changes to existing records. Replace records in table: Replace existing records with new ones. Ignore records in table: Ignore the existing records, or insert the new ones.

Lines terminated by Fields terminated by Enclosed by Action on data

586

Talend Open Studio Components

Database components
tMysqlBulkExec

Records contain NULL value

Check this box if you want to retrieve the null values from the input data flow. If you do not check this box, the null values from the input data flow will be considered as empty fields in the output data flow. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Encoding

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tMysqlOutputBulk component. Used together, they can offer gains in performance while feeding a Mysql database. n/a

Limitation

Related scenarios
For use cases in relation with tMysqlBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

587

Database components
tMysqlClose

tMysqlClose
tMysqlClose properties
Component family Databases/Mysql

Function Purpose Basic settings

tMysqlClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tMysqlConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mysql components, especially with tMysqlConnection and tMysqlCommit. n/a

Related scenario
No scenario is available for this component yet.

588

Talend Open Studio Components

Database components
tMysqlColumnList

tMysqlColumnList
tMysqlColumnList Properties
Component family Databases/MySQL

Function Purpose Basic settings

Iterates on all columns of a given table through a defined Mysql connection. Lists all column names of a given Mysql table. Component list Select the tMysqlConnection component in the list if more than one connection are planned for the current job. Enter the name of the table.

Table name Usage Limitation

This component is to be used along with Mysql components, especially with tMysqlConnection. n/a

Scenario: Iterating on a DB table and listing its column names


The following Java scenario creates a five-component job that iterates on a given table name from a Mysql database using a Where clause and lists all column names present in the table. Drop the following components from the Palette onto the design workspace: tMysqlConnection, tMysqlTableList, tMysqlColumnList, tFixedFlowInput, and tLogRow. Connect tMysqlConnection to tMysqlTableList using an OnSubjobOk link. Connect tMysqlTableList, tMysqlColumnList, and tFixedFlowInput using Iterate links. Connect tFixedFlowInput to tLogRow using a Row Main link.

Talend Open Studio Components

589

Database components
tMysqlColumnList

In the design workspace, select tMysqlConnection and click the Component tab to define its basic settings. In the Basic settings view, set the database connection details manually or select them from the context variable list, through a Ctrl+Space click in the corresponding field if you have stored them locally as Metadata DB connection entries. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide.

In this example, we want to connect to a Mysql database called customers. In the design workspace, select tMysqlTableList and click the Component tab to define its basic settings.

On the Component list, select the relevant Mysql connection component if more than one connection is used. Enter a Where clause using the right syntax in the corresponding field to iterate on the table name(s) you want to list on the console. In this scenario, the table we want to iterate on is called customer. In the design workspace, select tMysqlColumnList and click the Component tab to define its basic settings.

590

Talend Open Studio Components

Database components
tMysqlColumnList

On the Component list, select the relevant Mysql connection component if more than one connection is used. In the Table name field, enter the name of the DB table you want to list its column names. In this scenario, we want to list the columns present in the DB table called customer. In the design workspace, select tFixedFlowInput and click the Component tab to define its basic settings. Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to define the data you want to use as input. In this scenario, the schema is made of two columns, the first for the table name and the second for the column name.

Click OK to close the dialog box, and accept propagating the changes when prompted by the system. The defined columns display in the Values panel of the Basic settings view. Click in the Value cell for each of the two defined columns and press Ctrl+Space to access the global variable list. From the global variable list, select ((String)globalMap.get("tMysqlTableList_1_CURRENT_TABLE")) and ((String)globalMap.get("tMysqlColumnList_1_COLUMN_NAME")) for the TableName and ColumnName respectively.

In the design workspace, select tLogRow. Click the Component tab and define the basic settings for tLogRow as needed. Save your job and press F6 to execute it.

Talend Open Studio Components

591

Database components
tMysqlColumnList

The name of the DB table is displayed on the console along with all its column names.

592

Talend Open Studio Components

Database components
tMysqlCommit

tMysqlCommit
tMysqlCommit Properties
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/MySQL

Function Purpose

Validates the data processed through the job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tMysqlConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tMysqlCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mysql components, especially with tMysqlConnection and tMysqlRollback components. n/a

Related scenario
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesnt make much sense to use one of these without using a tMysqlConnection component to open a connection for the current transaction. For tMysqlCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

Talend Open Studio Components

593

Database components
tMysqlConnection

tMysqlConnection
tMysqlConnection Properties
This component is closely related to tMysqlCommit and tMysqlRollback. It usually doesnt make much sense to use one of these without using a tMysqlConnection component to open a connection for the current transaction.
Component family Databases/MySQL

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Additional JDBC parameters Username and Password Use or register a shared DB Connection Database server IP address. Listening port number of DB server. Name of the database. Specify additional connection properties for the DB connection you are creating. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Mysql components, especially with tMysqlCommit and tMysqlRollback components. n/a

Scenario: Inserting data in mother/daughter tables


The following Job is dedicated to advanced database users, who want to carry out multiple table insertions using a parent table id to feed a child table. As a prerequisite to this Job, follow the steps described below to create the relevant tables using an engine such as innodb. In a command line editor, connect to your Mysql server.

594

Talend Open Studio Components

Database components
tMysqlConnection

Once connected to the relevant database, type in the following command to create the parent table: create table f1090_mum(id int not null auto_increment, name varchar(10), primary key(id)) engine=innodb; Then create the second table: create table baby (id_baby int not null, years int) engine=innodb; Back into Talend Open Studio, the Job requires seven components including tMysqlConnection and tMysqlCommit.

Drag and drop the following components from the Palette: tFileList, tFileInputDelimited, tMap, tMysqlOutput (x2). Connect the tFileList component to the input file component using an Iterate link as the name of the file to be processed will be dynamically filled in from the tFileList directory using a global variable. Connect the tFileInputDelimited component to the tMap and dispatch the flow between the two output Mysql DB components. Use a Row link for each for these connections representing the main data flow. Set the tFileList component properties, such as the directory. name where files will be fetched from. Add a tMysqlConnection component and connect it to the starter component of this job, in this example, the tFileList component using an OnComponentOk link to define the execution order. In the tMysqlConnection Component view, set the connection details manually or fetch them from the Repository if you centrally stored them as a Metadata DB connection entry. For more information about Metadata, see How to centralize the Metadata items in Talend Open Studio User Guide. On the tFileInputDelimited components Basic settings panel, press Ctrl+Space bar to access the variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH

Talend Open Studio Components

595

Database components
tMysqlConnection

Set the rest of the fields as usual, defining the row and field separators according to your file structure. Then set the schema manually through the Edit schema feature or select the schema from the Repository. In Java version, make sure the data type is correctly set, in accordance with the nature of the data processed. In the tMap Output area, add two output tables, one called mum for the parent table, the second called baby, for the child table. Drag the Name column from the Input area, and drop it to the mum table. Drag the Years column from the Input area and drop it to the baby table.

Make sure the mum table is on the top of the baby table as the order is determining for the flow sequence hence the DB insert to perform correctly. Then connect the output row link to distribute correctly the flow to the relevant DB output component. In each of the tMysqlOutput components Basic settings panel, select the Use an existing connection check box to retrieve the tMysqlConnection details. Notice (in Perl version) that the Commit every field doesnt show anymore as you are supposed to use the tMysqlCommit instead to manage the global transaction commit. In Java version, ignore the field as this command will get overridden by the tMysqlCommit.

596

Talend Open Studio Components

Database components
tMysqlConnection

Set the Table name making sure it corresponds to the correct table, in this example either f1090_mum or f1090_baby. There is no action on the table as they are already created. Select Insert as Action on data for both output components. Click on Sync columns to retrieve the schema set in the tMap. In the Additional columns area of the DB output component corresponding to the child table (f1090_baby), set the id_baby column so that it reuses the id from the parent table. In the SQL expression field type in: '(Select Last_Insert_id())' The position is Before and the Reference column is years. Add the tMysqlCommit component to the design workspace and connect it from the tFileList component using a OnComponentOk connection in order for the Job to terminate with the transaction commit. On the tMysqlCommit Component view, select in the list the connection to be used. Save your Job and press F6 to execute it.

The parent table id has been reused to feed the id_baby column.
Talend Open Studio Components 597

Database components
tMysqlInput

tMysqlInput
tMysqlInput properties
Component family Databases/MySQL

Function Purpose

tMysqlInput reads a database and extracts fields based on a query. tMysqlInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection in the Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tMysqlConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data.

Basic settings

598

Talend Open Studio Components

Database components
tMysqlInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema in the Talend Open Studio User Guide.

Table Name

Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. When you need to handle data of the time-stamp type 0000-00-00 00:00:00 using this component, set the parameter as: noDatetimeStringSync=true&zeroDateTimeBehavior=convertToNull. When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Enable stream Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Mysql databases.

Scenario: Writing dynamic columns from a MySQL database to an output file


The Dynamic Schema feature is only functional in Talend Integration Suite Studio. You can reproduce this scenario only if you are using any edition of Talend Integration Suite Studio.

In this scenario we will read dynamic columns from a MySQL database, map them and then write them to a table in a local output file. By defining a dynamic column alongside known column names, we can retrieve all of the columns from the database table, including the unknown columns. Drop a tMysqlInput, a tMap and a tFileOutputDelimited component onto the workspace.

Talend Open Studio Components

599

Database components
tMysqlInput

Link tMysqlInput to tMap using a Row > Main connection. Link tMap to tFileOutputDelimited using a Row > *New Output* (Main) connection. Double-click tMysqlInput to open its Basic Settings view in the Component tab.

The dynamic schema feature is only supported in Built-In mode.

Select Built-in as the Property Type. Select the DB Version from the corresponding list. Next to Host, enter the database server IP address. Next to Port, enter the listening port number of the database server. Enter your authentication data in the Username and Password fields. Set the Schema type as Built-in and click Edit schema to define the dynamic schema. The schema editor opens:

600

Talend Open Studio Components

Database components
tMysqlInput

Click the

button to add a row to the schema.

Under Column and Db Column, click in the fields to enter the corresponding column names. Click the field under Type to define the type of data. Click the arrow and select Dynamic from the list.
Under Type, the dynamic column type must be set as Dynamic.

Click OK to close the schema editor. Next to the Table Name field, click the [...] button to select the database table of interest. A dialog box displays a tree diagram of all the tables in the selected database:

Click the table of interest and then click OK to close the dialog box. Set the Query Type as Built-In. In the Query box, enter the query required to retrieve all of the columns from the table.
In the SELECT statement it is necessary to use the * wildcard character, to retrieve all of the columns from the selected table.

Click tMap to open its Basic Settings view in the Component tab. Click [...] next to Map Editor to map the column from the source file.

Talend Open Studio Components

601

Database components
tMysqlInput

Drop the column defined as dynamic from the input schema on the left onto the output schema on the right. The column dropped on the output schema retains its original values.
The dynamic column must be mapped on a one to one basis and cannot undergo any transformations. It cannot be used in a filter expression or in a variables section. It cannot be renamed in the output table and cannot be used as a join condition.

Click OK to close the Map Editor. Double-click tFileOutputDelimited to set its Basic Settings in the Component tab.

602

Talend Open Studio Components

Database components
tMysqlInput

Next to the File Name field, click the [...] button to browse your directory to where you want to save the output file, then enter a name for the file. Select the Include Header check box to retrieve the column names as well as the data. Save the Job and press F6 to run it. The output file is written with all the column names and corresponding data, retrieved from the database via the dynamic schema:

The Job can also be run in the Traces Debug mode, which allows you to view the rows as they are written to the output file, in the workspace.

For further information about defining and mapping dynamic schemas, see Dynamic schema in the Talend Open Studio User Guide. For related scenarios, see: Scenario: Dynamic context use in MySQL DB insert. Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable.

Talend Open Studio Components

603

Database components
tMysqlLastInsertId

tMysqlLastInsertId
tMysqlLastInsertId properties
Component family Databases

Function Purpose Basic settings

tMysqlLastInsertId fetches the last inserted ID from a selected MySQL Connection. tMysqlLastInsertId obtains the primary key value of the record that was last inserted in a Mysql table by a user. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flow charts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Component list Select the relevant tMysqlConnection component in the list if more than one connection is planned for the current job.

Usage

This component is to be used as an intermediary component. If you use this component with tMySqlOutput, verify that the Extend Insert check box in the Advanced Settings tab is not selected. Extend Insert allows you to make a batch insertion, however, if the check box is selected, only the ID of the last line in the last batch will be returned.

Limitation

n/a

Scenario: Get the ID for the last inserted record


The following Java scenario creates a job that opens a connection to Mysql database, writes the defined data into the database, and finally fetches the last inserted ID on the existing connection. Drop the following components from the Palette onto the design workspace: tMySqlConnection, tMySqlCommit, tFileInputDelimited, tMySqlOutput, tMysqlLastInsertId, and tLogRow. Connect tMySqlConnection to tFileInputDelimited using an OnSubjobOk link. Connect tFileInputDelimited to tMySqlCommit using an OnSubjobOk link. Connect tFileInputdelimited to the three other components using Row Main links.
604 Talend Open Studio Components

Database components
tMysqlLastInsertId

In the design workspace, select tMysqlConnection. Click the Component tab to define the basic settings for tMysqlConnection. In the Basic settings view, set the connection details manually or select them from the context variable list, through a Ctrl+Space click in the corresponding field if you stored them locally as Metadata DB connection entries. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide.

In the design workspace, select tMysqlCommit and click the Component tab to define its basic settings. On the Component List, select the relevant tMysqlConnection if more than one connection is used. In the design workspace, select tFileInputDelimited. Click the Component tab to define the basic settings of tFileInputDelimited.

Talend Open Studio Components

605

Database components
tMysqlLastInsertId

Set Property Type to Built-In. Fill in a path to the processed file in the File Name field. The file used in this example is Customers. Define the Row separator that allow to identify the end of a row. Then define the Field separator used to delimit fields in a row. Set the header, the footer and the number of processed rows as necessary. In this scenario, we have one header. Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the next component. Related topics: How to set a built-in schema and How to set a repository schema of Talend Open Studio User Guide.

In this scenario, the schema consists of two columns, name and age. The first holds three employees names and the second holds the corresponding age for each. In the design workspace, select tMySqlOutput. Click the Component tab to define the basic settings of tMySqlOuptput.

606

Talend Open Studio Components

Database components
tMysqlLastInsertId

Select the Use an existing connection check box. In the Table field, enter the name of the table where to write the employees list, in this example: employee. Select relevant actions on the Action on table and Action on data lists. In this example, no action is carried out on table, and the action carried out on data is Insert. Set Schema to Built-In and click Sync columns to synchronize columns with the previous component. In this example, the schema to be inserted into the MySql database table consists of the two columns name and age.

In the design workspace, select tMySqlLastInsertId. Click the Component tab to define the basic settings of tMySqlLastInserId.

On the Component List, select the relevant tMysqlConnection, if more than one connection is used.

Talend Open Studio Components

607

Database components
tMysqlLastInsertId

Set Schema to Built-In and click Sync columns to synchronize columns with the previous component. In the output schema of tMySqlLastInsertId, you can see the read-only column last_insert_id that will fetch the last inserted ID on the existing connection.

In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow. Save your job and press F6 to execute it.

tMysqlLastInsertId fetched the last inserted ID for each line on the existing connection.

608

Talend Open Studio Components

Database components
tMysqlOutput

tMysqlOutput
tMysqlOutput properties
Component family Databases/MySQL

Function Purpose

tMysqlOutput writes, updates, makes changes or suppresses entries in a database. tMysqlOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Select the MySQL version you are using. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection in the Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tMysqlConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Database server IP address. Listening port number of DB server. Name of the database.

Basic settings

Talend Open Studio Components

609

Database components
tMysqlOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. Truncate table: The table content is quickly deleted. However, you will not be able to rollback the operation. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, the job stops. Update: Make changes to existing entries. Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or creates them if they do not exist. Delete: Remove entries corresponding to the input flow. Replace: Add new entries to the table. If an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. Insert or update on duplicate key or unique index: Add entries if the inserted value does not exist or update entries if the inserted value already exists and there is a risk of violating a unique index or primary key. Insert Ignore: Add only new rows to prevent duplicate key errors. You must specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the update and delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column name on which you want to base the update operation. Do the same in the Key in delete column for the deletion operation.

Action on data

610

Talend Open Studio Components

Database components
tMysqlOutput

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. Select this check box to carry out a bulk insert of a defined set of lines instead of inserting lines one by one. The gain in system performance is considerable. Number of rows per insert: enter the number of rows to be inserted per operation. Note that the higher the value specidied, the lower performance levels shall be due to the increase in memory demands. This option is not compatible with the Reject link. You should therefore clear the checkbox if you are using a Row > Rejects link with this component. If you are using this component with tMysqlLastInsertID, ensure that the Extend Insert check box in Advanced Settings is not selected. Extend Insert allows for batch loading, however, if the check box is selected, only the ID of the last line of the last batch will be returned.

Advanced settings

Additional JDBC parameters

Extend Insert

Use batch size

Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected, the Update or the Delete option in the Action on data field.

Talend Open Studio Components

611

Database components
tMysqlOutput

Commit every

Number of rows to be included in the batch before it is committed to the DB. This option ensures transaction quality (but not rollback) and, above all, a higher performance level. This option is not available if you have just created the DB table (even if you delete it beforehand). This option allows you to call SQL functions to perform actions on columns, provided that these are not insert, update or delete actions, or actions that require pre-processing. Name: Type in the name of the schema column to be altered or inserted. SQL expression: Type in the SQL statement to be executed in order to alter or insert the data in the corrsponding column. Position: Select Before, Replace or After, depending on the action to be performed on the reference column. Reference column: Type in a reference column that tMySqlOutput can use to locate or replace the new column, or the column to be modified.

Additional Columns

Use field options

Select this check box to customize a request, particularly if multiple actions are being carried out on the data. Select this check box to activate the hint configuration area which helps you optimize a querys execution. In this area, parameters are: - HINT: specify the hint you need, using the syntax /*+ */. - POSITION: specify where you put the hint in a SQL statement. - SQL STMT: select the SQL statement you need to use. Select this check box to display each step involved in the process of writing data in the database. Updates the values of the columns specified, in the event of duplicate primary keys.: Column: Between double quotation marks, enter the name of the column to be updated. Value: Enter the action you want to carry out on the column. To use this option you must first of all select the Insert mode in the Action on data list found in the Basic Settings view.

Use Hint Options

Enable debug mode Use duplicate key update mode insert

tStatCatcher Statistics Select this check box to collect log data at the component level.

612

Talend Open Studio Components

Database components
tMysqlOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a MySQL database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link.

Scenario 1: Adding a new column and altering data in a DB table


This Java scenario is a three-component job that aims at creating random data using a tRowGenerator, duplicating a column to be altered using the tMap component, and eventually altering the data to be inserted based on an SQL expression using the tMysqlOutput component. Drop the following components from the Palette onto the design workspace: tRowGenerator, tMap and tMySQLOutput. Connect tRowGenerator, tMap, and tMysqlOutput using the Row Main link.

In the design workspace, select tRowGenerator to display its Basic settings view.

Set the Schema to Built-In. Click the Edit schema three-dot button to define the data to pass on to the tMap component, two columns in this scenario, name and random_date.

Talend Open Studio Components

613

Database components
tMysqlOutput

Click OK to close the dialog box. Click the RowGenerator Editor three-dot button to open the editor and define the data to be generated.

Click in the corresponding Functions fields and select a function for each of the two columns, getFirstName for the first column and getrandomDate for the second column. In the Number of Rows for Rowgenerator field, enter 10 to generate ten first name rows and click Ok to close the editor. Double-click the tMap component to open the Map editor. The Map editor opens displaying the input metadata of the tRowGenerator component.

614

Talend Open Studio Components

Database components
tMysqlOutput

In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and define the first as random_date and the second as random_date1.

In this scenario, we want to duplicate the random_date column and adapt the schema in order to alter the data in the output component. In the Map editor, drag the random_date row from the input table to the random_date and random_date1 rows in the output table.

Talend Open Studio Components

615

Database components
tMysqlOutput

Click OK to close the editor. In the design workspace, double-click the tMysqlOutput component to display its Basic settings view and set its parameters.

Set Property Type to Repository and then click the three-dot button to open the [Repository content] dialog box and select the correct DB connection. The connection details display automatically in the corresponding fields.
If you have not stored the DB connection details in the Metadata entry in the Repository, select Built-in on the property type list and set the connection detail manually.

Click the three-dot button next to the Table field and select the table to be altered, Dates in this scenario.

616

Talend Open Studio Components

Database components
tMysqlOutput

On the Action on table list, select Drop table if exists and create, select Insert on the Action on data list. If needed, click Sync columns to synchronize with the columns coming from the tMap component. Click the Advanced settings tab to display the corresponding view and set the advanced parameters.

In the Additional Columns area, set the alteration to be performed on columns. In this scenario, the One_month_later column replaces random_date_1. Also, the data itself gets altered using an SQL expression that adds one month to the randomly picked-up date of the random_date_1 column. ex: 2007-08-12 becomes 2007-09-12. -Enter One_Month_Later in the Name cell. -In the SQL expression cell, enter the relevant addition script to be performed, adddate(Random_date, interval 1 month) in this scenario. -Select Replace on the Position list. -Enter Random_date1 on the Reference column list.
For this job we duplicated the random_date_1 column in the DB table before replacing one instance of it with the One_Month_Later column. The aim of this workaround was to be able to view upfront the modification performed.

Save your job and press F6 to execute it. The new One_month_later column replaces the random_date1 column in the DB table and adds one month to each of the randomly generated dates.

Related topic: tDBOutput properties.


Talend Open Studio Components 617

Database components
tMysqlOutput

Scenario 2: Updating data in a database table


This Java scenario describes a two-component Job that updates data in a MySQL table according to that in a delimited file. Drop tFileInputDelimited and tMysqlOutput from the Palette onto the design workspace. Connect the two components together using a Row Main link.

Double-click tFileInputDelimited to display its Basic settings view and define the component properties. From the Property Type list, select Repository if you have already stored the metadata of the delimited file in the Metadata node in the Repository tree view. Otherwise, select Built-In to define manually the metadata of the delimited file. For more information about storing metadata, see Setting up a File Delimited schema of Talend Open Studio User Guide.

In the File Name field, click the three-dot button and browse to the source delimited file that contains the modifications to propagate in the MySQL table. In this example, we use the customer_update file that holds four columns: id, CustomerName, CustomerAddress and idState. Some of the data in these four columns is different from that in the MySQL table.

618

Talend Open Studio Components

Database components
tMysqlOutput

Define the row and field separators used in the source file in the corresponding fields. If needed, set Header, Footer and Limit. In this example, Header is set to 1 since the first row holds the names of columns, therefore it should be ignored. Also, the number of processed lines is limited to 2000. Select Built in from the Schema list then click the three-dot button next to Edit Schema to open a dialog box where you can describe the data structure of the source delimited file that you want to pass to the component that follows.

Select the Key check box(es) next to the column name(s) you want to define as key column(s).
It is necessary to define at least one column as a key column for the Job to be executed correctly. Otherwise, the Job is automatically interrupted and an error message displays on the console.

In the design workspace, double-click tMysqlOutput to open its Basic settings view where you can define its properties.

Talend Open Studio Components

619

Database components
tMysqlOutput

Click Sync columns to retrieve the schema of the preceding component. If needed, click the three-dot button next to Edit schema to open a dialog box where you can check the retrieved schema. From the Property Type list, select Repository if you have already stored the connection metadata in the Metadata node in the Repository tree view. Otherwise, select Built-In to define manually the connection information. For more information about storing metadata, see Setting up a DB connection of Talend Open Studio User Guide. Fill in the database connection information in the corresponding fields. In the Table field, enter the name of the table to update. From the Action on table list, select the operation you want to perform, None in this example since the table already exists. From the Action on data list, select the operation you want to perform on the data, Update in this example. Save your Job and press F6 to execute it.

620

Talend Open Studio Components

Database components
tMysqlOutput

Using you DB browser, you can verify if the MySQL table, customers, has been modified according to the delimited file. In the above example, the database table has always the four columns id, CustomerName, CustomerAddress and idState, but certain fields have been modified according to the data in the delimited file used.

Scenario 3: Retrieve data in error with a Reject link


This scenario describes a four-component Job that carries out migration from a customer file to a MySQL database table and redirects data in error towards a CSV file using a Reject link.

In the Repository, select the customer file metadata that you want to migrate and drop it onto the workspace. In the [Components] dialog box, select tFileInputDelimited and click OK. The component properties will be filled in automatically.

Talend Open Studio Components

621

Database components
tMysqlOutput

If you have not stored the information about your customer file under the Metadata node in the Repository. Drop a tFileInputDelimited component from the family File > Input, in the Palette, and fill in its properties manually in the Component tab. From the Palette, drop a tMap from the Processing family onto the workspace. In the Repository, expand the Metadata node, followed by the Db Connections node and select the connection required to migrate your data to the appropriate database. Drop it onto the workspace. In the [Components] dialog box, select tMysqlOutput and click OK. The database connection properties will be automatically filled in. If you have not stored the database connection details under the Db Connections node in the Repository, drop a tMysqlOutput from the Databases family in the Palette and fill in its properties manually in the Component tab. For more information, see How to set a built-in schema and How to set a repository schema in the Talend Open Studio User Guide. From the Palette, select a tFileOutputDelimited from the File > Output family, and drop it onto the workspace. Link the customers component to the tMap component, and the tMap and Localhost with a Row Main link. Name this second link out. Link the Localhost to the tFileOutputDelimited using a Row > Reject link. Double-click the customers component to display the Component view.

In the Property Type list, select Repository and click the [...] button in order to select the metadata containing the connection to your file. You can also select the Built-in mode and fill in the fields manually. Click the [...] button next to the File Name field, and fill in the path and the name of the file you want to use. In the Row and Field Separator fields, type in between inverted commas the row and field separator used in the file. In the Header, Footer and Limit fields, type in the number of headers and footers to ignore, and the number of rows to which processing should be limited. In the Schema list, select Repository and click the [...] button in order to select the schema of your file, if it is stored under the Metadata node in the Repository. You can also click the [...] button next to the Edit schema field, and set the schema manually.

622

Talend Open Studio Components

Database components
tMysqlOutput

The schema is as follows:

Double-click the tMap component to open its editor.

Select the id, CustomerName, CustomerAddress, idSate, id2, RegTime and RegisterTime columns on the table on the left and drop them on the out table, on the right.

Talend Open Studio Components

623

Database components
tMysqlOutput

In the Schema editor area, at the bottom of the tMap editor, in the right table, change the length of the CustomerName column to 28 to create an error. Thus, any data for which the length is greater than 28 will create errors, retrieved with the Reject link. Click OK. In the workspace, double-click the output Localhost component to display its Component view.

In the Property Type list, select Repository and click the [...] button to select the connection to the database metadata. The connection details will be automatically filled in. You can also select the Built-in mode and set the fields manually. In the Table field, type in the name of the table to be created. In this scenario, we call it customers_data. In the Action on data list, select the Create table option. Click the Sync columns button to retrieve the schema from the previous component.

624

Talend Open Studio Components

Database components
tMysqlOutput

Make sure the Die on error check box isnt selected, so that the Job can be executed despite the error you just created. Click the Advanced settings tab of the Component view to set the advanced parameters of the component.

Deselect the Extend Insert check box which enables you to insert rows in batch, because this option is not compatible with the Reject link. Double-click the tFileOutputDelimited component to set its properties in the Component view.

Click the [...] button next to the File Name field to fill in the path and name of the output file. Click the Sync columns button to retrieve the schema of the previous component. Save your Job and press F6 to execute it.

Talend Open Studio Components

625

Database components
tMysqlOutput

The data in error are sent to the delimited file, as well as the error type met. Here, we have: Data truncation.

626

Talend Open Studio Components

Database components
tMysqlOutputBulk

tMysqlOutputBulk
tMysqlOutputBulk properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database. These two steps are fused together in the tMysqlOutputBulkExec component, detailed in a separate section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family Databases/MySQL

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the MySql standards Prepares the file to be used as parameter in the INSERT query to feed the MySQL database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows. Select this check box to add the new rows at the end of the file Select this check box to include the column header to the file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Field separator Row separator Append Include header Schema and Edit Schema

Talend Open Studio Components

627

Database components
tMysqlOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Usage

This component is to be used along with tMySQlBulkExec component. Used together they offer gains in performance while feeding a MySQL database.

Scenario: Inserting transformed data in MySQL database


This scenario describes a four-component job which aims at fueling a database with data contained in a file, including transformed data. Two steps are required in this job, first step is to create the file, that will then be used in the second step. The first step includes a tranformation phase of the data included in the file.

Drag and drop a tRowGenerator, a tMap, a tMysqlOutputBulk as well as a tMysqlBulkExec component. Connect the main flow using row Main links. And connect the start component (tRowgenerator in this example) to the tMysqlBulkExec using a trigger connection, of type OnComponentOk. A tRowGenerator is used to generate random data. Double-click on the tRowGenerator component to launch the editor. Define the schema of the rows to be generated and the nature of data to generate. In this example, the clients file to be produced will contain the following columns: ID, First Name, Last Name, Address, City which all are defined as string data but the ID that is of integer type.

628

Talend Open Studio Components

Database components
tMysqlOutputBulk

Some schema information dont necessarily need to be displayed. To hide them away, click on Columns list button next to the toolbar, and uncheck the relevant entries, such as Precision or Parameters. Use the plus button to add as many columns to your schema definition. Click the Refresh button to preview the first generated row of your output. Then select the tMap component to set the transformation. Drag and drop all columns from the input table to the output table.

Apply the transformation on the LastName column by adding .toUpperCase() in its expression field. Click OK to validate the transformation. Then double-click on the tMysqlOutputBulk component. Define the name of the file to be produced in File Name field. If the delimited file information is stored in the Repository, select it in Property Type field, to retrieve relevant data. In this use case the file name is clients.txt.

Talend Open Studio Components

629

Database components
tMysqlOutputBulk

The schema is propagated from the tMap component, if you accepted it when prompted. In this example, dont include the header information as the table should already contain it. The encoding is the default one for this use case. Click OK to validate the output. Then double-click on the tMysqlBulkExec component to set the INSERT query to be executed. Define the database connection details. We recommend you to store this type of information in the Repository, so that you can retrieve them at any time for any job.

Set the table to be filled in with the collected data, in the Table field. Fill in the column delimiters in the Field terminated by area. Make sure the encoding corresponds to the data encoding. Then press F6 to run the job.

The clients database table is filled with data from the file including upper-case last name as transformed in the job.

630

Talend Open Studio Components

Database components
tMysqlOutputBulk

For simple Insert operations that dont include any transformations, the use of tMysqlOutputBulkExec allows you to skip a step in the process and thus improves performance. Related topic: tMysqlOutputBulkExec properties

Talend Open Studio Components

631

Database components
tMysqlOutputBulkExec

tMysqlOutputBulkExec
tMysqlOutputBulkExec properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database. These two steps are fused together in the tMysqlOutputBulkExec component.
Component family Databases/MySQL

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it improves performance during Insert operations to a MySQL database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Port Database Username and Password Select the version of MySQL that you are using. Database server IP address Listening port number of DB server. Name of the database DB user authentication data. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if doesnt exist: The table is created if it does not already exist. Clear a table: The table content is deleted. Name of the table to be written. Note that only one table can be written at a time and that the table must already exist for the insert operation to succeed Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide Select the check box for this option to append new rows to the end of the file.

In Java, use tCreateTable as substitute for this function.

Action on table

Table

Local FileName

Append

632

Talend Open Studio Components

Database components
tMysqlOutputBulkExec

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Character of the row to be escaped Character used to enclose the text. This check box is selected by default. It creates a directory to hold the output table if required. Customize the amount of memory used to temporarily store output data. In the Row number field, enter the number of rows after which the memory is to be freed again. On the data of the table defined, you can carry out the following opertaions: Insert records in table: Add new records to the table. Update records in table: Make changes to existing records. Replace records in table: Replace existing records with new one. Ignore records in table: Ignore existing records or insert the new ones. This check box is selected by default. It allows you to take account of NULL value fields. If you clear the check box, the NULL values will automatically be replaced with empty values. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Advanced settings

Additional JDBC parameters

Row separator Field separator Escape char Text enclosure Create directory if does not exist Custom the flush buffer size

Action on data

Records contain NULL value

Encoding

tStatCatcher Statistics Select this check box to collect the log data at the component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Talend Open Studio Components

633

Database components
tMysqlOutputBulkExec

Scenario: Inserting data in MySQL database


This scenario describes a two-component job which carries out the same operation as the one described for tMysqlOutputBulk properties and tMysqlBulkExec properties, although no data is transformed.

Drop a tRowGenerator and a tMysqlOutputBulkExec component from the Palette to the design workspace. Connect the components using a link such as Row > Main. Set the tRowGenerator parameters the same way as in the Scenario: Inserting transformed data in MySQL database. The schema is made of four columns including: ID, First Name, Last Name, Address and City. In the workspace, double-click the tMysqlOutputBulkExec to display the Component view and set the properties.

Define the database connection details, if necessary. Consult the recommendations detailed in the Scenario: Inserting transformed data in MySQL database, concerning the conservation of connection details in the Repository, under the Metadata node. In the component view, select Repository in the Property Type field and then select the appropriate connection in the adjacent field. The following fields will be filled in automatically. For further information, see How to set a built-in schema and How to set a repository schema in the Talend Open Studio User Guide. In the Action on table field, select the None option as you want to insert the data into a table which already exists. In the Table field, enter the name of the table you want to populate, the name being clients in this example. In the Local filename field, indicate the access path and the name of the file which contains the data to be added to the table. In this example, the file is clients.txt.

634

Talend Open Studio Components

Database components
tMysqlOutputBulkExec

Click on the Advanced settings tab to define the components advanced parameters.

In the Action on data list, select the Insert records in table to insert the new data in the table. Press F6 to run the job. The result should be pretty much the same as in Scenario: Inserting transformed data in MySQL database, but the data might differ as these are regenerated randomly everytime the job is run.

Talend Open Studio Components

635

Database components
tMysqlRollback

tMysqlRollback
tMysqlRollback properties
This component is closely related to tMysqlCommit and tMysqlConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases

Function Purpose Basic settings

Cancel the transaction commit in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tMysqlConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mysql components, especially with tMysqlConnection and tMysqlCommit components. n/a

Scenario: Rollback from inserting data in mother/daughter tables


Based on the tMysqlConnection Scenario: Inserting data in mother/daughter tables, insert a rollback function in order to prevent unwanted commit.

Drag and drop a tMysqlRollback to the design workspace and connect it to the Start component.

636

Talend Open Studio Components

Database components
tMysqlRollback

Set the Rollback unique field on the relevant DB connection. This complementary element to the job ensures that the transaction wont be partly committed.

Talend Open Studio Components

637

Database components
tMysqlRow

tMysqlRow
tMysqlRow properties
Component family Databases/MySQL

Function

tMysqlRow is the specific component for this database query. It executes the SQL query stated in the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tMysqlRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Use an existing connection Select the MySQL version that you are using. Select this check box and click the relevant tMysqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

638

Talend Open Studio Components

Database components
tMysqlRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name Query type

Name of the table to be processed. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Guess Query

Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the result of the query in a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Query

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Use PreparedStatement

Talend Open Studio Components

639

Database components
tMysqlRow

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Scenario 1: Removing and regenerating a MySQL table index


This scenario describes a four-component job that removes a table index, applies a select insert action onto a table then regenerates the index.

Select and drop the following components onto the design workspace: tMysqlRow (x2), tRowGenerator, and tMysqlOutput. Connect tRowGenerator to tMysqlIntput. Using a OnComponentOk connections, link the first tMysqlRow to tRowGenerator and tRowGenerator to the second tMysqlRow. Select the tMysqlRow to fill in the DB Basic settings. In Property type as well in Schema, select the relevant DB entry in the list. The DB connection details and the table schema are accordingly filled in. Propagate the properties and schema details onto the other components of the job. The query being stored in the Metadata area of the Repository, you can also select Repository in the Query type field and the relevant query entry. If you didnt store your query in the Repository, type in the following SQL statement to alter the database entries: drop index <index_name> on <table_name> Select the second tMysqlRow component, check the DB properties and schema.

640

Talend Open Studio Components

Database components
tMysqlRow

Type in the SQL statement to recreate an index on the table using the following statement: create index <index_name> on <table_name> (<column_name>) The tRowGenerator component is used to generate automatically the columns to be added to the DB output table defined. Select the tMysqlOutput component and fill in the DB connection properties either from the Repository or manually the DB connection details are specific for this use only. The table to be fed is named: comprehensive. The schema should be automatically inherited from the data flow coming from the tLogRow. Edit the schema to check its structure and check that it corresponds to the schema expected on the DB table specified. The Action on table is None and the Action on data is Insert. No additional Columns is required for this job. Press F6 to run the job. If you manage to watch the action on DB data, you can notice that the index is dropped at the start of the job and recreated at the end of the insert action. Related topics: tDBSQLRow properties.

Scenario 2: Using PreparedStatement objects to query data


This scenario describes a four component job which allows you to link a table column with a client file. The MySQL table contains a list of all the American States along with the State ID, while the file contains the customer information including the ID of the State in which they live. We want to retrieve the name of the State for each client, using an SQL query. In order to process a large volume of data quickly, we use a PreparedStatement object which means that the query is executed only once rather than against each row in turn. Then each row is sentas a parameter. For this scenario, we use a file and a database for which we have already stored the connection and properties in the Rerpository metadata. For further information concerning the creation of metadata in delimited files, consult Setting up a File Delimited schema. For further information concerning the creation of database connection metadata, see Setting up a DB connection and for further information as to the usage of metadata, see How to set a repository schema. These sections are all in the Talend Open Studio Components User Guide.

In the Repository, expand the Metadata and File delimited nodes. Select the metadata which corresponds to the client file you want to use in the Job. Here, we are using the customers metadata. Slide the metadata onto the workspace and double-click tFileInputDelimited in the Components dialog box so that the tFileInputDelimited component is created with the parameters already set.

Talend Open Studio Components

641

Database components
tMysqlRow

In the Schema list, select Built-in so that you can modify the components schema. Then click on [...] next to the Edit schema field to add a column into which the name of the State will be inserted.

Click on the [+] button to add a column to the schema. Rename this column LabelStateRecordSet and select Object from the Type list. Click OK to save your modifications. From the Palette, select the tMysqlRow, tParseRecordSet and tFileOutputDelimited components and drop them onto the workspace. Connect the four components using Row > Main type links. Double click tMysqlRow to set its properties in the Basic settings tab of the Component view.

642

Talend Open Studio Components

Database components
tMysqlRow

In the Property Type list, select Repository and click on the [...] button to select a database connection from the metadata in the Repository. The DB Version, Host, Port, Database, Username and Password fields are completed automatically. If you are using the Built-in mode, complete these fields manually. From the Schema list, select Built-in to set the schema properties manually and add the LabelStateRecordSet column, or click directly on the Sync columns button to retrieve the schemma from the preceding component. In the Query field, enter the SQL query you want to use. Here, we want to retrieve the names of the American States from the LabelState column of the MySQL table, us_state: "SELECT LabelState FROM us_state WHERE idState=?". The question mark, ?, represents the parameter to be set in the Advanced settings tab. Click Advanced settings to set the components advanced properties.

Talend Open Studio Components

643

Database components
tMysqlRow

Select the Propagate QUERYs recordset check box and select the LabelStateRecordSet column from the use column list to insert the query results in that column. Select the Use PreparedStatement check box and define the parameter used in the query in the Set PreparedStatement Parameters table. Click on the [+] button to add a parameter. In the Parameter Index cell, enter the parameter position in the SQL instruction. Enter 1 as we are only using one parameter in this example. In the Parameter Type cell, enter the type of parameter. Here, the parameter is a whole number, hence, select Int from the list. In the Parameter Value cell, enter the parameter value. Here, we want to retrieve the name of the State based on the State ID for every client in the input file. Hence, enter row1.idState. Double click tParseRecordSet to set its properties in the Basic settings tab of the Component view.

644

Talend Open Studio Components

Database components
tMysqlRow

From the Prev. Comp. Column list, select the preceding components column for analysis. In this example, select LabelStateRecordSet. Click on the Sync columns button to retrieve the schema from the preceding component. The Attribute table is automatically completed with the schema columns. In the Attribute table, in the Value field which corresponds to the LabelStateRecordSet, enter the name of the column containing the State names to be retrieved and matched with each client, within double quotation marks. In this example, enter LabelState. Double click tFileOutputDelimited to set its properties in the Basic settings tab of the Component view.

In the File Name field, enter the access path and name of the output file. Click Sync columns to retrieve the schema from the preceding component. Save your Job and press F6 to run it.

Talend Open Studio Components

645

Database components
tMysqlRow

A column containing the name of the American State corrresponding to each client is added to the file.

646

Talend Open Studio Components

Database components
tMysqlSCD

tMysqlSCD
tMysqlSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tMysqlSCD.

Talend Open Studio Components

647

Database components
tMysqlSCDELT

tMysqlSCDELT
tMysqlSCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tMysqlSCDELT.

648

Talend Open Studio Components

Database components
tMysqlSP

tMysqlSP
tMysqlSP Properties
Component family Databases/Mysql

Function Purpose Basic settings

tMysqlSP calls the database stored procedure. tMysqlSP offers a convenient way to centralize multiple or complex queries in a database and call them easily. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SP Name Is Function / Return result in Type in the exact name of the Stored Procedure Select this check box, if a value only is to be returned. Select on the list the schema column, the value to be returned is based on.

Talend Open Studio Components

649

Database components
tMysqlSP

Parameters

Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are paramaters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Usage Limitation

This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax.

Scenario: Finding a State Label using a stored procedure


The following job aims at finding the State labels matching the odd State IDs in a Mysql two-column table. A stored procedure is used to carry out this operation.

Drag and drop the following components used in this example: tRowGenerator, tMysqlSP, tLogRow. Connect the components using the Row Main link. The tRowGenerator is used to generate the odd id number. Double-click on the component to launch the editor.

Click on the Plus button to add a column to the schema to generate. Select the Key check box and define the Type to Int. The Length equals to 2 digits max.

650

Talend Open Studio Components

Database components
tMysqlSP

Use the preset function called sequence but customize the Parameters in the lower part of the window.

Change the Value of step from 1 to 2 for this example, still starting from 1. Set the Number of generated rows to 25 in order for all the odd State id (of 50 states) to be generated. Click OK to validate the configuration. Then select the tMysqlSP component and define its properties.

Set the Property type field to Repository and select the relevant entry on the list. The connection details get filled in automatically. Else, set manually the connection information. Click Sync Column to retrieve the generated schema from the preceding component.
Talend Open Studio Components 651

Database components
tMysqlSP

Then click Edit Schema and add an extra column to hold the State Label to be output, in addition to the ID. Type in the name of the procedure in the SP Name field as it is called in the Database. In this example, getstate. The procedure to be executed states as follows: DROP PROCEDURE IF EXISTS `talend`.`getstate` $$ CREATE DEFINER=`root`@`localhost` PROCEDURE `getstate`(IN pid INT, OUT pstate VARCHAR(50)) BEGIN SELECT LabelState INTO pstate FROM us_states WHERE idState = pid; END $$ In the Parameters area, click the plus button to add a line to the table. Set the Column field to ID, and the Type field to IN as it will be given as input parameter to the procedure. Add a second line and set the Column field to State and the Type to Out as this is the output parameter to be returned. Eventually, set the tLogRow component properties.

Synchronize the schema with the preceding component. And select the Print values in cells of a table check box for reading convenience. Then save your Job and execute it.

The output shows the state labels corresponding to the odd state ids as defined in the procedure. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

652

Talend Open Studio Components

Database components
tMysqlTableList

tMysqlTableList
tMysqlTableList Properties
Component family Databases/MySQL

Function Purpose Basic settings

Iterates on a set of table names through a defined Mysql connection. Lists the names of a given set of Mysql tables using a select statement based on a Where clause. Component list Select the tMysqlConnection component in the list if more than one connection are planned for the current job.

Where clause for table Enter the Where clause to identify the tables to iterate name selection on. Usage Limitation This component is to be used along with Mysql components, especially with tMysqlConnection. n/a

Scenario: Iterating on DB tables and deleting their content using a user-defined SQL template
The following Java scenario creates a three-component job that iterates on given table names from a MySQL database using a WHERE clause. It then deletes the content of the tables directly on the DBMS using a user-defined SQL template. For advanced use, start with creating a connection to the database that contains the tables you want to empty of their content. In the Repository tree view, expand Metadata and right click DB Connections to create a connection to the relevant database and to store the connection information locally. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide. Otherwise, drop a tMySQLConnection component in the design workspace and fill the connection details manually. Drop the database connection you created from the Repository onto the design workspace. The [Components] dialog box displays. Select tMysqlConnection and click OK. The tMysqlConnection components displays on the design workspace with all connection details automatically filled in its Basic settings view. Drop the following two components from the Palette onto the design workspace: tMysqlTableList and tELT. Connect tMysqlConnection to tMysqlTableList using an OnSubjobOk link.

Talend Open Studio Components

653

Database components
tMysqlTableList

Connect tMysqlTableList to tELT using an Iterate link. If needed, double-click tMysqlConnection to display its Basic settings view and verify the connection details.

In this example, we want to connect to a MySQL database called examples. In the design workspace, double-click tMysqlTableList to display its Basic settings view and define its settings.

On the Component list, select the relevant MySQL connection component if more than one connection is used. Enter a WHERE clause using the right syntax in the corresponding field to iterate on the table name(s) you want to delete the content of. In this scenario, we want the job to iterate on all the tables which names start with ex. In the design workspace, double-click tELT to display its Basic settings view and define its settings.

654

Talend Open Studio Components

Database components
tMysqlTableList

In Database Name, enter the name of the database containing the tables you want to process. On the Component list, select the relevant MySQL connection component if more than one connection is used. Click in the Table name field and press Ctrl+Space to access the global variable list. From the global variable list, select ((String)globalMap.get("tMysqlTableList_1_CURRENT_TABLE")). To create the user-defined SQL template: In the Repository tree view, expand SQL Templates and MySQL in succession.

Right-click UserDefined and select Create SQLTemplate from the drop-down list. The New SQLTemplate wizard opens.

Talend Open Studio Components

655

Database components
tMysqlTableList

Enter a name for the new SQL template and fill in the other fields If needed and then click Finish to close the wizard. An SQL pattern editor opens on the design workspace. Delete the existing code and enter the code necessary to carry out the desired action, deleting the content of all tables which names start with ex in this example.

In the SQL template code, you must use the correct variable name attached to the table name parameter (__TABLE-NAME__ in this example). To display the variable name used, put your pointer in the Table Name field in the basic settings of the tELT component.

Press Ctrl+S to save the new user-defined SQL template. The next step is to add the new user-defined SQL template to the SQL template list in the tELT component.

656

Talend Open Studio Components

Database components
tMysqlTableList

To add the user-defined SQL template to the SQL template list: In the Component view of tELT, click the SQL Templates tab to display the SQLTemplate List.

Click the Add button and add two SQL template lines. Click in the first line to display a drop-down arrow and then click the arrow to display the SQL template list.

Select in the list the user-defined SQL template you already created. Make sure that the SQL template in the second line is Commit. Save your job and press F6 to execute it. All tables in the MySQL examples database which names begin with ex are emptied from their content.

Related scenario
For tMysqlTableList related scenario, see Scenario: Iterating on a DB table and listing its column names.

Talend Open Studio Components

657

Database components
tNetezzaBulkExec

tNetezzaBulkExec
tNetezzaBulkExec properties
Component family Databases/Netezza

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, tNetezzaBulkExec offers gains in performance while carrying out the Insert operations to a Netezza database Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box when you are using the component tNetezzaConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Table Database server IP address Listening port number of DB server. Name of the database. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Schema and Edit Schema

658

Talend Open Studio Components

Database components
tNetezzaBulkExec

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Character, string or regular expression to separate fields. Select this check box to use data enclosure characters. String (ex: \non Unix) to distinguish rows. Character of the row to be escaped. Use Date format to distinguish the way years, months and days are represented in a string. Use Date delimiter to specify the separator between date values. Use Time format to distinguish the time is represented in a string. Use Time delimiter to specify the separator between time values. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Enter the maximum error limit that will not stop the process. Enter the number of rows to be skipped.

Advanced settings

Field Separator Require quotes () around data files Row Separator Escape character Date format / Date delimiter

Time format/ Time delimiter Encoding

Max Errors Skip Rows

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is mainly used when non particular transformation is required on the data to be loaded on to the database. n/a

Related scenarios
For use cases in relation with tNetezzaBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database. tMysqlOutputBulkExec Scenario: Inserting data in MySQL database. tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

Talend Open Studio Components

659

Database components
tNetezzaClose

tNetezzaClose
tNetezzaClose properties
Component family Databases/Netezza

Function Purpose Basic settings

tNetezzaClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tNetezzaConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Netezza components, especially with tNetezzaConnection and tNetezzaCommit. n/a

Related scenario
No scenario is available for this component yet.

660

Talend Open Studio Components

Database components
tNetezzaCommit

tNetezzaCommit
tNetezzaCommit Properties
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Netezza

Function Purpose

tNetezzaCommit validates the data processed through the Job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tNetezzaConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tNetezzaCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Netezza components, especially with tNetezzaConnection and tNetezzaRollback. n/a

Related scenario
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does not make much sense to use one of these without using a tNetezzaConnection component to open a connection for the current transaction. For tNetezzaCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

Talend Open Studio Components

661

Database components
tNetezzaConnection

tNetezzaConnection
tNetezzaConnection Properties
This component is closely related to tNetezzaCommit and tNetezzaRollback. It usually does not make much sense to use one of these without using a tNetezzaConnection component to open a connection for the current transaction.
Component family Databases/Netezza

Function Purpose Basic settings

tNetezzaConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Additional JDBC Parameters Use or register a shared DB Connection Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Specify additional connection properties for the DB connection you are creating. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Netezza components, especially with tNetezzaCommit and tNetezzaRollback. n/a

Related scenarios
For a tNetezzaConnection related scenario, see Scenario: Inserting data in mother/daughter tables.

662

Talend Open Studio Components

Database components
tNetezzaInput

tNetezzaInput
tNetezzaInput properties
Component family Databases/Netezza

Function Purpose

tNetezzaInput reads a database and extracts fields based on a query. tNetezzaInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a tNetezzaConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Basic settings

Talend Open Studio Components

663

Database components
tNetezzaInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name

Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Use cursor Trim all the String/Char columns Trim column When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Netezza databases.

Related scenarios
Related scenarios for tNetezzaInput are: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Scenario: Dynamic context use in MySQL DB insert.

664

Talend Open Studio Components

Database components
tNetezzaNzLoad

tNetezzaNzLoad
This component invokes Netezza's nzload utility to insert records into a Netezza database. This component can be used either in standalone mode, loading from an existing data file; or connected to an input row to load data from the connected component.

tNetezzaNzLoad properties
Component family Databases/Netezza

Function Purpose Basic settings

tNetezzaNzLoad inserts data into a Netezza database table using Netezza's nzload utility. To bulk load data into a Netezza table either from an existing data file, an input flow, or directly from a data flow in streaming mode through a named-pipe. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Table Action on table Database server IP address. Listening port number of the DB server. Name of the Netezza database. DB user authentication data. Name of the table into which the data is to be inserted. On the table defined, you can perform one of the following operations before loading the data: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Drop table if exists and create: The table is removed if it already exists and created again. Clear table: The table content is deleted before the data is loaded. Truncate table: executes a truncate statement prior to loading the data to clear the entire content of the table. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Schema and Edit Schema

Talend Open Studio Components

665

Database components
tNetezzaNzLoad

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema in the Talend Open Studio User Guide. Data file Full path to the data file to be used. If this component is used on its own (not connected to another component with input flow) then this is the name of an existing data file to be loaded into the database. If it is connected, with an input flow to another component; this is the name of the file to be generated and written with the incoming data to later be used with nzload to load into the database. Select this check box to use a named-pipe instead of a data file. This option can only be used when the component is connected with an input flow to another component. When the check box is selected, no data file is generated and the data is transferred to nzload through a named-pipe. This option greatly improves performance in both Linux and Windows. This component on named-pipe mode uses a JNI interface to create and write to a named-pipe on any Windows platform. Therefore the path to the associated JNI DLL must be configured inside the java library path. The component comes with two DLLs for both 32 and 64 bit operating systems that are automatically provided in the Studio with the component. Named-pipe name Advanced settings Use existing control file Specify a name for the named-pipe to be used. Ensure that the name entered is valid. Select this check box to provide a control file to be used with the nzload utility instead of specifying all the options explicitly in the component. When this check box is selected, Data file and the other nzload related options no longer apply. Please refer to Netezza's nzload manual for details on creating a control file. Enter the path to the control file to be used, between double quotation marks, or click [...] and browse to the control file. This option is passed on to the nzload utility via the -cf argument.

Use named-pipe

Control file

666

Talend Open Studio Components

Database components
tNetezzaNzLoad

Field separator

Character, string or regular expression used to separate fields. This is nzload's delim argument. If you do not use the Wrap quotes around fields option, you must make sure that the delimiter is not included in the data that's inserted to the database. The default value is \t orTAB. To improve performance, use the default value.

Wrap quotes around fields

This option is only applied to columns of String, Byte, Byte[], Char, and Object types. Select either: None: do not wrap column values in quotation marks. Single quote: wrap column values in single quotation marks. Double quote: wrap column values in double quotation marks. If using the Single quote or Double quote option, it is necessary to use\ as the Escape char.

Advanced options

Set the nzload arguments in the corresponding table. Click [+] as many times as required to add arguments to the table. Click the Parameter field and choose among the arguments from the list. Then click the corresponding Value field and enter a value between quotation marks. Name of the log file to generate. The logs will be appended if the log file already exists. If the parameter is not specified, the default name for the log file is '<table_name>.<db_name>.nzlog'. And it's generated under the current working directory where the job is running. Name of the bad file to generate. The bad file contains all the records that could not be loaded due to an internal Netezza error. The records will be appended if the bad file already exists. If the parameter is not specified, the default name for the bad file is '<table_name>.<db_name>.nzbad'. And it's generated under the current working directory where the job is running. Directory path to where the log and the bad file are generated. If the parameter is not specified the files are generated under the current directory where the job is currently running. Maximum size for the log file. The value is in MB. The default value is 2000 or 2GB. To save hard disk space, specify a smaller amount if your job runs often.

Parameter

-If

-bf

-ouputDir

-logFileSize

Talend Open Studio Components

667

Database components
tNetezzaNzLoad

-compress

Specify this option if the data file is compressed. Valid values are "TRUE" or "FALSE". Default value if "FALSE". This option is only valid if this component is used by itself and not connected to another component via an input flow.

-skipRows <n>

Number of rows to skip from the beginning of the data file. Set the value to "1" if you like to skip the header row from the data file. The default value is "0". This option should only be used if this component is used by itself and not connected to another component via an input flow.

-maxRows <n>

Maximum number of rows to load from the data file. This option should only be used if this component is used by itself and not connected to another component via an input flow.

-maxErrors -ignoreZero

Maximum number of error records to allow before terminating the load process. The default value is "1". Binary zero bytes in the input data will generate errors. Set this option to "NO" to generate error or to "YES" to ignore zero bytes. The default value is "NO". This option requires all the values to be wrapped in quotes. The default value is "FALSE". This option currently does not work with input flow. Use this option only in standalone mode with an existing file.

-requireQuotes

-nullValue <token>

Specify the token to indicate a null value in the data file. The default value is "NULL". To improve slightly performance you can set this value to an empty field by specifying the value as single quotes: "\'\'". Treat missing trailing input fields as null. You do not need to specify a value for this option in the value field of the table. This option is not turned on by default, therefore input fields must match exactly all the columns of the table by default. Trailing input fields must be nullable in the database.

-fillRecord

-ctrlChar

Accept control chars in char/varchar fields (must escape NUL, CR and LF). You do not need to specify a value for this option in the value field of the table. This option is turned off by default. Accept un-escaped CR in char/varchar fields (LF becomes only end of row). You do not need to specify a value for this option in the value field of the table. This option is turned off by default.

-ctInString

668

Talend Open Studio Components

Database components
tNetezzaNzLoad

-truncString

Truncate any string value that exceeds its declared char/varchar storage. You do not need to specify a value for this option in the value field of the table. This option is turned off by default. Specify the date format in which the input data is written in. Valid values are: "YMD", "Y2MD", "DMY", "DMY2", "MDY", "MDY2", "MONDY", "MONDY2". The default value is "YMD". The date format of the column in the component's schema must match the value specified here. For example if you want to load a DATE column, specify the date format in the component schema as "yyyy-MM-dd" and the -dateStyle option as "YMD". For more description on loading date and time fields please see the below section Loading DATE, TIME and TIMESTAMP columns.

-dateStyle

-dateDelim

Delimiter character between date parts. The default value is "-" for all date styles except for "MONDY[2]" which is " " (empty space). The date format of the column in the component's schema must match the value specified here.

-y2Base -timeStyle

First year expressible using two digit year (Y2) dateStyle. Specify the time format in which the input data is written in. Valid values are: "24HOUR" and "12HOUR". The default value is "24HOUR". For slightly better performance you should keep the default value. The time format of the column in the component's schema must match the value specified here. For example if you want to load a TIME column, specify the date format in the component schema as "HH:mm:ss" and the -timeStyle option as "24HOUR". For more description on loading date and time fields please see the below section Loading DATE, TIME and TIMESTAMP columns.

-timeDelim

Delimiter character between time parts. The default value is ":". The time format of the column in the component's schema must match the value specified here.

-timeRoundNanos

Allow but round non-zero digits with smaller than microsecond resolution.

Talend Open Studio Components

669

Database components
tNetezzaNzLoad

-boolStyle

Specify the format in which Boolean data is written in the data. The valid values are: "1_0", "T_F", "Y_N", "TRUE_FALSE", "YES". The default value is "1_0". For slightly better performance keep the default value. Allow load to continue after one or more SPU reset or failed over. The default behaviour is not allowed. Specify number of allowable continuation of a load. Default value is "1". Select the encoding type from the list. Select this check box to specify the full path to the nzload executable. You must check this option if the nzload path is not specified in the PATH environment variable. Full path to the nzload executable on the machine in use. It is advisable to specify the nzload path in the PATH environment variable instead of selecting this option.

-allowRelay -allowRelay <n> Encoding Specify nzload path

Full path to nzload executable

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is mainly used when non particular transformation is required on the data to be loaded ont to the database. This component can be used as a standalone or an output component.

Loading DATE, TIME and TIMESTAMP columns


When this component is used with an input flow, the date format specified inside the component's schema must match the value specified for -dateStyle, -dateDelim, -timeStyle, and -timeDelim options. Please refer to following examples:
DB Type DATE TIME Schema date format "yyyy-MM-dd" "HH:mm:ss" -dateStyle "YMD" n/a "YMD" -dateDelim "-" n/a "-" -timeStyle n/a "24HOUR" "24HOUR" -timeDelim n/a ":" ":"

TIMESTAMP "yyyy-MM-dd HH:mm:ss"

Related scenario
For a related use case, see Scenario: Inserting data in MySQL database of tMysqlOutputBulkExec.

670

Talend Open Studio Components

Database components
tNetezzaOutput

tNetezzaOutput
tNetezzaOutput properties
Component family Databases/Netezza

Function Purpose

tNetezzaOutput writes, updates, makes changes or suppresses entries in a database. tNetezzaOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the designed Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a tNetezzaConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Basic settings

Talend Open Studio Components

671

Database components
tNetezzaOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

672

Talend Open Studio Components

Database components
tNetezzaOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. Select this check box to carry out a bulk insert of a definable set of lines instead of inserting lines one by one. The gain in system performance is huge. Number of rows per insert: enter the number of rows to be inserted as one block. Note that too high value decreases performance due to memory issues. This check box is available only when you have selected the Insert option in the Action on data field.

Advanced settings

Additional JDBC parameters

Extend Insert

Use batch size

Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Update or the Delete option in the Action on data field. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at executions. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Commit every

Additional Columns

Use field options

Select this check box to customize a request, especially when there is double action on data.

tStatCatcher Statistics Select this check box to collect log data at the component level. Talend Open Studio Components 673

Database components
tNetezzaOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Netezza database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tNetezzaOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

674

Talend Open Studio Components

Database components
tNetezzaRollback

tNetezzaRollback
tNetezzaRollback properties
This component is closely related to tNetezzaCommit and tNetezzaConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Netezza

Function Purpose Basic settings

tNetezzaRollback cancels the transaction committed in the connected DB. This component avoids to commit part of a transaction involuntarily. Component list Select the tNetezzaConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Netezza components, especially with tNetezzaConnection and tNetezzaCommit. n/a

Related scenarios
For tNetezzaRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

Talend Open Studio Components

675

Database components
tNetezzaRow

tNetezzaRow
tNetezzaRow properties
Component family Databases/Netezza

Function

tNetezzaRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means that the component implements a flow in the job design although it does not provide output. Depending on the nature of the query and the database, tNetezzaRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tNetezzaConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

676

Talend Open Studio Components

Database components
tNetezzaRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name Query type

Enter the name of the table to be processed. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Talend Open Studio Components 677

Database components
tNetezzaRow

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For a tNetezzaRow related scenario, see Scenario 1: Removing and regenerating a MySQL table index.

678

Talend Open Studio Components

Database components
tOracleBulkExec

tOracleBulkExec
tOracleBulkExec properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tOracleOutputBulkExec component, detailed in a separate section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family Databases/Oracle

Function Purpose Basic settings

tOracleBulkExec inserts, appends, replaces or truncate data in an Oracle database. As a dedicated component, it allows gains in performance during operations performed on data of an Oracle database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box when you are using the component tOracleConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type DB Version Host Port Database Schema Drop-down list of available drivers Select the Oracle version in use IP address of the database server Port number listening the database server Database name. Schema name.

Talend Open Studio Components

679

Database components
tOracleBulkExec

Service Name Perl only Username and Password Table Action on table

Oracle Service Name or SID in Oracle database. In Java projects, the the full database connection details are required. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if doesnt exist: The table is created if it does not exist. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Data file name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Action on data On the data of the table defined, you can perform: Insert: Inserts rows to an empty table. If duplicates are found, job stops. Update: Update the existing data of the table. Append: Adds rows to the existing data of the table Replace: Overwrites some rows of the table Truncate: Drops table entries and inserts new input flow data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Select this check box to change the separator used for the numbers. Select this check box if you use a control file (.ctl) and specify its path in the .ctl file name field. Define the record format: Default: format parameters are set by default. Stream: set Record terminator. Fixed: set the Record length. Variable: set the Field size of the record length.

Schema and Edit Schema

Advanced settings

Advanced separator (for number) Use existing control file Record format

Specify .ctl files INTO Select this check box to manually fill in the INTO TABLE clause TABLE clause of the control file. manually

680

Talend Open Studio Components

Database components
tOracleBulkExec

Fields terminated by

Character, string or regular expression to separate fields: None: no separator is used. Whitespace: the separator used is a space. EOF (used for loading LOBs from lobfile): the separator used is an EOF character (End Of File). Other terminator: Set another terminator in the Field terminator field. Select this check box if you want to use enclosing characters for the text: Fields enclosure (left part): character delimiting the left of the field. Field enclosure (right part): character delimiting the right of the field. Select this check box to use the date pattern of the schema in the date field.

Use fields enclosure

Use schemas Date Pattern to load Date field

Specify field condition Select this check box to define data loading condition. Preserve blanks Trailing null columns Load options Select this check box to preserve the blanks. Select this check box to load null columns. Click + to add data loading options: Parameter: select a loading parameter from the list. Value: enter a value for the parameter selected. In the list, select the language used for the data that are not used in Unicode. Select this check box to modify the territory conventions used for day and weeks numbering. Your OS value is the default value used. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for database data handling. Select the type of output for the standard output of the Oracle database: to console, to global variable. Select this check box to uppercase the names of the columns and the name of the table.

NLS Language Set Parameter NLS_TERRITORY Encoding

Output

Convert columns and table names to uppercase

tStatCatcher Statistics Select this check box to collect log data at the component level. Fields terminated by Perl only Perl only Perl only Fields optionnally enclosed by Encoding Character, string or regular expression to separate fields. Data enclosure characters. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Talend Open Studio Components

681

Database components
tOracleBulkExec

Usage

This dedicated component offers performance and flexibility of Oracle DB query handling.

Scenario: Truncating and inserting file data into Oracle DB


This scenario describes how to truncate the content of an Oracle DB and load an input file content. The related job is composed of three components that respectively creates the content, output this content into a file to be loaded onto the Oracle database after the DB table has been truncated.

Drop the following components: tOracleInput, tFileOutputDelimited, tOracleBulkExec from the Palette to the design workspace Connect the tOracleInput with the tFileOutputDelimited using a row main link. And connect the tOracleInput to the tOracleBulkExec using a OnSubjobOk trigger link. Define the Oracle connection details. We recommend you to store the DB connection details in the Metadata repository in order to retrieve them easily at any time in any job.

Define the schema, if it isnt stored either in the Repository. In this example, the schema is as follows: ID_Contract, ID_Client, Contract_type, Contract_Value. Define the tFileOutputDelimited component parameters, including output File Name, Row separator and Fields delimiter. Then double-click on the tOracleBulkExec to define the DB feeding properties.
682 Talend Open Studio Components

Database components
tOracleBulkExec

In the Property Type, select Repository mode if you stored the database connection details under the Metadata node of the Repository or select Built-in mode to define them manually. In this scenario, we use the Built-in mode. Thus, set the connection parameters in the following fields: Host, Port, Database, Schema, Username, and Password. Fill in the name of the Table to be fed and the Action on data to be carried out, in this use case: insert. In the Schema field, select Built-in mode, and click [...] button next to the Edit schema field to describe the structure of the data to be passed on to the next component. Click the Advanced settings view to configure the advanced settings of the component.

Talend Open Studio Components

683

Database components
tOracleBulkExec

Select the Use an existing control file check box if you want to use a control file (.ctl) storing the status of the physical structure of the database. Or, fill in the following fields manually: Record format, Specify .ctl files INTO TABLE clause manually, Field terminated by, Use field enclosure, Use schemas Date Pattern to load Date field, Specify field condition, Preserve blanks, Trailing null columns, Load options, NLS Language et Set Parameter NLS_TERRITORY according to your database. Define the Encoding as in preceding steps. For this scenario, in the Output field, select to console to output the standard output f the database to the console. Press F6 to run the job. The log output displays in the Run tab and the table is fed with the parameter file data. Related topic: Scenario: Inserting data in MySQL database.

684

Talend Open Studio Components

Database components
tOracleClose

tOracleClose
tOracleClose properties
Component family Databases/Oracle

Function Purpose Basic settings

tOracleClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tOracleConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Oracle components, especially with tOracleConnection and tOracleCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

685

Database components
tOracleCommit

tOracleCommit
tOracleCommit Properties
This component is closely related to tOracleConnection and tOracleRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Oracle

Function Purpose

Validates the data processed through the job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tOracleConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tOracleCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Oracle components, especially with tOracleConnection and tOracleRollback components. n/a

Related scenario
This component is closely related to tOracleConnection and tOracleRollback. It usually doesnt make much sense to use one of these without using a tOracleConnection component to open a connection for the current transaction. For tOracleCommit related scenario, see tMysqlConnection on page 594.

686

Talend Open Studio Components

Database components
tOracleConnection

tOracleConnection
tOracleConnection Properties
This component is closely related to tOracleCommit and tOracleRollback. It usually doesnt make much sense to use one of these without using a tOracleConnection component to open a connection for the current transaction.
Component family Databases/Oracle

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Connection type DB Version Use tns file Drop-down list of available drivers. Select the Oracle version in use Select this check box to use the metadata of a context included in a tns file. One tns file may have many contexts. TNS File: Enter the path to the tns file manually or browse to the file by clicking the three-dot button next to the filed. Select a DB Connection in Tns File: Click the three-dot button to display all the contexts held in the tns file and select the desired one. Host Port Database Schema Username and Password Additional JDBC parameters Database server IP address Listening port number of DB server. Name of the database Name of the schema DB user authentication data. Specify additional connection properties for the DB connection you are creating. You can set the encoding parameters through this field.

Talend Open Studio Components

687

Database components
tOracleConnection

Use or register a shared DB Connection

Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Oracle components, especially with tOracleCommit and tOracleRollback components. n/a

Related scenario
This component is closely related to tOracleCommit and tOracleRollback. It usually doesnt make much sense to use one of these without using a tOracleConnection component to open a connection for the current transaction. For tOracleConnection related scenario, see tMysqlConnection on page 594.

688

Talend Open Studio Components

Database components
tOracleInput

tOracleInput
tOracleInput properties
Component family Databases/Oracle

Function Purpose

tOracleInput reads a database and extracts fields based on a query. tOracleInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Connection type DB Version Use an existing connection Drop-down list of available drivers. Select the Oracle version in use Select this check box when using a configured tOracleConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Oracle schema Database server IP address Listening port number of DB server. Name of the database Oracle schema name.

Basic settings

Talend Open Studio Components

689

Database components
tOracleInput

Username and Password Schema and Edit Schema

DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Database table name.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings tStatCatcher Statistics Select this check box to collect log data at the component level. Use cursor Trim all the String/Char columns Trim column Usage When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

This component covers all possible SQL queries for Oracle databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Dynamic context use in MySQL DB insert. Scenario: Writing dynamic columns from a MySQL database to an output file.

690

Talend Open Studio Components

Database components
tOracleOutput

tOracleOutput
tOracleOutput properties
Component family Databases/Oracle

Function Purpose

tOracleOutput writes, updates, makes changes or suppresses entries in a database. tOracleOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a tOracleConnection component. When you deselect it, a check box appears (selected by default and followed by a field) in the Advanced settings, Batch Size, which enables you to define the number of lines in each processed batch. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type This field is Java only. DB Version Drop-down list of available drivers. Select the Oracle version in use

Basic settings

Talend Open Studio Components

691

Database components
tOracleOutput

Host Port Database Username and Password This field is Perl only. Oracle schema Table Action on table

Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Name of the Oracle schema Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

692

Talend Open Studio Components

Database components
tOracleOutput

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

This option is Java only.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. You can press Ctrl+Space to access a list of predefined global variables. Select this check box to override variables already set for a NLS language environment. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution.

Advanced settings

Additional JDBC parameters

Java only. This field is Perl only. Override any existing NLS_LANG environment variable Commit every

tStatCatcher Statistics Select this check box to collect log data at the component level. Additional Columns This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Java only. Select this check box to customize a request, especially when there is double action on data.

Talend Open Studio Components

693

Database components
tOracleOutput

Use Hint Options Java only.

Select this check box to activate the hint configuration area which helps you optimize a querys execution. In this area, parameters are: - HINT: specify the hint you need, using the syntax /*+ */. - POSITION: specify where you put the hint in a SQL statement. - SQL STMT: select the SQL statement you need to use. Select this check box to set the names of columns and table in upper case. Select this check box to display each step during processing entries in a database. When selected, enables you to define the number of lines in each processed batch. This option is available only when you do not Use an existing connection in Basic settings.

Java only. Java only.

Convert columns and table to uppercase Enable debug mode Use Batch Size

Java only.

Java only. Usage

Support null in SQL WHERE statement

Select this check box to validate null in SQL WHERE statement.

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Oracle database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMysqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tOracleOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

694

Talend Open Studio Components

Database components
tOracleOutputBulk

tOracleOutputBulk
tOracleOutputBulk properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tOracleOutputBulkExec component, detailed in a separate section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family Databases/Oracle

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the Oracle standards Prepares the file to be used as parameter in the INSERT query to feed the Oracle database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the file A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Advanced separator (for number)

Select this check box to change data separators for numbers: Thousands separator: define separators you want to use for thousands. Decimal separator: define separators you want to use for decimals.

Talend Open Studio Components

695

Database components
tOracleOutputBulk

Field separator Row separator Encoding

Character, string or regular expression to separate fields. String (ex: \non Unix) to separate rows. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage This component is to be used along with tOracleBulkExec component. Used together they offer gains in performance while feeding a Oracle database.

Related scenarios
For use cases in relation with tOracleOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

696

Talend Open Studio Components

Database components
tOracleOutputBulkExec

tOracleOutputBulkExec
tOracleOutputBulkExec properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tOracleOutputBulkExec component.
Component family Databases/Oracle

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to an Oracle database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tOracleConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type DB Version Host Port Database Schema List of available drivers Select the Oracle version in use Database server IP address Listening port number of DB server. Name of the database Name of the schema.

Talend Open Studio Components

697

Database components
tOracleOutputBulkExec

Username and Password Table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operations is carried out. Drop and create the table: The table is removed and created again. Create a table: The table does not exist and gets created. Create table if doesnt exist: The table is created if does not exist. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. This check box is selected by default. It creates a directory to hold the output table if required. Select this check box to add the new rows at the end of the file. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Truncate: Remove all entries from table. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Character, string or regular expression to separate fields. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to change data separators for numbers: Thousands separator: define separators you want to use for thousands. Decimal separator: define separators you want to use for decimals. Select this check box and browse to the .ctl control file you want to use.

Action on table

File Name

Create directory if not exists Append Action on data

Schema and Edit Schema

Field separator Encoding

Advanced settings

Advanced separator (for number)

Use existing control file

698

Talend Open Studio Components

Database components
tOracleOutputBulkExec

Field separator Row separator

Character, string or regular expression to separate fields. String (ex: \non Unix) to separate rows.

Specify .ctl files INTO Select this check box to enter manually the INTO TABLE clause TABLE clause of the control file directly into the manually code. Use schemas Date Pattern to load Date field Select this check box to use the date model indicated in the schema for dates.

Specify field condition Select this check box to define a condition for loading data. Preserve blanks Trailing null columns Load options Select this check box to preserve blank spaces. Select this check box to load data with all empty columns. Click + to add data loading options: Parameter: select a loading parameter from the list. Value: enter a value for the parameter selected. From the drop-down list, select the language for your data if the data is not in Unicode. Select this check box to modify the conventions used for date and time formats. The default value is that of the operating system. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select the type of output for the standard output of the Oracle database: to console, to global variable. Select this check box to put columns and table names in upper case.

NLS Language Set Parameter NLS_TERRITORY Encoding

Output

Convert columns and table names to uppercase

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tOracleOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database. tMysqlOutputBulkExec Scenario: Inserting data in MySQL database.

Talend Open Studio Components

699

Database components
tOracleOutputBulkExec

tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

700

Talend Open Studio Components

Database components
tOracleRollback

tOracleRollback
tOracleRollback properties
This component is closely related to tOracleCommit and tOracleConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases

Function Purpose Basic settings

Cancel the transaction commit in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tOracleConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Oracle components, especially with tOracleConnection and tOracleCommit components. n/a

Related scenario
This component is closely related to tOracleConnection and tOracleCommit. It usually doesnt make much sense to use one of these without using a tOracleConnection component to open a connection for the current transaction. For tOracleRollback related scenario, see tMysqlRollback on page 636.

Talend Open Studio Components

701

Database components
tOracleRow

tOracleRow
tOracleRow properties
Component family Databases/Oracle

Function

tOracleRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tOracleRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tOracleConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type Host Port Database Username and Password Drop-down list of available drivers. Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

702

Talend Open Studio Components

Database components
tOracleRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Talend Open Studio Components

703

Database components
tOracleRow

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

704

Talend Open Studio Components

Database components
tOracleSCD

tOracleSCD
tOracleSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tOracleSCD.

Talend Open Studio Components

705

Database components
tOracleSCDELT

tOracleSCDELT
tOracleSCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tOracleSCDELT.

706

Talend Open Studio Components

Database components
tOracleSP

tOracleSP
tOracleSP Properties
Component family Databases/Oracle

Function Purpose Basic settings

tOracleSP calls the database stored procedure. tOracleSP offers a convenient way to centralize multiple or complex queries in a database and call them easily. Use an existing connection Select this check box to use an established connection from tOracleConnection. Once you select it, the Component list field appears allowing you to choose the tOracleConnection component to be used from those already established on the studio workspace. For more information on tOracleConnection, see section tOracleConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Connection type The type may be: - Oracle SID - Oracle Service Name - Oracle OCI Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Port Database Schema Select the Oracle version in use Database server IP address Listening port number of DB server. Name of the database Name of the schema. 707

Property type

Talend Open Studio Components

Database components
tOracleSP

Username and Password Schema and Edit Schema

DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

SP Name Is Function / Return result in

Type in the exact name of the Stored Procedure (or Function) Select this check box, if the stored procedure is a function and one value only is to be returned. Select on the list the schema column, the value to be returned is based on. Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are paramaters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameter is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tParseRecordSet component if you want to analyze a set of records from a database table or DB query and return single records. The Custom Type is used when a Schema Column you want to use is user-defined. Two Custom Type columns are available in the Parameters table. In the first Custom Type column: - Select the check box in the Custom Type column when the corresponding Schema Column you want to use is of user-defined type. - If all listed Schema Columns in the Parameters table are of custom type, you can select the check box before Custom Type once for them all. Select a database type from the DB Type list to map the source database type to the target database type: - Auto-Mapping: Map the source database type to the target database type automatically.(default) - CLOB: Character large object - BLOB: Binary large object - DECIMAL: Decimal numeric object - NUMERIC: Character 0 to 9

Parameters

708

Talend Open Studio Components

Database components
tOracleSP

In the second Custom Type column, you can precise what the custom type is. The type may be: - STRUCT: used for one element. - ARRAY: used for a collection of elements. In the Custom name column, specify the name of the custom type that you have given to this type. When an OUT parameter uses the custom type, make sure that its corresponding Schema Column has chosen the Object type in the schema table. Advanced settings Additional JDBC parameters Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. In the list, select the language used for the data that are not used in Unicode. Select the conventions used for date and time formats. The default value is that of the operating system.

NLS Language NLS Territory

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax. When the parameters set in this component are of Custom Type, the tJava family components should be placed before the component in order for users to define values for the custom-type parameters, or after the component so as to read and output the Out-type custom parameters.

Scenario: Checking number format using a stored procedure


The following job aims at connecting to an Oracle Database containing Social Security Numbers and their holders name, calling a stored procedure that checks the SSN format of against a standard ###-##-#### format. Then the verification output results, 1 for valid format and 0 for wrong format get displayed onto the execution console.

Talend Open Studio Components

709

Database components
tOracleSP

Drag and drop the following components from the Palette: tOracleConnection, tOracleInput, tOracleSP and tLogRow. Link the tOracleConnection to the tOracleInput using a Then Run connection as no data is handled here. And connect the other components using a Row Main link as rows are to be passed on as parameter to the SP component and to the console. In the tOracleConnection, define the details of connection to the relevant Database. You will then be able to reuse this information in all other DB-related components. Then select the tOracleInput and define its properties.

Select the Use an existing connection check box and select the tOracleConnection component in the list in order to reuse the connection details that you already set. Select Repository as Property type as the Oracle schema is defined in the DB Oracle connection entry of the Repository. If you havent recorded the Oracle DB details in the Repository, then fill in the Schema name manually. Then select Repository as Schema, and retrieve the relevant schema corresponding to your Oracle DB table.

In this example, the SSN table has a four-column schema that includes ID, NAME, CITY and SSNUMBER. In the Query field, type in the following Select query or select it in the list, if you stored it in the Repository. select ID, NAME, CITY, SSNUMBER from SSN Then select the tOracleSP and define its Basic settings.

710

Talend Open Studio Components

Database components
tOracleSP

Like for the tOracleInput component, select Repository in the Property type field and select the Use an existing connection check box, then select the relevant entries in the respective list. The schema used for the tOracleSP slightly differs from the input schema. Indeed, an extra column (SSN_Valid) is added to the Input schema. This column will hold the format validity status (1 or 0) produced by the procedure.

In the SP Name field, type in the exact name of the stored procedure (or function) as called in the Database. In this use case, the stored procedure name is is_ssn. The basic function used in this particular example is as follows: CREATE OR REPLACE FUNCTION is_ssn(string_in VARCHAR2) RETURN PLS_INTEGER IS -- validating ###-##-#### format BEGIN IF TRANSLATE(string_in, '0123456789A', 'AAAAAAAAAAB') = 'AAA-AA-AAAA' THEN RETURN 1; END IF; RETURN 0; END is_ssn; / As a return value is expected in this use case, the procedure acts as a function, so select the Is function check box. The only return value expected is based on the ssn_valid column, hence select the relevant list entry.
Talend Open Studio Components 711

Database components
tOracleSP

In the Parameters area, define the input and output parameters used in the procedure. In this use case, only the SSNumber column from the schema is used in the procedure. Click the plus sign to add a line to the table and select the relevant column (SSNumber) and type (IN). Then select the tLogRow component and click Sync Column to make sure the schema is passed on from the preceding tOracleSP component.

Select the Print values in cells of a table check box to facilitate the output reading. Then save your job and press F6 to run it.

On the console, you can read the output results. All input schema columns are displayed eventhough they are not used as parameters in the stored procedure. The final column shows the expected return value, i.e. whether the SS Number checked is valid or not. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

712

Talend Open Studio Components

Database components
tOracleTableList

tOracleTableList

tOracleTableList properties
Component family Databases/Oracle

Function Purpose Basic settings

tOracleTableList iterates on a set of tables through a defined Oracle connection. This component lists the names of specified Oracle tables using a SELECT statement based on a WHERE clause. Component list Select the tOracleConnection component in the list if more than one connection is planned for the current Job.

Where clause for table Enter the WHERE clause that will be used to identify name selection the tables to iterate on. Usage Limitation This component is to be used along with other Oracle components, especially with tOracleConnection. n/a

Related scenarios
For a a tOracleTablerList related scenario, see Scenario: Iterating on DB tables and deleting their content using a user-defined SQL template.

Talend Open Studio Components

713

Database components
tParAccelBulkExec

tParAccelBulkExec

tParAccelBulkExec Properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tParAccelOutputBulkExec component, detailed in a different section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component Family Databases/ParAccel

Function Purpose Basic settings

tParAccelBulkExec performs an Insert action on the data. tParAccelBulkExec is a component which is specifically designed to improve performance when loading data in ParAccel database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box if you use a configured tParAccelConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address. Listening port number of the DB server. Database name. Exact name of the schema. DB user authentication data.

714

Talend Open Studio Components

Database components
tParAccelBulkExec

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit Schema

Advanced settings

Copy mode

Select the copy mode you want to use from either: Basic: Standard mode, without optimisation. Parallel: Allows you to use several internal ParAccel APIs in order to optimise loading speed. Name and path of the file to be processed. Select the file type from the list. Select the field layout from the list. Character, string or regular expression to separate fields. The ID is already present in the file to be loaded or will be set by the database. Select this check box to remove quotation marks from the file to be loaded. Type in the maximum number of errors before your Job stops. Type in the date format to be used. Enter the date and hour format to be used. Enter the specific, customized ParAccel option that you want to use. Browse to or enter the access path to the log file in your directory.

Filename File Type Field Layout Field separator Explicit IDs Remove Quotes Max. Errors Date Format Time/Timestamp Format Additional COPY Options Log file

Talend Open Studio Components

715

Database components
tParAccelBulkExec

Logging level

Select the information type you want to record in your log file.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a Row > Reject link filtering the data in error. For a usage example, see Scenario 3: Retrieve data in error with a Reject link from component tMysqlOutput.

Related scenarios
For a related scenario, see: Scenario: Displaying DB output from tDBOutput. Scenario 1: Adding a new column and altering data in a DB table from tMySQLOutput.

716

Talend Open Studio Components

Database components
tParAccelClose

tParAccelClose
tParAccelClose properties
Component family Databases/ParAccel

Function Purpose Basic settings

tParAccelClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tParAccelConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with ParAccel components, especially with tParAccelConnection and tParAccelCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

717

Database components
tParAccelCommit

tParAccelCommit
tParAccelCommit Properties
This component is closely related to tParAccelConnection and tParAccelRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/ParAccel

Function Purpose

Validates the data processed through the job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tParAccelConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tParAccelCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with ParAccel components, especially with tParAccelConnection and tParAccelRollback components. n/a

Related scenario
This component is closely related to tParAccelConnection and tParAccelRollback. It usually doesnt make much sense to use one of these without using a tParAccelConnection component to open a connection for the current transaction. For tParAccelCommit related scenario, see tMysqlConnection on page 594.

718

Talend Open Studio Components

Database components
tParAccelConnection

tParAccelConnection
tParAccelConnection Properties
This component is closely related to tParAccelCommit and tParAccelRollback. It usually doesnt make much sense to use one of these without using a tParAccelConnection component to open a connection for the current transaction.
Component family Databases/ParAccel

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Use or register a shared DB Connection Database server IP address Listening port number of DB server. Name of the database Name of the schema DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with ParAccel components, especially with tParAccelCommit and tParAccelRollback components. n/a

Talend Open Studio Components

719

Database components
tParAccelConnection

Related scenario
This component is closely related to tParAccelCommit and tParAccelRollback. It usually doesnt make much sense to use one of these without using a tParAccelConnection component to open a connection for the current transaction. For tParAccelConnection related scenario, see tMysqlConnection on page 594.

720

Talend Open Studio Components

Database components
tParAccelInput

tParAccelInput

tParAccelInput properties
Component family Databases/ ParAccel

Function Purpose

tParAccelInput reads a database and extracts fields based on a query. tParAccelInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tParAccelConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address. Listening port number of the DB server. Name of the database Exact name of the schema DB user authentication data.

Basic settings

Talend Open Studio Components

721

Database components
tParAccelInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to sequence the fields properly in order to match the schema definition. Guess Query Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Click the Guess schema button to retrieve the table schema. When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Guess schema Advanced settings Use cursor Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for ParAccel databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file.

722

Talend Open Studio Components

Database components
tParAccelOutput

tParAccelOutput

tParAccelOutput Properties
Component Family Databases/ParAccel

Function Purpose Basic settings

tParAccelOutput writes, updates, modifies or deletes the data in a database. tParAccelOutput executes the action defined on the table and/or on the data of a table, according to the input flow form the previous component. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box if you use a configured tParAccelConnection. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Database server IP address. Listening port number of the DB server. Database name. Exact name of the schema. DB user authentication data.

Talend Open Studio Components

723

Database components
tParAccelOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

724

Talend Open Studio Components

Database components
tParAccelOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options

Select this check box to customize a request, especially when there is double action on data.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a Row > Rejects link filtering the data in error. For a usage example, see Scenario 3: Retrieve data in error with a Reject link from component tMysqlOutput.

Related scenarios
For a related scenario, see: Scenario: Displaying DB output from tDBOutput. Scenario 1: Adding a new column and altering data in a DB table from tMySQLOutput.

Talend Open Studio Components

725

Database components
tParAccelOutputBulk

tParAccelOutputBulk

tParAccelOutputBulk properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tParAccelOutputBulkExec component, detailed in a different section. The advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family Databases/ParAccel

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the ParAccel standards Prepares the file to be used as parameter in the INSERT query to feed the ParAccel database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the file A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to include the column header.

726

Talend Open Studio Components

Database components
tParAccelOutputBulk

Encoding

Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tParAccelBulkExec component. Used together they offer gains in performance while feeding a ParAccel database.

Related scenarios
For user cases in relation with tParAccelOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

727

Database components
tParAccelOutputBulkExec

tParAccelOutputBulkExec

tParAccelOutputBulkExec Properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in tParAccelOutputBulkExec.
Component Family Databases/ParAccel

Function Purpose Basic settings

tParAccelOutputBulkExec performs an Insert action on the data. tParAccelOutputBulkExec is a component which is specifically designed to improve performance when loading data in ParAccel database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Table Action on table Database server IP address. Listening port number of the DB server. Database name. Exact name of the schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Schema and Edit Schema

728

Talend Open Studio Components

Database components
tParAccelOutputBulkExec

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Copy mode Select the copy mode you want to use from either: Basic: Standard mode, without optimisation. Parallel: Allows you to use several internal ParAccel APIs in order to optimise loading speed. Name and path of the file to be processed. Select the file type from the list. String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to add the new rows at the end of the file. The ID is already present in the file to be loaded or will be set by the database. Select this check box to remove quotation marks from the file to be loaded. Type in the maximum number of errors before your Job stops. Type in the date format to be used. Enter the date and hour format to be used. Enter the specific, customized ParAccel option that you want to use. Browse to or enter the access path to the log file in your directory. Select the information type you want to record in your log file.

Filename Advanced settings File Type Row separator Fields terminated by Append Explicit IDs Remove Quotes Max. Errors Date Format Time/Timestamp Format Additional COPY Options Log file Logging level

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a Row > Reject link filtering the data in error. For a usage example, see Scenario 3: Retrieve data in error with a Reject link from component tMysqlOutput.

Related scenarios
For a related scenario, see: Scenario: Displaying DB output from tDBOutput. Scenario 1: Adding a new column and altering data in a DB table from tMySQLOutput.

Talend Open Studio Components

729

Database components
tParAccelRollback

tParAccelRollback
tParAccelRollback properties
This component is closely related to tParAccelCommit and tParAccelConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases

Function Purpose Basic settings

Cancel the transaction commit in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tParAccelConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with ParAccel components, especially with tParAccelConnection and tParAccelCommit components. n/a

Related scenario
This component is closely related to tParAccelConnection and tParAccelCommit. It usually doesnt make much sense to use one of them without using a tParAccelConnection component to open a connection for the current transaction. For tParAccelRollback related scenario, see tMysqlRollback on page 636.

730

Talend Open Studio Components

Database components
tParAccelRow

tParAccelRow
tParAccelRow Properties
Component Family Databases/ParAccel

Function

tParAccelRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tParAccelRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tFirebirdConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username et Password Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data.

Purpose

Basic settings

Talend Open Studio Components

731

Database components
tParAccelRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name Query type

Name of the table to be read. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder. Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Guess Query

Click the Guess Query button to generate the query which corresponds to your table schema in the Query field. Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Query

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level.

732

Talend Open Studio Components

Database components
tParAccelRow

Usage

This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

Related scenarios
For a related scenario, see: Scenario: Resetting a DB auto-increment from tDBSQLRow. Scenario 1: Removing and regenerating a MySQL table index from tMySQLRow.

Talend Open Studio Components

733

Database components
tParAccelSCD

tParAccelSCD
tParAccelSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tParAccelSCD.

734

Talend Open Studio Components

Database components
tParseRecordSet

tParseRecordSet
You can find this component at the root of Databases group of the Palette of Talend Open Studio. tParseRecordSet covers needs related indirectly to the use of any database.

tParseRecordSet properties
Component family Databases

Function Purpose Basic settings

tParseRecordSet parses a set of records from a database table or DB query and possibly returns single records. .Parses a recordset rather than individual records from a table. Prev. Comp. Column list Schema and Edit Schema Set the column from the database that holds the recordset. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Attribute table Set the position value of each column for single records from the recordset.

Usage Limitation

This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. This component is mainly designed for a use with the SP component Recordset feature.

Related Scenario
For an example of tParseRecordSet in use, see:

Scenario 2: Using PreparedStatement objects to query data

Talend Open Studio Components

735

Database components
tPostgresPlusBulkExec

tPostgresPlusBulkExec
tPostgresPlusBulkExec properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases/PostgresPl us tPostgresPlusBulkExec executes the Insert action on the data provided. As a dedicated component, tPostgresPlusBulkExec allows gains in performance during Insert operations to a DB2 database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tPostgresPlusConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Name of the DB schema. DB user authentication data.

Function Purpose Basic settings

736

Talend Open Studio Components

Database components
tPostgresPlusBulkExec

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, hence can reuse it. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Schema and Edit Schema

Advanced settings

Action

Select the action to be carried out Bulk insert Bulk update Depending on the action selected, the required information varies. Character, string or regular expression to separate fields.

Field terminated by

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This dedicated component offers performance and flexibility of DB2 query handling.

Related scenarios
For tPostgresPlusBulkExec related topics, see: tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database. tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

Talend Open Studio Components

737

Database components
tPostgresPlusClose

tPostgresPlusClose
tPostgresPlusClose properties
Component family Databases/Postgres

Function Purpose Basic settings

tPostgresPlusClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tPostgresPlusConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with PostgresPlus components, especially with tPostgresPlusConnection and tPostgresPlusCommit. n/a

Related scenario
No scenario is available for this component yet.

738

Talend Open Studio Components

Database components
tPostgresPlusCommit

tPostgresPlusCommit
tPostgresPlusCommit Properties
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It usually doesnt make much sense to use JDBC components independently in a transaction.
Component family Databases/PostgresPl us Validates the data processed through the Job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tPostgresPlusConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tPostgresPlusCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit. Advanced settings Usage Limitation tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with PostgresPlus components, especially with the tPostgresPlusConnection and tPostgresPlusRollback components. n/a

Function Purpose

Basic settings

Close Connection

Related scenario
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It usually doesnt make much sense to use PostgresPlus components without using the tPostgresPlusConnection component to open a connection for the current transaction. For tPostgresPlusCommit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

739

Database components
tPostgresPlusConnection

tPostgresPlusConnection
tPostgresPlusConnection Properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually doesnt make much sense to use one of PostgresPlus components without using the tPostgresPlusConnection component to open a connection for the current transaction.
Component family Databases/PostgresPl us Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Use or register a shared DB Connection Database server IP address Listening port number of DB server. Name of the database Exact name of the schema Enter your DB authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Function Purpose Basic settings

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with PostgresPlus components, especially with the tPostgresPlusCommit and tPostgresPlusRollback components. n/a

740

Talend Open Studio Components

Database components
tPostgresPlusConnection

Related scenario
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually doesnt make much sense to use one of PostgresPlus components without using the tPostgresPlusConnection component to open a connection for the current transaction. For tPostgresPlusConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

741

Database components
tPostgresPlusInput

tPostgresPlusInput

tPostgresPlusInput properties
Component family Databases/ PostgresPlus tPostgresPlusInput reads a database and extracts fields based on a query. tPostgresPlusInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tPostgresplusConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address. Listening port number of DB server. Name of the database. Exact name of the schema. DB user authentication data.

Function Purpose

Basic settings

742

Talend Open Studio Components

Database components
tPostgresPlusInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Use cursor Trim all the String/Char columns Trim column When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Postgresql databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file.

Talend Open Studio Components

743

Database components
tPostgresPlusOutput

tPostgresPlusOutput

tPostgresPlusOutput properties
Component family Databases/PostgresPl us tPostgresPlusOutput writes, updates, makes changes or suppresses entries in a database. tPostgresPlusOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tPostgresPlusConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Database server IP address Listening port number of DB server. Name of the database Exact name of the schema.

Function Purpose

Basic settings

744

Talend Open Studio Components

Database components
tPostgresPlusOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Talend Open Studio Components

745

Database components
tPostgresPlusOutput

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Support null in SQL WHERE statement Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema. Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Insert, the Update or the Delete option in the Action on data field.

Advanced settings

Commit every

Additional Columns

Use batch size

tStatCatcher Statistics Select this check box to collect log data at the component level.

746

Talend Open Studio Components

Database components
tPostgresPlusOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a PostgresPlus database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tPostgresPlusOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

747

Database components
tPostgresPlusOutputBulk

tPostgresPlusOutputBulk

tPostgresPlusOutputBulk properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases/PostgresPl us Writes a file with columns based on the defined delimiter and the PostgresPlus standards Prepares the file to be used as parameter in the INSERT query to feed the PostgresPlus database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows. Select this check box to add the new rows at the end of the file Select this check box to include the column header to the file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Function Purpose Basic settings

Field separator Row separator Append Include header Schema and Edit Schema

748

Talend Open Studio Components

Database components
tPostgresPlusOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Usage

This component is to be used along with tPostgresPlusBulkExec component. Used together they offer gains in performance while feeding a PostgresPlus database.

Related scenarios
For use cases in relation with tPostgresplusOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

749

Database components
tPostgresPlusOutputBulkExec

tPostgresPlusOutputBulkExec

tPostgresPlusOutputBulkExec properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec component.
Component family Databases/PostgresPl us Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a PostgresPlus database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Table Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Function Purpose Basic settings

Action on table

Schema and Edit Schema

750

Talend Open Studio Components

Database components
tPostgresPlusOutputBulkExec

Advanced settings

Action

Select the action to be carried out Bulk insert Bulk update Depending on the action selected, the required information varies. Select the type of file being handled. String displayed to indicate that the value is null. String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Character used to enclose text.

File type Null string Row separator Field terminated by Text enclosure

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is mainly used when no particular tranformation is required on the data to be loaded onto the database.

Related scenarios
For use cases in relation with tPostgresPlusOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

751

Database components
tPostgresPlusRollback

tPostgresPlusRollback
tPostgresPlusRollback properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/PostgresPl us tPostgresPlusRollback cancels the transaction committed in the connected DB. This component avoids to commit part of a transaction involuntarily. Component list Select the tPostgresPlusConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Function Purpose Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with PostgresPlus components, especially with tPostgresPlusConnection and tPostgresPlusCommit. n/a

Related scenarios
For tPostgresPlusRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

752

Talend Open Studio Components

Database components
tPostgresPlusRow

tPostgresPlusRow
tPostgresPlusRow properties
Component family Databases/Postgrespl us tPostgresPlusRow is the specific component for the database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tPostgresPlusRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tPostgresPlusConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data.

Function

Purpose

Basic settings

Talend Open Studio Components

753

Database components
tPostgresPlusRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name Query type

Name of the table to be read. Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

754

Talend Open Studio Components

Database components
tPostgresPlusRow

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

755

Database components
tPostgresPlusSCD

tPostgresPlusSCD
tPostgresPlusSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tPostgresPlusSCD.

756

Talend Open Studio Components

Database components
tPostgresPlusSCDELT

tPostgresPlusSCDELT
tPostgresPlusSCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tPostgresPlusSCDELT.

Talend Open Studio Components

757

Database components
tPostgresqlBulkExec

tPostgresqlBulkExec
tPostgresqlBulkExec properties
tPostgresqlOutputBulk and tPostgresqlBulkExec components are used together to first output the file that will be then used as parameter to execute the SQL query stated. These two steps compose the tPostgresqlOutputBulkExec component, detailed in a separate section. The interest in having two separate elements lies in the fact that it allows transformations to be carried out before the data loading in the database.
Component family Databases/Postgresql

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, tPostgresqlBulkExec offers gains in performance while carrying out the Insert operations to a Postgresql database Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tPostgrresqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database. Name of the schema. DB user authentication data.

758

Talend Open Studio Components

Database components
tPostgresqlBulkExec

Table

Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Truncate table: The table content is deleted. You don not have the possibility to rollback the operation. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. On the data of the table defined, you can perform: Bulk Insert: Add multiple entries to the table. If duplicates are found, job stops. Bulk Update: Make simultaneous changes to multiple entries.

Action on table

File Name

Schema and Edit Schema

Advanced settings

Action on data

Copy the OID for each Retrieve the ID item for each row. row Contains a header line Specify that the table contains header. with the names of each column in the file File type Null string Fields terminated by Escape char Text enclosure Activate standard_conforming _string Force not null for columns Select the type of file being handled. String displayed to indicate that the value is null.. Character, string or regular expression to separate fields. Character of the row to be escaped. Character used to enclose text. Activate the variable.

Define the columns nullability Force not null:: Select the check box next to the column you want to define as not null.

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

759

Database components
tPostgresqlBulkExec

Usage

This component is to be used along with tPostgresqlOutputBulk component. Used together, they can offer gains in performance while feeding a Postgresql database. n/a

Limitation

Related scenarios
For use cases in relation with tPostgresqlBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

760

Talend Open Studio Components

Database components
tPostgresqlCommit

tPostgresqlCommit
tPostgresqlCommit Properties
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Postgresql

Function Purpose

Validates the data processed through the job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tPostgresqlConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tPostgresqlCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Postgresql components, especially with tPostgresqlConnection and tPostgresqlRollback components. n/a

Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlRollback. It usually doesnt make much sense to use one of these without using a tPostgresqlConnection component to open a connection for the current transaction. For tPostgresqlCommit related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

761

Database components
tPostgresqlClose

tPostgresqlClose
tPostgresqlClose properties
Component family Databases/Postgresql

Function Purpose Basic settings

tPostgresqlClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tPostgresqlConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Postgresql components, especially with tPostgresqlConnection and tPostgresqlCommit. n/a

Related scenario
No scenario is available for this component yet.

762

Talend Open Studio Components

Database components
tPostgresqlConnection

tPostgresqlConnection
tPostgresqlConnection Properties
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually doesnt make much sense to use one of these without using a tPostgresqlConnection component to open a connection for the current transaction.
Component family Databases/Postgresql

Function Purpose Basic settings

Opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Use or register a shared DB Connection Database server IP address Listening port number of DB server. Name of the database Exact name of the schema DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Postgresql components, especially with tPostgresqlCommit and tPostgresqlRollback components. n/a

Related scenario
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually doesnt make much sense to use one of these without using a tPostgresqlConnection component to open a connection for the current transaction. For tPostgresqlConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

763

Database components
tPostgresqlInput

tPostgresqlInput
tPostgresqlInput properties
Component family Databases/ PostgreSQL tPostgresqlInput reads a database and extracts fields based on a query. tPostgresqlInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tPostgresqlConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data.

Function Purpose

Basic settings

764

Talend Open Studio Components

Database components
tPostgresqlInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Use cursor Trim all the String/Char columns Trim column When selected, helps to decide the row set to work with at a time and thus optimize performance. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Postgresql databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Scenario: Dynamic context use in MySQL DB insert.

Talend Open Studio Components

765

Database components
tPostgresqlOutput

tPostgresqlOutput
tPostgresqlOutput properties
Component family Databases/Postgresql

Function Purpose

tPostgresqlOutput writes, updates, makes changes or suppresses entries in a database. tPostgresqlOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tPostgresqlConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Exact name of the schema. DB user authentication data.

Basic settings

766

Talend Open Studio Components

Database components
tPostgresqlOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Talend Open Studio Components

767

Database components
tPostgresqlOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options Enable debug mode Support null in SQL WHERE statement

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Postgresql database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tPostgresqlOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.
768 Talend Open Studio Components

Database components
tPostgresqlOutputBulk

tPostgresqlOutputBulk
tPostgresqlOutputBulk properties
The tPostgresqlOutputBulk and tPostgresqlBulkExec components are generally used together as part of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tPostgresqlOutputBulkExec component, detailed in a separate section. The advantage of having two separate steps is that it makes it possible to transform data before it is loaded in the database.
Component family Databases/Postgresql

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the Postgresql standards Prepares the file to be used as parameters in the INSERT query to feed the Postgresql database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the file A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to include the column header to the file.

Talend Open Studio Components

769

Database components
tPostgresqlOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tPostgresqlBulkExec component. Used together they offer gains in performance while feeding a Postgresql database.

Related scenarios
For user cases in relation with tPostgresqlOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

770

Talend Open Studio Components

Database components
tPostgresqlOutputBulkExec

tPostgresqlOutputBulkExec
tPostgresqlOutputBulkExec properties
The tPostgresqlOutputBulk and tPostgresqlBulkExec components are generally used together as part of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tPostgresqlOutputBulkExec component.
Component family Databases/Postgresql

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Postgresql database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Schema Username and Password Table Database server IP address Listening port number of DB server. Name of the database Name of the schema. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if already exists and created again. Clear a table: The table content is deleted. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide

Action on table

File Name

Talend Open Studio Components

771

Database components
tPostgresqlOutputBulkExec

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings

Action on data

On the data of the table defined, you can perform: Bulk Insert: Add multiple entries to the table. If duplicates are found, job stops. Bulk Update: Make simultaneous changes to multiple entries.

Copy the OID for each Retrieve the ID item for each row. row Contains a header line Specify that the table contains header. with the names of each column in the file Encoding Select the encoding from the list or select CUSTOM and define it manually. This field is compulsory for DB data handling. Select the type of file being handled. String displayed to indicate that the value is null.. String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Character of the row to be escaped. Character used to enclose text. Activate the variable.

File type Null string Row separator Fields terminated by Escape char Text enclosure Activate standard_conforming _string Force not null for columns

Define the columns nullability Force not null:: Select the check box next to the column you want to define as not null.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is mainly used when no particular tranformation is required on the data to be loaded onto the database.

772

Talend Open Studio Components

Database components
tPostgresqlOutputBulkExec

Related scenarios
For use cases in relation with tPostgresqlOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

773

Database components
tPostgresqlRollback

tPostgresqlRollback
tPostgresqlRollback properties
This component is closely related to tPostgresqlCommit and tPostgresqlConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases

Function Purpose Basic settings

Cancel the transaction commit in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tPostgresqlConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Postgresql components, especially with tPostgresqlConnection and tPostgresqlCommit components. n/a

Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlCommit. It usually doesnt make much sense to use one of them without using a tPostgresqlConnection component to open a connection for the current transaction. For tPostgresqlRollback related scenario, see tMysqlRollback on page 636.

774

Talend Open Studio Components

Database components
tPostgresqlRow

tPostgresqlRow
tPostgresqlRow properties
Component family Databases/Postgresql

Function

tPostgresqlRow is the specific component for the database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tPostgresqlRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tPostgresqlConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Schema Username and Password Database server IP address Listening port number of DB server. Name of the database Name of the schema. DB user authentication data.

Purpose

Basic settings

Talend Open Studio Components

775

Database components
tPostgresqlRow

Schema using CDC and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

776

Talend Open Studio Components

Database components
tPostgresqlRow

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

777

Database components
tPostgresqlSCD

tPostgresqlSCD
tPostgresqlSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tPostgresqlSCD.

778

Talend Open Studio Components

Database components
tPostgresqlSCDELT

tPostgresqlSCDELT
tPostgresqlSCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tPostgresPlusSCDELT.

Talend Open Studio Components

779

Database components
tSASInput

tSASInput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the following three modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib > java in your Talend Open Studio directory. You can later verify, if needed whether the modules are successfully installed through the Modules view of the Studio.

tSASInput properties
Component family Databases/SAS

Function Purpose

tSASInput reads a database and extracts fields based on a query. tSASInput executes a DB query with a strictly defined statement which must correspond to the schema definition. Then it passes on the field list to the component that follows via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host name Port Librefs SAS server IP address. Listening port number of server. Enter the directory name that holds the table to read followed by its access path. For example: TpSas C:/SAS/TpSas DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

Username and Password Schema and Edit Schema

780

Talend Open Studio Components

Database components
tSASInput

Table Name

Enter the name of the table to read preceded by the directory name that holds it. For example: TpSas.Customers. The query can be built-in for a particular job or for commonly used query, it can be stored in the repository to ease the query reuse. If your query is not stored in the Repository, type in your DB query paying particularly attention to properly sequence the fields in order to match the schema definition.

Query type

Query

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component covers all possible SQL queries for databases using SAS connections.

Usage

Related scenarios
For related topics, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Dynamic context use in MySQL DB insert. Scenario: Writing dynamic columns from a MySQL database to an output file.

Talend Open Studio Components

781

Database components
tSASOutput

tSASOutput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the following three modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib > java in your Talend Open Studio directory. You can later verify, if needed whether the modules are successfully installed through the Modules view of the Studio.

tSASOutput properties
Component family Databases/SAS

Function Purpose

tSASOutput writes, updates, makes changes or suppresses entries in a database. tSASOutput executes the action defined on the table and/or on the data contained in the table, based on the incoming flow from the preceding component in the Job. Use an existing connection Select this check box and click the relevant tSASConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. SAS URL Driver JAR Enter the URL to connect to the desired DB. In the drop down list, select a desired available driver, or download one from a local directory through clicking the three-dot button. Type in the Class name to be pointed to in the driver. DB user authentication data. Name of the table to read.

Basic settings

Class Name Username and Password Table

782

Talend Open Studio Components

Database components
tSASOutput

Action on data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Clear data in table Schema and Edit Schema

Select this check box to delete data in the selected table before any operation. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution.

Advanced settings

Commit every

Talend Open Studio Components

783

Database components
tSASOutput

Additional Columns

This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as a new column. SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tSASOutput can use to place or replace the new or altered column.

Use field options Enable debug mode

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a SAS database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For scenarios in which tSASOutput might be used, see: tDBOutput Scenario: Displaying DB output. tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

784

Talend Open Studio Components

Database components
tSQLiteClose

tSQLiteClose
tSQLiteClose properties
Component family Databases/SQLite

Function Purpose Basic settings

tSQLiteClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tSQLiteConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with SQLite components, especially with tSQLiteConnection and tSQLiteCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

785

Database components
tSQLiteCommit

tSQLiteCommit
tSQLiteCommit Properties
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/SQLite

Function Purpose

tSQLiteCommit validates the data processed through the Job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tSQLiteConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tSQLiteCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with SQLite components, especially with tSQLiteConnection and tSQLiteRollback. n/a

Related scenario
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not make much sense to use one of these without using a tSQLiteConnection component to open a connection for the current transaction. For tSQLiteCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

786

Talend Open Studio Components

Database components
tSQLiteConnection

tSQLiteConnection

SQLiteConnection properties
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not make much sense to use one of these without using a tSQLiteConnection to open a connection for the current transaction.
Component family Databases/SQLite

Function Purpose Basic settings

tSQLiteConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Database Use or register a shared DB Connection Name of the database. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to gather the job processing metadata at a Job level as well as at each component level. Usage Limitation This component is to be used along with SQLite components, especially with tSQLiteCommit and tSQLiteRollback. n/a

Related scenarios
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not make much sense to use one of these without using a tSQLiteConnection component to open a connection for the current transaction. For tSQLiteConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

787

Database components
tSQLiteInput

tSQLiteInput
tSQLiteInput Properties
Component family Databases

Function Purpose

tSQLiteInput reads a database file and extracts fields based on an SQL query. As it embeds the SQLite engine, no need of connecting to any database server. tSQLiteInput executes a DB query with a defined command which must correspond to the schema definition. Then it passes on rows to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSQLiteConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Database Filepath to the SQLite database file.

Basic settings

788

Talend Open Studio Components

Database components
tSQLiteInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

The query can be built-in for a particular job or for commonly used query, it can be stored in the repository to ease the query reuse. If your query is not stored in the Repository, type in your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Query

Advanced settings

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is standalone as it includes the SQLite engine. This is a startable component that can iniate a data flow processing.

Scenario: Filtering SQlite data


This scenario describes a rather simple job which uses a select statement based on a filter to extract rows from a source SQLite Database and feed an output SQLite table.

Drop from the Palette, a tSQLiteInput and a tSQLiteOutput component from the Palette to the design workspace. Connect the input to the output using a row main link. On the tSQLiteInput Basic settings, type in or browse to the SQLite Database input file.

Talend Open Studio Components

789

Database components
tSQLiteInput

The file contains hundreds of lines and includes an ip column which the select statement will based on On the tSQLite Basic settings, edit the schema for it to match the table structure.

In the Query field, type in your select statement based on the ip column. On the tSQLiteOutput component Basic settings panel, select the Database filepath.

790

Talend Open Studio Components

Database components
tSQLiteInput

Type in the Table to be fed with the selected data. Select the Action on table and Action on Data. In this use case, the action on table is Drop and create and the action on data is Insert. The schema should be synchronized with the input schema. Save the job and run it.

The data queried is returned in the defined SQLite file.

Talend Open Studio Components

791

Database components
tSQLiteOutput

tSQLiteOutput
tSQLiteOutput Properties
Component family Databases

Function

tSQLiteOutput writes, updates, makes changes or suppresses entries in an SQLite database. As it embeds the SQLite engine, no need of connecting to any database server. tSQLiteOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSQLiteConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Database Table Filepath to the Database file Name of the table to be written. Note that only one table can be written at a time

Purpose

Basic settings

792

Talend Open Studio Components

Database components
tSQLiteOutput

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Talend Open Studio Components

793

Database components
tSQLiteOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options Enable debug mode

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component must be connected to an Input component. It allows you to carry out actions on a table or on the data of a table in an SQLite database. It also allows you to create reject flows using a Row > Reject link to filter erroneous data. For an example of tSQLiteOuput in use, see Scenario 3: Retrieve data in error with a Reject linkn of the tMySQLOutput component.

Related Scenario
For scenarios related to tSQLiteOutput, see tPostgresqlCommit on page 761.

794

Talend Open Studio Components

Database components
tSQLiteRollback

tSQLiteRollback
tSQLiteRollback properties
This component is closely related to tSQLiteCommit and tSQLiteConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/SQLite

Function Purpose Basic settings

tSQLiteRollback cancels the transaction committed in the connected DB. Avoids to commit part of a transaction involuntarily. Component list Select the tSQLiteConnection component in the list if more than one connection are planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with SQLite components, especially with tSQLiteConnection and tSQLiteCommit. n/a

Related scenarios
For tSQLiteRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

Talend Open Studio Components

795

Database components
tSQLiteRow

tSQLiteRow
tSQLiteRow Properties
Component family Databases

Function Purpose

tSQLiteRow executes the defined query onto the specified database and uses the parameters bound with the column. A prepared statement uses the input flow to replace the placeholders with the values for each parameters defined. This component can be very useful for updates. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSQLiteConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

796

Talend Open Studio Components

Database components
tSQLiteRow

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Clear this check box to skip the row on error and complete the process for error-free rows. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error Advanced settings Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Scenario: Updating SQLite rows


This scenario describes a job which updates an SQLite database file based on a prepared statement and using a delimited file.

Drop a tFileInputDelimited and a tSQLiteRow component from the Palette to the design workspace.

Talend Open Studio Components

797

Database components
tSQLiteRow

On the tFileInputDelimited Basic settings panel, browse to the input file that will be used to update rows in the database.

There is no Header nor Footer. The Row separator is a carriage return and the Field separator is a semi-colon. Edit the schema in case it is not stored in the Repository.

Make sure the length and type are respectively correct and large enough to define the columns. Then in the tSQLiteRow Basic settings panel, set the Database filepath to the file to be updated.

The schema is read-only as it is required to match the input schema. Type in the query or retrieve it from the Repository. In this use case, we updated the type_os for the id defined in the Input flow. The statement is as follows: Update download set type_os=? where id=?

798

Talend Open Studio Components

Database components
tSQLiteRow

Then select the Use PreparedStatement check box to display the placeholders parameter table.

In the Input parameters table, add as many lines as necessary to cover all placeholders. In this scenario, type_os and id are to be defined. Set the Commit every field. Save the job and press F6 to run it. The download table from the SQLite database is thus updated with new type_os code according to the delimited input file.

Talend Open Studio Components

799

Database components
tSybaseBulkExec

tSybaseBulkExec
tSybaseBulkExec Properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Sybase database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Server Port Database Username and Password Database server IP address Listening port number of DB server. Database name DB user authentication data.

800

Talend Open Studio Components

Database components
tSybaseBulkExec

Bcp Utility Server Batch size Table

Name of the utility to be used to copy data over to the Sybase server. IP address of the database server for the Bcp utility connection. Number of lines in each processed batch. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Action on table

File Name

Schema and Edit Schema

Advanced settings

Action on data

On the data of the table defined, you can perform: Bulk Insert: Add multiple entries to the table. If duplicates are found, job stops. Bulk Update: Make simultaneous changes to multiple entries. Character, string or regular expression to separate fields. String (ex: \n in Unix) to separate lines. Number of head lines to be ignored in the beginning of a file. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Field Terminator Row Terminator Head row Encoding

Talend Open Studio Components

801

Database components
tSybaseBulkExec

Output

Select the type of output for the standard output of the Sybase database: to console, to global variable. Select this check box to gather the job processing metadata at a job level as well as at each component level.

tStataCatcher statistics Usage Limitation

This component is mainly used when no particular transformation is required on the data to be loaded onto the database. As opposed to the Oracle dedicated bulk component, no action on data is possible using this Sybase dedicated component.

Related scenarios
For tSybaseBulkExec related topics, see: tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

802

Talend Open Studio Components

Database components
tSybaseClose

tSybaseClose
tSybaseClose properties
Component family Databases/Sybase

Function Purpose Basic settings

tSybaseClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tSybaseConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Sybase components, especially with tSybaseConnection and tSybaseCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

803

Database components
tSybaseCommit

tSybaseCommit
tSybaseCommit Properties
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Sybase

Function Purpose

tSybaseCommit validates the data processed through the Job into the connected DB Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tSybaseConnection component in the list if more than one connection are planned for the current Job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tSybaseCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Sybase components, especially with tSybaseConnection and tSybaseRollback. n/a

Related scenario
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not make much sense to use one of these without using a tSybaseConnection component to open a connection for the current transaction. For tSybaseCommit related scenario, see Scenario: Inserting data in mother/daughter tables.

804

Talend Open Studio Components

Database components
tSybaseConnection

tSybaseConnection
tSybaseConnection Properties
This component is closely related to tSybaseCommit and tSybaseRollback. It usually does not make much sense to use one of these without using a tSybaseConnection component to open a connection for the current transaction.
Component family Databases/Sybase

Function Purpose Basic settings

tSybaseConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Port Database Username and Password Use or register a shared DB Connection Database server IP address Listening port number of DB server. Name of the database. DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Usage Limitation

This component is to be used along with Sybase components, especially with tSybaseCommit and tSybaseRollback. n/a

Related scenarios
For a tSybaseConnection related scenario, see Scenario: Inserting data in mother/daughter tables.

Talend Open Studio Components

805

Database components
tSybaseInput

tSybaseInput
tSybaseInput Properties
Component family Databases/Sybase

Function Purpose

tSybaseInput reads a database and extracts fields based on a query. tSybaseInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Server Port Database Sybase Schema Database server IP address Listening port number of DB server. Name of the database Exact name of the Sybase schema.

Basic settings

806

Talend Open Studio Components

Database components
tSybaseInput

Username and Password Schema and Edit Schema

DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name

Name of the table to read.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Trim all the String/Char columns Trim column Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Sybase databases.

Related scenarios
For related topics, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Dynamic context use in MySQL DB insert. Scenario: Writing dynamic columns from a MySQL database to an output file.

Talend Open Studio Components

807

Database components
tSybaseIQBulkExec

tSybaseIQBulkExec
tSybaseIQBulkExec Properties
Component family Databases/Sybase IQ

Function Purpose Basic settings

tSybaseIQBulkExec uploads a bulk file in a Sybase IQ database. As a dedicated component, it allows gains in performance during Insert operations to a Sybase IQ database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component List to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. DB Version Host Port Database Username and Password Table Select the Sybase database version you are using. Database server IP address Listening port number of DB server. Database name DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed.

808

Talend Open Studio Components

Database components
tSybaseIQBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Local filename

Schema and Edit Schema

Advanced settings

Lines terminated by Field Terminated by Use enclosed quotes Use fixed length

Character or sequence of characters used to separate lines. Character, string or regular expression to separate fields. Select this check box to use data enclosure characters. Select this check box to set a fixed width for data lines.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This dedicated component offers performance and flexibility of Sybase IQ DB query handling. As opposed to the Oracle dedicated bulk component, no action on data is possible using this Sybase dedicated component.

Related scenarios
For tSybaseIQBulkExec related topics, see: tMysqlOutputBulkExec Scenario: Inserting transformed data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.
Talend Open Studio Components 809

Database components
tSybaseIQOutputBulkExec

tSybaseIQOutputBulkExec
tSybaseIQOutputBulkExec properties
Component family Databases/Sybase IQ

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Sybase IQ database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Table Database server IP address. Listening port number of DB server. Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed.

810

Talend Open Studio Components

Database components
tSybaseIQOutputBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. select this check box to add the new rows at the end of the records. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

File Name

Append the file Schema and Edit Schema

Advanced settings

Fields terminated by Lines terminated by Use enclose quotes Include Head Encoding

Character, string or regular expression to separate fields. Character or sequence of characters used to separate lines. Select this check box to use data enclosure characters. Select this heck box to include the column header. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tSybaseIQOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database.

Talend Open Studio Components

811

Database components
tSybaseIQOutputBulkExec

tMysqlOutputBulkExec Scenario: Inserting data in MySQL database. tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB.

812

Talend Open Studio Components

Database components
tSybaseOutput

tSybaseOutput
tSybaseOutput Properties
Component family Databases/Sybase

Function Purpose

tSybaseOutput writes, updates, makes changes or suppresses entries in a database. tSybaseOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Server Port Database Sybase Schema Database server IP address Listening port number of DB server. Name of the database Exact name of the Sybase schema.

Basic settings

Talend Open Studio Components

813

Database components
tSybaseOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.

Turn on identity insert Select this check box to use your own sequence for the identity value of the inserted records (instead of having the SQL Server pick the next sequential value). Action on data On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

814

Talend Open Studio Components

Database components
tSybaseOutput

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Use batch size Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Insert, the Update or the Delete option in the Action on data field.

Advanced settings

Commit every

Additional Columns

tStatCatcher Statistics Select this check box to collect log data at the component level.

Talend Open Studio Components

815

Database components
tSybaseOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Sybase database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For use cases in relation with tSybaseOutput, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

816

Talend Open Studio Components

Database components
tSybaseOutputBulk

tSybaseOutputBulk
tSybaseOutputBulk properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases/Sybase

Function Purpose Basic settings

Writes a file with columns based on the defined delimiter and the Sybase standards Prepares the file to be used as parameter in the INSERT query to feed the Sybase database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to include the column header in the file.

Talend Open Studio Components

817

Database components
tSybaseOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to collect log data at the component level Usage This component is to be used along with tSybaseBulkExec component. Used together they offer gains in performance while feeding a Sybase database.

Related scenarios
For user cases in relation with tSybaseOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

818

Talend Open Studio Components

Database components
tSybaseOutputBulkExec

tSybaseOutputBulkExec
tSybaseOutputBulkExec properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component.
Component family Databases/Sybase

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Sybase database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Server Port Database Username and Password Bcp utility Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Name of the utility to be used to copy data over to the Sybase server.

Talend Open Studio Components

819

Database components
tSybaseOutputBulkExec

Batch row number Table

Number of lines in each processed batch. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the records. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Action on table

File Name

Append Schema and Edit Schema

Advanced settings

Action on data

On the data of the table defined, you can perform: Bulk Insert: Add multiple entries to the table. If duplicates are found, job stops. Bulk Update: Make simultaneous changes to multiple entries. Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows in the DB. Type in the number of the file row where the action should start at. Character, string or regular expression to separate fields in a file. Select this heck box to include the column header. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Field terminator DB Row terminator First row FILE Row terminator Include Head Encoding

820

Talend Open Studio Components

Database components
tSybaseOutputBulkExec

Output

Select the type of output for the standard output of the Sybase database: to console, to global variable.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tSybaseOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB

Talend Open Studio Components

821

Database components
tSybaseRollback

tSybaseRollback
tSybaseRollback properties
This component is closely related to tSybaseCommit and tSybaseConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Sybase

Function Purpose Basic settings

tSybaseRollback cancels the transaction committed in the connected DB. This component avoids to commit part of a transaction involuntarily. Component list Select the tSybaseConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Sybase components, especially with tSybaseConnection and tSybaseCommit. n/a

Related scenarios
For tSybaseRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

822

Talend Open Studio Components

Database components
tSybaseRow

tSybaseRow
tSybaseRow Properties
Component family Databases/Sybase

Function

tSybaseRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tSybaseRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Server Port Database Sybase Schema Username and Password Table Name Database server IP address Listening port number of DB server. Name of the database Exact name of the sybase schema. DB user authentication data. Name of the table to be processed.

Purpose

Basic settings

Talend Open Studio Components

823

Database components
tSybaseRow

Turn on identity insert Select this check box to use your own sequence for the identity value of the inserted records (instead of having the SQL Server pick the next sequential value). Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Query type Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in. Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased Commit every Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

tStatCatcher Statistics Select this check box to collect log data at the component level.

824

Talend Open Studio Components

Database components
tSybaseRow

Usage

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For tSybaseRow related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

825

Database components
tSybaseSCD

tSybaseSCD
tSybaseSCD belongs to two component families: Business Intelligence and Databases. For more information on it, see tSybaseSCD.

826

Talend Open Studio Components

Database components
tSybaseSCDELT

tSybaseSCDELT
tSybaseSCDELT belongs to two component families: Business Intelligence and Databases. For more information on it, see tSybaseSCDELT.

Talend Open Studio Components

827

Database components
tSybaseSP

tSybaseSP
tSybaseSP properties
Component family Databases/Sybase

Function Purpose Basic settings

tSybaseSP calls the database stored procedure. tSybaseSP offers a convenient way to centralize multiple or complex queries in a database and call them easily. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tSybaseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Schema and Edit Schema Database server IP address Listening port number of DB server. Name of the database DB user authentication data. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

828

Talend Open Studio Components

Database components
tSybaseSP

Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. SP Name Is Function / Return result in Timeout Interval Parameters Type in the exact name of the Stored Procedure Select this check box, if a value is to be returned. Select on the list the schema column, the value to be returned is based on. Maximum waiting time for the results of the stored procedure. Click the Plus button and select the various Schema Columns that will be required by the procedures. Note that the SP schema can hold more columns than there are parameters used in the procedure. Select the Type of parameter: IN: Input parameter OUT: Output parameter/return value IN OUT: Input parameters is to be returned as value, likely after modification through the procedure (function). RECORDSET: Input parameters is to be returned as a set of values, rather than single value. Check the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Advanced settings

Use Multiple SELECT Select this check box to use procedures which contain Procedure multiple SELECT statements. tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage Limitation

This component is used as intermediary component. It can be used as start component but only input parameters are thus allowed. The Stored Procedures syntax should match the Database syntax.

Related scenarios
For related topic, see tMysqlSP Scenario: Finding a State Label using a stored procedure. Check as well the tPostgresqlCommit component if you want to analyze a set of records from a database table or DB query and return single records.

Talend Open Studio Components

829

Database components
tTeradataClose

tTeradataClose
tTeradataClose properties
Component family Databases/Teradata

Function Purpose Basic settings

tTeradataClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tTeradataConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Teradata components, especially with tTeradataConnection and tTeradataCommit. n/a

Related scenario
No scenario is available for this component yet.

830

Talend Open Studio Components

Database components
tTeradataCommit

tTeradataCommit
tTeradataCommit Properties
This component is closely related to tTeradataConnection and tTeradataRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Teradata

Function Purpose

tTeradataCommit validates the data processed through the job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tTeradataConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tTeradataCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Teradata components, especially with tTeradataConnection and tTeradataRollback components. n/a

Related scenario
This component is closely related to tTeradataConnection and tTeradataRollback. It usually doesnt make much sense to use one of these without using a tTeradataConnection component to open a connection for the current transaction. For tTeradataCommit related scenario, see tVerticaConnection on page 875.

Talend Open Studio Components

831

Database components
tTeradataConnection

tTeradataConnection
tTeradataConnection Properties
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesnt make much sense to use one of these without using a tTeradataConnection component to open a connection for the current transaction.
Component family Databases/Teradata

Function Purpose Basic settings

tTeradataConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Database Username and Password Additional JDBC parameters Database server IP address. Name of the database. DB user authentication data. Specify additional connection properties in the existing DB connection, to allow specific character set support. E.G.: CHARSET=KANJISJIS_OS to get support of Japanese characters. You can set the encoding parameters through this field. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to automatically commit a transaction when it is completed.

Use or register a shared DB Connection

Advanced settings

Auto commit

tStatCatcher Statistics Select this check box to collect log data at the component level. Utilisation Limitation This component is to be used along with Teradata components, especially with tTeradataCommit and tTeradataRollback components. n/a

832

Talend Open Studio Components

Database components
tTeradataConnection

Related scenario
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesnt make much sense to use one of these without using a tTeradataConnection component to open a connection for the current transaction. For tTeradataConnection related scenario, see tMysqlConnection on page 594.

Talend Open Studio Components

833

Database components
tTeradataFastExport

tTeradataFastExport
tTeradataFastExport Properties
Component Family Databases/Teradata

Function Purpose Basic settings

tTeradataFastExport exports rapidly voluminous data batches from a Teradata table or view. tTeradataFastExport exports data batches from a Teradata table to a cutsomer system or to a smaller database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution platform Database name Username and Password Table Schema and Edit Schema Select the Operating System type you use. Database name. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Script generated folder Exported file Field separator Error file Browse your directory and select the destination of the file which will be created. Name and path to the file which will be created. Character, string or regular expression to separate fields. Browse your directory and select the destination of the file where the error messages will be recorded.

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

834

Talend Open Studio Components

Database components
tTeradataFastExport

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

835

Database components
tTeradataFastLoad

tTeradataFastLoad
tTeradataFastLoad Properties
Component Family Databases/Teradata

Function Purpose

tTeradataFastLoad reads a database and extracts fields using queries. tTeradataFastLoad executes a database query according to a strict order which must be the same as the one in the schema. The retrieve list of fields is then transfered to the next component, using a connexion flow (Main row). Property type Either Built-in or Repository Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Host Database Username and Password Table Execute Batch every Die on error Database server IP address. Database name. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. Number of rows per batch to be loaded. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

Schema and Edit Schema

Advanced settings Usage

tStatCatcher Statistics Select this check box to collect log data at the component level. This component offers the flexibility benefit of the DB query and covers all possible SQL queries.

836

Talend Open Studio Components

Database components
tTeradataFastLoad

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

837

Database components
tTeradataFastLoadUtility

tTeradataFastLoadUtility
tTeradataFastLoadUtility Properties
Component Family Databases/Teradata

Function Purpose

tTeradataFastLoadUtility reads a database and extracts fields using queries. tTeradataFastLoadUtility executes a database query according to a strict order which must be the same as the one in the schema. The retrieve list of fields is then transfered to the next component, using a connexion flow (Main row). Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution platform Database name Username and Password Table Schema and Edit Schema Select the Operating System type you use. Database name. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Script generated folder Load file Field separator Error file Browse your directory and select the destination of the file which will be created. Browse your directory and select the file from which you want to load data. Character, string or regular expression to separate fields. Browse your directory and select the destination of the file where the error messages will be recorded. Specify the character encoding you need use for your system. Enter the check point value.

Basic settings

Advanced settings

Define character set Check point

838

Talend Open Studio Components

Database components
tTeradataFastLoadUtility

Error files

Enter the file name where the error messages are stored. By default, the code ERRORFILES table_ERR1, table_ERR2 is entered, meaning that the two tables table_ERR1 and table_ERR2 are used to record the error messages. Select this check box to specify the exit code number to indicate the point at which an error message should display in the console. Enter the limit number of errors detected during the loading phase. Processing stops when the limit is reached. The default error limit value is 1000000. For more information, see Teradata FastLoad Reference documentation.

Return fastload error

ERRLIMIT

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenario
For related topic, see scenario: Scenario: Inserting data into a Teradata database table.

Talend Open Studio Components

839

Database components
tTeradataInput

tTeradataInput
tTeradataInput Properties
Component family Databases/Teradata

Function Purpose

tTeradataInput reads a database and extracts fields based on a query. tTeradataInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tTeradataConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Basic settings

840

Talend Open Studio Components

Database components
tTeradataInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type and Query Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Advanced settings Additional JDBC parameters Specify additional connection properties in the existing DB connection, to allow specific character set support. E.G.: CHARSET=KANJISJIS_OS to get support of Japanese characters. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

Trim all the String/Char columns Trim column

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possible SQL queries for Teradata databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Dynamic context use in MySQL DB insert. Scenario: Writing dynamic columns from a MySQL database to an output file.

Talend Open Studio Components

841

Database components
tTeradataMultiLoad

tTeradataMultiLoad
tTeradataMultiLoad Properties
Component Family Databases/Teradata

Function Purpose

tTeradataMultiLoad reads a database and extracts fields using queries. tTeradataMultiLoad executes a database query according to a strict order which must be the same as the one in the schema. The retrieve list of fields is then transfered to the next component, using a connexion flow (Main row). Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution platform Database name Username and Password Table Schema and Edit Schema Select the Operating System type you use. Database name. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Script generated folder Browse your directory and select the destination of the file which will be created.

Basic settings

842

Talend Open Studio Components

Database components
tTeradataMultiLoad

Action to data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s).

Where condition in case Delete Load file Field separator Error file Advanced settings Define Log table

Type in a condition, which, once verified, will delete the row. Browse your directory and select the file from which you want to load data. Character, string or regular expression to separate fields. Browse your directory and select the destination of the file where the error messages will be recorded. This check box is selected to define a log table you want to use in place of the default one that is the database table you defined in Basic settings. The syntax required to define the log table is databasename.logtablename. This field allows you to define your BEGIN LOAD command to initiate or restart a load task. You can specify the number of sessions to use, the error limit, any other parameters needed to execute the task. For more information, see Teradata MultiLoad Reference documentation. Select this check box to specify the exit code number to indicate the point at which an error message should display in the console. Specify the character encoding you need use for your system

BEGIN LOAD

Return mload error

Define character set

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenario
For related topic, see scenario: Scenario: Inserting data into a Teradata database table.

Talend Open Studio Components

843

Database components
tTeradataOutput

tTeradataOutput
tTeradataOutput Properties
Component family Databases/Teradata

Function Purpose

tTeradataOutput writes, updates, makes changes or suppresses entries in a database. tTeradataOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tTeradataConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Host Port Database Database server IP address Listening port number of DB server. Name of the database

Basic settings

844

Talend Open Studio Components

Database components
tTeradataOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and ceate: The table is removed if it already exists and created again. Clear a table: The table content is deleted. This is not visible by default, until you choose to create a table from the Action on table drop-down list. The table to be created may be: - SET TABLE: tables which do not allow to duplicate - MULTI SET TABLE: tables allowing duplicate rows. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Create

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

Talend Open Studio Components

845

Database components
tTeradataOutput

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. This is intended to allow specific character set support. E.G.: CHARSET=KANJISJIS_OS to get support of Japanese characters. You can press Ctrl+Space to access a list of predefined global variables. Enter the number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at execution. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column. Use field options Enable debug mode Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

Advanced settings

Additional JDBC parameters

Commit every

Additional Columns

tStatCatcher Statistics Select this check box to collect log data at the component level. Use Batch Size When selected, enables you to define the number of lines in each processed batch.

846

Talend Open Studio Components

Database components
tTeradataOutput

Usage

This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Teradata database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

847

Database components
tTeradataRollback

tTeradataRollback
tTeradataRollback Properties
This component is closely related to tTeradataCommit and tTeradataConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Teradata

Function Purpose Basic settings

tTeradataRollback cancels the transaction commit in the connected DB. tTeradataRollback avoids to commit part of a transaction involuntarily. Component list Select the TeradataConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Teradata components, especially with tTeradataConnection and tTeradataCommit components. n/a

Related scenario
For tTeradataRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

848

Talend Open Studio Components

Database components
tTeradataRow

tTeradataRow
tTeradataRow Properties
Component family Databases/Teradata

Function

tTeradataRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tTeradataRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tTeradataConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of the DB server. Name of the database DB user authentication data.

Purpose

Basic settings

Talend Open Studio Components

849

Database components
tTeradataRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Query type

Either Built-in or Repository. Built-in: Fill in manually the query statement or build it graphically using SQLBuilder Repository: Select the relevant query stored in the Repository. The Query field gets accordingly filled in.

Query

Enter your DB query paying particularly attention to properly sequence the fields in order to match the schema definition. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. This is intended to allow specific character set support. E.G.: CHARSET=KANJISJIS_OS to get support of Japanese characters. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Commit every

Die on error

Advanced settings

Additional JDBC parameters

Propagate QUERYs recordset Use PreparedStatement

850

Talend Open Studio Components

Database components
tTeradataRow

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenarios
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

851

Database components
tTeradataTPump

tTeradataTPump
tTeradataTPump Properties
Component Family Databases/Teradata

Function Purpose

tTeradataTPump reads a database and extracts fields using queries. tTeradataTPump executes a database query according to a strict order which must be the same as the one in the schema. The retrieve list of fields is then transfered to the next component, using a connexion flow (Main row). Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Execution platform Database name Username and Password Table Schema and Edit Schema Select the Operating System type you use. Database name. DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Script generated folder Browse your directory and select the destination of the file which will be created.

Basic settings

852

Talend Open Studio Components

Database components
tTeradataTPump

Action to data

On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s).

Where condition in case Delete Load file Field separator Error file Advanced settings Define Log table

Type in a condition, which, once verified, will delete the row. Browse your directory and select the file from which you want to load data. Character, string or regular expression to separate fields. Browse your directory and select the destination of the file where the error messages will be recorded. This check box is selected to define a log table you want to use in place of the default one that is the database table you defined in Basic settings. The syntax required to define the log table is databasename.logtablename. This field allows you to define your BEGIN LOAD command to initiate or restart a TPump task. You can specify the number of sessions to use, the error limit and any other parameters needed to execute the task. The default value is: SESSIONS 8 PACK 600 ARRAYSUPPORT ON CHECKPOINT 60 TENACITY 2 ERRLIMIT 1000. For more information, see Teradata Parallel Data Pump Reference documentation. Select this check box to specify the exit code number to indicate the point at which an error message should display in the console. Specify the character encoding you need use for your system

BEGIN LOAD

Return tpump error

Define character set

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Talend Open Studio Components

853

Database components
tTeradataTPump

Scenario: Inserting data into a Teradata database table


In this scenario, you create a Job using tTeradataTPump to insert customer data into a Teradata database table and specify the exit code to be displayed in the event of an exception error. Three components are used in this Job: tRowGenerator: generates rows as required using random customer data taken from a list. tFileOutputDelimited: outputs the customer data into a delimited file. tTeradataTPump: inserts the customer data into the Teradata database table in the Tpump mode.

Drop the required components: tRowGenerator, tFileOutputDelimited and tTeradataTPump from the Palette onto the design workspace. Link tRowGenerator to tFileOutputDelimited using a Row > Main connection. Link tRowGenerator to tTeradataTPump using a Trigger > On SubjobOk connection. Double click tRowGenerator to open the tRowGenerator Editor window. In the tRowGenerator Editor window, define the data to be generated. For this Job, the schema is composed of two columns: ID and Name.

854

Talend Open Studio Components

Database components
tTeradataTPump

Enter the Number of Rows for RowGenerator to generate. Double click tFileOutputDelimited to define its properties in the Component view. Next to File Name, browse to the output file or enter a name for the output file to be created. Between double quotation marks, enter the delimiters to be used next to Row Separator and Field Separator.

Click Edit schema and check that the schema matches the input schema. If need be, click Sync Columns. Double click tTeradataTPump to open its Component view. In the Basic settings tab of the Component view, define the tTeradataTPump parameters. I

Talend Open Studio Components

855

Database components
tTeradataTPump

Enter the Database name, User name and Password in accordance with your database authentication information. Specify the Table into which you want to insert the customer data. In this scenario, it is called mytable. In the Script generated folder field, browse to the folder in which you want to store the script files generated. In the Load file field, browse to the file which contains the customer data. In the Error file field, browse to the file in which you want to log the error information. In the Action on data field, select Insert. Press F6 to execute the Job. The Run view console reads as follows:

Double-click the tTeradataTPump component to go back to its Component view. On the Advanced settings tab, select the Return tpump error check box and type in the exit code number to indicate the point at which an error message should be displayed in the console. In this example, enter the number 4 and use the default values for the other parameters.
856 Talend Open Studio Components

Database components
tTeradataTPump

Press F6 to run the Job. The Run view console reads as follows:

An exception error occurs and TPump returned exit code 12 is displayed. If you need to view detailed information about the exception error, you can open the log file stored in the directory you specified in the Error file field in the Basic settings tab of the Component view.

Talend Open Studio Components

857

Database components
tVectorWiseCommit

tVectorWiseCommit
tVectorWiseCommit Properties
This component is closely related to tVectorWiseConnection and tVectorWiseRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/VectorWise

Function Purpose

tVectorWiseCommit validates the data processed in a Job into the connected DB. Using a single connection, this component commits a global transaction in one go instead of doing so on every row or every batch. This provides a gain in performance Component list Select the tVectorWiseConnection component from the list if more than one connection is planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tVectorWiseCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is generally used with other VectorWise components, notably tVectorWiseConnection and tVectorWiseRollback. n/a

Related scenario
This component is closely related to tVectorWiseConnection and tVectorWiseRollback. It usually doesnt make much sense to use one of these without using a tVectorWiseConnection component to open a connection for the current transaction. For a tVectorWiseCommit related scenario, see tVerticaConnection.

858

Talend Open Studio Components

Database components
tVectorWiseConnection

tVectorWiseConnection
tVectorWiseConnection Properties
This component is closely related to tVectorWiseCommit and tVectorWiseRollback. It usually doesnt make much sense to use one of these without using a tVectorWiseConnection component to open a connection for the current transaction.
Component family Databases/VectorWise

Function Purpose Basic settings

tVectorWiseConnection opens a connection to a database for a transaction to be carried out. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Server Port Database Username et Password Use or register a shared DB Connection Database server IP address. Listening port number of DB server. Name of the database. Authentication information of the database user. Select this check box to share your connection or retrieve a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name. Select this check box to commit a transaction automatically upon completion.

Advanced settings

Auto Commit

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is to be used along with VectorWise components, particularly tVectorWiseCommit and tVectorWiseRollback. n/a

Related scenario
This component is closely related to tVectorWiseCommit and tVectorWiseRollback. It usually doesnt make much sense to use one of these without using a tVectorWiseConnection component to open a connection for the current transaction.
Talend Open Studio Components 859

Database components
tVectorWiseConnection

For a tVectorWiseConnection related scenario, see tMysqlConnection.

860

Talend Open Studio Components

Database components
tVectorWiseInput

tVectorWiseInput
tVectorWiseInput Properties
Component family Databases/VectorWise

Function Purpose

tVectorWiseInput reads a database and extracts fields based on a query. tVectorWiseInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where Properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tVectorWiseConnection component. When a Job contains the parent Job and the child Job, the Component list only presents the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can deactivate the connection components and use the Dynamic settings of the component to specify the connection manually. In this case, enssure that the connection name is unique and distinctive throughout the two Job levels. For further information about Dynamic settings, see your studio user guide. Server Port Database Database server IP address. Listening port number of the DB server. Name of the database.

Basic settings

Username a Password Authentication information of the database user.

Talend Open Studio Components

861

Database components
tVectorWiseInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name

Name of the table to be read.

Query type and Query Enter your DB query, ensuring that the field order matches the order in the schema. Guess Query Guess schema Advanced settings Trim all the String/Char columns Trim column Click this button to generate a query that corresponds to your table schema in the Query field. Cliquez sur le bouton pour rcuprer le schma de la table. Select this check box to remove leading and trailing whitespace from all the String/Char columns. Define columns from which to remove leading and trailing whitespace.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component covers all possibile SQL queries forVertica databases.

Related scenario
For tVectorWiseInput related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Writing dynamic columns from a MySQL database to an output file. Scenario: Dynamic context use in MySQL DB insert.

862

Talend Open Studio Components

Database components
tVectorWiseOutput

tVectorWiseOutput
tVectorWiseOutput Properties
Component family Databases/VectorWise

Function Purpose

tVectorWiseOutput writes, updates, makes changes or suppresses entries in a database. tVectorWiseOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the Job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. Use an existing connection Select this check box when using a configured tVerticaConnection component. When a Job contains the parent Job and the child Job, the Component list only presents the connection components of the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive throughout the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address. Listening port number of the DB server. Name of the database. DB user authentication data.

Basic settings

Talend Open Studio Components

863

Database components
tVectorWiseOutput

Table Action on table

Name of the table to be written. Note that only one table can be written at a time. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Drop a table if exists and create: The table is removed if it already exists and created again. Clear a table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

864

Talend Open Studio Components

Database components
tVectorWiseOutput

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at executions. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column. SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Advanced settings

Commit every

Additional Columns

Use field options Enable debug mode Support null in SQL WHERE statement

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database. Select this check box if you want to deal with the Null values contained in a DB table. Make sure the Nullable check box is selected for the corresponding columns in the schema.

tStatCatcher Statistics Select this check box to collect log data at the component level. Utilisation This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Vertica database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link.

Related scenario
For tVectorWiseOutput related topics, see: tDBOutput Scenario: Displaying DB output. tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

Talend Open Studio Components

865

Database components
tVectorWiseRollback

tVectorWiseRollback
tVectorWiseRollback Properties
This component is closely related to tVectorWiseCommit and tVectorWiseConnection. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/VectorWise

Function Purpose Basic settings

tVectorWiseRollback cancels transactions commited to the DB connected. This component prevents involuntary commits. Component list Select the tVectorWiseConnection component from the list if more than one connection is planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Teradata components, especially with tVectorWiseConnection and tVectorWiseCommit components. n/a

Related scenario
For a tVectorWiseRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

866

Talend Open Studio Components

Database components
tVectorWiseRow

tVectorWiseRow
tVectorWiseRow Properties
Component family Databases/VectorWise

Function

tVectorWiseRow is the specific component for this database query. It executes the SQL query stated in the specified database. The row suffix means the component implements a flow in the job design although it doesnt provide output. Depending on the nature of the query and the database, tVectorWiseRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write your SQL statements easily. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow after are completed automatically using the data retrieved. Use an existing connection Select this check box and click the relevant tVectorWiseConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, the Component list only presents the connection components of the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For further information about how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address. Listening port number of the DB server. Name of the database. DB user authentication data.

Purpose

Basic settings

Talend Open Studio Components

867

Database components
tVectorWiseRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name Query type

Name of the table to be processed. Either Built-in or Repository. Built-in: Fill in the query statement manually or build it graphically using the SQLBuilder. Repository: Select the relevant query stored in the Repository. The Query field is filled in accordingly.

Guess Query Query

Click this button to generate a query that corresponds to your table schema in the Query field. Enter your DB query taking care to sequence the fields properly in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level.

868

Talend Open Studio Components

Database components
tVectorWiseRow

Utilisation

This component offers the flexibility of the DB query and covers all possible SQL queries.

Related scenario
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment. tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

Talend Open Studio Components

869

Database components
tVerticaBulkExec

tVerticaBulkExec
tVerticaBulkExec Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases/Vertica

Function Purpose Basic settings

Executes the Insert action on the data provided. As a dedicated component, tVerticaBulkExec offers gains in performance while carrying out the Insert operations to a Mysql database Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Use an existing connection Select this check box when using a configured tVerticaConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

870

Talend Open Studio Components

Database components
tVerticaBulkExec

Action on table

On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. You have the possibility to rollback the operation. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table

Schema and Edit Schema

Remote Filename

Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to store the data in a physical storage area, in order to optimize the reading, as the data is compressed and pre-sorted. The Job automatically stops if no row has been loaded. Character, string or regular expression to separate fields. String displayed to indicate that the value is null.

Advanced settings

Write to ROS (Read Optimized Store) Exit job if no row was loaded Fields terminated by Null string

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component is to be used along with tVerticaOutputBulk component. Used together, they can offer gains in performance while feeding a Vertica database.

Related scenarios
For related topics, see: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database.

Talend Open Studio Components

871

Database components
tVerticaBulkExec

tMysqlOutputBulkExec Scenario: Inserting data in MySQL database. tOracleBulkExec Scenario: Truncating and inserting file data into Oracle DB du composant.

872

Talend Open Studio Components

Database components
tVerticaClose

tVerticaClose
tVerticaClose properties
Component family Databases/Vertica

Function Purpose Basic settings

tVerticaClose closes the transaction committed in the connected DB. Close a transaction. Component list Select the tVerticaConnection component in the list if more than one connection are planned for the current Job.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Vertica components, especially with tVerticaConnection and tVerticaCommit. n/a

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

873

Database components
tVerticaCommit

tVerticaCommit
tVerticaCommit Properties
This component is closely related to tVerticaConnection and tVerticaRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Databases/Vertica

Function Purpose

tVerticaConnection validates the data processed through the job into the connected DB. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list Select the tVerticaConnection component in the list if more than one connection are planned for the current job. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tVerticaCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Close connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mysql components, especially with tVerticaConnection and tVerticaRollback components. n/a

Related scenario
This component is closely related to tVerticaConnection and tVerticaRollback. It usually doesnt make much sense to use one of these without using a tVerticaConnection component to open a connection for the current transaction. For tVerticaCommit related scenario, see tVerticaConnection on page 875.

874

Talend Open Studio Components

Database components
tVerticaConnection

tVerticaConnection
tVerticaConnection Properties
This component is closely related to tVerticaCommit and tVerticaRollback. It usually doesnt make much sense to use one of these without using a tVerticaConnection component to open a connection for the current transaction.
Component family Databases/Vertica

Function Purpose Basic settings

tVerticaConnection opens a connection to the database for a current transaction. This component allows you to commit all of the Job data to an output database in just a single transaction, once the data has been validated. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Port Database Username and Password Use or register a shared DB Connection Select the version of Vertica you are using from the list. Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Select this check box to share your connection or fetch a connection shared by a parent or child Job. This allows you to share one single DB connection among several DB connection components from different Job levels that can be either parent or child. Shared DB Connection Name: set or type in the shared connection name.

Advanced settings Utilisation Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Vertica components, especially with tVerticaCommit and tVerticaRollback components. n/a

Related scenario
This component is closely related to tVerticaCommit and tVerticaRollback. It usually doesnt make much sense to use one of these without using a tVerticaConnection component to open a connection for the current transaction.
Talend Open Studio Components 875

Database components
tVerticaConnection

For tVerticaConnection related scenario, see tMysqlConnection on page 594.

876

Talend Open Studio Components

Database components
tVerticaInput

tVerticaInput
tVerticaInput Properties
Component family Databases/Vertica

Function Purpose

tVerticaInput reads a database and extracts fields based on a query. tVerticaInput executes a DB query with a strictly defined order which must correspond to the schema definition. Then it passes on the field list to the next component via a Main row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where Properties are stored. The fields that come after are pre-filled in using the fetched data. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. DB Version Use an existing connection Select the version of Vertica you are using from the list. Select this check box when using a configured tVerticaConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Username and Password Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

Basic settings

Talend Open Studio Components

877

Database components
tVerticaInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table Name

Name of the table to be read.

Query type and Query Enter your DB query, ensuring that the field order matches the order in the schema. Advanced settings Trim all the String/Char columns Trim column Select this check box to remove leading and trailing whitespace from all the String/Char columns. Remove leading and trailing whitespace from defined columns.

tStatCatcher Statistics Select this check box to collect log data at the component level. Utilisation This component covers all possible SQL queries for Vertica databases.

Related scenarios
For related scenarios, see: Scenario 1: Displaying selected data from DB table. Scenario 2: Using StoreSQLQuery variable. Scenario: Dynamic context use in MySQL DB insert. Scenario: Writing dynamic columns from a MySQL database to an output file.

878

Talend Open Studio Components

Database components
tVerticaOutput

tVerticaOutput
tVerticaOutput Properties
Component family Databases/Vertica

Function Purpose

tVerticaOutput writes, updates, makes changes or suppresses entries in a database. tVerticaOutput executes the action defined on the table and/or on the data contained in the table, based on the flow incoming from the preceding component in the job. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a database connection wizard and store the database connection parameters you set in the component Basic settings view. For more information about setting up and storing database connection parameters, see Setting up a DB connection of Talend Open Studio User Guide. DB Version Use an existing connection Select the version of Vertica you are using from the list. Select this check box when using a configured tVerticaConnection component. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Database Database server IP address Listening port number of DB server. Name of the database

Basic settings

Talend Open Studio Components

879

Database components
tVerticaOutput

Username and Password Table Action on table

DB user authentication data. Name of the table to be written. Note that only one table can be written at a time. On the table defined, you can perform one of the following operations: Default: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Drop table if exists and create: The table is removed if it already exists and created again. Clear table: The table content is deleted. On the data of the table defined, you can perform: Insert: Add new entries to the table. If duplicates are found, job stops. Update: Make changes to existing entries Insert or update: Add entries or update existing ones. Update or insert: Update existing entries or create it if non existing Delete: Remove entries corresponding to the input flow. Copy: Read data from a text file and insert tuples of entries into the WOS (Write Optimized Store) or directly into the ROS (Read Optimized Store). This option is ideal for bulk loading. For further information, see your Vertica SQL Reference Manual. It is necessary to specify at least one column as a primary key on which the Update and Delete operations are based. You can do that by clicking Edit Schema and selecting the check box(es) next to the column(s) you want to set as primary key(s). For an advanced use, click the Advanced settings view where you can simultaneously define primary keys for the Update and Delete operations. To do that: Select the Use field options check box and then in the Key in update column, select the check boxes next to the column names you want to use as a base for the Update operation. Do the same in the Key in delete column for the Delete operation.

Action on data

Schema and Edit schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository.

880

Talend Open Studio Components

Database components
tVerticaOutput

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Die on error This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and, above all, better performance at executions. Select this check box to stop the Copy operation on data if a row is rejected and rolls back this operation. Thus no data is loaded.

Advanced settings

Commit every

Abort on error Copy parameters This area is available only when the Action on data is Copy. For further details about the Copy parameters, see your Vertica SQL Reference Manual. Maximum rejects

Type in a number to set the REJECTMAX command used by Vertica, which indicates the upper limit on the number of logical records to be rejected before a load fails. If not specified or if value is 0, an unlimited number of rejections are allowed. Select this check box to prevent the current transaction from committing automatically. Type in the path to, or browse to the file in which messages are written indicating the input line number and the reason for each rejected data record. Type in the node of the exception file. If not specified, operations default to the querys initiator node. Type in the path to, or browse to the file in which to write rejected rows. This file can then be edited to resolve problems and reloaded. Type in the node of the rejected data file. If not specified, operations default to the querys initiator node.

No commit Exception file

Exception file node Rejected data file

Rejected data file node

Talend Open Studio Components

881

Database components
tVerticaOutput

Use batch mode

Select this check box to activate the batch mode for data processing. In the Batch Size field that appears when this check box is selected, you can type in the number you need to define the batch size to be processed. This check box is available only when you have selected the Insert, the Update, the Delete or the Copy option in the Action on data field. This option is not offered if you create (with or without drop) the DB table. This option allows you to call SQL functions to perform actions on columns, which are not insert, nor update or delete actions, or action that require particular preprocessing. Name: Type in the name of the schema column to be altered or inserted as new column SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant column data. Position: Select Before, Replace or After following the action to be performed on the reference column. Reference column: Type in a column of reference that the tDBOutput can use to place or replace the new or altered column.

Additional Columns

Use field options Enable debug mode

Select this check box to customize a request, especially when there is double action on data. Select this check box to display each step during processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible. This component must be used as an output component. It allows you to carry out actions on a table or on the data of a table in a Vertica database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see Scenario 3: Retrieve data in error with a Reject link

Related scenarios
For tVerticaOutput related topics, see: tDBOutput Scenario: Displaying DB output tMySQLOutput Scenario 1: Adding a new column and altering data in a DB table.

882

Talend Open Studio Components

Database components
tVerticaOutputBulk

tVerticaOutputBulk
tVerticaOutputBulk Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component, detailed in a separate section. The advantage of using two separate components is that the data can be transformed before it is loaded in the database.
Component family Databases/Vertica

Function Purpose Basic settings

tVerticaBulkOutputExec writes a file with columns based on the defined delimiter and the Vertica standards. tVerticaBulkOutputExec prepares the file to be used as parameter in the INSERT query to feed the Vertica database. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide Select this check box to add the new rows at the end of the file. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Append Schema and Edit Schema

Advanced settings

Row separator Field separator Include header

String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to include the column header to the file.

Talend Open Studio Components

883

Database components
tVerticaOutputBulk

Encoding

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to collect log data at the component level. Utilisation This component is to be used along with tVerticaBulkExec. Used together, they offer gains in performance while feeding a Vertica database.

Related scenarios
For use cases in relation with tVerticaOutputBulk, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

884

Talend Open Studio Components

Database components
tVerticaOutputBulkExec

tVerticaOutputBulkExec
tVerticaOutputBulkExec Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component.
Component family Databases/Vertica

Function Purpose Basic settings

tVerticaOutputBulkExec executes the Insert action on the data provided. As a dedicated component, it allows gains in performance during Insert operations to a Vertica database. Property Type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Host Port DB Name Username and Password Table Select the version of Vertica you are using from the list. Database server IP address. Listening port number of DB server. Name of the database DB user authentication data. Name of the table to be written. Note that only one table can be written at a time and that the table must exist for the insert operation to succeed. On the table defined, you can perform one of the following operations: None: No operation is carried out. Drop and create a table: The table is removed and created again. Create a table: The table does not exist and gets created. Create a table if not exists: The table is created if it does not exist. Clear a table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Talend Open Studio Components 885

Action on table

Schema and Edit Schema

Database components
tVerticaOutputBulkExec

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. File Name Name of the file to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to add the new rows at the end of the file Select this check box to store the data in a physical storage area, in order to optimize the reading, as the data is compressed and pre-sorted. The Job automatically stops if no row has been loaded. Character, string or regular expression to separate fields. String displayed to indicate that the value is null. Select this check box to include the column header to the file. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Append Advanced settings Write to ROS (Read Optimized Store) Exit job if no row was loaded Field Separator Null string Include header Encoding

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage Limitation This component is mainly used when no particular transformation is required on the data to be loaded onto the database. n/a

Related scenarios
For use cases in relation with tVerticaOutputBulkExec, see the following scenarios: tMysqlOutputBulk Scenario: Inserting transformed data in MySQL database tMysqlOutputBulkExec Scenario: Inserting data in MySQL database

886

Talend Open Studio Components

Database components
tVerticaRollback

tVerticaRollback
tVerticaRollback Properties
This component is closely related to tVerticaCommit and tVerticaConnection. It usually does not make much sense to use these components independently in a transaction.
Component family Databases/Vertica

Function Purpose Basic settings

tVerticaRollback cancels the transaction commit in the connected DB. tVerticaRollback avoids to commit part of a transaction involuntarily. Component list Select the VerticaConnection component in the list if more than one connection are planned for the current job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mysql components, especially with tVerticaConnection and tVerticaCommit components. n/a

Related scenario
For tVerticaRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables.

Talend Open Studio Components

887

Database components
tVerticaRow

tVerticaRow
tVerticaRow Properties
Component family Databases/Vertica

Function

tVerticaRow is the specific component for this database query. It executes the SQL query stated onto the specified database. The row suffix means the component implements a flow in the job design although it does not provide output. Depending on the nature of the query and the database, tVerticaRow acts on the actual DB structure or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file in which the properties are stored. The fields that follow are completed automatically using the data retrieved. DB Version Use an existing connection Select the version of Vertica you are using from the list. Select this check box and click the relevant tVerticaConnection component on the Component list to reuse the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Port Database Username and Password Listening port number of DB server. Name of the database DB user authentication data.

Purpose

Basic settings

888

Talend Open Studio Components

Database components
tVerticaRow

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either Built-in or stored remotely in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Table name Query type

Name of the table to be processed. Either Built-in or Repository. Built-in: Fill in the query statement manually or build it graphically using the SQLBuilder. Repository: Select the relevant query stored in the Repository. The Query field is filled in accordingly.

Query

Enter your DB query taking care to sequence the fields properly in order to match the schema definition. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Select this check box to insert the result of the query into a COLUMN of the current flow. Select this column from the use column list. Select this checkbox if you want to query the database using a PreparedStatement. In the Set PreparedStatement Parameter table, define the parameters represented by ? in the SQL instruction of the Query field in the Basic Settings tab. Parameter Index: Enter the parameter position in the SQL instruction. Parameter Type: Enter the parameter type. Parameter Value: Enter the parameter value. This option is very useful if you need to execute the same query several times. Performance levels are increased

Die on error

Advanced settings

Propagate QUERYs recordset Use PreparedStatement

Commit every

Number of rows to be completed before committing batches of rows together into the DB. This option ensures transaction quality (but not rollback) and above all better performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component level. Usage This component offers the flexibility of the DB query and covers all possible SQL queries.

Talend Open Studio Components

889

Database components
tVerticaRow

Related scenario
For related topics, see: tDBSQLRow Scenario: Resetting a DB auto-increment. tMySQLRow Scenario 1: Removing and regenerating a MySQL table index.

890

Talend Open Studio Components

ELT components
This chapter details the main components that you can find in the ELT family of the Talend Open

Studio Palette.
The ELT family groups together the most popular database connectors and processing components, all dedicated to the ELT mode where the target DBMS becomes the transformation engine. This mode supports all of the most popular databases including Teradata, Oracle, Vertica, Netezza, Sybase, etc.

ELT components
tELTJDBCInput

tELTJDBCInput
tELTJDBCInput properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name. Advanced settings tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTJDBCInput is to be used along with the tELTJDBCMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Usage

Related scenarios
For use cases in relation with tELTJDBCInput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907

892

Talend Open Studio Components

ELT components
tELTJDBCInput

Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

893

ELT components
tELTJDBCMap

tELTJDBCMap
tELTJDBCMap properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps to graphically build the SQL statement using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Select this check box and select the appropriate Connection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT JDBC Map Editor Style link The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance. Either Built-in or Repository.

Basic settings

Property type

894

Talend Open Studio Components

ELT components
tELTJDBCMap

Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Advanced settings Additional JDBC parameters Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tELTJDBCMap is used along with a tELTJDBCInput and tELTJDBCOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenario:
For related scenarios, see tELTJDBCMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907. Scenario 2: ELT using an Alias table on page 911.

Talend Open Studio Components

895

ELT components
tELTJDBCOutput

tELTJDBCOutput
tELTJDBCOutput properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the JDBC database Action on data On the data of the table defined, you can perform the following operation: Insert: Adds new entries to the table. If duplicates are found, Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter the default table name, between double quotation marks.

Schema and Edit schema

Default Schema Name Enter the default schema name,between double quotation marks. Use different table name Advanced settings Select this check box to define a different output table name, between double quotation marks, in the Table name field which appears.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

896

Talend Open Studio Components

ELT components
tELTJDBCOutput

Usage

tELTJDBCOutput is to be used along with the tELTJDBCMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenarios
For use cases in relation with tELTJDBCOutput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

897

ELT components
tELTMSSqlInput

tELTMSSqlInput
tELTMSSqlInput properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name. Advanced settings tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTMySSqlInput is to be used along with the tELTMSSsqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Usage

Related scenarios
For use cases in relation with tELTMSSqlInput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907

898

Talend Open Studio Components

ELT components
tELTMSSqlInput

Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

899

ELT components
tELTMSSqlMap

tELTMSSqlMap
tELTMSSqlMap properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps you to build the SQL statement graphically, using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Select this check box and select the appropriate Connection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT MSSql Map Editor Style link The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance.

Basic settings

900

Talend Open Studio Components

ELT components
tELTMSSqlMap

Property type

Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data.

Host Port Database Username and Password Advanced settings Additional JDBC parameters

Database server IP address. Listening port number of DB server. Name of the database. DB user authentication data. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tELTMSSqlMap is used along with a tELTMSSqlInput and tELTMSSqlOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenario:
For related scenarios, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907. Scenario 2: ELT using an Alias table on page 911.

Talend Open Studio Components

901

ELT components
tELTMSSqlOutput

tELTMSSqlOutput
tELTMSSqlOutput properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the MSSql database Action on data On the data of the table defined, you can perform the following operation: Insert: Adds new entries to the table. If duplicates are found, Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter the default table name, between double quotation marks.

Schema and Edit schema

Default Schema Name Enter the default schema name,between double quotation marks. Use different table name Advanced settings Select this check box to define a different output table name, between double quotation marks, in the Table name field which appears.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

902

Talend Open Studio Components

ELT components
tELTMSSqlOutput

Usage

tELTMSSqlOutput is to be used along with the tELTMSSqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenarios
For use cases in relation with tELTMSSqlOutput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

903

ELT components
tELTMysqlInput

tELTMysqlInput
tELTMysqlInput properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit Schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Enter the default table name, between double quotation marks.

Usage

tELTMysqlInput is to be used along with the tELTMysqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Related scenarios
For use cases in relation with tELTMysqlInput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

904

Talend Open Studio Components

ELT components
tELTMysqlMap

tELTMysqlMap
tELTMysqlMap properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps to graphically build the SQL statement using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Select this check box and select the appropriate Connection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT Mysql Map editor The ELT Map editor allows you to define the output schema as well as build graphically the SQL statement to be executed. Style link Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance. Either Built-in or Repository.

Basic settings

Property type

Talend Open Studio Components

905

ELT components
tELTMysqlMap

Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Usage Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

tELTMysqlMap is used along with a tELTMysqlInput and tELTMysqlOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. The ELT components do not handle actual data flow but only schema information.

Connecting ELT components


The ELT components do not handle any data as such but table schema information that will be used to build the SQL query to execute. Therefore the only connection required to connect these components together is a simple link.
The output name you give to this link when creating it should always be the exact name of the table to be accessed as this parameter will be used in the SQL statement generated.

Related topic: Link connection of Talend Open Studio User Guide.

Mapping and joining tables


In the ELT Mapper, you can select specific columns from input schemas and include them in the output schema. As you would do it in the regular Map editor, simply drag & drop the content from the input schema towards the output table defined. Use the Ctrl and Shift keys for multiple selection of contiguous or non contiguous table columns. You can implement explicit joins to retrieve various data from different tables. Select the Explicit join check box for the relevant column, and selct a type of join from the Join list. Possible joins include: Inner Join, Left Outer Join, Right Outer Join or Full Outer Join and Cross Join. By default the Inner Join is selected. You can also create Alias tables to retrieve various data from the same table. In the Input area, click on the plus [+] button to create an Alias. Define the table to base the alias on. Type in a new name for the alias table, preferably not the same as the main table.

906

Talend Open Studio Components

ELT components
tELTMysqlMap

Adding where clauses


You can also restrict the Select statement based on a Where clause. Click on the Add filter row button at the top of the output table and type in the relevant restriction to be applied. Make sure that all input components are linked correctly to the ELT Map component to be able to implement all inclusions, joins and clauses.

Generating the SQL statement


The mapping of elements from the input schemas to the output schemas create instantly the corresponding Select statement.

The clause are also included automatically.

Scenario 1: Aggregating table columns and filtering


This scenario describes a Job that gathers together several input DB table schemas and implementing a clause to filter the output using an SQL statement.

Drop the following components from the Palette onto the design workspace: three tELTMysqlInput components, a tELTMysqlMap, and a tELTMysqlOutput. Label these components to best describe their functionality. Double-click the first tELTMysqlInput component to display its Basic settings view.
Talend Open Studio Components 907

ELT components
tELTMysqlMap

Select Repository from the Schema list, click the three dot button preceding Edit schema, and select your DB connection and the desired schema from the [Repository Content] dialog box. The selected schema name appears in the Default Table Name field automatically. In this use case, the DB connection is Talend_MySQL and the schema for the first input component is owners. Set the second and third tELTMysqlInput components in the same way but select cars and resellers respectively as their schema names.
In this use case, all the involved schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information concerning metadata, see How to centralize the Metadata items in the Talend Open Studio User Guide. You can also select the three input components by dropping the relevant schemas from the Metadata area onto the design workspace and double-clicking tELTMysqlInput from the [Components] dialog box. Doing so allows you to skip the steps of labeling the input components and defining their schemas manually.

Connect the three tELTMysqlInput components to the tELTMysqlMap component using links named following strictly the actual DB table names: owners, cars and resellers. Connect the tELTMysqlMap component to the tELTMysqlOutput component and name the link agg_result, which is the name of the database table you will save the aggregation result to. Click the tELTMysqlMap component to display its Basic settings view.

Select Repository from the Property Type list, and select the same DB connection that you use for the input components. All the database details are automatically retrieved. Leave all the other settings as they are.

908

Talend Open Studio Components

ELT components
tELTMysqlMap

Double-click the tELTMysqlMap component to launch the ELT Map editor to set up joins between the input tables and define the output flow.

Add the input tables by clicking the green plus button at the upper left corner of the ELT Map editor and selecting the relevant table names in the [Add a new alias] dialog box. Drop the ID_Owner column from the owners table to the corresponding column of the cars table. In the cars table, select the Explicit join check box in front of the ID_Owner column. As the default join type, INNER JOIN is displayed on the Join list. Drop the ID_Reseller column from the cars table to the corresponding column of the resellers table to set up the second join, and define the join as an inner join in the same way. Select the columns to be aggregated into the output table, agg_result. Drop the ID_Owner, Name, and ID_Insurance columns from the owners table to the output table. Drop the Registration, Make, and Color columns from the cars table to the output table. Drop the Name_Reseller and City columns from the resellers table to the output table. With the relevant columns selected, the mappings are displayed in yellow and the joins are displayed in dark violet. Set up a filter in the output table. Click the Add filter row button on top of the output table to display the Additional clauses expression field, drop the City column from the resellers table to the expression field, and complete a WHERE clause that reads resellers.City ='Augusta'.

Talend Open Studio Components

909

ELT components
tELTMysqlMap

Click the Generated SQL Select query tab to display the corresponding SQL statement.

Click OK to save the ELT Map settings. Double-click the tELTMysqlOutput component to display its Basic settings view.

Select an action from the Action on data list as needed.

910

Talend Open Studio Components

ELT components
tELTMysqlMap

Select Repository as the schema type, and define the output schema in the same way as you defined the input schemas. In this use case, select agg_result as the output schema, which is the name of the database table used to store the mapping result.
You can also use a built-in output schema and retrieve the schema structure from the preceding component; however, make sure that you specify an existing target table having the same data structure in your database.

Leave all the other settings as they are. Save your Job and press F6 to launch it. All selected data is inserted in the agg_result table as specified in the SQL statement.

Scenario 2: ELT using an Alias table


This scenario describes a Job that maps information from two input tables and an alias table, serving as a virtual input table, to an output table. The employees table contains employees IDs, their department numbers, their names, and the IDs of their respective managers. The managers are also considered as employees and hence included in the employees table. The dept table contains the department information. The alias table retrieves the names of the managers from the employees table.

Drop two tELTMysqlInput components, a tELTMysqlMap component, and a tELTMysqlOutput component to the design workspace, and label them to best describe their functionality. Double-click the first tELTMysqlInput component to display its Basic settings view.

Talend Open Studio Components

911

ELT components
tELTMysqlMap

Select Repository from the Schema list, and define the DB connection and schema by clicking the three dot button preceding Edit schema. The DB connection is Talend_MySQL and the schema for the first input component is employees.
In this use case, all the involved schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information concerning metadata, see How to centralize the Metadata items in the Talend Open Studio User Guide.

Set the second tELTMysqlInput component in the same way but select dept as its schema. Double-click the tELTMysqlOutput component to display its Basic settings view.

Select an action from the Action on data list as needed, Insert in this use case. Select Repository as the schema type, and define the output schema in the same way as you defined the input schemas. In this use case, select result as the output schema, which is the name of the database table used to store the mapping result. The output schema contains all the columns of the input schemas plus a ManagerName column. Leave all the other parameters as they are. Connect the two tELTMysqlInput components to the tELTMysqlMap component using Link connections named strictly after the actual input table names, employees and dept in this use case. Connect the tELTMysqlMap component to the tELTMysqlOutput component using a Link connection. When prompted, click Yes to allow the ELT Mapper to retrieve the output table structure from the output schema. Click the tELTMysqlMap component and select the Component tab to display its Basic settings view.

912

Talend Open Studio Components

ELT components
tELTMysqlMap

Select Repository from the Property Type list, and select the same DB connection that you use for the input components. All the DB connection details are automatically retrieved. Leave all the other parameters as they are. Click the three dot button next to ELT Mysql Map Editor or double-click the tELTMysqlMap component on the design workspace to launch the ELT Map editor. With the tELTMysqlMap component connected to the output component, the output table is displayed in the output area. Add the input tables, employees and dept, in the input area by clicking the green plus button and selecting the relevant table names in the [Add a new alias] dialog box. Create an alias table based on the employees table by selecting employees from the Select the table to use list and typing in Managers in the Type in a valid alias field in the the [Add a new alias] dialog box.

Drop the DeptNo column from the employees table to the dept table. Select the Explicit join check box in front of the DeptNo column of the dept table to set up an inner join. Drop the ManagerID column from the employees table to the ID column of the Managers table. Select the Explicit join check box in front of the ID column of the Managers table and select LEFT OUTER JOIN from the Join list to allow the output rows to contain Null values.

Talend Open Studio Components

913

ELT components
tELTMysqlMap

Drop all the columns from the employees table to the corresponding columns of the output table. Drop the DeptName and Location columns from the dept table to the corresponding columns of the output table. Drop the Name column from the Managers table to the ManagerName column of the output table.

Click on the Generated SQL Select query tab to display the SQL query statement to be executed.

914

Talend Open Studio Components

ELT components
tELTMysqlMap

Save your Job and press F6 to run it. The output database table result contains all the information about the employees, including the names of their respective managers.

Talend Open Studio Components

915

ELT components
tELTMysqlOutput

tELTMysqlOutput
tELTMysqlOutput properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings In Java, use tCreateTable as substitute for this function.

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the Mysql database Action on data On the data of the table defined, you can perform the following operation: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Use different table name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter the default table name, between inverted commas. Select this check box to define a different output table name, between double quotation marks, in the Table name field which appears.

Schema and Edit schema

Usage

tELTMysqlOutput is to be used along with the tELTMysqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

916

Talend Open Studio Components

ELT components
tELTMysqlOutput

Related scenarios
For use cases in relation with tELTMysqlOutput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

917

ELT components
tELTOracleInput

tELTOracleInput
tELTOracleInput properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components should be used to handle Oracle DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Enter the default table name, between double quotation marks.

Java only Advanced settings

Default Schema Name Enter the default schema name,between double quotation marks. tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTOracleInput is to be used along with the tELTOracleMap. Note that the Output link to be used with these components must must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Usage

918

Talend Open Studio Components

ELT components
tELTOracleInput

Related scenarios
For use cases in relation with tELTOracleInput, see tELTOracleMap Scenario: Updating Oracle DB entries on page 922.

Talend Open Studio Components

919

ELT components
tELTOracleMap

tELTOracleMap
tELTOracleMap properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components should be used to handle Oracle DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps to graphically build the SQL statement using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Java only Select this check box and select the appropriate tOracleConnection component from the Component list if you want to re-use connection parameters that you have already defined When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT Oracle Map Editor Style link The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance. Either Built-in or Repository. Built-in: No property data stored centrally.

Basic settings

Property type

920

Talend Open Studio Components

ELT components
tELTOracleMap

Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Connection type Java only DB Version Host Port Database Username and Password Mapping Advanced settings Additional JDBC Parameters Java only Use Hint Options Java only Select the Oracle version you are using. Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Automatically set mapping parameter. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings. Select this check box to activate the hint configuration area to help you optimize a querys execution. In this area, parameters are: - HINT: specify the hint you need, using the syntax /*+ */. - POSITION: specify where you put the hint in a SQL statement. - SQL STMT: select the SQL statement you need to use. Drop-down list of the available drivers.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tELTOracleMap is used along with a tELTOracleInput and tELTOracleOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Connecting ELT components


For detailed information regarding ELT component connections, see Connecting ELT components on page 906. Related topic: Link connection of Talend Open Studio User Guide.

Mapping and joining tables


In the ELT Mapper, you can select specific columns from input schemas and include them in the output schema. For detailed information regarding the table schema mapping and joining, see Mapping and joining tables on page 921.

Talend Open Studio Components

921

ELT components
tELTOracleMap

When you need to join a lot of tables or need to join tables by multiple join conditions with outer joins, it is recommended to use the LEFT OUTER JOIN (+) and the RIGHT OUTER JOIN (+) options that allow you to use the Oracle private keywords. For further information about these two private keywords, see the site: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries006.htm

Adding where clauses


For details regarding the clause handling, see Adding where clauses on page 922.

Generating the SQL statement


The mapping of elements from the input schemas to the output schemas create instantly the corresponding Select statement. The clause defined internally in the ELT Mapper are also included automatically.

Scenario: Updating Oracle DB entries


This scenario is based on the data aggregation scenario, Scenario 1: Aggregating table columns and filtering on page 907. As the data update action is available in Oracle DB, this scenario describes a Job that updates particular data in the agg_result table.

As described in Scenario 1: Aggregating table columns and filtering on page 907, set up a Job for data aggregation using the corresponding ELT components for Oracle DB, tELTOracleInput, tELTOracleMap, and tELTOracleOutput, and execute the Job to save the aggregation result in a database table named agg_result.
When defining filters in the ELT Map editor, note that strings are case sensitive in Oracle DB.

Launch the ELT Map editor and add a new output table named update_data.

922

Talend Open Studio Components

ELT components
tELTOracleMap

Add a filter row to the update_data table to set up a relationship between input and output tables: owners.ID_OWNER = agg_result.ID_OWNER. Drop the MAKE column from the cars table to the update_data table. Drop the NAME_RESELLER column from the resellers table to the update_data table. Add a model enclosed in single quotation marks, A8 in this use case, to the MAKE column from the cars table, preceded by a double pipe. Add Sold by enclosed in single quotation marks in front of the NAME_RESELLER column from the resellers table, with a double pipe in between.

Check the Generated SQL select query tab to be executed.

Click OK to validate the changes in the ELT Mapper. Deactivate the tELTOracleOutput component labeled Agg_Result by right-clicking it and selecting Deactivate Agg_Result from the contextual menu. Drop a new tELTOracleOutput component from the Palette to the design workspace, and label it Update_Data to better identify its functionality. Connect the tELTOracleMap component to the new tELTOracleOutput component using the link corresponding to the new output table defined in the ELT Mapper, update_data in this use case. Double-click the new tELTOracleOutput component to display its Basic settings view.

Talend Open Studio Components

923

ELT components
tELTOracleMap

From the Action on data list, select Update. Check the schema, and click Sync columns to retrieve the schema structure from the preceding component if necessary. In the WHERE clauses area, add a clause that reads agg_result.MAKE = 'Audi' to update data relating to the make of Audi in the database table agg_result. Fill the Default Table Name field with the name of the output link, update_data in this use case. Select the Use different table name check box, and fill the Table name field with the name of the database table to be updated, agg_result in this use case. Leave the other parameters as they are. Save your Job and press F6 to run it. The relevant data in the database table is updated as defined.

924

Talend Open Studio Components

ELT components
tELTOracleOutput

tELTOracleOutput
tELTOracleOutput properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components should be used to handle Oracle DB schemas to generate Insert, Update or Delete statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic Settings The MERGE option is available in Java only

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the Mysql database Action on data On the data of the table defined, you can perform the following operation: Insert: Add new entries to the table. If duplicates are found, the Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow.: MERGE: Updates or adds data to the table. The options available for the MERGE operation are different to those available for the Insert, Update or Delete operations On the table defined, you can perform the following operation: Default: No operation is carried out. Drop and create table: The table is removed and created again. Create table: The table does not exist and gets created. Create table if not exists: The table is created if it does not exist. Clear table: The table content is deleted. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Action on table Perl only

Schema and Edit schema

Talend Open Studio Components

925

ELT components
tELTOracleOutput

Where clauses for (for UPDATE and DELETE only) Java only Use Merge Update (for MERGE)

Enter a clause to filter the data to be updated or deleted during the update or delete operations. Select this check box to update the data in the output table. Column : Lists the columns in the entry flow. Update : Select the check box which corresponds to the name of the column you want to update. Use Merge Update Where Clause : Select this check box and enter the WHERE clause required to filter the data to be updated, if necessary. Use Merge Update Delete Clause: Select this check box and enter the WHERE clause required to filter the data to be deleted and updated, if necessary. Select this check box to insert the data in the table. Column: Lists the entry flow columns. Check All: Select the check box corresponding to the name of the column you want to insert. Use Merge Update Where Clause: Select this check box and enter the WHERE clause required to filter the data to be inserted. Enter a default name for the table, between double quotation marks.

Java only

Use Merge Insert (for MERGE)

Default Table Name

Java only

Default Schema Name Enter a name for the default Oracle schema, between double quotation marks. Use different table name Select this check box to define a different output table name, between double quotation marks, in the Table name field which appears. Select this check box to activate the hint configuration area when you want to use a hint to optimize a querys execution. In this area, parameters are: - HINT: specify the hint you need, using the syntax /*+ */. - POSITION: specify where you put the hint in a SQL statement. - SQL STMT: select the SQL statement you need to use.

Advanced settings

Use Hint Options Java only

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tELTOracleOutput is to be used along with the tELTOracleInput and tELTOracleMap components. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

926

Talend Open Studio Components

ELT components
tELTOracleOutput

Scenario: Using the Oracle MERGE function to update and add data simultaneously
This scenario describes a Job that allows you to add new customer information and update existing customer information in a database table using the Oracle MERGE command.

Drop the following components from the Palette to the design workspace: tELTOracleInput, tELTOracleMap, and tELTOracleOutput, and label them to identify their functionality. Double-click the tELTOracleInput component to display its Basic settings view.

Select Repository from the Schema list, click the three dot button preceding Edit schema, and select your DB connection and the desired schema from the [Repository Content] dialog box. The selected schema name appears in the Default Table Name field automatically. In this use case, the DB connection is Talend_Oracle and the schema is new_customers.
In this use case, the input schema is stored in the Metadata node of the Repository tree view for easy retrieval. For further information concerning metadata, see How to centralize the Metadata items in the Talend Open Studio User Guide. You can also select the input component by dropping the relevant schema from the Metadata area onto the design workspace and double-clicking tELTOracleInput from the [Components] dialog box. Doing so allows you to skip the steps of labeling the input component and defining its schema manually.

Connect the tELTOracleInput components to the tELTOraclelMap component using the link named strictly after the actual DB table name, new_customers in this use case. Connect the tELTOraclelMap component to the tELTOracleOutput component and name the link customers_merge, which is the name of the database table you will save the merge result to. Click the tELTOraclelMap component to display its Basic settings view.

Talend Open Studio Components

927

ELT components
tELTOracleOutput

Select Repository from the Property Type list, and select the same DB connection that you use for the input components. All the database details are automatically retrieved. Leave the other settings as they are. Double-click the tELTOracleMap component to launch the ELT Map editor to set up the data transformation flow. Display the input table by clicking the green plus button at the upper left corner of the ELT Map editor and selecting the relevant table name in the [Add a new alias] dialog box. In this use case, the only input table is new_customers.

Select all the columns in the input table and drop them to the output table.

928

Talend Open Studio Components

ELT components
tELTOracleOutput

Click the Generated SQL Select query tab to display the query statement to be executed.

Click OK to validate the ELT Map settings and close the ELT Map editor. In the design workspace, double-click the tELTOracleOutput component to display its Basic settings view. From the Action on data list, select MERGE. Click the Sync columns button to retrieve the schema from the preceding component. Select the Use Merge Update check box to update the data using Oracles MERGE function. In the table that appears, select the check boxes for the columns you want to update. In this use case, we want to update all the data according to the customer ID. Therefore, select all the check boxes except the one for the ID column.

Talend Open Studio Components

929

ELT components
tELTOracleOutput

The columns defined as the primary key CANNOT and MUST NOT be made subject to updates.

Select the Use Merge Insert check box to insert new data while updating the existing data by leveraging Oracles MERGE function. In the table that appears, select the check boxes for the columns into which you want to insert new date. In this use case, we want to insert all the new customer data. Therefore, select all the check boxes by clicking the Check All check box. Fill the Default Table Name field with the name of the target table already existing in your database. In this example, fill in customers_merge. Leave the other parameters as they are.

Save your Job and press F6 to run it. The data is updated and inserted in the database. The query used is displayed on the console.

930

Talend Open Studio Components

ELT components
tELTPostgresqlInput

tELTPostgresqlInput
tELTPostgresqlInput properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components should be used to handle Postgresql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Enter the default table name, between double quotation marks.

Default Schema Name Enter the default schema name, between double quotation marks. Advanced settings tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTPostgresqlInput is to be used along with the tELTPostgresqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Usage

Related scenarios
For use cases in relation with tELTPostgresqlInput, see tELTMysqlMap scenarios:
Talend Open Studio Components 931

ELT components
tELTPostgresqlInput

Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

932

Talend Open Studio Components

ELT components
tELTPostgresqlMap

tELTPostgresqlMap
tELTPostgresqlMap properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components should be used to handle Postgresql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps to build the SQL statement graphically, using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Select this check box and select the appropriate Connection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT Postgresql Map Editor Style link The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance. Either Built-in or Repository.

Basic settings

Property type

Talend Open Studio Components

933

ELT components
tELTPostgresqlMap

Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Advanced settings Additional JDBC parameters Database server IP address Listening port number of DB server. Name of the database DB user authentication data. Specify additional connection properties for the DB connection you are creating. This option is not available if you have selected the Use an existing connection check box in the Basic settings.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tELTPostgresqlMap is used along with a tELTPostgresqlInput and tELTPostgresqlOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenario:
For related scenarios, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907. Scenario 2: ELT using an Alias table on page 911.

934

Talend Open Studio Components

ELT components
tELTPostgresqlOutput

tELTPostgresqlOutput
tELTPostgresqlOutput properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the Postgresql database Action on data On the data of the table defined, you can perform the following operation: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter the default table name between double quotation marks.

Schema and Edit schema

Default Schema Name Enter the default schema name between double quotation marks Use different table name Select this check box to enter a different output table name, between double quotation marks, in the Table name field which appears.

Talend Open Studio Components

935

ELT components
tELTPostgresqlOutput

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTPostgresqlOutput is to be used along with the tELTPostgresqlMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Usage

Related scenarios
For use cases in relation with tELTPostgresqlOutput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

936

Talend Open Studio Components

ELT components
tELTSybaseInput

tELTSybaseInput
tELTSybaseInput properties
The three ELT Sybase components are closely related, in terms of their operating conditions. These components should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema for the SQL statement to execute Allows you to add as many Input tables as required, for Insert statements which can be complex. Schema and Edit schema A schema is a row description, i.e., it defines the number and nature of the fields to be processed. The schema is either built-in (local) or stored remotely in the Repository. The Schema defined is then passed on to the ELT Mapper for inclusion in the Insert SQL statement. Click on Edit Schema, to modify the schema. Note that if you modify the schema, it automatically becomes built-in. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema, in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository. Hence, it can be re-used for other projects and Jobs. Related topic: How to set a built-in schema, in the Talend Open Studio User Guide. Default Table Name Enter a default name for the table, between double quotation marks.

Default Schema Name Enter a default name for the Sybase schema, between double quotation marks. Advanced settings tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. tELTSybaseInput is intended for use with tELTSybaseMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. ELT components only handle schema information. They do not handle actual data flow..

Usage

Related scenarios
For scenarios in which tELTSybaseInput may be used, see tELTMysqlMap scenarios:

Talend Open Studio Components

937

ELT components
tELTSybaseInput

Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911.

938

Talend Open Studio Components

ELT components
tELTSybaseMap

tELTSybaseMap
tELTSybaseMap properties
The three ELT Sybase components are closely related in terms of their operating conditions. These components should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Allows you construct a graphical build of the SQL statement using the table provided as input. Uses the tables provided as input to feed the parameters required to execute the SQL statement. The statement can include inner or outer joins to be implemented between tables or between a table and its aliases Use an existing connection Select this check box and select the appropriate tSybaseConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT Sybase Map Editor Style link The ELT Map editor allows you to define the output schema and make a graphical build of the SQL statement to be executed. Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance.

Basic settings

Talend Open Studio Components

939

ELT components
tELTSybaseMap

Property type

Can be either Built-in or Repository. Built-in : No property data is stored centrally. Repository : Select the Repository file where the component properties are stored. The following fields are pre-filled using collected data

Host Port Database Username et Password Advanced settings Usage

Database server IP address Listening port number of DB server Name of the database DB user authentication data.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at component level. tELTSybaseMap is intended for use with tELTSybaseInput and tELTSybaseOutput. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. The ELT components only handle schema information. They do not handle actual data flow.

Related scenarios
For scenarios in which tELTSybaseMap may be used, see the following tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907. Scenario 2: ELT using an Alias table on page 911.

940

Talend Open Studio Components

ELT components
tELTSybaseOutput

tELTSybaseOutput
tELTSybaseOutput properties
The three ELT Sybase components are closely related in terms of their operating conditions. These components should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Componant family ELT

Function Purpose Basic settings In Java, use tCreate Table as substitute for this function.

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement in the Mysql database Action on data On the data of the table defined, you can perform the following operation: Insert: Add new entries to the table. If duplicates are found, the Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number and nature of the fields to be processed and passed on to the next component. The schema is either built-in (local) or stored remotely in the Repository. The Schema defined is then passed on to the ELT Mapper for inclusion in the Insert SQL statement. Click on Edit Schema, to modify the schema. Note that if you modify the schema, it automatically becomes built-in. Built-in: The schema is created and stored locally for this component only. Related topic:: How to set a built-in schema, in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository. Hence, it can be re-used for other projects and Jobs. Related topic: How to set a built-in schema, in the Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter a default name for the table, between double quotation marks.

Schema and Edit schema

Default Schema Name Enter a default name for the Sybase schema, between double quotation marks. Use different table name Select this check box to enter a different output table name, between double quotation marks, in the Table name field which appears.

Talend Open Studio Components

941

ELT components
tELTSybaseOutput

Advanced settings Usage

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at component level. tELTSybaseOutput is intended for use with the tELTMysqlInput and tELTSybaseMap components. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name.. ELT components only handle schema information. They do not handle actual data flow.

Related scenarios
For scenarios in which tELTSybaseOutput may be used, see the following tELTMysqlMap scenarios : Scenario 1: Aggregating table columns and filtering on page 907. Scenario 2: ELT using an Alias table on page 911.

942

Talend Open Studio Components

ELT components
tELTTeradataInput

tELTTeradataInput
tELTTeradataInput properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings

Provides the table schema to be used for the SQL statement to execute. Allows you to add as many Input tables as required for the most complicated Insert statement. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed. The schema is either built-in or remotely stored in the Repository. The Schema defined is then passed on to the ELT Mapper to be included to the Insert SQL statement. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Default Table Name Enter a default name for the table, between double quotation marks.

Advanced settings Usage

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at component level. tELTTeradataInput is to be used along with the tELTTeradataMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name Note that the ELT components do not handle actual data flow but only schema information.

Related scenarios
For use cases in relation with tELTTeradataInput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911
Talend Open Studio Components 943

ELT components
tELTTeradataMap

tELTTeradataMap
tELTTeradataMap properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose

Helps to graphically build the SQL statement using the table provided as input. Uses the tables provided as input, to feed the parameter in the built statement. The statement can include inner or outer joins to be implemented between tables or between one table and its aliases. Use an existing connection Select this check box and select the appropriate tSybaseConnection component from the Component list if you want to re-use connection parameters that you have already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, make sure that the available connection components are sharing the intended connection. For more information on how to share a DB connection across Job levels, see Use or register a shared DB connection in any database connection component corresponding to the database you are using, in Database components on page 315. Otherwise, you can as well deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. ELT Teradata Map editor Style link The ELT Map editor allows you to define the output schema as well as build graphically the SQL statement to be executed. Select the way in which links are displayed. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Bezier curve: Links between the schema and the Web service parameters are in the form of curve. Line: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance. Either Built-in or Repository.

Basic settings

Property type

944

Talend Open Studio Components

ELT components
tELTTeradataMap

Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Host Port Database Username and Password Usage Database server IP address Listening port number of DB server. Name of the database DB user authentication data.

tELTTeradataMap is used along with a tELTTeradataInput and tELTTeradataOutput. Note that the Output link to be used with these components must faithfully reflect the name of the tables. The ELT components do not handle actual data flow but only schema information.

Connecting ELT components


For detailed information regarding ELT component connections, see Connecting ELT components on page 906. Related topic: Link connection of Talend Open Studio User Guide.

Mapping and joining tables


In the ELT Mapper, you can select specific columns from input schemas and include them in the output schema. For detailed information regarding the table schema mapping and joining, see Mapping and joining tables on page 921.

Adding WHERE clauses


For details regarding the clause handling, see Adding where clauses on page 922.

Generating the SQL statement


The mapping of elements from the input schemas to the output schemas create instantly the corresponding Select statement. The clause defined internally in the ELT Mapper are also included automatically.

Related scenarios
For use cases in relation with tELTTeradataMap, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

945

ELT components
tELTTeradataOutput

tELTTeradataOutput
tELTTeradataOutput properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be executed in the DB output table defined.
Component family ELT

Function Purpose Basic settings In Java, use tCreate Table as substitute for this function.

Carries out the action on the table specified and inserts the data according to the output schema defined the ELT Mapper. Executes the SQL Insert, Update and Delete statement to the Teradata database Action on data On the data of the table defined, you can perform the following operation: Insert: Add new entries to the table. If duplicates are found, Job stops. Update: Updates entries in the table. Delete: Deletes the entries which correspond to the entry flow. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modifcation, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where clauses for (for UPDATE and DELETE only) Default Table Name Use different table name Enter a clause to filter the data to be updated or deleted during the update or delete operations. Enter a default name for the table, between double quotation marks. Select this check box to enter a different output table name, between double quotation marks, in the Table name field which appears.

Schema and Edit schema

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at component level.

946

Talend Open Studio Components

ELT components
tELTTeradataOutput

Usage

tELTTeradataOutput is to be used along with the tELTTeradataMap. Note that the Output link to be used with these components must correspond strictly to the syntax of the table name. Note that the ELT components do not handle actual data flow but only schema information.

Related scenarios
For use cases in relation with tELTTeradataOutput, see tELTMysqlMap scenarios: Scenario 1: Aggregating table columns and filtering on page 907 Scenario 2: ELT using an Alias table on page 911

Talend Open Studio Components

947

ELT components
tSQLTemplateAggregate

tSQLTemplateAggregate
tSQLTemplateAggregate properties
Component family ELT/SQLTemplate

Function

tSQLTemplateAggregate collects data values from one or more columns with the intent to manage the collection as a single unit. This component has real-time capabilities since it runs the data transform on the DBMS itself. Helps to provide a set of matrix based on values or calculations. Database Type Component List Select the database type you want to connect to from the list. Select the relevant DB connection component in the list if you use more than one connection in the current Job. Name of the database. Name of the table holding the data you want to collect values from. Name of the table you want to write the collected and transformed data in. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide Operations Select the type of operation along with the value to use for the calculation and the output field. Output Column: Select the destination field in the list. Function: Select any of the following operations to perform on data: count, min, max, avg, sum, and count (distinct).

Purpose Basic settings

Database name Source table name Target table name Schema and Edit schema

948

Talend Open Studio Components

ELT components
tSQLTemplateAggregate

Input column position: Select the input column from which you want to collect the values to be aggregated. Group by Define the aggregation sets, the values of which will be used for calculations. Output Column: Select the column label in the list offered according to the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations. Input Column position: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different. Advanced settings tStatCatcher Statistics SQL Template List Select this check box to gather the job processing metadata at a job level as well as at each component level. To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL template and add them to the SQL Template List. To create a user-defined SQL template: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide. Usage This component is used as an intermediate component with other relevant DB components, especially the DB connection and commit components. n/a

SQL Template

Limitation

Scenario: Filtering and aggregating table columns directly on the DBMS


The following Java scenario creates a Job that opens a connection to a Mysql database and:

Talend Open Studio Components

949

ELT components
tSQLTemplateAggregate

instantiates the schemas from a database table whose rows match the column names specified in the filter, filters a column in the same database table to have only the data that matches a WHERE clause, collects data grouped by specific value(s) from the filtered column and writes aggregated data in a target database table. To filter and aggregate database table columns: Drop the following components from the Palette onto the design workspace: tELTMysqlconnection, tSQLTemplateFilterColumns, tSQLTemplateFilterRows, tSQLTemplateAggregate, tSQLTemplateCommit, and tSQLTemplateRollback. Connect the five first components using OnComponentOk links. Connect tSQLTemplateAggregate to tSQLTemplateRollback using an OnComponentError link.

In the design workspace, select tMysqlConnection and click the Component tab to define the basic settings for tMysqlConnection. In the Basic settings view, set the database connection details manually or select Repository from the Property Type list and select your DB connection if it has already been defined and stored in the Metadata area of the Repository tree view. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide.

In the design workspace, select tSQLTemplateFilterColumns and click the Component tab to define its basic settings.

950

Talend Open Studio Components

ELT components
tSQLTemplateAggregate

On the Database type list, select the relevant database. On the Component list, select the relevant database connection component if more than one connection is used. Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables.
When you define the data structure for the source table, column names automatically appear in the Column list in the Column filters panel.

In this scenario, the source table has five columns: id, First_Name, Last_Name, Address, and id_State. In the Column filters panel, set the column filter by selecting the check boxes of the columns you want to write in the source table. In this scenario, the tSQLTemplateFilterColumns component instantiates only three columns: id, First_Name, and id_State from the source table.
In the Component view, you can click the SQL Template tab and add system SQL templates or create your own and use them within your Job to carry out the coded operation. For more information, see tSQLTemplateFilterColumns Properties on page 957.

In the design workspace, select tSQLTemplateFilterRows and click the Component tab to define its basic settings.

On the Database type list, select the relevant database.

Talend Open Studio Components

951

ELT components
tSQLTemplateAggregate

On the Component list, select the relevant database connection component if more than one connection is used. Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables. In this scenario, the source table has the three initially instantiated columns: id, First_Name, and id_State and the source table has the same three-column schema. In the Where condition field, enter a WHERE clause to extract only those records that fulfill the specified criterion. In this scenario, the tSQLTemplateFilterRows component filters the First_Name column in the source table to extract only the first names that contain the a letter. In the design workspace, select tSQLTemplateAggregate and click the Component tab to define its basic settings. On the Database type list, select the relevant database. On the Component list, select the relevant database connection component if more than one connection is used. Enter the names for the database, source table, and target table in the corresponding fields and click the three-dot buttons next to Edit schema to define the data structure in the source and target tables. The schema for the source table consists of the three columns: id, First_Name, and id_State. The schema for the target table consists of two columns: customers_status and customers_number. In this scenario, we want to group customers by their marital status and count customer number in each marital group. To do that, we define the Operations and Group by panels accordingly.

952

Talend Open Studio Components

ELT components
tSQLTemplateAggregate

In the Operations panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the counted data. Click in the Function line and select the operation to be carried on. In the Group by panel, click the plus button to add one or more lines and then click in the Output column line to select the output column that will hold the aggregated data. In the design workspace, select tSQLTemplateCommit and click the Component tab to define its basic settings. On the Database type list, select the relevant database. On the Component list, select the relevant database connection component if more than one connection is used. Do the same for tSQLTemplateRollback. Save your Job and press F6 to execute it. A two-column table aggregate_customers is created in the database. It groups customers according to their marital status and count customer number in each marital group.

Talend Open Studio Components

953

ELT components
tSQLTemplateAggregate

954

Talend Open Studio Components

ELT components
tSQLTemplateCommit

tSQLTemplateCommit
tSQLTemplateCommit properties
This component is closely related to tSQLTemplateRollback and to the ELT connection component for the database you work with. tSQLTemplateCommit, tSQLTemplateRollback and the ELT database connection component are usually used together in a transaction.
Component family ELT/SQLTemplate

Function Purpose

tSQLTemplateCommit validates the data processed in a Job in a specified database. Using a single connection, this component commits a global action in one go instead of doing so for every row or every batch of rows, separately. This provides a gain in performance. Database Type Component List Select the database type you want to connect to from the list. Select the ELT database connection component in the list if more than one connection is required for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Basic settings

Close Connection

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. SQL Template List To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL template and add them to the SQL Ttemplate List. To create a user-defined SQL template: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide.

SQL Template

Talend Open Studio Components

955

ELT components
tSQLTemplateCommit

Usage Limitation

This component is to be used with ELT components, especially with tSQLTemplateRollback and the relevant database connection component. n/a

Related scenario
This component is closely related to tSQLTemplateRollback and to the ELT connection component depending on the database you are working with. It usually does not make much sense to use ELT components without using the relevant ELT database connection component as its purpose is to open a connection for a transaction. For more information on tSQLTemplateCommit, see Scenario: Filtering and aggregating table columns directly on the DBMS on page 949.

956

Talend Open Studio Components

ELT components
tSQLTemplateFilterColumns

tSQLTemplateFilterColumns
tSQLTemplateFilterColumns Properties
Component family ELT/SQLTemplate

Function

tSQLTemplateFilterColumns makes specified changes to the defined schema of the database table based on column name mapping. This component has real-time capabilities since it runs the data filtering on the DBMS itself Helps homogenize schemas by reorganizing, deleting or adding new columns. Database Type Component List Select the type of database you want to work on from the drop-down list. Select the relevant DB connection component in the list if you use more than one connection in the current Job. Name of the database. Name of the table holding the data you want to filter. Name of the table you want to write the filtered data in. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Column Filters In the table, click the Filter check box to filter all of the columns. To select specific columns for filtering, select the check box(es) which correspond(s) to the column name(s). Select this check box to gather the job processing metadata at a job level as well as at each component level.

Purpose Basic settings

Database name Source table name Target table name Schema and Edit schema

Advanced settings

tStatCatcher Statistics

Talend Open Studio Components

957

ELT components
tSQLTemplateFilterColumns

SQL Template

SQL Template List

To add a default system SQL Template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL templates and add them to the SQL Template List. To create a user-defined SQL list: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL Template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide.

Usage Limitation

This component is used as an intermediary component with other relevant DB components, especially DB connection components. n/a

Related Scenario
For a related scenario, see Scenario: Filtering and aggregating table columns directly on the DBMS on page 949.

958

Talend Open Studio Components

ELT components
tSQLTemplateFilterRows

tSQLTemplateFilterRows
tSQLTemplateFilterRows Properties
Component family ELT/SQLTemplate

Function

tSQLTemplateFilterRows allows you to define a row filter on one table. This component has real-time capabilities since it runs the data filtering on the DBMS itself. Helps to set row filters for any given data source, based on a WHERE clause. Database Type Component List Select the type of database you want to work on from the drop down list. Select the relevant DB connection component in the list if you are using more than one connection in the current Job. Name of the database. Name of the table holding the data you want to filter. Name of the table you want to write the filtered data in. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Where condition Use a WHERE clause to set the criteria that you want the rows to meet. You can use the WHERE clause to select specific rows from the table that match specified criteria or conditions. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Purpose Basic settings

Database name Source table name Target table name Schema and Edit schema

Advanced settings

tStatCatcher Statistics

Talend Open Studio Components

959

ELT components
tSQLTemplateFilterRows

SQL Template

SQL Template List

To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL template and add them to the SQL Template List. To create a user-defined SQL template: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide.

Usage Limitation

This component is used as an intermediary component with other DB components, particularly DB connection components. n/a

Related Scenario
For a related scenario, see Scenario: Filtering and aggregating table columns directly on the DBMS on page 949.

960

Talend Open Studio Components

ELT components
tSQLTemplateMerge

tSQLTemplateMerge
tSQLTemplateMerge properties
Component family ELT/SQLTemplate

Function Purpose Basic settings

This component creates an SQL MERGE statement to merge data into a database table. This component is used to merge data into a database table directly on the DBMS by creating and executing a MERGE statement. Database Type Component list Select the type of database you want to work on from the drop-down list. Select the relevant DB connection component from the list if you use more than one connection in the current Job. Name of the database table holding the data you want to merge into the target table. Name of the table you want to merge data into. This component involves two schemas: source schema and target schema. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Click Edit Schema to modify the schema. Note that if you make the modification, the schema switches automatically to the Built-in mode. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Merge ON Specify the target and source columns you want to use as the primary keys.

Source table name Target table name Schema and Edit schema

Use UPDATE (WHEN Select this check box to update existing records. With MATCHED) the check box selected, the UPDATE Columns table appears, allowing you to define the columns in which records are to be updated. Specify additional output columns Select this check box to update records in additional columns other than those listed in the UPDATE Columns table. With this check box selected, the Additional UPDATE Columns table appears, allowing you to specify additional columns.

Talend Open Studio Components

961

ELT components
tSQLTemplateMerge

Specify UPDATE WHERE clause

Select this check box and type in a WHERE clause in the WHERE clause field to filter data during the update operation. This option may not work with certain database versions, including Oracle 9i. Select this check box to insert new records. With the check box selected, the INSERT Columns table appears, allowing you to specify the columns to be involved in the insert operation. Select this check box to insert records to additional columns other than those listed in the INSERT Columns table. With this check box selected, the Additional INSERT Columns table appears, allowing you to specify additional columns. Select this check box and type in a WHERE clause in the WHERE clause field to filter data during the insert operation. This option may not work with certain database versions, including Oracle 9i.

Use INSERT (WHEN MATCHED)

Specify additional output columns

Specify INSERT WHERE clause

Advanced settings SQL Template

tStatCatcher Statistics Select this check box to gather the Job processing metadata at a Job level as well as at component level. SQL Template List To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL template and add them to the SQL Template List. To create a user-defined SQL template: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide.

Usage

This component is used as an intermediate component with other relevant DB components, especially the DB connection and commit components.

962

Talend Open Studio Components

ELT components
tSQLTemplateMerge

Scenario: Merging data directly on the DBMS


This scenario describes a simple Job that opens a connection to a MySQL database, merges data from a source table into a target table according to customer IDs, and displays the contents of the target table before and after the merge action. A WHERE clause is used to filter data during the merge operation.

Drop a tMysqlConnection component, a tSQLTemplateMerge component, two tMysqlInput components and two tLogRow components from the Palette onto the design workspace. Connect the tMysqlConnection component to the first tMysqlInput component using a Trigger > OnSubjobOK connection. Connect the first tMysqlInput component to the first tLogRow component using a Row > Main connection. This row will display the initial contents of the target table on the console. Connect the first tMysqlInput component to the tSQLTemplateMerge component, and the tSQLTemplateMerge component to the second tMysqlInput component using Trigger > OnSubjobOK connections. Connect the second tMysqlInput component to the second tLogRow component using a Row > Main connection. This row will display the merge result on the console. Double-click the tMysqlConnection component to display its Basic settings view.

Talend Open Studio Components

963

ELT components
tSQLTemplateMerge

Set the database connection details manually or select Repository from the Property Type list and select your DB connection if it has already been defined and stored in the Metadata area of the Repository tree view. For more information about Metadata, see How to centralize the Metadata items of Talend Open Studio User Guide. Double-click the first tMysqlInput component to display its Basic settings view.

Select the Use an existing connection check box. If you are using more than one DB connection component in your Job, select the component you want to use from the Component List. Click the three-dot button next to Edit schema and define the data structure of the target table, or select Repository from the Schema list and select the target table if the schema has already been defined and stored in the Metadata area of the Repository tree view. In this scenario, we use built-in schemas.

964

Talend Open Studio Components

ELT components
tSQLTemplateMerge

Define the columns as shown above, and then click OK to propagate the schema structure to the output component and close the schema dialog box. Fill the Table Name field with the name of the target table, customer_info_merge in this scenario. Click the Guess Query button, or type in SELECT * FROM customer_info_merge in the Query area, to retrieve all the table columns. Define the properties of the second tMysqlInput component, using exactly the same settings as for the first tMysqlInput component. In the Basic settings view of each tLogRow component, select the Table option in the Mode area so that the contents will be displayed in table cells on the console.

Double-click the tSQLTemplateMerge component to display its Basic settings view.

Talend Open Studio Components

965

ELT components
tSQLTemplateMerge

Type in the names of the source table and the target table in the relevant fields. In this scenario, the source table is new_customer_info, which contains eight records; the target table is customer_info_merge, which contains five records, and both tables have the same data structure.
The source table and the target table may have different schema structures. In this case, however, make sure that the source column and target column specified in each line of the Merge ON table, the UPDATE Columns table, and the INSERT Columns table are identical in data type and the target column length allows the insertion of the data from the corresponding source column.

Define the source schema manually, or select Repository from the Schema list and select the relevant table if the schema has already been defined and stored in the Metadata area of the Repository tree view. In this scenario, we use built-in schemas.

Define the columns as shown above and click OK to close the schema dialog box, and do the same for the target schema. Click the green plus button beneath the Merge ON table to add a line, and select the ID column as the primary key.

966

Talend Open Studio Components

ELT components
tSQLTemplateMerge

Select the Use UPDATE check box to update existing data during the merge operation, and define the columns to be updated by clicking the green plus button and selecting the desired columns. In this scenario, we want to update all the columns according to the customer IDs. Therefore, we select all the columns except the ID column.
The columns defined as the primary key CANNOT and MUST NOT be made subject to updates.

Select the Specify UPDATE WHERE clause check box and type in customer_info_merge.ID >= 4 within double quotation marks in the WHERE clause field so that only those existing records with an ID equal to or greater than 4 will be updated.

Select the Use INSERT check box and define the columns to take data from and insert data to in the INSERT Columns table. In this example, we want to insert all the records that do not exist in the target table.

Talend Open Studio Components

967

ELT components
tSQLTemplateMerge

Select the SQL Template view to display and add the SQL templates to be used. By default, the SQLTemplateMerge component uses two system SQL templates: MergeUpdate and MergeInsert.
In the SQL Template tab, you can add system SQL templates or create your own and use them within your Job to carry out the coded operation. For more information, see tSQLTemplateFilterColumns Properties on page 957.

Click the Add button to add a line and select Commit from the template list to commit the merge result to your database. Alternatively, you can connect the tSQLTemplateMerge component to a tSQLTemplateCommit or tMysqlCommit component using a Trigger > OnSubjobOK connection to commit the merge result to your database. Save your Job and press F6 to run it. Both the original contents of the target table and the merge result are displayed on the console. In the target table, records No. 4 and No. 5 contain the updated information, and records No.6 through No. 8 contain the inserted information.

968

Talend Open Studio Components

ELT components
tSQLTemplateMerge

Talend Open Studio Components

969

ELT components
tSQLTemplateRollback

tSQLTemplateRollback
tSQLTemplateRollback properties
This component is closely related to tSQLTemplateCommit and to the ELT connection component relative to the database you work with. tSQLTemplateRollback, tSQLTemplateCommit and the ELT database connection component are usually used together in a transaction.
Component family ELT/SQLTemplate

Function Purpose Basic settings

tSQLTemplateRollback cancels the transaction committed in the database you connect to. To avoid committing transactions accidentally. Database Type Component List Select the database type you want to connect to from the list. Select the ELT database connection component in the list if more than one connection is planned for the current Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Close Connection

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. SQL Template List To add a default system SQL template: Click the Add button to add the default system SQL template(s) in the SQL Template List. Click in the SQL template field and then click the arrow to display the system SQL template list. Select the desired system SQL template provided by Talend. Note: You can create your own SQL template and add them to the SQL Template List. To create a user-defined SQL template: -Select a system template from the SQL Template list and click on its code in the code box. You will be prompted by the system to create a new template. -Click Yes to open the SQL template wizard. -Define your new SQL template in the corresponding fields and click Finish to close the wizard. An SQL template editor opens where you can enter the template code. -Click the Add button to add the new created template to the SQL Template list. For more information, see How to use the SQL Templates of Talend Open Studio User Guide.

SQL Template

Usage Limitation

This component is to be used with ELT components, especially with tSQLTemplateCommit and the relevant database connection component. n/a

970

Talend Open Studio Components

ELT components
tSQLTemplateRollback

Related scenarios
For a tSQLTemplateRollback related scenario, see Scenario: Filtering and aggregating table columns directly on the DBMS on page 949.

Talend Open Studio Components

971

ELT components
tSQLTemplateRollback

972

Talend Open Studio Components

ESB components
This chapter details the main components that you can find in the ESB family of the Talend Open Studio Palette.

ESB components
tESBConsumer

tESBConsumer
tESBConsumer Properties
Component family ESB

Function Purpose Basic settings

Calls the defined method from the invoked Web service, and returns the class as defined, based on the given parameters. Invokes a Method through a Web service. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where properties are stored. The fields that come after are pre-filled in using fetched data. Service Configuration Mapping display link as Description of Web service bindings and configuration. Auto: By default, the links between the input and output schemas and the Web service parameters are in the form of curves. Curves: Links between the schema and the Web service parameters are in the form of curve. Lines: Links between the schema and the Web service parameters are in the form of straight lines. This option slightly optimizes performance.

Connection time out Set a value in seconds for Web service connection time out. Receive time out Input schema and Edit schema Set a value in seconds for server answer. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Response schema and Edit schema This will take automatically the columns from the previous component. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository

974

Talend Open Studio Components

ESB components
tESBConsumer

Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Fault Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Use NTLM/Domain and Host Select this check box if you want to use the NTLM authentication protocol. Domain: Name of the client domain, Host: Client IP address.

Need Select this check box and enter a username and a Authentication/User password in the corresponding fields if this is name and Password necessary to access the service. Use HTTP Proxy/Proxy host, Proxy port, Proxy user, and Proxy password Trust Server with SSL Select this check box if you are using a proxy server and fill in the necessary information.

Select this check box to validate the server certificate to the client via an SSL protocol and fill in the corresponding fields: TrustStore file: enter the path (including filename) to the certificate TrustStore file that contains the list of certificates that the client trusts. TrustStore password: enter the password used to check the integrity of the TrustStore data. Select this check box if you want the job to die on error. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Die on error Advanced Setting tStatCatcher Statistics

Usage Limitation

This component can be used as an intermediate component. It requires to be linked to an output component. A JDK is required for this component to operate.

Talend Open Studio Components

975

ESB components
tESBConsumer

Scenario: Return valid email


This java scenario describes a Job that uses a tESBConsumer component to retrieve the valid email. To create the Job: Drop the following components from the Palette to the design workspace tFixedFlowInput, tESBConsumer, two tJavaFlex, and two tLogRow and connect them as it is displayed on the diagram:

Then set properties for the components. In the Job Design, double-click the tFixedFlowInput_1 component to display its Component view and set its Basic settings. Set the Schema type to Built-in and click the three-dot button next to Edit Schema and define the schema as follow:

Click the plus button to add a new line of string type and name it payloadString.

976

Talend Open Studio Components

ESB components
tESBConsumer

Click Ok.

In the Number of rows field, set the number of rows as 1. In the Mode area, select Use Single Table and input the following request in quotations into Value field: <web:IsValidEmail xmlns:web='http://www.webservicex.net'><web:Email>nomatter@g mail.com</web:Email></web:IsValidEmail> In the Job Design, double-click the tJavaFlex_1 component to display its Component view and set its Basic settings.

Talend Open Studio Components

977

ESB components
tESBConsumer

Add following code in Main code field at Basic properties tab: org.dom4j.Document doc = org.dom4j.DocumentHelper.parseText(row1.payloadString); Document talendDoc = new Document(); talendDoc.setDocument(doc); row2.payload = talendDoc; Set the Schema type to Built-in and click the three-dot button next to Edit Schema and define the schema as follow:

In the Job Design, double-click the tESBConsumer_1 component to display its Component view and set its Basic settings.

Click the three-dot button next to the Service Configuration to open the editor.

978

Talend Open Studio Components

ESB components
tESBConsumer

In the WSDL field, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL Click Refresh to retrieve port name and operation name. In the Port Name list, select the port you want to use, ValidateEmailSoap in this example. Click OK. In the Basic settings of the tESBConsumer, set the Input Schema as follow:

Set the Response Schema as follow:

Set the Fault Schema as follow:

Talend Open Studio Components

979

ESB components
tESBConsumer

In the Job Design, double-click the tJavaFlex_2 component to display its Component view and set its Basic settings.

Add following code in the Main code field at the Basic properties tab: if (null != row4.faultDetail) { row6.faultDetailString = row4.faultDetail.getDocument().asXML(); } Set the Schema type to Built-in and click the three-dot button next to Edit Schema and define the schema as follow:

In the Job Design, double-click the tLogRow_1 component to display its Component view and set its Basic settings

980

Talend Open Studio Components

ESB components
tESBConsumer

Set the Schema type to Built-in and click the three-dot button next to Edit Schema and define the schema as follow:

In the Job Design, double-click the tLogRow_2 component to display its Component view and set its Basic settings

Set the Schema type to Built-in and click the three-dot button next to Edit Schema and define the schema as follow:

Save your Job and press F6 to execute it.

Talend Open Studio Components

981

ESB components
tESBConsumer

In the execution log you will see: Starting job consumer4 at 15:02 21/04/2011. [statistics] connecting to socket on port 4057 [statistics] connected ValidateEmail ValidateEmailSoap | {http://www.webservicex.net}ValidateEmail {http://www.webservicex.net}ValidateEmailSoap IsValidEmail [tLogRow_2] payload: <?xml version="1.0" encoding="UTF-8"?> <IsValidEmailResponse xmlns="http://www.webservicex.net"><IsValidEmailResult>false</IsV alidEmailResult></IsValidEmailResponse> [statistics] disconnected Job consumer4 ended at 15:03 21/04/2011. [exit code=0]

982

Talend Open Studio Components

ESB components
tESBProviderFault

tESBProviderFault
tESBProviderFault properties
Component family ESB

Function Purpose Basic settings

Generates a Fault message at the end of a Talend job cycle from the Web service. Returns a Fault message of the Web Service response at the end of a Talend job cycle. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns EBS service settings fault title This will take automatically the columns from the previous component. Value of the faultString in the Fault message.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component should only be used with to be used along with the tESBProviderRequest component. A JDK is required for this component to operate.

Usage Limitation

Scenario: Return Fault message


This Java scenario describes how to use a tESBProviderFault component to return Fault message. You need create two jobs for this scenario: Provider Job Consumer Job

Setting up a Provider Job


This section shows how to set up a Provider Job.

Talend Open Studio Components

983

ESB components
tESBProviderFault

Drop the following components from the Palette onto the design workspace: a tESBProviderRequest, three tLogRow, two tJavaFlex, and a tESBProviderFault. Connect the components as shown in the diagram. Set properties for the tESBProvider Request_1 component.

Edit the schema of the tESBProvider Request_1 component.

Set properties for the tLogRow_1 component

984

Talend Open Studio Components

ESB components
tESBProviderFault

Edit the Schema of the tLogRow_1 component.

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic properties tab as shown below. Document requestTalendDoc = row2.payload; org.dom4j.Document requestDoc = requestTaldndDoc.getDocument(); org.dom4j.Element rootElement = requestDoc.getRootElement(); String text = rootElement.getTextTrim(); System.out.println("### " + text); row3.content = text;

Talend Open Studio Components

985

ESB components
tESBProviderFault

Edit the schema of the tJavaFlex_1 component.

Set properties for the tLogRow_2 component

Edit the schema for the tLogRow_2 component.

986

Talend Open Studio Components

ESB components
tESBProviderFault

Set properties for the tJavaFlex_2 component. Add the following code in the Main code field at the Basic properties tab as shown below. String fMessage = row4.content; String faultText = "Fault message text: " + fMessage + "!"; org.dom4j.Document faultDoc = org.dom4j.DocumentHelper.createDocument(); faultDoc.addElement(new org.dom4j.QName("response",

org.dom4j.Namespace.get("http://talend.org/esb/service/jo b")) ).addText(faultText); Document faultTalendDoc = new Document(); faultTalendDoc.setDocument(faultDoc); row5.payload = faultTalendDoc;

Edit the schema of the tJavaFlex_2 component.

Set properties for the tLogRow_3 component.

Talend Open Studio Components

987

ESB components
tESBProviderFault

Edit the schema of the tLogRow_3 component.

Set Properties for the tESBProviderFault_1 component.

Edit the schema of the tESBProviderFault_1 component.

The Job can be run without errors.

Setting up a Consumer Job


This section shows how to set up a Consumer Job as shown below.

988

Talend Open Studio Components

ESB components
tESBProviderFault

Drop the following components from the Palette onto the design workspace: a tESBConsumer, two tLogRow, a JavaFlex, and a tFixedFlowInput. Connect the components as shown in the diagram. Edit the schema of the tFixedFlowInput_1 component.

Set properties for the tFixedFlowInput_1 component.

Talend Open Studio Components

989

ESB components
tESBProviderFault

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic Properties tab. org.dom4j.Document doc = org.dom4j.DocumentHelper.parseText(row1.payloadString); Document talendDoc = new Document(); talendDoc.setDocument(doc); row2.payload = talendDoc;

Edit the schema of the tJavaFlex_1 component.

Start the Provider Job. In the executing log you can see: ... web service [endpoint: http://127.0.0.1:8008/esb/provider] published ... Open Service Configuration and put path to WSDL: http://127.0.0.1.8088/esbprovider?WSDL

990

Talend Open Studio Components

ESB components
tESBProviderFault

Press the

icon to retrieve port name and operation name.

Talend Open Studio Components

991

ESB components
tESBProviderFault

Edit the input schema, response schema, and fault schema of the tESBComsumer_1 component.

992

Talend Open Studio Components

ESB components
tESBProviderFault

Stop the Provider Job. Set properties for the tLogRow_1 component.

Edit schema for the tLogRow_1 component.

Set the properties for the tLogRow_2 component.

Edit the Schema for the tLogRow_2 component.


Talend Open Studio Components 993

ESB components
tESBProviderFault

The Job can be run without errors.

Run the Scenario


Run the Scenario to view the Fault message. Run the Provider Job. In the execution log you will see: ... 2011-04-19 15:38:33.486:INFO::jetty-7.2.2.v20101205 2011-04-19 15:38:33.721:INFO::Started SelectChannelConnector@127.0.0.1:8088 web service [endpoint: http://127.0.0.1:8088/esb/provider] published Run the Consumer Job. In the execution log of the Job you will see: Starting job consumer at 15:39 19/04/2011. [statistics] connecting to socket on port 3850 [statistics] connected LOCAL_provider LOCAL_providerSoapBinding | {http://talend.org/esb/service/job}LOCAL_provider {http://talend.org/esb/service/job}LOCAL_providerSoapBind ing invoke [tLogRow_1] faultString: TestFaultTitle [tESBProviderFault_1] faultDetail: <?xml version="1.0" encoding="UTF-8"?> <response xmlns="http://talend.org/esb/service/job">Fault message text: Test error!</response> [statistics] disconnected Job consumer ended at 15:39 19/04/2011. [exit code=0] In the Providers log you will see the exception trace log: ... WARNING: Application {http://talend.org/esb/service/job}LOCAL_provider#{http:/ /talend.org/esb/service/job}invoke has thrown exception, unwinding now org.apache.cxf.binding.soap.SoapFault: TestFaultTitle [tESBProviderFault_1] ... It is expected because the Fault message is generated.

994

Talend Open Studio Components

ESB components
tESBProviderRequest

tESBProviderRequest
The tESBProviderRequest component should be used with the tESBProviderResponse component, to provide a Job result as response, in case of a request-response communication style.

tESBProviderRequest properties
Component family ESB

Function Purpose Basic settings

Wraps Talend Job as web service. Waits for a request message from a consumer and passes it to the next component. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. ESB Service Settings Keep listening Endpoint: specifies URL location where web service will be accessible for requests. Check this box when you want to ensure that the provider (and therefore Talend Job) will continue listening for requests after processing the first incoming request.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component covers the possibility that a Talend Job can be wrapped as a service, with the ability to input a request to a service into a job, and return the job result as a service response. A JDK is required for this component to operate.

Usage

Limitation

Scenario: Service sends a message without expecting a response


This project consists of two Jobs: a Comsumer Job and a Provider Job.

Setting up a Provider Job


This section shows how to set up a Provider Job as shown below.

Talend Open Studio Components

995

ESB components
tESBProviderRequest

Drop the following components from the Palette onto the design workspace: a tESBProviderRequest, three tLogRow, and two tJavaFlex. Connect the components as shown in the diagram. Set properties for the tESBProvider Request_1 component.

Edit the schema of the tESBProvider Request_1 component.

Set properties for the tLogRow_1 component.

996

Talend Open Studio Components

ESB components
tESBProviderRequest

Edit the Schema of the tLogRow_1 component.

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic properties tab as shown below. Document requestTalendDoc = row2.payload; org.dom4j.Document requestDoc = requestTalendDoc.getDocument(); org.dom4j.Element rootElement = requestDoc.getRootElement(); String text = rootElement.getTextTrim(); System.out.println("### " + text); row3.content = text;

Talend Open Studio Components

997

ESB components
tESBProviderRequest

Edit the schema of the tJavaFlex_1 component.

Set properties for the tLogRow_2 component

Edit the schema for the tLogRow_2 component.

998

Talend Open Studio Components

ESB components
tESBProviderRequest

Set properties for the tJavaFlex_2 component. Add the following code in the Main code field at the Basic properties tab as shown below. String name = row5 content String faultText = "Hello, " + name + "!"; org.dom4j.Document faultDoc = org.dom4j.DocumentHelper.createDocument(); faultDoc.addElement( new org.dom4j.QName("response", org.dom4j.Namespace.get("http://talend.org/esb/serv ice/job")) ).addText(faultText); Document faultTalendDoc = new Document(); faultTalendDoc.setDocument(faultDoc); row6.payload = faultTalendDoc;

Edit the schema of the tJavaFlex_2 component.

Set properties for the tLogRow_3 component.

Talend Open Studio Components

999

ESB components
tESBProviderRequest

Edit the schema of the tLogRow_3 component.

Save the Job.

Setting up a Consumer Job


This section shows how to set up a Consumer Job as shown below.

Drop the following components from the Palette onto the design workspace: a tESBConsumer, three tLogRow, two JavaFlex, and a tFixedFlowInput. Connect the components as shown in the diagram. Edit the schema of the tFixedFlowInput_1 component.

1000

Talend Open Studio Components

ESB components
tESBProviderRequest

Set properties for the tFixedFlowInput_1 component.

Set properties for the tLogRow_2 component.

Edit the schema of the tLogRow_2 component.

Talend Open Studio Components

1001

ESB components
tESBProviderRequest

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic Properties tab. org.dom4j.Document doc = org.dom4j.DocumentHelper.parseText(row5.payloadString); Document talendDoc = new Document(); talendDoc.setDocument(doc); row4.payload = talendDoc;

Edit the schema of the tJavaFlex_1 component.

Start the Provider Job. In the executing log you can see: ... web service [endpoint: http://127.0.0.1:8088/esb/provider] published ... Open the tESBConsumer_1 component.

1002

Talend Open Studio Components

ESB components
tESBProviderRequest

Open Service Configuration and put path to WSDL: http://127.0.0.1.8088/esbprovider?WSDL

Talend Open Studio Components

1003

ESB components
tESBProviderRequest

Press the

icon to retrieve port name and operation name.

Edit the input schema, response schema, and fault schema of the tESBComsumer_1 component.

1004

Talend Open Studio Components

ESB components
tESBProviderRequest

Set properties for the tJavaFlex_2 component. Add the following code in the Main code field at the Basic Properties tab. if (null != row6.faultDetail) { row7.payload = row6.faultDetail; }

Edit schema for the tJavaFlex_2 component.

Set properties for the tLogRow_3 component.

Edit schema for the tLogRow_3 component.

Talend Open Studio Components

1005

ESB components
tESBProviderRequest

Set the properties for the tLogRow_1 component.

Edit the Schema for the tLogRow_1 component.

Save the Job.

Run the Scenario


Run the Provider Job. In the execution log you will see: INFO: Setting the server's publish address to be http://127.0.0.1:8088/esb/provider 2011-04-21 14:14:36.793:INFO::jetty-7.2.2.v20101205 2011-04-21 14:14:37.856:INFO::Started SelectChannelConnector@127.0.0.1:8088 web service [endpoint: http://127.0.0.1:8088/esb/provider] published

1006

Talend Open Studio Components

ESB components
tESBProviderRequest

Run the Consumer Job. In the execution log of the Job you will see: Starting job CallProvider at 14:15 21/04/2011. [statistics] connecting to socket on port 3942 [statistics] connected TEST_ESBProvider2 TEST_ESBProvider2SoapBingding | [tLogRow_2] payloadString: <request>world</request> {http://talend.org/esb/service/job}TEST_ESBProvider2 {http://talend.org/esb/service/job}TEST_ESBProvider2SoapB inding invoke [tLogRow_1] payload: null [statistics] disconnected Job CallProvider2 ended at 14:16 21/04/2011. [exit code=0] In the Providers log you will see the trace log: web service [endpoint: http://127.0.0.1:8088/esb/provider] published [tLogRow_1] payload: <?xml version="1.0" encoding="UTF-8"?> <request>world</request> ### world [tLogRow_2] content: world [tLogRow_3] payload: <?xml version="1.0" encoding="UTF-8"?> <response xmlns="http://talend.org/esb/service/job">Hello, world!</response> web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished [statistics] disconnected Job ESBProvider2 ended at 14:16 21/04/2011. [exit code=0]

Talend Open Studio Components

1007

ESB components
tESBProviderResponse

tESBProviderResponse
The tESBProviderResponse component should only be used with the tESBProviderRequest component, to provide a Job result as response, for a web service provider, in case of a request-response communication style.

tESBProviderResponse properties
Component family ESB

Function Purpose Basic settings

Serves a Talend Job cycle result as a response message and a Talend Job cycle result as a Fault message of the Web service. Acts as a service provider response builder at the end of each Talend Job cycle. Schema and Edit schema A schema is a row description, i.e., it defines the nature and number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component works as a helper for the tESBProviderRequest component. A JDK is required for this component to operate.

Usage Limitation

Scenario: Return Hello world response


This project consists of two Jobs: a Comsumer Job and a Provider Job.

Setting up a Provider Job


This section shows how to set up a Provider Job as shown below.

1008

Talend Open Studio Components

ESB components
tESBProviderResponse

Drop the following components from the Palette onto the design workspace: a tESBProviderRequest, three tLogRow, two tJavaFlex, and a tESBProviderResponse. Connect the components as shown in the diagram. Set properties for the tESBProvider Request_1 component.

Edit the schema of the tESBProvider Request_1 component.

Set properties for the tLogRow_1 component.

Talend Open Studio Components

1009

ESB components
tESBProviderResponse

Edit the Schema of the tLogRow_1 component.

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic properties tab as shown below. Document requestTalendDoc = row2.payload; org.dom4j.Document requestDoc = requestTaldndDoc.getDocument(); org.dom4j.Element rootElement = requestDoc.getRootElement(); String text = rootElement.getTextTrim(); System.out.println("### " + text); row3.content = text;

1010

Talend Open Studio Components

ESB components
tESBProviderResponse

Edit the schema of the tJavaFlex_1 component.

Set properties for the tLogRow_2 component

Edit the schema for the tLogRow_2 component.

Talend Open Studio Components

1011

ESB components
tESBProviderResponse

Set properties for the tJavaFlex_2 component. Add the following code in the Main code field at the Basic properties tab as shown below. String name = row5 content String faultText = "Hello, " + name + "!"; org.dom4j.Document faultDoc = org.dom4j.DocumentHelper.createDocument(); faultDoc.addElement( new org.dom4j.QName("response", org.dom4j.Namespace.get("http://talend.org/esb/serv ice/job")) ).addText(faultText); Document faultTalendDoc = new Document(); faultTalendDoc.setDocument(faultDoc); row6.payload = faultTalendDoc;

Edit the schema of the tJavaFlex_2 component.

Set properties for the tLogRow_3 component.

1012

Talend Open Studio Components

ESB components
tESBProviderResponse

Edit the schema of the tLogRow_3 component.

Set Properties for the tESBProviderResponse_1 component.

Edit the schema of the tESBProviderResponse_1 component.

Save the Job.

Setting up a Consumer Job


This section shows how to set up a Consumer Job as shown below.

Talend Open Studio Components

1013

ESB components
tESBProviderResponse

Drop the following components from the Palette onto the design workspace: a tESBConsumer, three tLogRow, a JavaFlex, and a tFixedFlowInput. Connect the components as shown in the diagram. Edit the schema of the tFixedFlowInput_1 component.

Set properties for the tFixedFlowInput_1 component.

1014

Talend Open Studio Components

ESB components
tESBProviderResponse

Set properties for the tJavaFlex_1 component. Add the following code in the Main code field at the Basic Properties tab. org.dom4j.Document doc = org.dom4j.DocumentHelper.parseText(row1.payloadString); Document talendDoc = new Document(); talendDoc.setDocument(doc); row2.payload = talendDoc;

Edit the schema of the tJavaFlex_1 component.

Start the Provider Job. In the executing log you can see: ... web service [endpoint: http://127.0.0.1:8088/esb/provider] published ... Open the tESBConsumer_1 component.

Talend Open Studio Components

1015

ESB components
tESBProviderResponse

Open Service Configuration and put path to WSDL: http://127.0.0.1.8088/esbprovider?WSDL

1016

Talend Open Studio Components

ESB components
tESBProviderResponse

Press the

icon to retrieve port name and operation name.

Edit the input schema, response schema, and fault schema of the tESBComsumer_1 component.

Talend Open Studio Components

1017

ESB components
tESBProviderResponse

Set properties for the tJavaFlex_3 component. Add the following code in the Main code field at the Basic Properties tab. if (null != row4.faultDetail) { row6.faultDetailString = row4.faultDetail.getDocument().asXML() }

Edit schema for the tJavaFlex_3 component.

Set properties for the tLogRow_2 component.

1018

Talend Open Studio Components

ESB components
tESBProviderResponse

Edit schema for the tLogRow_2 component.

Set the properties for the tLogRow_3 component.

Edit the Schema for the tLogRow_3 component.

Save the Job.

Talend Open Studio Components

1019

ESB components
tESBProviderResponse

Run the Scenario


Run the Provider Job. In the execution log you will see: 2011-04-21 15:28:26.874:INFO::jetty-7.2.2.v20101205 2011-04-21 15:28:27.108:INFO::Started web service [endpoint: http://127.0.0.1:8088/esb/provider] published Run the Consumer Job. In the execution log of the Job you will see: Starting job ConsumerJob at 15:29 21/04/2011. [statistics] connecting to socket on port 3690 [statistics] connected TEST_ProviderJob LOCAL_ProviderJobSoapBinding | {http://talend.org/esb/service/job}TEST_ProviderJob {http://talend.org/esb/service/job}TEST_ProviderJobSoapBinding invoke [tLogRow_2] payload: <?xml version="1.0" encoding="UTF-8"?> <response xmlns="http://talend.org/esb/service/job">Hello, world!</response> [statistics] disconnected Job ConsumerJob ended at 15:29 21/04/2011. [exit code=0] In the Providers log you will see the trace log: W[tLogRow_1] payload: <?xml version="1.0" encoding="UTF-8"?> <request>world</request> ### world [tLogRow_2] content: world [tLogRow_3] payload: <?xml version="1.0" encoding="UTF-8"?> <response xmlns="http://talend.org/esb/service/job">Hello, world!</response> web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished [statistics] disconnected Job ProviderJob ended at 15:29 21/04/2011. [exit code=0].

1020

Talend Open Studio Components

File components
This chapter details the main components that you can find in File family of the Talend Open Studio Palette. The File family groups together components that read and write data in all types of files, from the most popular to the most specific format (in the Input and Output subfamilies). In addition, the Management subfamily groups together File-dedicated components that perform various tasks on files, including unarchiving, deleting, copying, comparing files and so on.

File components
tAdvancedFileOutputXML

tAdvancedFileOutputXML
tAdvancedFileOutputXML belongs to two component families: File and XML. For more information on tAdvancedFileOutputXML, see tAdvancedFileOutputXML on page 1568.

1022

Talend Open Studio Components

File components
tApacheLogInput

tApacheLogInput
tApacheLogInput properties
Component family File/Input

Function Purpose

tApacheLogInput reads the access-log file for an Apache HTTP server. tApachLogInput helps to effectively manage the Apache HTTP Server,. It is necessary to get feedback about the activity and performance of the server as well as any problems that may be occurring. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the context of tApacheLogInput usage, the schema is read-only. Built-in: You can create the schema and store it locally for this component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. File Name Name of the file and/or the variable to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Die on error Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Basic settings

Advanced settings

Encoding

tStatCatcher Statistics Usage

tApacheLogInput can be used with other components or as a standalone component. It allows you to create a data flow using a Row > Main connection, or to create a reject flow to filter specified data using a Row > Reject connection. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Talend Open Studio Components

1023

File components
tApacheLogInput

Limitation

n/a

Scenario: Reading an Apache access-log file


The following scenario creates a two-component Job, which aims at reading the access-log file for an Apache HTTP server and displaying the output in the Run log console. Drop a tApacheLogInput component and a tLogRow component from the Palette onto the design workspace. Right-click on the tApacheLogInput component and connect it to the tLogRow component using a Main Row link.

In the design workspace, select tApacheLogInput. Click the Component tab to define the basic settings for tApacheLogInput.

Set Property Type and Schema to Built-In. If desired, click the Edit schema button to see the read-only columns. In the File Name field, enter the file path or browse to the access-log file you want to read. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Press F6 to execute the Job.

1024

Talend Open Studio Components

File components
tApacheLogInput

The log lines of the defined file are displayed on the console.

Talend Open Studio Components

1025

File components
tCreateTemporaryFile

tCreateTemporaryFile
tCreateTemporaryFile properties
Component family File/Management

Function Purpose

tCreateTemporaryFile creates and manages temporary files. tCreateTemporaryFile helps to create a temporary file and puts it in a defined directory. This component allows you to either keep the temporary file or delete it after job execution. Remove file when execution is over Use default temporary system directory Directory Template Suffix Select this check box to delete the temporary file after job execution. Select this check box to create the file in the systems default temporary directory. Select this check box to create the temporary file . Enter a name to the temporary file respecting the template. Enter the filename extension to indicate the file format you want to give to the temporary file.

Basic settings

Usage

tCreateTemporaryFile provides the possibility to manage temporary files so that the memory can be freed for other ends and thus optimizes system performance. Filepath: Retrieves the path to where the file was created. This is available as an After variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Connections

Outgoing links (from one component to another): Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

1026

Talend Open Studio Components

File components
tCreateTemporaryFile

Scenario: Creating a temporary file and writing data in it


The following scenario describes a simple Job that creates an empty temporary file in a defined directory, writes data in it and deletes it after job execution. Drop the following components from the Palette onto the design workspace: tCreate temporaryFile, tRowGenerator, tFileOutputDelimited, tFileInputDelimited and tLogRow. Connect tCreateTemporaryFile to tRowGenerator using a SubjobOk link. Connect tRowGenerator to tFileOutputDelimited using a Row Main link. Connect tRowGenerator to tFileInputDelimited using a SubjobOk link. Connect tFileInputDelimited to tLogRow using a Row Main link.

In the design workspace, select tCreateTemporaryFile. Click the Component tab to define the basic settings for tCreateTemporaryFile.

Select the Remove file when execution is over check box to delete the created temporary file when job execution is over. Click the three-dot button next to the Directory field to browse to the directory where temporary files will be stored, or enter the path manually.

Talend Open Studio Components

1027

File components
tCreateTemporaryFile

In the Template field, enter a name for the temporary file respecting the template format. In the Suffix field, enter a filename extension to indicate the file format you want to give to the temporary file. In the design workspace, select tRowGenerator and click the Component tab to define its basic settings.

Set the Schema to Built-In. Click the Edit schema three-dot button to define the data to pass on to the tFileOutputDelimited component, one column in this scenario, value.

Click OK to close the dialog box. Click the RowGenerator Editor three-dot button to open the editor dialog box.

In the Number of Rows for Rowgenerator field, enter 5 to generate five rows and click Ok to close the dialog box. In the design workspace, select tFileOutputDelimited and click the Component tab to define its basic settings.

1028

Talend Open Studio Components

File components
tCreateTemporaryFile

Set Property Type to Built-In. Click in the File Name field and use the Ctrl+Space bar combination to access the variable completion list. To output data in the created temporary file, select tCreateTemporaryFile_1.FILEPATH on the global variable list. Set the row and field separators in their corresponding fields as needed. Set Schema to Built-In and click Sync columns to synchronize input and output columns. Note that the row connection feeds automatically the output schema. For more information about schema types, see How to set a built-in schema and How to set a repository schema of Talend Open Studio User Guide. In the design workspace, select the tFileInputDelimited component. Click the Component tab to define the basic settings of tFileInputDelimited.

Set property type to Built-in. Click in the File Name field and use the Ctrl+Space bar combination to access the variable completion list. To read data in the created temporary file, select tCreateTemporaryFile_1.FILEPATH on the global variable list. Set the row and field separators in their corresponding fields as needed. Set Schema to Built in and click Edit schema to define the data to pass on to the tLogRow component. The schema consists of one column here, value. Save the Job and press F6 to execute the Job.

Talend Open Studio Components

1029

File components
tCreateTemporaryFile

The temporary file is created in the defined directory during job execution and the five generated rows are written in it. The temporary file is deleted when job execution is over.

1030

Talend Open Studio Components

File components
tChangeFileEncoding

tChangeFileEncoding

tChangeFileEncoding Properties
Component family File/Management

Function Purpose Basic settings

tChangeFileEncoding changes the encoding of a given file. tChangeFileEncoding transforms the character encoding of a given file and generates a new file with the transformed character encoding. Use Custom Input Encoding Select this check box to customize input encoding type. When it is selected, a list of input encoding types appears, allowing you to select an input encoding type or specify an input encoding type by selecting CUSTOM. From this list of character encoding types, you can select one of the offered options or customize the character encoding by selecting CUSTOM and specifying a character encoding type. Path of the input file. Path of the output file.

Encoding

Input File Name Output File Name Usage Limitation

This component can be used as standalone component. n/a

Scenario: Transforming the character encoding of a file.


This Java scenario describes a very simple Job that transforms the character encoding of a text file and generates a new file with the new character encoding. Drop a tChangeFileEncoding component onto the design workspace.

Double-click the tChangeFileEncoding component to display its Basic settings view.

Talend Open Studio Components

1031

File components
tChangeFileEncoding

Select Use Custom Input Encoding check box. Set the Encoding type to GB2312. In the Input File Name field, enter the file path or browse to the input file. In the Output File Name field, enter the file path or browse to the output file. Select CUSTOM from the second Encoding list and enter UTF-16 in the text field. Press F6 to execute the Job.

The encoding type of the file in.txt is transformed and out.txt is generated with the UTF-16 encoding type.
1032 Talend Open Studio Components

File components
tFileArchive

tFileArchive

tFileArchive properties
Component Family File/Management

Function Purpose Basic settings

The tFileArchive zips one or several files according to the parameters defined and places the archive created in the directory selected. This component zips one or several files for processing. Directory Path where the zipped file will be created. Subdirectories (in Perl, this field is Include subdirectories): Select this check box if the selected directory contains subfolders. Destination path and name of the archive file. Select the compression level you want to apply. Best: the compression quality will be optimum, but the compression time will be long. Normal: compression quality and time will be average. Fast: compression will be fast, but quality will be lower. Select this check box if you want all files in the directory to be zipped. Clear it to specify the file(s) you want to zip in the Files table. Filemask: type in a file name or a file mask using a special character or a regular expression. This check box is selected by default. It creates a destination folder for the output table if it does not already exist. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. This check box is selected by default. This allows you to save an archive by replacing the existing one. But if you clear the check box, an error is reported, the replacement fails and the new archive cannot be saved. When the replacement fails, the Job runs. Encrypt files Java only Select this check box if you want your archive to be password protected. The Enter Password text box appears to let you enter your password. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Archive file Compress level

All files

Java only

Create directory if not exists Encoding

Java only

Overwrite Existing Archive

Advanced settings

tStatCatcher Statistics

Talend Open Studio Components

1033

File components
tFileArchive

Usage Global Variables

This component must be used as a standalone component. Archive File Path: Retrieves the path to the archive file. This is available as an After variable. Returns a string. Archive File Name: Retrieves the name of the archive file. This is available as an After variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Zip files using a tFileArchive


This scenario creates a Job with a unique component. It aims at zipping files and recording them in the selected directory. Drop the tFileArchive component from the Palette onto the workspace. Double-click it to display its Component view.

In the Directory field, click the [...] button, browse your directory and select the directory or the file you want to compress.

1034

Talend Open Studio Components

File components
tFileArchive

Select the Subdirectories check box if you want to include the subfolders and their files in the archive. Then, set the Archive file field, by filling the destination path and the name of your archive file. Select the Create directory if not exists check box if you do not have a destination directory yet and you want to create it. In the Compress level list, select the compression level you want to apply to your archive. In this example, we use the normal level. Clear the All Files check box if you only want to zip specific files.

Add a row in the table by clicking the [+] button and click the name which appears. Between two star symbols (ie. *RG*), type part of the name of the file that you want to compress. Press F6 to execute your Job. The tFileArchive has compressed the selected file(s) and created the folder in the selected directory.

Talend Open Studio Components

1035

File components
tFileCompare

tFileCompare
tFileCompare properties
Component family File/Management

Function Purpose Basic settings

Compares two files and provides comparison data (based on a read-only schema) Helps at controlling the data quality of files being processed. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository but in this case, the schema is read-only. Filepath to the file to be checked. Filepath to the file, the comparison is based on. Type in a message to be displayed in the Run console based on the result of the comparison.

File to compare Reference file If differences are detected, display If no difference detected, display Print to console Advanced settings Encoding

Select this check box to display the message. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

tStatCatcher Statistics Usage Global Variables

This component can be used as standalone component but it is usually linked to an output component to gather the log data. Difference: Checks whether two files are identical or not. This is available as a Flow variable. Returns a boolean value: - true if the two files are identical. - false if there is a differnce between them. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

1036

Talend Open Studio Components

File components
tFileCompare

Connections

Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Comparing unzipped files


This scenario describes a Job unarchiving a file and comparing it to a reference file to make sure it didnt change. The output of the comparison is stored into a delimited file and a message displays in the console.

Drag and drop the following components: tFileUnarchive, tFileCompare, and tFileOutputDelimited. Link the tFileUnarchive to the tFileCompare with Iterate connection. Connect the tFileCompare to the output component, using a Main row link. In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip. In the Extraction Directory field, fill in the destination folder for the unarchived file. In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to display the list of global variables. Select $_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or "((String)globalMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the language you work with, to fetch the file path from the tFileUnarchive component.

Talend Open Studio Components

1037

File components
tFileCompare

And set the Reference file to base the comparison on it. In the messages fields, set the messages you want to see in case the files differ or in case the files are identical, for example: '[job '.$_globals{job_name}.'] Files differ' if you work with Perl or "[job " + jobName + "] Files differ" if you work in Java. Select the Print to Console check box, for the message defined to display at the end of the execution. The schema is read-only and contains standard information data. Click Edit schema to have a look to it.

Then set the output component as usual with semi-colon as data separators. Save your Job and press F6 to run it.

The message set is displayed to the console and the output shows the schema information data.

1038

Talend Open Studio Components

File components
tFileCopy

tFileCopy
tFileCopy Properties
Component family File/Management

Function Purpose Basic settings

Copies a source file into a target directory and can remove the source file if required. Helps to streamline processes by automating recurrent and tedious tasks such as copy. File Name Destination Remove source file Replace existing file Path to the file to be copied or moved Path to the directory where the file is copied/moved to. Select this check box to move the file to the destination. Select this check box to overwrite any existing file with the newly copied file. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

tStatCatcher Statistics

Usage Global Variables

This component can be used as standalone component. Destination File Name: Retrieves the name of the destination file. This is available as an After variable. Returns a string. Destination File Path: Retrieves the path to the destination file. This is available as an After variable. Returns a string. Source Directory:.Retrieves the path to the source directory. This is available as an After variable. Returns a string. Destination Directory: Retrieves the path to the destination directory. This is available as an After variable. Returns a stirng. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Talend Open Studio Components

1039

File components
tFileCopy

Connections

Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Restoring files from bin


This scenario describes a Job that iterates on a list of files, copies each file from the defined source directory to a target directory. It then removes the copied files from the source directory.

Drop a tFileList and a tFileCopy from the Palette to the design workspace. Link both components using an Iterate link. In the tFileList Basic settings, set the directory for the iteration loop.

Set the Filemask to *.txt to catch all files with this extension. For this use case, the case is not sensitive. Then select the tFileCopy to set its Basic settings.

1040

Talend Open Studio Components

File components
tFileCopy

In the File Name field, press Ctrl+Space bar to access the list of variables. Select the global variable ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) if you work in Java, or $_globals{tFileList_1}{CURRENT_FILEPATH} if you work in Perl. This way, all files from the source directory can be processed. Select the Remove Source file check box to get rid of the file that have been copied. Select the Replace existing file check box to overwrite any file possibly present in the destination directory. Save your Job and press F6. The files are copied onto the destination folder and are removed from the source folder.

Talend Open Studio Components

1041

File components
tFileDelete

tFileDelete
tFileDelete Properties
Component family File/Management

Function Purpose Basic settings Advanced settings

Suppresses a file from a defined directory. Helps to streamline processes by automating recurrent and tedious tasks such as delete. File Name tStatCatcher Statistics Path to the file to be deleted. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage Global Variables

This component can be used as standalone component. Delete path: Returns the path to the location from which the item was deleted. This is available as an After variable. Returns a string. Current Status: Indicates whether an item has been deleted or not. This is available as a Flow variable. Returns a string and the delete command label. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Deleting files


This very simple scenario describes a Job deleting files from a given directory.

1042

Talend Open Studio Components

File components
tFileDelete

Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design workspace. In the tFileList Basic settings, set the directory to loop on in the Directory field.

The filemask is *.txt and no case check is to carry out. In the tFileDelete Basic settings panel, set the File Name field in order for the current file in selection in the tFileList component be deleted. This delete all files contained in the directory, as specified earlier.

press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to collect the current file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")). Then in the tJava component, define the message to be displayed in the standard output (Run console). In this Java use case, type in the Code field, the following script: System.out.println( ((String)globalMap.get("tFileList_1_CURRENT_FILE")) + " has been deleted!" ); Then save your Job and press F6 to run it.

Talend Open Studio Components

1043

File components
tFileDelete

The message set in the tJava component displays in the log, for each file that has been deleted through the tFileDelete component.

1044

Talend Open Studio Components

File components
tFileExist

tFileExist
tFileExist Properties
Component family File/Management

Function Purpose Basic settings Advanced settings

tFileExist checks if a file exists or not. tFileExists helps to streamline processes by automating recurrent and tedious tasks such as checking if a file exists. File Name tStatCatcher Statistics Path to the file you want to check if it exists or not. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage Global Variables

This component can be used as standalone component. Exists: Indicates whether a specified file exists or not. This is available as a Flow variable Returns a boolean value: - true if the file exists. - false if the file does not exist. File Name: Retrieves the name and path to a file. This is available as an After variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Talend Open Studio Components

1045

File components
tFileExist

Scenario: Checking for the presence of a file and creating it if it does not exist
This scenario describes a simple Job that: checks if a given file exists, displays a graphical message to confirm that the file does not exist, reads the input data in another given file and writes it in an output delimited file. Drop the following components from the Palette onto the design workspace: tFileExist, tFileInputDelimited, tFileOutputDelimited, and tMsgBox. Connect tFileExist to tFile InputDelimited using an OnSubjobOk and to tMsgBox using a Run If link.

Connect tFileInputDelimited to tFileOutputDelimite using a Row Main link. In the design workspace, select tFileExist and click the Component tab to define its basic settings.

In the File name field, enter the file path or browse to the file you want to check if it exists or not. In the design workspace, select tFileInputDelimited and click the Component tab to define its basic settings.

1046

Talend Open Studio Components

File components
tFileExist

Browse to the input file you want to read to fill out the File Name field. Set the row and field separators in their corresponding fields. Set the header, footer and number of processed rows as needed. In this scenario, there is one header in our table. Set Schema to Built-in and click the Edit schema button to define the data to pass on to the tFileOutputDelimited component. Define the data present in the file to read, file2 in this scenario. For more information about schema types, see How to set a built-in schema and How to set a repository schema of Talend Open Studio User Guide.

The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax. In the design workspace, select the tFileOutputDelimited component. Click the Component tab to define the basic settings of tFileOutputDelimited.

Talend Open Studio Components

1047

File components
tFileExist

Set property type to Built-in. In the File name field, press Ctrl+Space to access the variable list and select the global variable FILENAME. Set the row and field separators in their corresponding fields. Select the Include Header check box as file2 in this scenario includes a header. Set Schema to Built-in and click Sync columns to synchronize the output file schema (file1) with the input file schema (file2).

In the design workspace, select the tMsgBox component. Click the Component tab to define the basic settings of tMsgBox.

Click the If link to display its properties in the Basic settings view.

1048

Talend Open Studio Components

File components
tFileExist

In the Condition panel, press Ctrl+Space to access the variable list and select the global variable EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.

Save your Job and press F6 to execute it.

A dialog box appears to confirm that the file does not exists. Click OK to close the dialog box and continue the job execution process. The missing file, file1 in this scenario, got written in a delimited file in the defined place.

Talend Open Studio Components

1049

File components
tFileInputARFF

tFileInputARFF

tFileInputARFF properties
Component Family File/Input

Function Purpose

tFileInputARFF reads a ARFF file row by row, with simple separated fields. This component opens a file and reads it row by row, in order to divide it in fields and to send these fields to the next component, as defined in the schema, through a Row connection. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the components Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. File Name Name and path of the ARFF file and/or variable to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

Schema and Edit Schema

Advanced settings

Encoding

Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the processing metadata at the Job level as well as at each component level.

tStatCatcher Statistics

1050

Talend Open Studio Components

File components
tFileInputARFF

Usage

Use this component to read a file and separate the fields with the specified separator.

Scenario: Display the content of a ARFF file


This scenario describes a two-component Job in which the rows of an ARFF file are read, the delimited data is selected and the output is displayed in the Run view. An ARFF file looks like the following:

It is generally made of two parts. The first part describes the data structure, that is to say the rows which begin by @attribute and the second part comprises the raw data, which follows the expression @data.

Drop the tFileInputARFF component from the Palette onto the workspace. In the same way, drop the tLogRow component. Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link to the tLogRow, and click it. The link is createde appears. Double-click the tFileInputARFF. In the Component view, in the File Name field, browse your directory in order to select your .arff file. In the Schema field, select Built-In.
Talend Open Studio Components 1051

File components
tFileInputARFF

Click the [...] button next to Edit schema to add column descriptions corresponding to the file to be read. Click on the button as many times as required to create the number of columns required, according to the source file. Name the columns as follows.

For every column, the String check box is selected by default. Leave the check boxes selected, for all of the columns. Click OK. In the workspace, double-click the tLogRow to display its Component view. Click the [...] button of the Edit schema field to check that the schema has been propagated. If not, click the Sync columns button.

1052

Talend Open Studio Components

File components
tFileInputARFF

Use the default settings. Press F6 to execute your Job.

The console displays the data contained in the ARFF file, delimited using a vertical line (the default separator).

Talend Open Studio Components

1053

File components
tFileInputDelimited

tFileInputDelimited
tFileInputDelimited properties
Component family File/Input

Function Purpose Basic settings

tFileInputDelimited reads a given file row by row with simple separated fields. Opens a file and reads it row by row to split them up into fields then sends fields as defined in the Schema to the next job component, via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name/Stream File name: Name and path of the file to be processed. Stream: The data flow to be processed. The data must be added to the flow in order to be fetched by the tFileInputDelimited via the INPUT_STREAM variable in the auto-completion list (Ctrl+Space) Related topic:How to define variables from the Component view of Talend Open Studio User Guide Row separator Field separator CSV options Header Footer Limit Schema and Edit Schema String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to include CSV specific parameters such as Escape char and Text enclosure. Number of rows to be skipped in the beginning of file. Number of rows to be skipped at the end of the file. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema in the Talend Open Studio User Guide.

1054

Talend Open Studio Components

File components
tFileInputDelimited

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Skip empty rows Uncompress as zip file Die on error Select this check box to skip empty rows. Select this check box to uncompress the input file. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals.

Advanced settings

Advanced separator (for numbers)

Extract lines at random Select this check box to set the number of lines to be extracted randomly. Encoding Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to remove leading and trailing whitespace from all columns. Select this check box to synchronize every row against the input schema. Select the check box next to the column name you want to remove leading and trailing whitespace from. Select this check box to split rows before splitting fields. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Trim all column Check each row structure against schema Check columns to trim Split row before field tStatCatcher Statistics Usage

Use this component to read a file and separate fields contained in this file using a defined separator. It allows you to create a data flow using a Row > Main link or via a Row > Reject link in which case the the data is filtered by data that doesnt correspond to the type defined. For further information, please see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Scenario: Delimited file content display


The following scenario creates a two-component Job, which aims at reading each row of a file, selecting delimited data and displaying the output in the Run log console.

Drop a tFileInputDelimited component from the Palette to the design workspace.

Talend Open Studio Components

1055

File components
tFileInputDelimited

Drop a tLogRow component the same way. Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the tLogRow component and release when the plug symbol shows up. Select the tFileInputDelimited component again, and define its Basic settings:

Fill in a path to the file in the File Name field. This field is mandatory. Define the Row separator allowing to identify the end of a row. Then define the Field separator used to delimit fields in a row. In this scenario, the header and footer limits are not set. And the Limit number of processed rows is set on 50. Set the Schema as either a local (Built-in) or a remotely managed (Repository) to define the data to pass on to the tLogRow component. You can load and/or edit the schema via the Edit Schema function. Related topics: How to set a built-in schema and How to set a repository schema of Talend Open Studio User Guide. As selected, the empty rows will be ignored. Enter the encoding standard the input file is encoded in. This setting is meant to ensure encoding consistency throughout all input and output files. Select the tLogRow and define the Field separator to use for the output display. Related topic: tLogRow on page 1305. Select the Print schema column name in front of each value check box to retrieve the column labels in the output displayed. Go to Run tab, and click on Run to execute the Job. The file is read row by row and the extracted fields are displayed on the Run log as defined in both components Basic settings.

1056

Talend Open Studio Components

File components
tFileInputDelimited

The Log sums up all parameters in a header followed by the result of the Job.

Scenario 2: Reading data from a remote file in streaming mode


This scenario describes a four component job used to fetch data from a voluminous file almost as soon as it has been read. The data is displayed in the Run view. The advantage of this technique is that you do not have to wait for the entire file to be downloaded, before viewing the data.

Drop the following components onto the workspace: tFileFetch, tSleep, tFileInputDelimited, and tLogRow. Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and connect tFileInputDelimited to tLogRow using a Row > Main link. Double-click tFileFetch to display the Basic settings tab in the Component view and set the properties.

From the Protocol list, select the appropriate protocol to access the server on which your data is stored.

Talend Open Studio Components

1057

File components
tFileInputDelimited

In the URI field, enter the URI required to acces the server on which your file is stored. Select the Use cache to save the resource check box to add your file data to the cache memory. This option allows you to use the streaming mode to transfer the data. In the workspace, click on tSleep to display the Basic settings tab in the Component view and set the properties. By default, tSleeps Pause field is set to 1 second. Do not change this setting. It pauses the second Job in order to give the first Job, containing tFileFetch, the time to read the file data. In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the Component view and set the properties.

In the File name/Stream field: - Delete the default content - Press Ctrl+Space to view the variables available for this component. - Select tFileFetch_1_INPUT_STREAM from the auto-completion list, to add the following variable to the Filename field: ((java.io.InputStream)globalMap.get("tFileFetch_1_INPUT_STRE AM")). From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the structure of the file that you want to fetch. The US_Employees file is composed of six colonnes: ID, Employee, Age, Address, State, EntryDate.

Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.

1058

Talend Open Studio Components

File components
tFileInputDelimited

In the workspace, double-click tLogRow to display its Basic settings in the Component view and set its properties. Click Edit schema and ensure that the schema has been fetched from the preceding component. If it hasnt, click Sync Columns to fetch it. Click on the Job tab and then on the Extra view.

Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear in mind that the second Job has a one second delay according to the properties set in tSleep. This option allows you to fetch the data almost as soon as it is read by tFileFetch, thanks to the tFileDelimited component. Save the Job and press F6 to run it.

The data is displayed in the console as almost as soon as it is read.

Talend Open Studio Components

1059

File components
tFileInputEBCDIC

tFileInputEBCDIC

This component requires a JDK Sun to be functional.

tFileInputEBCDIC properties
Component family File/Input

Function Purpose

tFileInputEBCDIC reads an EBCDIC file and extracts data depending on the selected schema. tFileInputEBCDIC opens a file and reads it in order to separate the data, based on the file structure description (schemas), and to send the file data and metadata to the next job component(s), via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Schema(s) Data file Xc2j file Add the various schemas to output to the next job component(s). Select the EBCDIC file containing the data to be processed. Select the xc2j file, transforming the EBCDIC schema(s) into an intermediary XML file. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Basic settings

Advanced settings Usage

tStatCatcher Statistics

Use this component to read an EBCDIC file and to output the data separately depending on the schemas identified in the file.

Scenario: Extracting data from an EBCDIC file and populating a database


This scenario uses the [Copybook Connection] wizard that guides users through the different steps necessary to create a Copybook connection and to retrieve the EBCDIC schemas. This wizard is available only for Talend Integration Suite users. If you are using Talend Open Studio or Talend On Demand, you need to set the basic settings for the tFileInputEBCDIC component manually.

The following scenario is a four-component Job that aims at: reading an EBCDIC file which contains information concerning clients and their financial transactions, extracting and transforming this data, and finally creating two tables in a database, based on the two schemas, clients and transactions, extracted from the original EBCDIC file.

1060

Talend Open Studio Components

File components
tFileInputEBCDIC

This Java scenario uses the EBCDIC Connection wizard to set up a connection to the Copybook file and to generate an xc2j file, which allows the retrieval and transformation of the different file schemas. Create a connection to the Copybook file, which describes the structure of your EBCDIC file. In this scenario, the Copybook connection is called EBCDIC. Retrieve the file schemas. Once the Copybook connection has been created and the schemas retrieved, using the EBCDIC and Schema wizards, the new schemas appear under the node Metadata > Copybook. They are called Schema01, Schema04 and Schema05.

In order to retrieve the different file structures and to use them in Talend Open Studio: Drop schema 01 from the Repository tree view to the design workspace. This automatically creates the tFileInputEBCDIC input component. Drop the tMysqlOutput component from the Palette to the design workspace. Double-click tFileInputEBCDIC to display the Basic settings view, then define the component properties:

Talend Open Studio Components

1061

File components
tFileInputEBCDIC

The metadata is automatically defined in the Property Type, Schema(s), Data file and Xc2j file fields. The Property Type field shows which metadata has been used for the component. The Schema field shows which schema will be transmitted to the following component. The Data file field shows the path to the file that holds the EBCDIC data. The Xc2j file field shows the path to the file which enables to extract the schema describing the EBCDIC file structure. If you are in Built-In mode, you have to fill these fields manually. In the design workspace, right-click tFileInputEBCDIC, select Row > row_Schema01_1 from the menu, then click tMysqlOutput to connect the components together. Double-click tMysqlOutput to display the Basic settings view, then define the component properties.

In the Property Type list, select Repository and click the button [...]. Select the database connection you want to use, which is centralized in the metadata of the Repository. The Host, Port, Database, Username and Password fields are automatically filled. If you are in Built-In mode, you have to fill these fields manually. In the Table field, enter the name of the table to be created, which will contain the data extracted from the EBCDIC file.
1062 Talend Open Studio Components

File components
tFileInputEBCDIC

In the Action on table field, select the option Create table. At this stage, the Job retrieves the schema Schema01 from the EBCDIC file and transfers it, as well as the corresponding data, to the database. We now need to retrieve, from the EBCDIC file, the schema 04 and its data, then transform and transmit the data to the same database. To do this: Drop the tMap and tMysqlOutputBulkExec components to the design workspace. Double-click the tFileInputEBCDIC to display the Basic settings view, then define the component properties.

In the Schema(s) field, click the plus button to add a line. Click in this line and then click the three-dot button that displays to open a dialog box. Select the Create schema from repository button to retrieve the schema defined in the EBCDIC metadata, then select Shema04 from the drop-down list. Click OK to close the dialog box. If you did not retrieve the schema from the Repository tree view, select Create schema for built-in and manually enter the name and description of your schema. The two schemas Shema01 and Schema04 appear in the Schema(s) field of the tFileInputEBCDIC component. In order to connect these two components, right-click tFileInputEBCDIC, select Row > row_Schema04_1 in the menu and click the tMap component. Then right-click tMap, drag a link over to tMysqlOutputBulkExec and release the right-click button. In the dialog box that opens up, fill in the name of the ebcdic_04 output file. Double-click tMap to open up the tMap Editor.

Talend Open Studio Components

1063

File components
tFileInputEBCDIC

Select all the columns from the row_Schema04_1 table and drag them towards the ebcdic_04 table. In the table ebcdic_04, located in the Schema editor area at the bottom of the editor, click the plus button to add a column to the schema. Name this column SUM_AG_NUMBER. In the table row_Schema04_1, to the left of the editor, press Ctrl and select the CC01404_L_11_MENAG_1_1 and CC01404_AG_CAM_1_1 columns. Drag them to the new column SUM_AG_NUMBER in table ebcdic_04. Add the sign + between the two concatenated columns so that you have: row_04_1.CC01404_L_11_MENAG_1_1 + row_04_1.CC01404_AG_CAM_1_ 1. Click OK to validate your changes and close the editor. In the design workspace, double-click tMysqlOutputBulkExec to display the Basic settings view, then define the component properties:

1064

Talend Open Studio Components

File components
tFileInputEBCDIC

In the Property Type list, select Repository and click the three-dot button to display a dialog bow where you can select the database connection you want to use, which is centralized in the Metadata folder of the Repository tree view. The Host, Port, Database, Username and Password fields are automatically filled. If you are in Built-In mode, you have to fill these fields manually. In the Table field, enter the name of the table to be created, which will contain the data extracted from the EBCDIC file. In the Action on table field, select the option Create table. Press Ctrl+S to save your Job and click the Run view. Select the Statistics and Exec time check boxes, then click Run to execute the Job. The two tables are created in the database. They contain the structure, as well as the clients and transaction data, from the original EBCDIC file.

Talend Open Studio Components

1065

File components
tFileInputExcel

tFileInputExcel
tFileInputExcel properties
Component family File/Input

Function Purpose

tFileInputExcel reads an Excel file (.xls or .xlsx) and extracts data line by line. tFileInputExcel opens a file and reads it row by row to split data up into fields using regular expressions. Then sends fields as defined in the schema to the next component in the job via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. File Name/Stream File name: Name of the file and/or the variable to be processed. Stream: Data flow to be processed. The data must be added to the flow in order to be collected by tFileInputExcel via the INPUT_STREAM variable in the auto-completion list (Ctrl+Space). Related topic: How to define variables from the Component view of Talend Open Studio User Guide. All sheets Sheet list Select this check box to process all sheets of the Excel file. Click the plus button to add as many lines as needed to the list of the excel sheets to be processed: Sheet (name or position): enter the name or position of the excel sheet to be processed. Use Regex: select this check box if you want to use a regular expression to filter the sheets to process. Number of records to be skipped in the beginning of the file. Number of records to be skipped at the end of the file. Maximum number of lines to be processed. Select this check box if you want to apply the parameters set in the Header and Footer fields to all excel sheets to be processed.

Basic settings

Header Footer Limit Affect each sheet(header&footer)

1066

Talend Open Studio Components

File components
tFileInputExcel

Die on error

Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. Define the range of the columns to be processed through setting the first and last columns in the First column and Last column fields respectively. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

First column and Last column Schema and Edit Schema

Advanced settings

Advanced separator Trim all columns

Select this check box to change the used data separators. Select this check box to remove the leading and trailing whitespaces from all columns. When this check box is cleared, the Check column to trim table is displayed, which lets you select particular columns to trim. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to read numbers in real values. Select this check box to ignore empty lines. Select this check box to in order not to validate data. Select this check box to ignore all warnings generated to indicate errors in the Excel file.

Encoding

Read real values for numbers Stop to read on empty rows Dont validate the cells Ignore the warning

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

Talend Open Studio Components

1067

File components
tFileInputExcel

Usage

Use this component to read an Excel file and to output the data separately depending on the schemas identified in the file. You can use a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Related scenarios
No scenario is available for this component yet.

1068

Talend Open Studio Components

File components
tFileInputFullRow

tFileInputFullRow
tFileInputFull Row properties
Component family File/Input

Function Purpose Basic settings

tFileInputFullRow reads a given file row by row. tFileInputFullRow opens a file and reads it row by row and sends complete rows as defined in the Schema to the next job component, via a Row link. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected to tFileInputFullRow. Name of the file and/or the variable to be processed Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Row separator Header Footer Limit Skip empty rows Die on error String (ex: \non Unix) to separate rows. Number of rows to be skipped at the beginning of a file Number of rows to be skipped at the end of a file. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. Select this check box to skip empty rows. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

File Name

Advanced settings

Encoding

Extract lines at random Select this check box to set the number of lines to be extracted randomly. tStatCatcher Statistics Usage Select this check box to gather the job processing metadata at a job level as well as at each component level.

Use this component to read full rows in delimited files that can get very large. You can also create a rejection flow using a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Talend Open Studio Components

1069

File components
tFileInputFullRow

Scenario: Reading full rows in a delimited file


The following scenario creates a two-component Job that aims at reading complete rows in a file and displaying the output in the Run log console. Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace. Right-click on the tFileInputFullRow component and connect it to tLogRow using a Row Main link.

In the design workspace, select tFileInputFullRow. Click the Component tab to define the basic settings for tFileInputFullRow.

In the Basic settings view, set Schema to Built-In. Click the three-dot [...] button next to the Edit schema field to see the data to pass on to the tLogRow component. Note that the schema is read-only and it consists of one column, line.

Fill in a path to the file to process in the File Name field, or click the three-dot [...] button. This field is mandatory. In this scenario, the file to read is test5. It holds three rows where each row consists of tow fields separated by a semi colon.

1070

Talend Open Studio Components

File components
tFileInputFullRow

Define the Row separator used to identify the end of a row. Set the Header to 1, in this scenario the footer and the number of processed rows are not set. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

tFileInputFullRow reads the three rows one by one ignoring field separators, and the complete rows are displayed on the Run console.
To extract only fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, and tExtractRegexFields. For more information, see tExtractDelimitedFields on page 1413, tExtractPositionalFields on page 1418 and tExtractRegexFields on page 1420.

Talend Open Studio Components

1071

File components
tFileInputJSON

tFileInputJSON
tFileInputJSON properties
Component Family File

Function Purpose

The tFileInputJSON reads a JSON file and extracts data according to the selected schema. This component opens a file and reads it in order to isolate data according to the schemas which describe this file structure, and to send the data and schemas to the next component(s), via a Row connection. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Use URL Select this check box to retrieve data directly from the Web. URL: type in the URL path from which you will retrieve data. Name of the file from which you will retrieve data. Column: shows the schema as defined in the Schema editor. JSONPath Query: Type in the fields to extract from the JSON input structure. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals.

Basic settings

FIlename Mapping

Advanced settings

Advanced separator (for numbers)

1072

Talend Open Studio Components

File components
tFileInputJSON

Encoding

Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Use this component to read a JSON file and separate data according to the identified schemas in this file. n/a

Scenario: Extracting data from the fields of a JSON format file


This is a 2 component scenario which involves reading a JSON file, and extracting its data.

Drag and drop a tFileInputJSON component from the File family and a tLogRow from the Logs & Errors family from the Palette onto the Job designer. Link the components using a Main > Row connection. Double-click the tFileInputJSON component to set its properties in the Basic settings, in the Component view:

If your schema is already stored under the Db Connections node in the Repository, select Repository in the Schema Type field, and choose the metadata from the list. If you have not defined a schema yet, select the Built-in mode, type in manually the connection details, and the data structure of a schema.

Talend Open Studio Components

1073

File components
tFileInputJSON

Click the [...] button of the Edit schema field to open a dialog box in which you will define the output schema to be displayed. Click OK to close the dialog box. In the Mapping table, the items in the Column field are automatically filled in according to the schema you just defined.In this example, the schema is made of four columns: FirstName, LastName, Address and City. In the Filename field, fill in the path to the JSON file from which you want to retrieve data. If your data are stored on the internet, select the Use URL check box, and then, in the same way, fill in the access URL to the file to be processed. In this example, the processed file is presented as follows:

In the Mapping table, the rows in the Column field are already filled in. For each of them, type in the tree view level in which retrieve data, in the JSONPath query field. In the Job designer, double-click the tLogRow to set its properties in the Basic settings tab, in the Component view.

Click Sync Columns button to retrieve the schema of the previous component. Save your Job and press F6 to execute it.

1074

Talend Open Studio Components

File components
tFileInputJSON

The Job returns the customer information according to the parameters selected in the schema.

Talend Open Studio Components

1075

File components
tFileInputLDIF

tFileInputLDIF
tFileInputLDIF Properties
Component Family File/Input

Function Purpose Basic settings

tFileInputLDIF reads a given LDIF file row by row. tFileInputLDIF opens a file, reads it row by row, et gives the full rows to the next component as defined in the schema, using a Row connection. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file and/or variable to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. add operation as prefix when the entry is modify type Value separator Die on error Select this check box to display the operation mode.

Type in the separator required for parsing data in the given file. By default, the separator used is ,. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Schema and Edit schema

Advanced settings

Encoding

tStatCatcher Statistics Usage

Use this component to read full rows in a voluminous LDIF file. This component enables you to create a data flow, using a Row > Main link, and to create a reject flow with a Row > Reject link filtering the data which type does not match the defined type. For an example of usage, see Scenario 2: Extracting erroneous XML data via a reject flow on page 1595 from tFileInputXML.

1076

Talend Open Studio Components

File components
tFileInputLDIF

Related scenario
For a related scenario, see Scenario: Writing DB data into an LDIF-type file on page 1131..

Talend Open Studio Components

1077

File components
tFileInputMail

tFileInputMail
tFileInputMail properties
Component family File/Input

Function Purpose Basic settings

tFileInputMail reads the header and content parts of defined email file. This component helps to extract standard key data from emails. File name Schema and Edit Schema Browse to the source email file. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Attachment export directory Mail parts Enter the path to the directory where you want to export email attachments. Column: This field is automatically populated with the columns defined in the schema that you propagated. Mail part: Type in the label of the header part or body to be displayed on the output. Multi value: Select the check box next to the name of the column that is made up of fields of multiple values. Field separator: Enter a value separator for the field of multiple values. Die on error Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to gather the job processing metadata at the Job level as well as at each component level.

Advanced settings

tStatCatcher Statistics

1078

Talend Open Studio Components

File components
tFileInputMail

Usage Limitation

This component handles flow of data therefore it requires output. It is defined as an intermediary step. n/a

Scenario: Extracting key fields from an email


This Java scenario describes a two-component Job that extracts some key standard fields and displays the values on the Run console.

Drop a tFileInputMail and a tLogRow component from the Palette to the design workspace. Connect the two components together using a Main Row link. Double-click tFileInputMail to display its Basic settings view and define the component properties.

Click the three-dot button next to the File Name field and browse to the mail file to be processed. Set schema type to Built-in and click the three-dot button next to Edit schema to open a dialog box where you can define the schema including all columns you want to retrieve on your output. Click the plus button in the dialog box to add as many columns as you want to include in the output flow. In this example, the schema has four columns: Date, Author, Object and Status. Once the schema is defined, click OK to close the dialog box and propagate the schema into the Mail parts table. Click the three-dot button next to Attachment export directory and browse to the directory in which you want to export email attachments, if any.

Talend Open Studio Components

1079

File components
tFileInputMail

In the Mail part column of the Mail parts table, type in the actual header or body standard keys that will be used to retrieve the values to be displayed. Select the Multi Value check box next to any of the standard keys if more than one value for the relative standard key is present in the input file. If needed, define a separator for the different values of the relative standard key in the Separator field. Double-click tLogRow to display its Basic settings view and define the component properties in order for the values to be separated by a carriage return. On Windows OS, type in \n between double quotes. Save your Job and press F6 to execute it and display the output flow on the console.

The header key values are extracted as defined in the Mail parts table. Mail reception date, author, subject and status are displayed on the console.

1080

Talend Open Studio Components

File components
tFileInputMSDelimited

tFileInputMSDelimited
tFileInputMSDelimited properties
Component family File/Input

Function Purpose

tFileInputMSDelimited reads a complex multi-structured delimited file. tFileInputMSDelimited opens a complex multi-structured file, reads its data structures (schemas) and then uses Row links to send fields as defined in the different schemas to the next job components. Multi Schema Editor The [Multi Schema Editor] helps to build and configure the data flow in a multi-structure delimited file to associate one schema per output. For more information, see The Multi Schema Editor on page 1081. Lists all the schemas you define in the [Multi Schema Editor], along with the related record type and the field separator that corresponds to every schema, if different field separators are used. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to remove leading and trailing whitespaces from defined columns. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Basic settings

Output

Die on error

Advanced settings

Trim all column Advanced separator (for numbers)

tStatCatcher Statistics Usage

Use this component to read multi-structured delimited files and separate fields contained in these files using a defined separator.

The Multi Schema Editor


The [Multi Schema Editor] enables you to: set the path to the source file, define the source file properties, define data structure for each of the output schemas.
When you define data structure for each of the output schemas in the [Multi Schema Editor], column names in the different data structures automatically appear in the input schema lists of the components that come after tFileInputMSDelimited. However, you can still define data structures directly in the Basic settings view of each of these components.

Talend Open Studio Components

1081

File components
tFileInputMSDelimited

The [Multi Schema Editor] also helps to declare the schema that should act as the source schema (primary key) from the incoming data to insure its unicity.The editor uses this mapping to associate all schemas processed in the delimited file to the source schema in the same file.
The editor opens with the first column, that usually holds the record type indicator, selected by default. However, once the editor is open, you can select the check box of any of the schema columns to define it as a primary key.

The below figure illustrates an example of the [Multi Schema Editor].

For detailed information about the usage of the Multi Schema Editor, see Scenario: Reading a multi structure delimited file on page 1083.

1082

Talend Open Studio Components

File components
tFileInputMSDelimited

Scenario: Reading a multi structure delimited file


The following scenario creates a Java Job which aims at reading three schemas in a delimited file and displaying their data structure on the Run Job console. The delimited file processed in this example looks like the following:

Drop a tFileInputMSDelimited and tLogRow (X3) components from the Palette onto the design workspace. Double-click tFileInputMSDelimited to open the Multi Schema Editor.

Click Browse... next to the File name field to locate the multi schema delimited file you need to process. In the File Settings area: -Select from the list the encoding type the source file is encoded in. This setting is meant to ensure encoding consistency throughout all input and output files. -Select the field and row separators used in the source file.
Select the Use Multiple Separator check box and define the fields that follow accordingly if different field separators are used to separate schemas in the source file.

A preview of the source file data displays automatically in the Preview panel.

Talend Open Studio Components

1083

File components
tFileInputMSDelimited

Column 0 that usually holds the record type indicator is selected by default. However, you can select the check box of any of the other columns to define it as a primary key.

Click Fetch Codes to the right of the Preview panel to list the type of schema and records you have in the source file. In this scenario, the source file has three schema types (A, B, C). Click each schema type in the Fetch Codes panel to display its data structure below the Preview panel. Click in the name cells and set column names for each of the selected schema. In this scenario, column names read as the following: -Schema A: Type, DiscName, Author, Date, -Schema B: Type, SongName, -Schema C: Type, LibraryName.
1084 Talend Open Studio Components

File components
tFileInputMSDelimited

You need now to set the primary key from the incoming data to insure its unicity (DiscName in this scenario). To do that: In the Fetch Codes panel, select the schema holding the column you want to set as the primary key (schema A in this scenario) to display its data structure. Click in the Key cell that corresponds to the DiscName column and select the check box that displays.

Click anywhere in the editor and the false in the Key cell will become true. You need now to declare the parent schema by which you want to group the other children schemas (DiscName in this scenario). To do that: In the Fetch Codes panel, select schema B and click the right arrow button to move it to the right. Do the same with schema C.

The Cardinality field is not compulsory. It helps you to define the number (or range) of fields in children schemas attached to the parent schema. However, if you set the wrong number or range and try to execute the Job, an error message will display.

In the [Multi Schema Editor], click OK to validate all the changes you did and close the editor. The three defined schemas along with the corresponding record types and field separators display automatically in the Basic settings view of tFileInputMSDelimited.

Talend Open Studio Components

1085

File components
tFileInputMSDelimited

In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1, tLogRow2, and tLogRow3 using the row_A_1, row_B_1, and row_C_1 links respectively.

The three schemas you defined in the [Multi Schema Editor] are automatically passed to the three tLogRow components. If needed, click the Edit schema button in the Basic settings view of each of the tLogRow components to view the input and output data structures you defined in the Multi Schema Editor or to modify them.

1086

Talend Open Studio Components

File components
tFileInputMSDelimited

Save your Job and click F6 to execute it. The multi schema delimited file is read row by row and the extracted fields are displayed on the Run Job console as defined in the [Multi Schema Editor].

Talend Open Studio Components

1087

File components
tFileInputMSPositional

tFileInputMSPositional
tFileInputMSPositional properties
Component family File/Input

Function Purpose

tFileInputMSPositional reads multiple schemas from a positional file. tFileInputMSPositional opens a complex multi-structured file, reads its data structures (schemas) and then uses Row links to send fields as defined in the different schemas to the next job components. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name of the file and/or the variable to be processed Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Row separator Schema identifier Field Position Records String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Schema: define as many schemas as needed. Schema identifier value: value of the string which identifies the different schemas. Type in the column name. Pattern: string which represents the length of each column of the schema, separated by commas. Make sure the values defined in this field are relevant with the defined schema. Reject incorrect row size: select the check boxes of the schemas where to reject incorrect row size. Parent key column: Type in the parent key column name. Key column: Type in the key column name/ Number of rows to be skipped in the beginning of file. Number of rows to be skipped at the end of the file. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. Let the component die if an parsing error occurs. Length values separated by commas, interpreted as a string between quotes. Make sure the values entered in this fields are consistent with the schema defined. Select this check box to process long rows (this is necessary to process rows longer than 100 000 characters).

Basic settings

Skip from header Skip from footer Limit Die on parse error Die on unknown header type Advanced settings Process long rows (needed for processing rows longger than 100,000 characters wide)

1088

Talend Open Studio Components

File components
tFileInputMSPositional

Trim all column Advanced separator (for numbers)

Select this check box to remove leading and trailing whitespaces from defined columns. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Encoding

tStatCatcher Statistics Usage

Use this component to read a multi schemas positional file and separate fields using a position separator value. You can also create a rejection flow using a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Related scenario
For related use case, see tFileInputMSDelimited Scenario: Reading a multi structure delimited file on page 1083.

Talend Open Studio Components

1089

File components
tFileInputMSXML

tFileInputMSXML
tFileInputMSXML Properties
Component family XML or File/Input

Function Purpose

tFileInputMSXML reads and outputs multiple schema within an XML structured file. tFileInputMSXML opens a complex multi-structured file, reads its data structures (schemas) and then uses Row links to send fields as defined in the different schemas to the next job components. File Name Name of the file and/or the variable to be processed Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Root XPath query Enable XPath in column Schema XPath loop But lose the order The root of the XML tree, which the query is based on. Select this check box if you want to define a XPath path in the Schema XPath loop field of th Outputs array. This option is only available with the dom4j generation mode. Make sure this mode is selected in the Generation mode list, in the Advanced settings tab of your component. If you use this option, the data will not be returned in order. Outputs Schema: define as many schemas as needed. Schema XPath loop: node of the XML tree or XPath path which the loop is based on. If you want to use a XPath path in the Schema XPath loop field, you must select the Enable XPath in column Schema XPath loop but lose the order check box. XPath Queries: Enter the fields to be extracted from the structured input. Create empty row: select the check boxes of the schemas where you want to create empty rows. Die on error Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to remove leading and trailing whitespaces from defined columns. Select the generation mode from the list.

Basic settings

Advanced settings

Trim all column Generation mode

1090

Talend Open Studio Components

File components
tFileInputMSXML

Encoding

Select the encoding type from the list or select CUSTOM and define it manually. This field is compulsory for DB data handling.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Limitation n/a

Scenario: Reading a multi structure XML file


The following scenario creates a Java Job which aims at reading a multi schema XML file and displaying data structures on the Run Job console. The XML file processed in this example looks like the following:

Drop a tFileInputMSXML and two tLogRow components from the Palette onto the design workspace. Double-click tFileInputMSXML to open the component Basic settings view.

Talend Open Studio Components

1091

File components
tFileInputMSXML

Browse to the XML file you want to process. In the Root XPath query field, enter the root of the XML tree, which the query will be based on. Select the Enable XPath in column Schema XPath loop but lose the order check box if you want to define a XPath path in the Schema XPath loop field, in the Outputs array. In this scenario, we do not use this option. Click the plus button to add lines in the Outputs table where you can define the output schema, two lines in this scenario: record and book. In the Outputs table, click in the Schema cell and then click a three-dot button to display a dialog box where you can define the schema name.

Enter a name for the output schema and click OK to close the dialog box. The tFileInputMSXML schema editor displays. Define the schema you previously defined in the Outputs table. Do the same for all the output schemas you want to define. In the design workspace, right-click tFileInputMSXML and connect it to tLogRow1, and tLogRow2 using the record and book links respectively.

In the Basic settings view and in the Schema XPath loop cell, enter the node of the XML tree, which the loop is based on. In the XPath Queries cell, enter the fields to be extracted from the structured XML input. Select the check boxes next to schemas names where you want to create empty rows. Save your Job and press F6 to execute it. The defined schemas are extracted from the multi schema XML structured file and displayed on the console.
1092 Talend Open Studio Components

File components
tFileInputMSXML

The multi schema XML file is read row by row and the extracted fields are displayed on the Run Job console as defined.

Talend Open Studio Components

1093

File components
tFileInputPositional

tFileInputPositional
tFileInputPositional properties
Component family File/Input

Function Purpose Basic settings

tFileInputPositional reads a given file row by row and extracts fields based on a pattern. This component opens a file and reads it row by row to split them up into fields then sends fields as defined in the schema to the next job component, via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name/Stream File name: Path and name of the output file and/or variable to be used Stream: Data flow to be processed. The data must be added to the flow so that it can be collected by tFileInputPositional via the INPUT_STREAM variable in the autocompletion list (Ctrl+Space). Related topic:How to define variables from the Component view of Talend Open Studio User Guide Row separator Use byte length as the cardinality Customize String (ex: \non Unix) to distinguish rows. Select this check box to enable the support of double-byte character to this component. JDK 1.6 is required for this feature. Select this check box to customize the data format of the positional file and define the table columns: Column: Select the column you want to customize. Size: Enter the column size. Padding char: Type in between inverted commas the padding character used in order for it to be removed from the field. A space by default. Alignment: Select the appropriate alignment parameter. Length values separated by commas, interpreted as a string between quotes. Make sure the values entered in this field are consistent with the schema defined. Select this check box to skip empty rows. Select this check box to uncompress the input file. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link.

Pattern

Skip empty rows Uncompress as zip file Die on error

1094

Talend Open Studio Components

File components
tFileInputPositional

Header Footer Limit Schema and Edit Schema

Number of rows to be skipped in the beginning of file Number of rows to be skipped at the end of the file. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings

Needed to process rows longer than 100 000 characters Advanced separator (for numbers)

Select this check box if the rows to be processed in the input file are longer than 100 000 characters. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select this check box to remove leading and trailing whitespaces from defined columns. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Trim all column Encoding

tStatCatcher Statistics

Usage

Use this component to read a file and separate fields using a position separator value. You can also create a rejection flow using a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588.

Scenario: From Positional to XML file


The following scenario creates a two-component Job, which aims at reading data of an Input file and outputting selected data (according to the data position) into an XML file.

Talend Open Studio Components

1095

File components
tFileInputPositional

Drop a tFileInputPositional component from the Palette to the design workspace. The file contains raw data, in this case, contract nr, customer references and insurance numbers. Drop a tFileOutputXML component as well. This file is meant to receive the references in a structured way. Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the tFileOutputXML component and release when the plug symbol shows up. Select the tFileInputPositional component again, and define its properties. The job properties are built-in for this scenario. As opposed to the Repository, this means that the Property type is set for this station only.

Fill in a path to the file in the File Name field. This field is mandatory. Define the Row separator identifying the end of a row, by default, a carriage return. If required, select the Use byte length as the cardinality check box to enable the support of double-byte character. Then define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding to the values of your input files. The values should be entered between quotes, and separated by a comma. Make sure the values you enter match the schema defined. In this scenario, the header, footer and limit fields are not set. But depending on the input file structure, you may need to define them. Next to Schema, select either Repository or Built-In to define the data to pass on to the tFileOutputXML component. You can load and/or edit the schema via the Edit Schema function. For this schema, define three columns, respectively Contracts, CustomerRef and InsuranceNr matching the three value lengths defined.

1096

Talend Open Studio Components

File components
tFileInputPositional

Then define the second component Basic settings: Enter the XML output file path.

Enter a root tag (or more), to wrap the XML structure output, in this case ContractsList. Define the row tag that will wrap each line data, in this case ContractRef. Select the Column name as tag name check box to reuse the column label from the input schema as tag label. By default, field is used for each column value data. Enter the Encoding standard, the input file is encoded in. Note that, for the time being, the encoding consistency verification is not supported. Select the Schema type. If the row connection is already implemented, the schema is automatically synchronized with the Input file schema. Else, click on Sync columns. Go to the Run tab, and click on Run to execute the Job. The file is read row by row and split up into fields based on the length values defined in the Pattern field. You can open it using any standard XML editor.

Talend Open Studio Components

1097

File components
tFileInputPositional

1098

Talend Open Studio Components

File components
tFileInputProperties

tFileInputProperties
tFileInputProperties properties
Component family File/Input

Function Purpose Basic settings

tFileInputProperties reads a text file row by row and extracts the fields. tFileInputProperties opens a text file and reads it row by row then separates the fields according to the model key = value. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository but for this component, the schema is read-only. It is made of two column, Key and Value, corresponding to the parameter name and the parameter value to be copied. Select from the list your file format, either: .properties or .ini. .properties: data in the configuration file is written in two lines and structured according to the following way: key = value. .ini: data in the configuration file is written in two lines and structured according to the following way: key = value and re-grouped in sections. Section Name: enter the section name on which the iteration is based. File Name Name or path to the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.

File format

Advanced settings

Encoding

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Use this component to read a text file and separate data according to the structure key = value.

Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary
This four-component Java Job reads two .properties files, one in French and the other in English. The data in the two input files is mapped to output a glossary matching the English and French terms. The two input files used in this scenario hold localization strings for the tMysqlInput component in Talend Open Studio.

Talend Open Studio Components

1099

File components
tFileInputProperties

Drop the following components from the Palette onto the design workspace: tFileInputProperties (x2), tMap, and tLogRow. Connect the component together using Row > Main links. The second properties file, FR, is used as a lookup flow.

Double-click the first tFileInputProperties component to open its Basic settings view and define its properties.

1100

Talend Open Studio Components

File components
tFileInputProperties

In the File Format field, select your file format. In the File Name field, click the three-dot button and browse to the input .properties file you want to use. Do the same with the second tFileInputProperties and browse to the French properties file this time.

Double-click the tMap component to open the tMap editor.

Select all columns from the English_terms table and drop them to the output table. Select the key column from the English_terms table and drop it to the key column in the French_terms table. In the glossary table in the lower right corner of the tMap editor, rename the value field as EN because it will hold the values of the English file. Click the plus button to add a line to the glossary table and rename it as FR.
Talend Open Studio Components 1101

File components
tFileInputProperties

In the Length field, set the maximum length to 255. In the upper left corner of the tMap editor, select the value column in the English_terms table and drop it to the FR column in the French_terms table. Click OK to validate your changes and close the editor. In the design workspace, double-click tLogRow to display its Basic settings and define the component properties. Click Sync Columns to retrieve the schema from the preceding component. Save your Job and press F6 to execute it.

The glossary displays on the console listing three columns holding: the key name in the first column, the English term in the second, and the corresponding French term in the third.

1102

Talend Open Studio Components

File components
tFileInputRegex

tFileInputRegex
tFileInputRegex properties
Component family File/Input

Function Purpose

Powerful feature which can replace number of other components of the File family. Requires some advanced knowledge on regular expression syntax Opens a file and reads it row by row to split them up into fields using regular expressions. Then sends fields as defined in the Schema to the next job component, via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name/Stream File name: Name of the file and/or the variable to be processed Stream: Data flow to be processed. The data must be added to the flow so that it can be collected by the tFileInputRegex via the INPUT_STREAM variable in the autocompletion list (Ctrl+Space) Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Row separator Regex String (ex: \non Unix) to distinguish rows. This field is Perl or Java compatible and can contain multiple lines. Type in your regular expressions including the subpattern matching the fields to be extracted. Note: In Java, antislashes need to be doubled in regexp Regex syntax is different in Java/Perl and requires doubel/single quotes respectively. Header Footer Limit Schema and Edit Schema Number of rows to be skipped in the beginning of file Number of rows to be skipped at the end of the file. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Basic settings

Talend Open Studio Components

1103

File components
tFileInputRegex

Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Skip empty rows Die on error Select this check box to skip empty rows. Select this check box to stop the execution of the Job when an error occurs. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can collect the rows on error using a Row > Reject link. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

Encoding

tStatCatcher Statistics

Usage

Use this component to read a file and separate fields contained in this file according to the defined Regex. You can also create a rejection flow using a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588. n/a

Limitation

Scenario: Regex to Positional file


The following scenario creates a two-component Job, reading data from an Input file using regular expression and outputting delimited data into an XML file.

Drop a tFileInputRegex component from the Palette to the design workspace. Drop a tFileOutputPositional component the same way. Right-click on the tFileInputRegex component and select Row > Main. Drag this main row link onto the tFileOutputPositional component and release when the plug symbol displays. Select the tFileInputRegex again so the Component view shows up, and define the properties:

1104

Talend Open Studio Components

File components
tFileInputRegex

The Job is built-in for this scenario. Hence, the Properties are set for this station only. Fill in a path to the file in File Name field. This field is mandatory. Define the Row separator identifying the end of a row. Then define the Regular expression in order to delimit fields of a row, which are to be passed on to the next component. You can type in a regular expression using Java or Perl code, and on mutiple lines if needed.
Make sure to use the correct Regex syntax according to the generation language in use as the syntax is different in Java/Perl, and include the regex in double/single quotes respectively.

In this expression, make sure you include all subpatterns matching the fields to be extracted. In this scenario, ignore the header, footer and limit fields. Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional component. You can load or create the schema through the Edit Schema function. Then define the second component properties:

Talend Open Studio Components

1105

File components
tFileInputRegex

Enter the Positional file output path. Enter the Encoding standard, the output file is encoded in. Note that, for the time being, the encoding consistency verification is not supported. Select the Schema type. Click on Sync columns to automatically synchronize the schema with the Input file schema. Now go to the Run tab, and click on Run to execute the Job. The file is read row by row and split up into fields based on the Regular Expression definition. You can open it using any standard file editor.

1106

Talend Open Studio Components

File components
tFileInputXML

tFileInputXML
tFileInputXML belongs to two component families: File and XML. For more information on tFileInputXML, see tFileInputXML on page 1592.

Talend Open Studio Components

1107

File components
tFileList

tFileList
tFileList properties
Component family File/Management

Function Purpose Basic settings

tFileList iterates on files or folders of a set directory. tFileList retrieves a set of files or folders based on a filemask pattern and iterates on each unity. Directory FileList Type Path to the directory where the files are stored. Select the type of input you want to iterate on from the list: Files if the input is a set of files, Directories if the input is a set of directories, Both if the input is a set of the above two types. Select this check box if the selected input source type includes sub-directories. Set the case mode from the list to either create or not create case sensitive filter on filenames.

Include subdirectories Case Sensitive

Generate Error if no Select this check box to generate an error file found message if no files or directories are found. Use Glob Expressions as Filemask (Unchecked means Perl5 Regex Expressions) Files This check box is selected by default. It filters the results using a Global Expression (Glob Expressions). Clear this check box to filter results using a Regex Expression of the type Perl5. Click the plus button to add as many filter lines as needed: Filemask: in the added filter lines, type in a filename or a filemask using special characters or regular expressions.

1108

Talend Open Studio Components

File components
tFileList

Order by

The folders are listed first of all, then the files. You can choose to prioritise the folder and file order either: By default: alphabetical order, by folder then file; By file name: alphabetical order or reverese alphabetical order; By file size: smallest to largest or largest to smallest; By modified date: most recent to least recent or least recent to most recent. If ordering by file name, in the event of identical file names then modified date takes precedence. If ordering by file size, in the event of identical file sizes then file name takes precedence. If ordering by modified date, in the event of identical dates then file name takes precedence. Either by: ASC: alphabetical order / smallest to largest / least recent to most recent. DESC: reverse alphabetical order / largest to smallest / most recent to least recent.

Order action

Usage Global Variables

tFileList provides a list of files or folders from a defined directory on which it iterates Current File Name: Indicates the current file name. This is available as a Flow variable. Returns a string. Current File Name with Path: Indicates the current file name as well as the path to the file. This is available as a Flow variable. Returns a string. Current File Extension: Indicates the extension of the current file. This is available as a Flow variable. Returns a string. Current File Directory: Indicates the access path to the folder or subfolder in which the current file is stored. This is available as a Flow variable. Returns a string. Number of files: Indicates the number of files iterated upon so far. This is available as a Flow variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Talend Open Studio Components

1109

File components
tFileList

Connections

Outgoing links (from one component to another): Row: Iterate Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Iterating on a file directory


The following scenario creates a three-component Job, which aims at listing files from a defined directory, reading each file by iteration, selecting delimited data and displaying the output in the Run log console.

Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited, and tLogRow. Right-click on the tFileList component, and pull an Iterate connection to the tFileInputDelimited component. Then pull a Main row from the tFileInputDelimited to the tLogRow component. Double-click tFileList to display its Basic settings view and define its properties.

1110

Talend Open Studio Components

File components
tFileList

Browse to the Directory that holds the files you want to process. To display the path on the Job itself, use the label (__DIRECTORY__) that shows up when you put the pointer anywhere in the Directory field. Type in this lable in the Label Format field you can find if you click the View tab in the Basic settings view.

In the Basic settings view and from the FileList Type list, select the source type you want to process, Files in this example. In the Case sensitive list, select a case mode, Yes in this example to create case sensitive filter on file names. Keep the Use Glob Expressions as Filemask check box selected if you want to use global expressions to filter files. In the Filemask field, define a file mask, use special characters if need be. Double-click tFileInputDelimited to display its Basic settings view and set its properties.

Enter the File Name field using a variable containing the current filename path, as you filled in the Basic settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of variables. Select the global variable ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) if you work in Java and, or $_globals{tFileList_1}{CURRENT_FILEPATH} if you work in Perl. This way, all files in the input directory can be processed. Fill in all other fields as detailed in the tFileInputDelimited section. Related topic: tFileInputDelimited properties on page 1054. Select the last component, tLogRow, to display its Basic settings view and fill in the separator to be used to distinguish field content displayed on the console. Related topic: tLogRow on page 1305.

Talend Open Studio Components

1111

File components
tFileList

The Job iterates on the defined directory, and reads all included files. Then delimited data is passed on to the last component which displays it on the console. For other scenarios using tFileList, see tFileCopy on page 1039.

1112

Talend Open Studio Components

File components
tFileOutputARFF

tFileOutputARFF
tFileOutputARFF properties
Component family File/Output

Function Purpose Basic settings

tFileOutputARFF outputs data to an ARFF file. This component writes an ARFF file that holds data organized according to the defined schema. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a connection wizard and store the Excel file connection parameters you set in the component Basic settings view. For more information about setting up and storing file connection parameters, see Setting up an XML file schema of Talend Open Studio User Guide. File name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Attribute Define Displays the schema you defined in the [Edit schema dialog box. Column: Name of the column. Type: Data type. Pattern: Enter the data model (pattern), if necessary. Enter the name of the relation. Select this check box to add the new rows at the end of the file. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You can create the schema and store it locally for this component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Talend Open Studio Components 1113

Relation Append Schema and Edit Schema

File components
tFileOutputARFF

Create directory if not This check box is selected by default. It creates a exists directory to hold the output table if it does not exist. Advanced settings Dont generate empty file Select this check box if you do not want to generate empty files.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Global Variables Use this component along with a Row link to collect data from another component and to re-write the data to an ARFF file. The Global variables can be used as parameters in most of the fields found in the component properties view. To view these variables, place the cursor in the field and press Ctrl + Space. Double click the variable to populate the field. The main global variable associated with tFileOutputARFF is: Number of lines: Indicates the number of lines processed. This is available as an After variable Connections Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide. Limitation n/a

Related scenarios
For tFileOutputARFF related scenario, see Scenario: Display the content of a ARFF file on page 1051.

1114

Talend Open Studio Components

File components
tFileOutputDelimited

tFileOutputDelimited
tFileOutputDelimited properties
Component family File/Output

Function Purpose Basic settings

tFileOutputDelimited outputs data to a delimited file. This component writes a delimited file that holds data organized according to the defined schema. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Row Separator Field Separator Append Include Header Compress as zip file Schema and Edit schema String (ex: \n on Unix) to distinguish rows in the output file. Character, string or regular expression to separate fields of the output file. Select this check box to add the new rows at the end of the file. Select this check box to include the column header to the file. Select this check box to compress the output file in zip format. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You can create the schema and store it locally for this component. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Talend Open Studio Components

1115

File components
tFileOutputDelimited

Repository: You have already created and stored the schema in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the output component. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select this check box to take into account all parameters specific to CSV files, in particular Escape char and Text enclosure parameters.

Advanced settings

Advanced separator (for numbers)

CSV options

Create directory if not This check box is selected by default. It creates the exists directory that holds the output delimited file, if it does not already exist. Split output in several files In case of very big output files, select this check box to divide the output delimited file into several files. Rows in each output file: set the number of lines in each of the output files. Select this check box to define the number of lines to write before emptying the buffer. Row Number: set the number of lines to write. Writes in row mode. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you do not want to generate empty files.

Custom the flush buffer size Output in row mode Encoding

Dont generate empty file

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Use this component to write a delimited file and separate fields using a field separator value. n/a

Scenario: Writing data in a delimited file


This Java scenario describes a three-component Job that extracts certain data from a file holding information about clients, customers, and then writes the extracted data in a delimited file. In the following example, we have already stored the input schema under the Metadata node in the Repository tree view. For more information about storing schema metadata in the Repository, see Setting up a File Delimited schema.
1116 Talend Open Studio Components

File components
tFileOutputDelimited

In the Repository tree view, expand Metadata and File delimited in succession and then browse to your input schema, customers, and drop it on the design workspace. A dialog box displays where you can select the component type you want to use.

Click tFileInputDelimited and then OK to close the dialog box. A tFileInput Delimited component holding the name of your input schema displays on the design workspace. Drop a tMap and a tFileOutputDelimited components from the Palette to the design workspace. Link the components together using Main Row links. Double-click tFileInputDelimited to open its Basic settings view. All its property fields are automatically filled in because you defined your input file locally.

Talend Open Studio Components

1117

File components
tFileOutputDelimited

If you do not define your input file locally in the Repository tree view, fill in the details manually after selecting Built-in in the Property type list. Click the three-dot button next to the File Name field and browse to the input file, customer.csv in this example. In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line and field separators. If needed, set the number of lines used as header and the number of lines used as footer in the corresponding fields and then set a limit for the number of processed rows. In this example, Header is set to 6 while Footer and Limit are not set. In the Schema field, schema is automatically set to Repository and your schema is already defined since you have stored your input file locally for this example. Otherwise, select Built-in and click Edit Schema to open a dialog box where you can define the input schema.

Click OK to close the dialog box. In the design workspace, double-click tMap to open its editor.

1118

Talend Open Studio Components

File components
tFileOutputDelimited

In the tMap editor, click table] dialog box.

on top of the panel to the right to open the [Add a new output

Enter a name for the table you want to create, row2 in this example. Click OK to validate your changes and close the dialog box. In the table to the left, row1, select the first three lines (Id, CustomerName and CustomerAddress) and drop them to the table to the right In the Schema editor view situated in the lower left corner of the tMap editor, change the type of RegisterTime to String in the table to the right.

Click OK to save your changes and close the editor. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and define the component properties.

In the Property Type field, set the type to Built-in and fill in the fields that follow manually. Click the three-dot button next to the File Name field and browse to the output file you want to write data in, customerselection.txt in this example. In the Row and Field Separators fields, set \n and ; respectively as row and field separators. Select the Include Header check box if you want to output columns headers as well.
Talend Open Studio Components 1119

File components
tFileOutputDelimited

Click Edit schema to open the schema dialog box and verify if the recuperated schema corresponds to the input schema. If not, click Sync Columns to recuperate the schema from the preceding component. Save your Job and press F6 to execute it.

The three specified columns Id, CustomerName and CustomerAddress are output in the defined output file.

1120

Talend Open Studio Components

File components
tFileOutputEBCDIC

tFileOutputEBCDIC

This component requires a JDK Sun to be functional.

tFileOutputEBCDIC properties
Component family File/Output

Function Purpose Basic settings

The tFileOutputEBCDIC writes ab EBCDIC file based on various source data files, each of them with a different schema. This component writes an EBCDIC file with data extracted from files based on their schemas. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Access path and name of the EBCDIC output file et/ou variable to be used For further information, see How to centralize contexts and variables in Talend Open Studio. Xc2j file Select the xc2j transformation file. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings Usage

tStatCatcher Statistics

Use this component to write an EBCDIC file and to output the data separately depending on the schemas identified in the incoming file.

Scenario: Creating an EBCDIC file using two delimited files


This scenario uses the [Copybook Connection] wizard that guides users through the different steps to create a Copybook connection and to retrieve the EBCDIC schemas. This wizard is available only for Talend Integration Suite users. If you are using Talend Open Studio or Talend On Demand, you need to set the basic settings for the tFileInputEBCDIC component manually.

The following scenario is a three-component Job that aims at writing an EBCDIC-format file using two delimited files with different schemas.

Talend Open Studio Components

1121

File components
tFileOutputEBCDIC

This Java scenario uses the EBCDIC Connection wizard to set up a connection to the Copybook file and to generate an xc2j file, which allows the retrieval and transformation of the different file schemas. Create a connection to the Copybook file, which describes the structure of your EBCDIC file. In this scenario, the Copybook connection is called EBCDIC. Retrieve the file schemas. Once the Copybook connection has been created and the schemas retrieved, using the EBCDIC and Schema wizards, the new schemas appear under the node Metadata > Copybook. They are called 01, 04 and 05. To create an EBCDIC file based on two delimited files in Talend Open Studio : Drop the following components from the Palette to the design workspace: tFileInputDelimited (x2) and tFileOutputEBCDIC. To connect them together, right-click on each tFileInputDelimited component, select Row > Main in the contextual menu and click on the tFileOutputEBCDIC component. Double-click on the first tFileInputDelimited component to display the Basic settings view and set the component properties.

In the File Name field, browse to the delimited file via the three-dot button [...]. In the Schema field, select Repository, then click the three-dot button and, when prompted, select the schema corresponding to your file, under the Copybook node.

1122

Talend Open Studio Components

File components
tFileOutputEBCDIC

In the Header field, set the number of fields that are used as headers, 1 in this example. Set the properties for the second tFileInputDelimited component the same way as for the first component. Double-click the tFileOutputEBCDIC component to display the Basic settings view and set the component properties:

In the Data file field, enter or browse to the directory path and the EBCDIC file name that is to be created based on both delimited files. In the Xc2j file field, enter or browse to the path to the file allowing to extract the schema that describes the EBCDIC structure file. Save your Job via Ctrl+S and click on the Run view, select the Statistics and Exec time check boxes then click Run to execute the Job.

Talend Open Studio Components

1123

File components
tFileOutputExcel

tFileOutputExcel
tFileOutputExcel Properties
Component family File/Output

Function Purpose Basic settings

tFileOutputExcel outputs data to an MS Excel type of file. tFileOutputExcel writes an MS Excel file with separated data value according to a defined schema. File name Name or path to the output file. Related topic: How to define variables from the Component view of Talend Open Studio User Guide Name of the xsl sheet. Select this check box to include a header row to the output file. Select this check box to add the new lines at the end of the file. Append existing sheet: Select this check box to add the new lines at the end of the Excel sheet. Select this check box to add information in specified cells: First cell X: cell position on the X-axis (X-coordinate or Abcissa). First cell Y: cell position on the Y-axis (Y-coordinate). Keep existing cell format: select this check box to retain the original layout and format of the cell you want to write into. Select in the list the font you want to use. Select this check box if you want the size of all your columns to be defined automatically. Otherwise, select the Auto size check boxes next to the column names you want their size to be defined automatically. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Sheet name Include header Append existing file

Is absolute Y pos.

Font Define all columns auto size

Schema and Edit Schema

1124

Talend Open Studio Components

File components
tFileOutputExcel

Advanced settings

Create directory if not This check box is selected by default. This option exists creates the directory that will hold the output files if it does not already exist. Advanced separator (for numbers) Select this check box to modify the separators you want to use for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Encoding

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Use this component to write an XML file with data passed on from other components using a Row link. n/a

Related scenario
For tFileOutputExcel related scenario, see tSugarCRMInput on page 119.

Talend Open Studio Components

1125

File components
tFileOutputJSON

tFileOutputJSON
tFileOutputJSON properties
Component Family File

Function Purpose Basic settings

tFileOutputJSON writes data to a JSON structured output file. tFileOutputJSON receives data and rewrites it in a JSON structured data block in an output file. File Name Name of data block Schema and Edit Schema Name and path of the output file. Enter a name for the data block to be written, between double quotation marks. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema, in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the Output component.

Advanced settings

Create directory if not This check box is selected by default. This option exists creates the directory that will hold the output files if it does not already exist. tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage Limitation

Use this component to rewrite received data in a JSON structured output file. n/a

Scenario: Writing a JSON structured file


This is a 2 component scenario in which a tRowGenerator component generates random data which a tFileOutputJSON component then writes to a JSON structured output file.
1126 Talend Open Studio Components

File components
tFileOutputJSON

Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the Palette. Link the components using a Row > Main connection. Double click tRowGenerator to define its Basic Settings properties in the Component view.

If the schema you require is already stored under the Db Connections node in the Repository, select Repository in the Schema field and choose the metadata from the

list.
Otherwise, click [...] next to Edit Schema to display the corresponding dialog box and define the schema.

Click [+] to add the number of columns desired. Under Columns type in the column names. Under Type, select the data type from the list. Click OK to close the dialog box. Click [+] next to RowGenerator Editor to open the corresponding dialog box.

Talend Open Studio Components

1127

File components
tFileOutputJSON

Under Functions, select pre-defined functions for the columns, if required, or select [...] to set customized function parameters in the Function parameters tab. Enter the number of rows to be generated in the corresponding field. Click OK to close the dialog box. Click tFileOutputJSON to set its Basic Settings properties in the Component view.

Click [...] to browse to where you want the output JSON file to be generated and enter the file name. Enter a name for the data block to be generated in the corresponding field, between double quotation marks. Select Built-In as the Schema type. Click Sync Columns to retrieve the schema from the preceding component. Press F6 to run the Job.

1128

Talend Open Studio Components

File components
tFileOutputJSON

The data from the input schema is written in a JSON structured data block in the output file.

Talend Open Studio Components

1129

File components
tFileOutputLDIF

tFileOutputLDIF
tFileOutputLDIF Properties
Component family File/Output

Function Purpose

tFileOutputLDIF outputs data to an LDIF type of file which can then be loaded into a LDAP directory. tFileOutputLDIF writes or modifies a LDIF file with data separated in respective entries based on the schema defined,.or else deletes content from an LDIF file. File name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Wrap Change type Wraps the file content, every defined number of characters. Select Add, Modify or Delete to respectively create an LDIF file, modify or remove an existing LDIF file. In case of modification, set the type of attribute changes to be made. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the Output component. Select this check box to add the new rows at the end of the file.

Basic settings

Schema and Edit Schema

Append Advanced settings

Create directory if not This check box is selected by default. It creates the exists directory that holds the output delimited file, if it does not already exist.

1130

Talend Open Studio Components

File components
tFileOutputLDIF

Custom the flush buffer size Encoding

Select this check box to define the number of lines to write before emptying the buffer. Row Number: set the number of lines to write. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you do not want to generate empty files.

Dont generate empty file

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation Use this component to write an XML file with data passed on from other components using a Row link. n/a

Scenario: Writing DB data into an LDIF-type file


This scenario describes a two component Job which aims at extracting data from a database table and writing this data into a new output LDIF file.

Drop a tMysqlInput and a tFileOutputLDIF component from the Palette to the design area. Connect them together using a Row > Main link. Select the tMysqlInput component, and go to the Component panel then select the Basic settings tab. If you stored the DB connection details in a Metadata entry in the Repository, set the Property type as well as the Schema type on Repository and select the relevant metadata entry. All other fields are filled in automatically, and retrieve the metadata-stored parameters.

Talend Open Studio Components

1131

File components
tFileOutputLDIF

Alternatively select Built-in as the Property type and Schema type and fill in the DB connection and schema fields manually. Then double-click on tFileOutpuLDIF and define the Basic settings. Browse to the folder where you store the Output file. In this use case, a new LDIF file is to be created. Thus type in the name of this new file. In the Wrap field, enter the number of characters held on one line. The text coming afterwards will get wrapped onto the next line.

Select Add as Change Type as the newly created file is by definition empty. In case of modification type of Change, youll need to define the nature of the modification you want to make to the file. As the Schema type, select Built-in and use the Sync Columns button to retrieve the input schema definition. Press F6 to short run the Job.

1132

Talend Open Studio Components

File components
tFileOutputLDIF

The LDIF file created contains the data from the DB table and the type of change made to the file, in this use case, addition.

Talend Open Studio Components

1133

File components
tFileOutputMSDelimited

tFileOutputMSDelimited
tFileOutputMSDelimited properties
Component family File/ Output

Function Purpose Basic settings

tFileOutputMSDelimited writes multiple schema in a delimited file. tFileOutputMSDelimited creates a complex multi-structured delimited file, using data structures (schemas) coming from several incoming Row flows. File Name Name and path to the file to be created and/or the variable to be used. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Row Separator Field Separator Use Multi Field Separators Schemas String (ex: \non Unix) to distinguish rows. Character, string or regular expression to separate fields. Select this check box to set a different field separator for each of the schemas using the Field separator field in the Schemas area. The table gets automatically populated by schemas coming from the various incoming rows connected to tFileOutputMSDelimited. Fill out the dependency between the various schemas: Parent row: Type in the parent flow name (based on the Row name transferring the data). Parent key column: Type in the key column of the parent row. Key column: Type in the key column for the selected row. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select this check box to take into account all parameters specific to CSV files, in particular Escape char and Text enclosure parameters. This check box is selected by default. It creates the directory that holds the output delimited file, if it does not already exist. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you do not want to generate empty files. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

Advanced separator (for numbers)

CSV options

Create directory if not exists Encoding Dont generate empty file tStatCatcher Statistics Usage

Use this component to write a multi-schema delimited file and separate fields using a field separator value.

1134

Talend Open Studio Components

File components
tFileOutputMSDelimited

Related scenarios
No scenario is available for this component yet.

Talend Open Studio Components

1135

File components
tFileOutputMSPositional

tFileOutputMSPositional
tFileOutputMSPositional properties
Component family File/Output

Function Purpose Basic settings

tFileOutputMSPositional writes multiple schemas in a positional file. tFileOutputMSPositional creates a complex multi-structured file, using data structures (schemas) coming from several incoming Row flows. File Name Name and path to the file to be created and/or variable to be used. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Row separator Schemas String (ex: \non Unix) to distinguish rows. The table gets automatically populated by schemas coming from the various incoming rows connected to tFileOutputMSPositional. Fill out the dependency between the various schemas: Parent row: Type in the parent flow name (based on the Row name transferring the data). Parent key column: Type in the key column of the parent row Key column: Type in the key column for the selected row. Pattern: Type in the pattern that positions the fields separator for each incoming row. Padding char: type in the padding character to be used Alignment: Select the relevant alignment parameter Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. This check box is selected by default. It creates the directory that holds the output delimited file, if it does not already exist. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

Advanced separator (for numbers)

Create directory if not exists Encoding tStatCatcher Statistics Usage

Use this component to write a multi-schema positional file and separate fields using a position separator value.

Related scenario
No scenario is available for this component yet.

1136

Talend Open Studio Components

File components
tFileOutputMSXML

tFileOutputMSXML
tFileOutputMSXML Properties
Component family File/Output

Function Purpose Basic settings

tFileOutputMSXML writes multiple schema within an XML structured file. tFileOutputMSXML creates a complex multi-structured XML file, using data structures (schemas) coming from several incoming Row flows. File Name Name and path to the file to be created and or the variable to be used. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Configure XML tree Opens the dedicated interface to help you set the XML mapping. For details about the interfaceDefining the MultiSchema XML tree on page 1137. This check box is selected by default. It creates the directory that holds the output delimited file, if it does not already exist. Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you do not want to generate empty files.

Advanced settings

Create directory only if not exists Advanced separator (for numbers)

Encoding

Dont generate empty file

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Limitation n/a

Defining the MultiSchema XML tree


Double-click on the tFileOutputMSXML component to open the dedicated interface or click on the three-dot button on the Basic settings vertical tab of the Component tab.

Talend Open Studio Components

1137

File components
tFileOutputMSXML

To the left of the mapping interface, under Linker source, the drop-down list includes all the input schemas that should be added to the multi-schema output XML file (on the condition that more than one input flow is connected to the tFileOutputMSXML component). And under Schema List, are listed all columns retrieved from the input data flow in selection. To the right of the interface, are expected all XML structures you want to create in the output XML file. You can create manually or easily import the XML structures. Then map the input schema columns onto each element of the XML tree, respectively for each of the input schemas in selection under Linker source. Importing the XML tree The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file. Rename the root tag that displays by default on the XML tree panel, by clicking on it once. Right-click on the root tag to display the contextual menu. On the menu, select Import XML tree. Browse to the file to import and click OK.

1138

Talend Open Studio Components

File components
tFileOutputMSXML

The XML Tree column is hence automatically filled out with the correct elements. You can remove and insert elements or sub-elements from and to the tree: Select the relevant element of the tree. Right-click to display the contextual menu Select Delete to remove the selection from the tree or select the relevant option among: Add sub-element, Add attribute, Add namespace to enrich the tree.

Talend Open Studio Components

1139

File components
tFileOutputMSXML

Creating manually the XML tree If you dont have any XML structure already defined, you can manually create it. Rename the root tag that displays by default on the XML tree panel, by clicking on it once. Right-click on the root tag to display the contextual menu. On the menu, select Add sub-element to create the first element of the structure. You can also add an attribute or a child element to any element of the tree or remove any element from the tree. Select the relevant element on the tree you just created. Right-click to the left of the element name to display the contextual menu. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace or Delete.

Mapping XML data from multiple schema sources


Once your XML tree is ready, select the first input schema that you want to map. You can map each input column with the relevant XML tree element or sub-element to fill out the Related Column: Click on one of the Schema column name. Drag it onto the relevant sub-element to the right. Release the mouse button to implement the actual mapping.

A light blue link displays that illustrates this mapping. If available, use the Auto-Map button, located to the bottom left of the interface, to carry out this operation automatically. You can disconnect any mapping on any element of the XML tree: Select the element of the XML tree, that should be disconnected from its respective schema column. Right-click to the left of the element name to display the contextual menu.

1140

Talend Open Studio Components

File components
tFileOutputMSXML

Select Disconnect link. The light blue link disappears.

Defining the node status


Defining the XML tree and mapping the data is not sufficient. You also need to define the loop elements for each of the source in selection and if required the group element. Loop element The loop element allows you to define the iterating object. Generally the Loop element is also the row generator. To define an element as loop element: Select the relevant element on the XML tree. Right-click to the left of the element name to display the contextual menu. Select Set as Loop Element.

The Node Status column shows the newly added status.


There can only be one loop element at a time.

Group element The group element is optional, it represents a constant element where the Groupby operation can be performed. A group element can be defined on the condition that a loop element was defined before. When using a group element, the rows should be sorted, in order to be able to group by the selected node. To define an element as group element:

Talend Open Studio Components

1141

File components
tFileOutputMSXML

Select the relevant element on the XML tree. Right-click to the left of the element name to display the contextual menu. Select Set as Group Element.

The Node Status column shows the newly added status and any group status required are automatically defined, if needed. Click OK once the mapping is complete to validate the definition for this source and perform the same operation for the other input flow sources.

Related scenario
No scenario is available for this component yet.

1142

Talend Open Studio Components

File components
tFileOutputPositional

tFileOutputPositional
tFileOutputPositional Properties
Component Family File/Output

Function Purpose Basic settings

tFileOutputPositional writes a file row by row according to the length and the format of the f ields or columns in a row. It writes a file row by row, according to the data structure (schema) coming from the input flow. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. File Name Name or path to the file to be processed and or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Row separator Append Include header Compress as zip file Formats String (ex: \non Unix) to distinguish rows in the output file. Select this check box to add the new rows at the end of the file. Select this check box to include the column header to the file. Select this check box to compress the output file in zip format. Customize the positional file data format and fill in the columns in the Formats table. Column: Select the column you want to customize. Size: Enter the column size. Padding char: Type in between quotes the padding characters used. A space by default. Alignment: Select the appropriate alignment parameter. Keep: If the data in the column or in the field are too long, select the part you want to keep.

Talend Open Studio Components

1143

File components
tFileOutputPositional

Advanced settings

Advanced separator (for numbers)

Select this check box to modify the separators used for numbers: Thousands separator: define separators for thousands. Decimal separator: define separators for decimals. Select this checkbox to add support of double-byte character to this component. JDK 1.6 is required for this feature. This check box is selected by default. It creates a directory to hold the output table if it does not exist. Select this check box to define the number of lines to write before emptying the buffer. Row Number: set the number of lines to write. Writes in row mode. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you do not want to generate empty files. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Use byte length as the cardinality Create directory if not exists Custom the flush buffer size Output in row mode Encoding

Dont generate empty file tStatCatcher Statistics Usage

Use this component to read a file and separate the fields using the specified separator.

Related scenario
For a related scenario, see: Scenario: From Positional to XML file on page 1095.

1144

Talend Open Studio Components

File components
tFileOutputProperties

tFileOutputProperties
tFileOutputProperties properties
Component family File/Output

Function Purpose Basic settings

tFileInputProperties writes a configuration file of the type .ini or .properties. tFileInputProperties writes a configuration file containing text data organized according to the model key = value. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository but for this component, the schema is read-only. It is made of two column, Key and Value, corresponding to the parameter name and the parameter value to be copied. Select from the list file format: either .properties or .ini. .properties: data in the configuration file is written in two lines and structured according to the following way: key = value. .ini: data in the configuration file is written in two lines and structured according to the following way: key = value and re-grouped in sections. Section Name: enter the section name on which the iteration is based. File Name Name or path to the file to be processed and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide.

File format

Advanced settings

Encoding tStatCatcher Statistics

Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

Use this component to write files where data is organized according to the structure key = value.

Related scenarios
For a related scenario, see Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary on page 1099 of the tFileInputProperties component.

Talend Open Studio Components

1145

File components
tFileOutputXML

tFileOutputXML
tFileOtputXML belongs to two component families: File and XML. For more information on tFileOutputXML, see tFileOutputXML on page 1599.

1146

Talend Open Studio Components

File components
tFileProperties

tFileProperties
tFileProperties Properties
Component family File/Management

Function Purpose Basic settings

tFileProperties creates a single row flow that displays the properties of the processed file. tFileProperties obtains information about the main properties of a defined file. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Edit schema File The number of the read-only lines is different between Java and Perl. Name or path to the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select this check box to check the MD5 of the downloaded file.

Calculate MD5 Hash Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component can be used as standalone component.

Usage

Talend Open Studio Components

1147

File components
tFileProperties

Connections

Outgoing links (from one component to another): Row: Main; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Displaying the properties of a processed file


This Java scenario describes a very simple Job that displays the properties of the specified file. Drop a tFileProperties component and a tLogRow component from the Palette onto the design workspace. Right-click on tFileProperties and connect it to tLogRow using a Main Row link.

In the design workspace, select tFileProperties. Click the Component tab to define the basic settings of tFileProperties.

Set Schema type to Built-In. If desired, click the Edit schema button to see the read-only columns. In the File field, enter the file path or browse to the file you want to display the properties for.

1148

Talend Open Studio Components

File components
tFileProperties

In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Press F6 to execute the Job.

The properties of the defined file are displayed on the console.

Talend Open Studio Components

1149

File components
tFileRowCount

tFileRowCount
tFileRowCount properties
Component Family File/Management

Function Purpose Basic settings

tFileRowCount counts the number of rows in a file. tFileRowCount opens a file and reads it row by row in order to determine the number of rows inside. File Name Name and path of the file to be processed and/or the variable to be used. See also: How to define variables from the Component view in the Talend Open Studio User Guide. Row separator Ignore empty rows Encoding String (ex: \non Unix) to distinguish rows in the output file. Select this checkbox to ignore the empty rows while the component is counting the rows in the file. Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Advanced settings Usage Global Variables

tStatCatcher Statistics

tFileRowCount is a standalone component, it must be used with a OnSubjobOk connection to tJava. Number of counted lines: Returns the number of rows in a file. This is available as a Flow variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

1150

Talend Open Studio Components

File components
tFileRowCount

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

1151

File components
tFileTouch

tFileTouch
tFileTouch properties
Component Family File/Management

Function Purpose Basic settings

tFileTouch creates an empty file. This component creates an empty file, and creates the destination directory if it does not exist. File Name Path and name of the file to be created and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Create directory if not exists This check box is selected by default. It creates a directory to hold the output table if it does not exist. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Advanced settings Usage Connections

tStatCatcher Statistics

This component can be used as a standalone component. Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Related scenario
No scenario is available for this component yet.

1152

Talend Open Studio Components

File components
tFileUnarchive

tFileUnarchive
tFileUnarchive Properties
Component family File/Management

Function Purpose Basic settings

Decompresses the archive file provided as parameter and put it in the extraction directory. Unarchives a file of any format (zip, rar...) that is mostlikely to be processed. Archive file Extraction Directory Java only File path to the archive. Folder where the unzipped file(s) will be put.

Use archive name as Select this check box to create a folder named as root directory the archive, if it does not exist, under the specified directory and extract the zipped file(s) to that folder. Use Command line tools Check the integrity before unzip Extract file paths Need a password Select this check box to use another unarchiving tool than the one provided by default in the Perl package. Select this check box to run an integrity check before unzipping the archive. Select this check box to reproduce the file path structure zipped in the archive. Select this check box and provide the correct password if the archive to be unzipped is password protected. Note that the encrypted archive must be one created by the tFileArchive component; otherwise you will see error messages or get nothing extracted even if no error message is displayed. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Perl only

Java only Java only Java only

Advanced settings

tStatCatcher Statistics

Usage

This component can be used as a standalone component but it can also be used within a Job as a Start component using an Iterate link.

Talend Open Studio Components

1153

File components
tFileUnarchive

Global Variables

Current File: Retrieves the name of the decompressed archive file. This is available as a Flow variable. Returns a string. Current File Path: Retrieves the path to the decompressed archive file.This is available as a Flow variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate. Trigger: Run if; On Subjob Ok; On Subjob Error; On component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Related scenario
For tFileUnarchive related scenario, see tFileCompare on page 1036.

1154

Talend Open Studio Components

File components
tGPGDecrypt

tGPGDecrypt
tGPGDecrypt Properties
Component family File/Management

Function Purpose

Decrypts a GnuPG-encrypted file and saves the decrypted file in the specified target directory. This component calls the gpg -d command to decrypt a GnuPG-encrypted file and saves the decrypted file in the specified directory. Input encrypted file Output decrypted file GPG binary path Perl only Secret key Passphrase Java only No TTY Terminal File path to the encrypted file. File path to the output decrypted file. File path to the GPG command. Enter your secret key. Enter the passphrase used in encrypting the specified input file. Select this check box to speficy that no TTY terminal is used by adding the --no-tty option to the decryption command. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Basic settings

Advanced settings

tStatCatcher Statistics

Usage Limitation

This component can be used as a standalone component. n/a

Scenario: Decrypt a GnuPG-encrypted file and display its content


The following scenario describes a three-component Job that decrypts a GnuPG-encrypted file and displays the content of the decrypted file on the Run console.

Talend Open Studio Components

1155

File components
tGPGDecrypt

Drop a tGPGDecrypt component, a tFileInputDelimited component, and a tLogRow component from the Palette to the design workspace. Connect the tGPGDecrypt component to the tFileInputDelimited component using a Trigger > OnSubjobOk link, and connect the tFileInputDelimited component to the tLogRow component using a Row > Main link. Double-click the tGPGDecrypt to open its Component view and set its properties:

In the Input encrypted file field, browse to the file to be decrypted. In the Output decrypted file field, enter the path to the decrypted file. In the GPG binary path field, browse to the GPG command file. In the Passphrase field, enter the passphrase used when encrypting the input file. Double-click the tFileInputDelimited component to open its Component view and set its properties:

1156

Talend Open Studio Components

File components
tGPGDecrypt

Use the Built-In property type for this scenario. In the File name/Stream field, define the path to the decrypted file, which is the output path you have defined in the tGPGDecrypt component. In the Header, Footer and Limit fields, define respectively the number of rows to be skipped in the beginning of the file, at the end of the file and the number of rows to be processed. Use a Built-In schema. This means that it is available for this Job only. Click Edit schema and edit the schema for the component. Click twice the [+] button to add two columns that you will call idState and labelState. Click OK to validate your changes and close the editor.

Double-click the tLogRow component and set its properties:

Talend Open Studio Components

1157

File components
tGPGDecrypt

Use a Built-In schema for this scenario. In the Mode area, define the console display mode according to your preference. In this scenario, select Table (print values in cells of a table). Save your Job and press F6 to run it.

The specified file is decrypted and the defined number of rows of the decrypted file are printed on the Run console.

1158

Talend Open Studio Components

File components
tPivotToColumnsDelimited

tPivotToColumnsDelimited
tPivotToColumnsDelimited Properties
Component family File/Output

Function Purpose Basic settings

tPivotToColumnsDelimited outputs data based on an aggregation operation carried out on a pivot column. tPivotToColumnsDelimited is used to fine-tune the selection of data to output Pivot column Aggregation column Aggregation function Group by Select the column from the incoming flow that will be used as pivot for the aggregation operation. Select the column from the incoming flow that contains the data to be aggregated. Select the function to be used in case several values are available for the pivot column. Define the aggregation sets, the values of which will be used for calculations. Input Column: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different. File Name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Field separator Row separator Character, string or regular expression to separate fields of the output file. String (ex: \non Unix) to distinguish rows in the output file.

Usage Limitation

This component requires an input flow. n/a

Scenario: Using a pivot column to aggregate data


The following scenario describes a Job that aggregates data from a delimited input file, using a defined pivot column.

Talend Open Studio Components

1159

File components
tPivotToColumnsDelimited

Drop the following component from the Palette to the design workspace: tFileInputDelimited, tPivotToColumnsDelimited. The file to use as input file is made of 3 columns, including: ID, Question and the corresponding Answer

On your design workspace, select the tFileInputDelimited component Define the basic settings, on the Component view.

Browse to the input file to fill out the File Name field. Define the Row and Field separators, in this example, respectively: carriage return and semi-colon As the file contains a header line, define it also. Set the schema describing the three columns: ID, Questions, Answers. Then select the tPivotToColumnsDelimited and set its properties on the Basic Settings tab of the Component view.

1160

Talend Open Studio Components

File components
tPivotToColumnsDelimited

In the Pivot column field, select the pivot column from the input schema. this is often the column presenting most duplicates (pivot aggregation values). In the Aggregation column field, select the column from the input schema that should gets aggregated. In the Aggregation function field, select the function to be used in case duplicates are found out. In the Group by table, add an Input column, that will be used to group by the aggregation column. In the File Name field, browse to the output file path. And on the Row and Field separator fields, set the separators for the aggregated output rows and data. Then, press F6 to execute the Job. The output file shows the newly aggregated data.

Talend Open Studio Components

1161

File components
tPivotToColumnsDelimited

1162

Talend Open Studio Components

Internet components
This chapter details the main components which belong to the Internet family in the Talend Open Studio Palette. The Internet family comprises all of the components which help you to access information via the Internet, through various means including Web services, RSS flows, SCP, MOM, Emails, FTP etc.

Internet components
tFileFetch

tFileFetch
tFileFetch properties
Component family Internet

Function Purpose Basic settings

tFileFetch retrieves a file via a defined protocol tFileFetch allows you to retrieve file data according to the protocol which is in place. Protocol Select the protocol you want to use from the list and fill in the corresponding fields: http, https, ftp, smb. The properties differ slightly depending on the type of protocol selected. The additional fields are defined in this table, after the basic settings. URI Use cache to save resource Type in the URI of the site from which the file is to be fetched. Select this check box to save the data in the cache. This option allows you to process the file data flow (in streaming mode) without saving it on your drive. This is faster and improves performance.

smb smb

Domain Username and Password Destination Directory Destination Filename

Enter the Microsoft server domain name Enter the authentication information required to access the server. Browse to the destination folder where the file fetched is to be placed. Enter a new name for the file fetched. This check box is selected by default. It allows you to reproduce the URI directory path. To save the file at the root of your destination directory, clear the check box. Select this check box if you want to add one or more HTTP request headers as fetch conditions. In the Headers table, enter the name(s) of the HTTP header parameter(s) in the Headers field and the corresponding value(s) in the Value field. This check box is selected by default. It allows you to use the POST method. In the Parameters table, enter the name of the variable(s) in the Name field and the corresponding value in the Value field. Clear the check box if you want to use the GET method.

http, https, ftp

Create full path according to URI

http, https

Add header

http, https

POST method

1164

Talend Open Studio Components

Internet components
tFileFetch

http, https, ftp http, https, ftp http, https, ftp

Die on error Read Cookie Save Cookie

Clear this check box to skip the rows in error and to complete the process for the error free rows Select this check box for tFileFetch to load a web authentication cookie. Select this check box to save the web page authentication cookie. This means you will not have to log on to the same web site in the future. Click [...] and browse to where you want to save the cookie in your directory, or to where the cookie is already saved. Select this check box to collect the log data at each component level. Enter the number of seconds after which the protocol connection should close. Select this check box to print the server response in the console. Select this check box to uload one or more files to the server. In the Name field, enter the name of the file you want to upload and in the File field, indicate the path.

http, https, ftp

Cookie directory

Advanced settings http, https http, https http, https

tStatCatcher Statistics Timeout Print response to console Upload file

http, https, ftp

Enable proxy server Select this check box if you are connecting via a proxy and complete the fields which follow with the relevant information. Enable NTLM Credentials Select this check box if you are using an NTLM authentication protocol. Domain: The client domain name. Host: The clients IP address.

http, https

http, https

Need authentication Select this check box and enter the username and password in the relevant fields, if they are required to access the protocol. Support redirection Select this check box to repeat the redirection request until redirection is successful and the file can be retrieved.

http, https

Usage

This component is generally used as a start component to feed the input flow of a Job and is often connected to the Job using an OnSubjobOk or OnComponentOk link, depending on the context. n/a

Limitation

Scenario 1: Fetching data through HTTP


This scenario describes a three-component Job which retrieves data from an HTTP website and select data that will be stored in a delimited file.

Talend Open Studio Components

1165

Internet components
tFileFetch

Drop a tFileFetch, a tFileInputRegex and a tFileOutputDelimited onto your design workspace and connect them as preceding. In the tFileFetch Basic settings panel, select the protocol you want to use from the list. Here, use the HTTP protocol. Type in the URI where the file to be fetched can be retrieved from. In the Destination directory field, browse to the folder where the fetched file is to be stored. In the Filename field, type in a new name for the file if you want it to be changed. In this example, filefetch.txt. If needed, select the Add header check box and define one or more HTTP request headers as fetch conditions. For example, to fetch the file only if it has been modified since 19:43:31 GMT, October 29, 1994, fill in the Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31 GMT" respectivley in the Headers table. For details about HTTP request header definitions, see Header Field Definitions. Select the tFileInputRegex, set the File name so that it corresponds to the file fetched earlier. Using a regular expression, in the Regex field, select the relevant data from the fetched file. In this example: <td(?: class="leftalign")?> \s* (t\w+) \s* </td>
Ensure that you use the correct Regex syntax, depending on the generation language in use as the syntax is different in Java and Perl. Enter the Regex between double or single quotes accordingly.

Define the header, footer and limit if need be. In this case, ignore these fields. Define also the schema describing the flow to be passed on to the final output. The schema should be automatically propagated to the final output, but to be sure, check the schema in the Basic settings panel of the tFileOutputDelimited component. Then press F6 to run the Job.

Scenario 2: Reusing stored cookie to fetch files through HTTP


This scenario describes a two-component Job which logs in a given HTTP website and then using cookie stored in a user-defined local directory, fetches data from this website.

1166

Talend Open Studio Components

Internet components
tFileFetch

Drop two tFileFetch onto your design workspace and connect them using the OnSubjobOk link. Double click tFileFetch_1 to open its component view.

Talend Open Studio Components

1167

Internet components
tFileFetch

In the Procotol field, select the protocol you want to use from the list. Here, we use the HTTP protocol. In the URI field, type in the URI through which you can log in the website and fetch the web page accordingly. In this example, the URI is http://www.codeproject.com/script/Membership/LogOn.aspx?rp=h ttp%3a%2f%2fwww.codeproject.com%2fKB%2fcross-platform%2fjava csharp.aspx&download=true. In the Destination directory field, browse to the folder where the fetched file is to be stored. This folder will be created on the fly if it does not exist. In this example, type in C:/Logpage. In the Destination Filename field, type in a new name for the file if you want it to be changed. In this example, webpage.html. Under the Parameters table, click the plus button to add two rows. In the Name column of the Parameters table, type in a new name respectively for the two rows. In this example, they are Email and Password, which are required by the website you are logging in. In the Value column, type in the authentication information. Select the Save cookie check box to activate the Cookie directory field. In the Cookie directory field, browse to the folder where you want to store cookie file and type in a name for the cookie to be saved. This folder must exist already. In this example, the directory is C:/temp/Cookie. Double click tFileFetch_2 to open its Component view.

In the Procotol list, select http. In the URI field, type in the address from which you fetch the files of your interest. In this example, the address is http://www.codeproject.com/KB/java/RemoteShell/RemoteShell.z ip.

1168

Talend Open Studio Components

Internet components
tFileFetch

In the Destination directory field, type in the directory or browse to the folder where you want to store the fetched files. This folder can be automatically created if it does not exist yet during the execution process. In this example, type in C:/JavaProject. In the Destination Filename field, type in a new name for the file if you want it to be changed. In this example, RemoteShell.zip. Clear the Post method check box to deactivate the Parameter table. Select the Read cookie check box to activate the Cookie directory field. In the Cookie directory field, type in the directory or browse to the cookie file you have saved and need to use. In this example, the directory is C:/temp/Cookie. Then press F6 to run the Job. Check each folder you have used to store the fetched files.

Related scenario: Reading the data from a remote file in streaming mode
For an example of transferring data in streaming mode, see Related scenario: Reading the data from a remote file in streaming mode on page 1169

Talend Open Studio Components

1169

Internet components
tFileInputJSON

tFileInputJSON
tFileInputJSON belongs to two different component families: Internet and File. For further information, see tFileInputJSON on page 1170.

1170

Talend Open Studio Components

Internet components
tFTPConnection

tFTPConnection
tFTPConnection properties
Component family Internet/FTP

Function Purpose Basic settings

tFTPConnection opens an FTP connection in order that a transaction may be carried out. tFTPConnection allows you to open an FTP connection to transfer files in a single transaction. Property type A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Host Port Username and Password SFTP Support The FTP server IP address. The FTP server listening port number. FTP user authentication data. When you select this check box, the Authentication method appears. It offers two means of authentication: Public key: Enter the access path to the public key. Password: Enter the password. Select this check box to connect to an FTP server via an FTPS connection. Two fields appear: Keystore file: Enter the access path to the keystore file (password protected file containing several keys and certificates). Keystore Password: Enter your keystore password. Select the mode: Active or Passive

FTPS Support

Connect mode Usage Limitation

This component is typically used as a single-component sub-job. It is used along with other FTP components. n/a

Talend Open Studio Components

1171

Internet components
tFTPConnection

Related scenarios
For a related scenario, see Scenario: Putting files on a remote FTP server on page 1187. For a related scenario, see Scenario: Iterating on a remote directory on page 1178. For a related scenario using a different protocol, see Scenario: Getting files from a remote SCP server on page 1243.

1172

Talend Open Studio Components

Internet components
tFTPDelete

tFTPDelete
tFTPDelete properties
Component family Internet/FTP

Function Purpose Basic settings

This component deletes specified files via an FTP connection. tFTPDelete deletes files on a remote FTP server. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Host Port Username and Password Remote directory SFTPSupport/ Authentication method FTP IP address The FTP server listening port number. FTP user authentication data. Source directory where the files to be deleted are located. Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. File name or path to the files to be deleted.

Files Usage Limitation

This component is typically used as a single-component sub-job but can also be used as an output or end object. n/a

Talend Open Studio Components

1173

Internet components
tFTPDelete

Related scenario
For tFTPDelete related scenario, see Scenario: Putting files on a remote FTP server on page 1187. For tFTPDelete related scenario using a different protocol, see Scenario: Getting files from a remote SCP server on page 1243.

1174

Talend Open Studio Components

Internet components
tFTPFileExist

tFTPFileExist
tFTPFileExist properties
Component family Internet/FTP

Function Purpose Basic settings

tFTPFileExist checks if a file exists on an FTP server. tFTPFileExist allows you to check if a file exists on an FTP server. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing Select this check box and in the Component List connection/Compon click the relevant connection component to reuse ent List the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. In this case, make sure that the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Host Port Username and Password (or Private key) Remote directory File Name FTP IP address. The FTP server listening port number. User authentication information.

Path to the remote directory. Name of the file you want to check exists.

Talend Open Studio Components

1175

Internet components
tFTPFileExist

SFTPSupport/ Authentication method

Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. Select the SFTP connection mode you want to use: Active: You determine the connection port to use to allow data transfer. Passive: the FTP server determines the connection port to use to allow data transfer. Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you want to use a proxy. Then, set the Host, Port, User and Password proxy fields. Select this check box to ignore library closing errors or FTP closing errors. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Connection Mode

Encoding Type

Advanced settings

Use Socks Proxy

Ignore Failure At Quit (FTP) tStatCatcher Statistics Usage

This component is typically used as a single-component sub-job but can also be used with other components.

Related scenario
For tFTPFileExist related scenario, see Scenario: Putting files on a remote FTP server on page 1187. For tFTPFileExist related scenario using a different protocol, see Scenario: Getting files from a remote SCP server on page 1243.

1176

Talend Open Studio Components

Internet components
tFTPFileList

tFTPFileList
tFTPFileList properties
Component family Internet/FTP

Function Objective

tFTPFileList iterates on files and/or folders of a given directory on a remote host. tFTPFileList retrieves files and /or folders based on a defined filemask pattern and iterates on each of them by connecting to a remote directory via an FTP protocol. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing Select this check box and in the Component List connection/Compon click the relevant connection component to reuse ent List the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. For more information about Dynamic settings, see your studio user guide. Host Port Username and Password (or Private key) Remote directory FTP IP address. Listening port number of the FTP server. User authentication information.

Basic settings

Path to the remote directory.

Talend Open Studio Components

1177

Internet components
tFTPFileList

File detail

Select this check box if you want to display the details of each of the files or folders on the remote host. These informative details include: type of rights on the file/folder, name of the author, name of the group of users that have a read-write rights, file size and date of last modification. Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. Click the plus button to add the lines you want to use as filters: Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions. Select the SFTP connection mode you want to use: Active: You determine the connection port to be used to allow data transfer. Passive: the FTP server determines the connection port to use to allow data transfer.

SFTPSupport/ Authentication method

Files

Connect Mode

Usage

This component is typically used as a single-component sub-job but can also be used with other components.

Scenario: Iterating on a remote directory


The following Java scenario describes a three-component Job that connects to an FTP server, lists files held in a remote directory based on a filemask and finally recuperates and saves the files in a defined local directory. Drop the following components from the Palette to the design workspace: tFTPConnection, tFTPFileList and tFTPGet.

Link tFTPConnection to tFTPFileList using an OnSubjobOk connection and then tFTPFileList to tFTPGet using an Iterate connection. Double-click tFTPConnection to display its Basic settings view and define the component properties.
1178 Talend Open Studio Components

Internet components
tFTPFileList

In the Host field, enter the IP address of the FTP server. In the Port field, enter the listening port number. In the Username and Password fields, enter your authentication information for the FTP server. In the Connect Mode list, select the FTP connection mode you want to use, Passive in this example. Double-click tFTPFileList to open its Basic settings view and define the component properties.

Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in. In the Remote directory field, enter the relative path of the directory that holds the files to be listed. In the Filemask field, click the plus button to add one line and then define a file mask to filter the data to be retrieved. You can use special characters if need be. In this example, we want only to recuperate delimited files (*csv).

Talend Open Studio Components

1179

Internet components
tFTPFileList

In the Connect Mode list, select the FTP server connection mode you want to use, Active in this example. Double-click tFTPGet to display its Basic settings view and define the components properties.

Select the Use an existing connection check box and in the Component list, click the relevant FTP connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled in. In the Local directory field, enter the relative path for the output local directory where you want to write the recuperated files. In the Remote directory field, enter the relative path of the remote directory that holds the file to be recuperated. In the Transfer Mode list, select the FTP transfer mode you want to use, ascii in this example. In the Overwrite file field, select an option for you want to use for the transferred files. In the Files area, click the plus button to add a line in the Filemask list, then click in the added line and pressCtrl+Space to access the variable list. In the list, select the global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) to process all files in the remote directory. In the Connect Mode list, select the connection mode to the FTP server you want to use. Save your Job and press F6 to execute it.

1180

Talend Open Studio Components

Internet components
tFTPFileList

All .csv files held in the remote directory on the FTP server are listed in the defined directory, as defined in the filemask. Then the files are retrieved and saved in the defined local output directory.

Talend Open Studio Components

1181

Internet components
tFTPFileProperties

tFTPFileProperties
tFTPFileProperties Properties
Component family Internet

Function Purpose

tFTPFileProperties iterates on files and/or folders of a given directory on a remote host. tFTPFileProperties retrieves files and /or folders based on a defined filemask pattern and iterates on each of them by connecting to a remote directory via an FTP protocol. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Edit schema Host Port Username Password Remote directory File The number of the read-only lines is different between Java and Perl. FTP IP address Listening port number of the FTP server. FTP user name. FTP password. Path to the source directory where the files can be fetched. Name or path to the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio User Guide.

Basic settings

1182

Talend Open Studio Components

Internet components
tFTPFileProperties

SFTP Support and Authentication method

Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. If you do not select the check box, choose the connection mode you want to use: Active: You determine the connection port to use to allow data transfer. Passive: the FTP server determines the connection port to use to allow data transfer.

Encoding

Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to check the of the downloaded files MD5. Select this check box if you want to use a proxy. Then, set the Host, Port, User and Password proxy fields.

Calculate MD5 Hash Advanced settings Use Socks Proxy

Ignore Failure At Quit Select this check box to ignore library closing errors (FTP) or FTP closing errors. tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component can be used as standalone component. n/a

Related scenario
For a related scenario, see Scenario: Displaying the properties of a processed file on page 1148

Talend Open Studio Components

1183

Internet components
tFTPGet

tFTPGet
tFTPGet properties
Component family Internet/FTP

Function Purpose Basic settings

This component retrieves specified files via an FTP connection. tFTPGet retrieves selected files from a defined remote FTP directory and cop them to a local directory. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing Select this check box and then choose the connection/Compon appropriate connection component from the ent List Component list to reuse its connection parameters. Host Port Username Password Local directory Remote directory Transfer mode Overwrite file FTP IP address. Listening port number of the FTP server. FTP user name. FTP password. Path to where the file is to be saved locally. Path to source directory where the files can be fetched. Different FTP transfer modes. List of file transfer options. Append: Select this check box to append the data at the end of the file in order to avoid overwriting data.

1184

Talend Open Studio Components

Internet components
tFTPGet

SFTP Support

When you select this check box, the Overwrite file and Authentication method appear. Overwrite file: Offers three options: Overwrite: Overwrite the existing file. Resume: Resume downloading the file from the point of interruption. Append: Add data to the end of the file without overwriting data. Authentication Offers two means of authentication: Public key: Enter the access path to the public key. Password: Enter the password. Select this check box to connect to an FTP server via an FTPS connection. Two fields appear: Keystore file: Enter the access path to the keystore file (password protected file containing several keys and certificates). Keystore Password: Enter your keystore password. File names or paths to the files to be transferred. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to gather the job processing metadata at a job level as well as at each component level. Select this check box to display in the Console the list of files downloaded.

FTPS Support

Files Die on error

Advanced settings

tStatCatcher Statistics Print message

Usage Limitation

This component is typically used as a single-component sub-job but can also be used as output or end object. n/a

Related scenario
For an tFTPGet related scenario, see Scenario: Putting files on a remote FTP server on page 1187. For an tFTPGet related scenario, see Scenario: Iterating on a remote directory on page 1178. For an tFTPGet related scenario using a different protocol, see Scenario: Getting files from a remote SCP server on page 1243.

Talend Open Studio Components

1185

Internet components
tFTPPut

tFTPPut
tFTPPut properties
Component family Internet/FTP

Function Purpose Basic settings

This component copies selected files via an FTP connection. tFTPPut copies selected files from a defined local directory to a destination remote FTP directory. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing A connection needs to be open to allow the loop connection/Compon to check for FTP data on the defined DB. ent List When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. For more information about Dynamic settings, see your studio user guide. Host Port Username Password Local directory Remote directory Transfer mode Overwrite file or Append FTP IP address. FTP server listening port number. FTP user name. FTP password. Path to the source location of the file(s). Path to the destination directory of the file(s). Different FTP transfer modes. List of available options for the transferred file

1186

Talend Open Studio Components

Internet components
tFTPPut

SFTPSupport/ Authentication method

Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. Click the [+] button to add a new line, then fill in the columns. Filemask: file names or path to the files to be transferred. New name: name to give the FTP file after the transfer. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Files

Die on error

Advanced settings

tStatCatcher Statistics

Usage Limitation

This component is typically used as a single-component sub-job but can also be used as output component. n/a

Scenario: Putting files on a remote FTP server


This two-component Job allows you to open a connection to a remote FTP server in order to put specific files on the remote server in one transaction. Drop tFTPConnection and tFTPPut from the Palette onto the design workspace. tFTPConnection allows you to perform all operations in one transaction. Connect the two components together using an OnSubJobOK link.

Talend Open Studio Components

1187

Internet components
tFTPPut

Double-click tFTPConnection to display its Basic settings view and define its properties.

In the Host field, enter the server IP address. In the Port field, enter the listening port number. In the Username and Password fields, enter your login and password for the remote server. From the Connect Mode list, select the FTP connection mode you want to use, Active in this example. In the design workspace, double-click tFTPPut to display its Basic settings view and define its properties.

1188

Talend Open Studio Components

Internet components
tFTPPut

Select the Use an existing connection check box and then select tFTPConnection_1 from the Component List. The connection information is automatically filled in. In the Local directory field, enter the path to the local directory containing the files, if all your files are in the same directory. If the files are in different directories, enter the path for each file in the Filemask column of the Files table. In the Remote directory field, enter the path to the destination directory on the remote server. From the Transfer mode list, select the transfer mode to be used. From the Overwrite file list, select an option for the transferred file. In the Files table, click twice the plus button to add two lines to the Filemask column and then fill in the filemasks of all files to be copied onto the remote directory. Save you Job and click F6 to execute it. The files specified in the Filemask column are copied to the remote server.

Talend Open Studio Components

1189

Internet components
tFTPRename

tFTPRename
tFTPRename Properties
Component Family Internet/FTP

Function Purpose Basic settings

tFTPRename renames the selected files via an FTP connection. tFTPRename renames files selected from a local directory towards a distant FTP directory. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing Select this check box and in the Component List connection/Compon click the relevant connection component to reuse ent List the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. For more information about Dynamic settings, see your studio user guide. Host Port Username Password Remote directory Overwrite file FTP IP address. FTP server listening port number. Connection login to the FTP server. Connection password to the FTP server. Path to the remote directory. List of available options for the transferred file. Append: Select this check box to write the data at the end of the record, to not delete it.

1190

Talend Open Studio Components

Internet components
tFTPRename

SFTPSupport/ Authentication method

Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. Click the [+] button to add the lines you want to use as filters: Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions. New name: name to give to the FTP file after the transfer. Select the SFTP connection mode you want to use: Active: You determine the connection port to use to allow data transfer. Passive: the FTP server determines the connection port to use to allow data transfer. Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling. This check box is selected by default. Clear the check box to skip the row in error and complete the process for error-free rows. Select this check box if you want to use a proxy. Then, set the Host, Port, User and Password proxy fields. Select this check box to ignore library closing errors or FTP closing errors. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Files

Connection Mode

Encoding type

Die on error

Advanced settings

Use Socks Proxy

Ignore Failure At Quit (FTP) tStatCatcher Statistics Usage Limitation

This component is generally used as a subjob with one component, but it can also be used as an output or end component.. n/a

Related scenario
For a related scenario, see Scenario: Putting files on a remote FTP server on page 1187 .

Talend Open Studio Components

1191

Internet components
tFTPTruncate

tFTPTruncate
tFTPTruncate properties
Component family Internet/FTP

Function Objective Basic settings

tFTPTruncate truncates the selected files via an FTP connection. tFTPTruncate truncates the selected files of a defined local directory via a distant FTP directory. Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Use an existing Select this check box and in the Component List connection/Compon click the relevant connection component to reuse ent List the connection details you already defined. When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can use Dynamic settings to share the intended connection. For more information about Dynamic settings, see your studio user guide. Host Port Username and Password (or Private key) Remote directory FTP IP address. Listening port number of the FTP server. User authentication information.

Path to the remote directory.

1192

Talend Open Studio Components

Internet components
tFTPTruncate

SFTPSupport/ Authentication method

Select this check box and then in the Authentication method list, select the SFTP authentication method: Password: Type in the password required in the relevant field. Public key: Type in the private key or click the three dot button next to the Private key field to browse to it. If you select Public Key as the SFTP authentication method, make sure that the key is added to the agent or that no passphrase (secret phrase) is required. Click the plus button to add the lines you want to use as filters: Filemask: enter the filename or filemask using wildcharacters (*) or regular expressions. Select the SFTP connection mode you want to use: Active: You determine the connection port to use to allow data transfer. Passive: the FTP server determines the connection port to use to allow data transfer. Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box if you want to use a proxy. Then, set the Host, Port, User and Password proxy fields. Select this check box to ignore library closing errors or FTP closing errors. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Files

Connection Mode

Encoding type

Advanced settings

Use Socks Proxy

Ignore Failure At Quit (FTP) tStatCatcher Statistics Usage

This component is typically used as a single-component sub-job but can also be used with other components.

Related scenario
For a related scenario, see Scenario: Putting files on a remote FTP server on page 1187.

Talend Open Studio Components

1193

Internet components
tHttpRequest

tHttpRequest
tHttpRequest properties
Component family Internet

Function Purpose Basic settings

This component sends an HTTP request to the server end and gets the corresponding response information from the server end. The tHttpRequest component allows you to send an HTTP request to the server and output the response information locally. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs. Related topic: How to set a repository schema, in the Talend Open Studio User Guide. Sync columns URI Click this button to retrieve the schema from the preceding component. Type in the Uniform Resource Identifier (URI) that identifies the data resource on the server. A URI is similar to a URL, but more general. Select an HTTP method to define the action to be performed: Post: Sends data (e.g. HTML form data) to the server end. Get: Retrieves data from the server end. Select this check box to save the HTTP response to a local file. You can either type in the file path in the input field or click the three-dot button to browse to the file path. Type in the name-value pair(s) for HTTP headers to define the parameters of the requested HTTP operation. Key: Fill in the name of the header field of an HTTP header. Value: Fill in the content of the header field of an HTTP header. For more information about definition of HTTP headers, please refer to: en.wikipedia.org/wiki/List_of_HTTP_headers.

Method

Write response content to file

Headers

1194

Talend Open Studio Components

Internet components
tHttpRequest

Need authentication

Select this check box to fill in a user name and a password in the corresponding fields if authentication is needed: user: Fill in the user name for the authentication. password: Fill in the password for the authentication.

Advanced settings Usage

tStatCatcher Statistics Select this check box to gather the Job processing metadata at a Job level and at each component level. This component can be used in sending HTTP requests to server and saving the response information. This component can be used as a standalone component. N/A

Limitation

Scenario: Sending a HTTP request to the server and saving the response information to a local file
This java scenario describes a two-component Job that uses the GET method to retrieve information from the server end and writes the response to a local file as well as to the console. Drop the following components from the Palette onto the design workspace: tHttpRequest and tLogRow.

Connect the tHttpRequest component to the tLogRow component using a Row > Main connection. Double-click the tHttpRequest component to open its Basic settings view and define the component properties.

Talend Open Studio Components

1195

Internet components
tHttpRequest

Fill in the URI field with http://192.168.0.63:8081/testHttpRequest/build.xml. Note that this URI is for demonstration purpose only and it is not a live address. Select GET from the Method list. Select the Write response content to file check box and fill in the input field on the right with the file path by manual entry, D:/test.txt for this use case. Select the Need authentication check box and fill in the user and password, both tomcat in this use case. Double-click the tLogRow component to open its Basic settings view and select Table in the Mode area. Save your Job and press F6 to execute it. Then the response information from the server is saved and displayed.

1196

Talend Open Studio Components

Internet components
tJMSInput

tJMSInput
tJMSInput properties
Component Family Internet

Function Purpose

tJMSInput creates an interface between a Java application and a Message-Oriented middle ware system. Using a JMS server, tJMSInput makes it possible to have loosely coupled, reliable, and asynchronous communication between different components in a distributed application. Module List Context Provider Select the library to be used from the list. Type in the context URL, for example "com.tibco.tibjms.naming.TibjmsInitialContextFact ory". However, be careful, the syntax can vary according to the JMS server used. Type in the server URL, respecting the syntax, for example "tibjmsnaming://localhost:7222". Type in the JDNI name. If you have to log in, select the check box and type in your login and password. Select the message type, either: Topic or Queue. Type in the message source, exactly as expected by the server; this must include the type and name of the source. e.g.: queue/A or topic/testtopic Note that the field is case-sensitive. Type in the number of seconds before passing to the next message. Type in the maximum number of messages to be processed. Set your filter. Select the processing mode for the messages. Raw Message or Message Content A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. the tJMSInput schema is read-only. It is made of only one column: Message Click the plus button underneath the table to add lines that contains username and password required for user authentication.

Basic settings

Server URL Connection Factory JDNI Name Use Specified User Identity Message Type Message From

Timeout for Next Message (in sec) Maximum Messages Message Selector Expression Processing Mode Schema and Edit Schema

Advanced settings

Properties

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Talend Open Studio Components 1197

Internet components
tJMSInput

Usage Limitation

This component is generally used as an input component. It must be linked to an output component. Make sure the JMS server is launched.

Related scenario
For a related scenario, see Scenario: asynchronous communication via a MOM server on page 1208.

1198

Talend Open Studio Components

Internet components
tJMSOutput

tJMSOutput
tJMSOutput properties
Component Family Internet

Function Purpose

tJMSOutput creates an interface between a Java application and a Message-Oriented middle ware system. Using a JMS server, tJMSOutput makes it possible to have loosely coupled, reliable, and asynchronous communication between different components in a distributed application. Module List Context Provider Select the library to be used from the list. Type in the context URL, for example "com.tibco.tibjms.naming.TibjmsInitialContextFact ory". However, be careful, the syntax can vary according to the JMS server used. Type in the server URL, respecting the syntax, for example "tibjmsnaming://localhost:7222". Type in the JDNI name. If you have to log in, select the check box and type in your login and password. Select the message type, either: Topic or Queue. Type in the message target, as expected by the server. Select the processing mode for the messages. Raw Message or Message Content A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. the tJMSOutput schema is read-only. It is made of one column: Message Select a delivery mode from this list to ensure the quality of data delivery: Not Persistent: This mode allows data loss during the data exchange. Persistent: This mode ensures the integrity of message delivery. Click the plus button underneath the table to add lines that contains username and password required for user authentication.

Basic settings

Server URL Connection Factory JDNI Name Use Specified User Identity Message Type To Processing Mode Schema and Edit Schema

Advanced settings

Delivery Mode

Properties

tStatCatcher Statistics Select this check box to gather the Job processing metadata at a Job level as well as at each component level. Usage This component is generally used as an output component. It must be linked to an input component.

Talend Open Studio Components

1199

Internet components
tJMSOutput

Limitation

Make sure the JMS server is launched.

Related scenario
For a related scenario, see Scenario: asynchronous communication via a MOM server on page 1208.

1200

Talend Open Studio Components

Internet components
tMicrosoftMQInput

tMicrosoftMQInput
tMicrosoftMQInput Properties
Component family Internet/MOM and JMS This component retrieves the first message in a given Microsoft message queue (only support String). This component allows you to fetch messages one by one in the ID sequence of these messages from the Microsoft message queue. Each execution retrieves only one message. PROPERTY Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Host Queue Advanced settings Type in the Host name or IP address of the host server. Enter the queue name you want to retrieve messages from.

Function Purpose

Basic settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is generally used as a start component of a Job or Subjob. It must be linked to an output component. Outgoing links (from one component to another): Row: Main, Iterate Trigger: Run if; On Subjob Ok, On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate; Trigger: Run if, On Subjob Ok, On Component Ok, On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage Connections

Limitation

This component supports only String type.

Talend Open Studio Components

1201

Internet components
tMicrosoftMQInput

Scenario: Writing and fetching queuing messages from Microsoft message queue
This scenario is made of two Jobs. The first Job posts messages on a Microsoft message queue and the second Job fetches the message from the server. In the first Job, a string message is created using a tRowGenerator and put on a Microsoft message queue using a tMicrosoftMQOutput. An intermediary tLogRow component displays the flow being passed.

Drop the three components required for the first Job from the Palette onto the design workspace. Right-click tRowGenerator to open its contextual menu. In this menu, select Row > Main to connect this component to tLogRow using a Main row link. Do the same to connect tLogRow to tMicrosoftMQOutput. Double-click tRowGenerator to open its editor.

In this editor, click the plus button to add three rows into the schema table. In the Column column, type in a new name for each row to rename it. Here, we type in ID, Name and Address. In the Type column, select Integer for the ID row from the drop-down list and leave the other rows as String. In the Functions column, select random for the ID row, getFirstName for the Name row and getUsCity for the Address row. In the Number of Rows for RowGenerator field on the right end of the toolbar, type in 12 to limit the number of rows to be generated.
1202 Talend Open Studio Components

Internet components
tMicrosoftMQInput

Click Ok to validate this editing.


In real case, you may use an input component to load the data of your interest, instead of the tRowGenerator component.

Double click the tMicrosoftMQOutput component to open its Component view.

In the Host field, type in the host address. In this example, it is localhost. In the Queue field, type in the queue name you want to write message in. In this example, name it AddressQueue. In Message column (String Type) field, select Address from the drop-down list to determine the message body to be written. Press F6 to run this Job.

You can see that this queue has been created automatically and that the messages have been written.

Talend Open Studio Components

1203

Internet components
tMicrosoftMQInput

Then set the second Job in order to fetch the first queuing message from the message queue.

Drop tMicrosoftMQInput and tLogRow from the Palette to the design workspace. Connect these two components using a Row > Main link. Double-click the tMicrosoftMQInput to open its Component view.

In the Host field, type in the host name or address. Here, we type in localhost. In the Queue field, type in the queue name from which you want to fetch the message. In this example, it is AddressQueue. Press F6 to run this Job.

The message body Atlanta fetched from the queue is displayed on the console.

1204

Talend Open Studio Components

Internet components
tMicrosoftMQOutput

tMicrosoftMQOutput
tMicrosoftMQOutput Properties
Component family Internet/MOM and JMS This component writes a defined column of given inflow data to Microsoft message queue (only support String type). This component makes it possible to write messages to Microsoft message queue. PROPERTY Either Built-in or Repository. Built-in: No property data stored centrally. Enter properties manually Repository: Select the repository file where properties are stored. The fields that come after are pre-filled in using the fetched data. Host Queue Type in the Host name or the IP address of the host server. Type in the name of the queue which you want write a given message in. This queue can be created automatically on the fly if it does not exist then. Select the column as message to be written to Microsoft message queue. The selected column must be of String type.

Function Purpose Basic settings

Message column

Usage Connections

This component must be linked to an input or intermediary component. Outgoing links (from one component to another): Row: Main, Iterate Trigger: Run if, On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; Iterate; Trigger: Run if, On Subjob Ok, On Subjob Error; On Component Ok, On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The message to be output cannot be null.

Related scenario
For a related scenario, see Scenario: Writing and fetching queuing messages from Microsoft message queue on page 1202.

Talend Open Studio Components

1205

Internet components
tMomCommit

tMomCommit
tMomCommit Properties
This component is closely related to tMomRollback. It usually doesnt make much sense to use these components independently in a transaction.
Component family Internet

Function Purpose

The tMomCommit commits data in the MQ Server. Using a unique connection, this component commits in one go a global transaction instead of doing that on every row or every batch and thus provides gain in performance. Component list MQ Server Close Connection Select the Connection component used in your Job. Select the MOM server to be used from the list. This check box is selected by default. It allows you to close the database connection once the commit is done. Clear this check box to continue to use the selected connection once the component has performed its task. If you want to use a Row > Main connection to link tMomCommit to your Job, your data will be commited row by row. In this case, do not select the Close connection check box or your connection will be closed before the end of your first row commit.

Basic settings

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mom components, especially with tMomRollback component. n/a

Related scenario
For tMomCommit related scenario, see tMysqlConnection on page 559.

1206

Talend Open Studio Components

Internet components
tMomInput

tMomInput
tMomInput Properties
Component family Internet

Function Purpose Basic settings

Fetches a message from a queue on a Message-Oriented middle ware system and passes it on to the next component. tMomInput makes it possible to set up asynchronous communications via a MOM server. Keep listening Select this check box to keep the MOM server listening for and fetching new messages. -For JBoss Messaging server, with this check box selected, the Sleeping time (in sec) field will appear. -For Active MQ server, with this check box selected, the Sleeping time (in sec) field will disappear. Set the frequency by typing in numbers. This field is not available if the MQ Server you selected is WebSphere MQ. MQ Server Select the MOM server to be used from the list. According to the server selected, the parameters required differ slightly. Fill in the Host name or IP address of the MOM server and Port. Connection login to the server you select in the MQ Server list. Connection password to the server you select in the MQ Server list. Type in the message source, exactly as expected by the server; this must include the type and name of the source. e.g.: queue/A or topic/testtopic Note that the field is case-sensitive. This field is not available if the MQ Server you selected is WebSphere MQ. Message Type Select the message type, either: Topic or Queue. This list is not available if the MQ Server you selected is WebSphere MQ. Message Body Type Select the message body type, either: Text , Bytes.or Map

Sleeping time (in sec)

Host/Port Username Password Message From

Talend Open Studio Components

1207

Internet components
tMomInput

Schema and Edit Schema

A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. In the context of tMomInput usage, the schema is read-only. It is comprised of two columns: From and Message. The default value is DC.SVRCONN.

Websphere MQ only

Channel

Queue Manager Message Queue Is using message id to fetch

Fill in the server driver details. Fill in the source of the message. Select this check box to fetch messages according to their IDs.

Commit (delete Select this check box to force a commit after reading message after reading each message from the queue. from the queue) ActiveMQ only Receive number of messages Select this check box to set the number of messages that you will receive on the console. When you want to limit the number of messages to receive, the time limit becomes inactive and the Keep listening/Sleeping time (in sec) fields disappear. Start Server Advanced settings Select this check box to

tStatCatcher Statistics Select this check box to gather the Job processing metadata at a Job level as well as at each component level. This component is generally used as a start component. It must be linked to an output component. Make sure the relevant ActiveMQ, JBoss or Websphere server is launched.

Usage Limitation

Scenario: asynchronous communication via a MOM server


This scenario is made of two Jobs. The first Job posts messages on a JBoss server queue and the second Job fetches the message from the server. In the first Job, a string message is created using a tRowGenerator and put on a JBoss server using a tMomOutput. An intermediary tLogRow component displays the flow being passed.

Drop the three components required for the first Job from the Palette onto the design workspace and right-click to connect them using a Main row link.

1208

Talend Open Studio Components

Internet components
tMomInput

Double-click on tRowGenerator to set the schema to be randomly generated.

Set just one column called message. This is the message to be put on the MOM queue. This column is of String type and is nullable. To produce the data, use a preset function which concatenates randomly chosen ascii characters to form a 6-char string. This function is getAsciiRandomString. (Java version). Click the Preview button to view a random sample of data generated. Set the Number of rows to be generated to 10. Click OK to validate. The tLogRow is only used to display a intermediary state of the data to be handled. In this example, it doesnt require any specific configuration. Then select the tMomOutput component.

In this case, the MQ server to be used is JBoss. In the Host and Port fields, fill in the relevant connection information. Select the Message type from the list. The message can be of Queue or Topic type. In this example, select the Queue type from the list. In the To field, type in the message source information strictly respecting the syntax expected by the server. This should match the Message Type you selected, such as: queue/A.
The message name is case-sensitive, therefore queue/A and Queue/A are different.

Then click Sync Columns to pass on the schema from the preceding component. The schema being read-only, it cannot be changed. The data posted onto the MQ comes from the first schema column encountered. Press F6 to execute the Job and view the data flow being passed on in the console, thanks to the tLogRow component.

Talend Open Studio Components

1209

Internet components
tMomInput

Then set the second Job in order to fetch the queuing messages from the MOM server.

Drop the tMomInput component and a tLogRow from the Palette to the design workspace. Select the tMomInput to set the parameters.

Select the MQ server from the list. In this example, a JBoss messaging server is used. Set the server Host and Port information. Set the Message From and the Message Type to match the source and type expected by the messaging server. The Schema is read-only and is made of two columns: From and Message. Select the Keep listening check box and set the verification frequency to 5 seconds.
When using the Keep Listening option, youll need to kill the Job to end it.

No need to change any default setting from the tLogRow. Save the Job and run it (when launching for the first time or if you killed it on a previous run).

1210

Talend Open Studio Components

Internet components
tMomInput

The messages fetched on the server are displayed on the console.

Talend Open Studio Components

1211

Internet components
tMomMessageIdList

tMomMessageIdList
tMomMessageIdList Properties
Component family Internet

Function Purpose

tMomMessageIdList fetches a message ID list from a queue on a Message-Oriented middleware system and passes it to the next component. tMomMessageIdList makes it possible to iterate on certain message IDs. It is usually used with tMomInput, for more information, see tMomInput Properties on page 1207. MQ Server Select the MOM server to be used from the list. According to the server selected, the parameters required differ slightly. Fill in the Host name or IP address of the MOM server and Port. Channel on the queue. Fill in the server driver details. Source of the message.

Basic settings

Host/Port Websphere Channel Queue Manager Message Queue Usage Limitation

This component is generally used as an input component. Make sure the relevant Websphere server is launched.

Related scenario
For a related scenario, see tMomInput on page 1207.

1212

Talend Open Studio Components

Internet components
tMomOutput

tMomOutput
tMomOutput Properties
Component family Internet

Function Purpose Basic settings

Adds a message to a Message-Oriented middleware system queue in order for it to be fetched asynchronously. tMomOutput makes it possible to set up asynchronous communications via a MOM server. MQ Server Select the MOM server to be used from the list. According to the server selected, the parameters required differ slightly. Fill in the MOM server and Port Host name or IP address. Connection login to the server. Connection password to the server. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. In the context of tMomOutput usage, the schema is read-only but will change according to the incoming schema. Only one-column schema is expected by the server to contain the Messages. Value by default is Channel. Select the message body type, either: Text , Bytes.or Map. Fill in the server driver details. Destination of the message. Select this check box to set messages according to their ids. Type in the message destination, respecting the syntax required by the server; this must include the type and name of the target folder. e.g.: queue/A or topic/testtopic Note that the field is case-sensitive. Select the message type, either: topic or queue. Select the message body type, either: Text , Bytes.or Map. Type in the message destination, respecting the syntax required by the server; this must include the type and name of the target folder. e.g.: queue/A or topic/testtopic Note that the field is case-sensitive. Select the message type, either: topic or queue.

Host/Port Username Password Schema and Edit Schema

Websphere

Channel Message Body Type Queue Manager Message Queue Is using message id to set

JBoss Messaging

To

Message Type Message Body Type ActiveMQ To

Message Type

Talend Open Studio Components

1213

Internet components
tMomOutput

Message Body Type Usage Limitation

Select the message body type, either: Text , Bytes.or Map.

This component must be linked to an input or intermediary component. Make sure the relevant Websphere, JBoss or ActiveMQ server is launched.

Related scenario
For a related scenario, see tMomInput on page 1207.

1214

Talend Open Studio Components

Internet components
tMomRollback

tMomRollback
tMolRollback properties
This component is closely related to tMomCommit component. It usually does not make much sense to use these components independently in a transaction.
Component family Internet

Function Purpose Basic settings

tMomRollback rollbacks data from the MQ Server.. Avoids involuntary commitment of part of a transaction. Component list Close Connection Select the Connection component Used in your Job. Clear this check box to continue to use the selected connection once the component has performed its task.

Advanced settings Usage Limitation

tStatCatcher Statistics Select this check box to collect log data at the component level. This component is to be used along with Mom components, especially with tMomCommit. n/a

Related scenario
For tMomRollback related scenario, see Scenario: Rollback from inserting data in mother/daughter tables of the tMysqlRollback.

Talend Open Studio Components

1215

Internet components
tPOP

tPOP
tPOP properties
Component family Internet

Function Purpose

The tPOP component fetches one or more email messages from a server using the POP3 or IMAP protocol. The tPOP component uses the POP or IMAP protocol to connect to a specific email server. Then it fetches one or more email messages and writes the recovered information in specified files. Parameters in the Advanced settings view allows you to use filters on your selection. Host Port Username and Password IP address of the email server you want to connect to. Port number of the email server. User authentication data for the email server. Username: enter the username you use to access your email box. Password: enter the password you use to access your email box. Enter the path to the file in which you want to store the email messages you retrieve from the email server, or click the three-dot button next to the field to browse to the file. Define the syntax of the names of the files that will hold each of the email messages retrieved from the email server, or press Ctrl+Space to display the list of predefined patterns. By default, all email messages present on the specified server are retrieved. To retrieve only a limited number of these email messages, clear this check box and in the Number of emails to retrieve.field, enter the number of messages you want to retrieve. email messages are retrieved starting from the most recent. Select this check box if you do not want to keep the retrieved email messages on the server. For Gmail servers, this option does not work for the pop3 protocol. Select the imap protocol and ensure that the Gmail account is configured to use imap. From the list, select the protocol to be used to retrieve the email messages from the server. This protocol is the one used by the email server. If you choose the imap protocol, you will be able to select the folder from which you want to retrieve your emails.

Basic settings Java-only field

Output directory

Filename pattern

Retrieve all emails?

Delete emails from server

Java-only field

Choose the protocol

1216

Talend Open Studio Components

Internet components
tPOP

Java-only field

Use SSL

Select this check box if your email server uses this protocol for authentication and communication confidentiality. This option is obligatory for users of Gmail.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Filter Click the plus button to add as many lines as needed to filter email messages and retrieve only a specific selection: Filter item: select one of the following filter types from the list: From: email messages are filtered according to the sender email address. To: email messages are filtered according to the recipient email address. Subject: email messages are filtered according to the message subject matter. Before date: email messages are filtered by the sending or receiving date. All messages before the set date are retrieved. After date: email messages are filtered by the sending or receiving date. All messages after the set date are retrieved. Pattern: press Ctrl+Space to display the list of available values. Select the value to use for each filter.

Java-only field

Java-only field

Filter condition relation

Select the type of logical relation you want to use to combine the specified filters: and: the conditions set by the filters are combined together, the research is more restrictive. or: the conditions set by the filters are independent, the research is large.

Usage Limitation

This component does not handle data flow, it can be used alone. n/a

Scenario: Retrieving a selection of email messages from an email server


This Java scenario is a one-component Job that retrieves a predefined number of email messages from an email server. Drop the tPOP component from the Palette to the design workspace. Double click tPOP to display the Basic settings view and define the component properties. Enter the email server IP address and port number in the corresponding fields. Enter the username and password for your email account in the corresponding fields. In this example, the email server is called Free.
Talend Open Studio Components 1217

Internet components
tPOP

In the Output directory field, enter the path to the output directory manually, or click the three-dot button next to the field and browse to the output directory where the email messages retrieved from the email server are to be stored. In the Filename pattern field, define the syntax you want to use to name the output files that will hold the messages retrieved from the email server, or press Ctrl+Space to display a list of predefined patterns. The syntax used in this example is the following: TalendDate.getDate("yyyyMMdd-hhmmss") + "_" + (counter_tPOP_1 + 1) + ".txt" The output files will be stored as .txt files and are defined by date, time and arrival chronological order. Clear the Retrieve all emails? field and in the Number of emails to retrieve field, enter the number of email messages you want to retrieve, 10 in this example. Select the Delete emails from server check box to delete the email messages from the email server once they are retrieved and stored locally. In the Choose the protocol field, select the protocol type you want to use. This depends on the protocol used by the email server. Certain email suppliers, like Gmail, use both protocols. In this example, the protocol used is pop3. Save your Job and press F6 to execute it.

1218

Talend Open Studio Components

Internet components
tPOP

The tPOP component retrieves the 10 recent messages from the specified email server. In the tPOP directory stored locally, a .txt file is created for each retrieved message. Each file holds the metadata of the email message headings (senders address, recipients address, subject matter) in addition to the message content.

Talend Open Studio Components

1219

Internet components
tREST

tREST
tREST properties
Component family Internet

Function Purpose Basic settings

The tREST component sends HTTP requests to a REpresentational State Transfer (REST) Web service provider and gets responses correspondingly. The tREST component serves as a REST Web service client that sends HTTP requests to a REST Web service provider and gets the responses. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. This component always uses a built-in, read-only schema that contains two columns: - Body: stores the result from the server end. - ERROR_CODE: stores the HTTP status code from the server end when an error occurs during the invocation process. The specific meanings of the errors codes are subject to definitions of your Web service provider. For reference information, visit en.wikipedia.org/wiki/List_of_HTTP_status_codes. Click Edit Schema to view the schema structure. Changing the schema type may result in loss of the schema structure and therefore failure of the component. URL HTTP Method Type in the URL address of the REST Web server to be invoked. From this list, select an HTTP method that describes the desired action. The specific meanings of the HTTP methods are subject to definitions of your Web service provider. Listed below are the generally accepted HTTP method definitions: - GET: retrieves data from the server end based on the given parameters. - POST: creates and uploads data based on the given parameters. - PUT: updates data based on the given parameters, or if the data does not exist, creates it. - DELETE: removes data based on the given parameters. Type in the name-value pair(s) for HTTP headers to define the parameters of the requested HTTP operation. For the specific definitions of HTTP headers, consult your REST Web service provider. For reference information, visit en.wikipedia.org/wiki/List_of_HTTP_headers.

HTTP Headers

1220

Talend Open Studio Components

Internet components
tREST

HTTP Body Advanced settings

Type in the payload to be uploaded to the server end when the POST or PUT action is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata at the Job level as well as at each component level. Use this component as a REST Web service client to communicate with a REST Web service provider. It must be linked to an output component. JRE 1.6 must be running for this component to work properly.

Usage Limitation

Scenario: Creating and retrieving data by invoking REST Web service


This scenario describes a simple Job that invokes a REST Web service to create a new customer record on the server end and then retrieve the customer information. When executed, the Job displays relevant information on the Run console. Drop the following components from the Palette onto the design workspace: two tREST components and two tLogRow components, and label the two tREST components to best describe the actions to perform. Connect each tREST to one tLogRow using a Row > Main connection. Connect the first tREST to the second tREST using a Trigger > OnSubjobOK connection.

Double click the first tREST component to open its Basic settings view.

Talend Open Studio Components

1221

Internet components
tREST

Fill the URL field with the URL of the Web service you are going to invoke. Note that the URL provided in this use case is for demonstration purpose only and is not a live address. From the HTTP Method list, select POST to send an HTTP request for creating a new record. Click the plus button to add a line in the HTTP Headers table, and type in the appropriate name-value key pair, which is subject to definition of your service provider, to indicate the media type of the payload to send to the server end. In this use case, type in Content-Type and application/xml. For reference information about Internet media types, visit www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7. Fill the HTTP Body field with the payload to be uploaded to the server end. In this use case, type in <Customer><name>Steven</name></Customer> to create a record for a new customer named Steven.
If you want to include double quotation marks in your payload, be sure to use a backslash escape character before each of the quotation marks. In this use case, for example, type in <Customer><name>\"Steven\"</name></Customer> if you want to enclose the name Steven in a pair of double quotation marks.

Double click the second tREST component to open its Basic settings view. Fill the URL field with the same URL. From the HTTP Method list, select GET to send an HTTP request for retrieving the existing records. In the Basic settings view of each tLogRow, select the Print component unique name in front of each output row and Print schema column name in front of each value check boxes for better identification of the output flows.

1222

Talend Open Studio Components

Internet components
tREST

Save your Job and press F6 to launch it. The console shows that the first tREST component sends an HTTP request to the server end to create a new customer named Steven, and the second tREST component successfully reads data from the server end, which includes the information of the new customer you just created.

Talend Open Studio Components

1223

Internet components
tRSSInput

tRSSInput
tRSSInput Properties
Component family Internet

Function Purpose Basic settings

tRSSInput reads RSS-Feeds using URLs. tRSSInput makes it possible to keep track of blog entries on websites to gather and organize information for quick and easy access. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the context of tRSSInput usage, the schema is made of four columns: TITLE, DESCRIPTION, PUBDATE, and Link. The parameter titles are read-only while their type and length are not. Enter the URL for the RSS_Feed to read. If selected, tRSSInput reads articles on the RSS_Feed from the date set through the three-dot [...] button next to the date time field. If selected, tRSSInput reads as many articles as the number entered in the max amount field. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows.

RSS URL Read articles from

Max number of articles Die on error

Usage Limitation

This component is generally used as an input component. It requires an output component. n/a.

Scenario: Fetching frequently updated blog entries.


This two-component Java scenario aims at retrieving frequently updated blog entries from a Talend local news RSS feed using the tRSSInput component. Drop the following components from the Palette onto the design workspace: tRSSInput and tLogRow. Right-click to connect them using a Row Main link.

1224

Talend Open Studio Components

Internet components
tRSSInput

In the design workspace, select tRSSInput. Click the Component tab to define the basic settings for tRSSInput.

Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to change the type and length of the schema parameters if necessary. Click OK to close the dialog box.

The scheme for tRSSInput is made up of four columns, TITLE, Description, PUBDATE, and LINK, and it is read-only apart from the type and length of parameters.

In the Basic settings view of tRSSInput, enter the URL for the RSS_Feed to access. In this scenario, tRSSInput links to the Talend RSS_Feed: http://feeds.feedburner.com/Talend. Select/clear the other check boxes as required. In this scenario, we want to display the information about two articles dated from July 20, 2008.

Talend Open Studio Components

1225

Internet components
tRSSInput

In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow properties, see tLogRow properties on page 1305. Save the Job and press F6 to execute it.

The tRSSInput component accessed the RSS feed of Talend website on your behalf and organized the information for you. Two blog entries are displayed on the console. Each entry has its own title, description, publication date, and the corresponding RSS feed URL address. Blogs show the last entry first, and you can scroll down to read earlier entries.

1226

Talend Open Studio Components

Internet components
tRSSOutput

tRSSOutput
tRSSOutput Properties
Component family Internet

Function Purpose Basic settings

tRSSOutput writes RSS_Feed or Atom_Feed XML files. tRSSOutput makes it possible to create XML files that hold RSS or Atom feeds. File name Name or path to the output XML file. Related topic: How to define variables from the Component view of Talend Open Studio User Guide. Select an encoding type from the list, or select Custom and define it manually. This field is compulsory for DB data handling. Select this check box to add the new rows to the end of the file. Select between RSS or ATOM according to the feed you want to generate. The information to be typed in here concerns your entire input data, site etc, rather than a particular item. Title: Enter a meaningful title. Description: Enter a description that you think will describe your content. Publication date: Enter the relevant date. Link: Enter the relevant URL. Feed (in ATOM mode) Title: Enter a meaningful title. Link: Enter the relevant URL. Id: Enter the valid URL corresponding to the Link. Update date: Enter the relevant date . Author name: Enter the relevant name. Optionnal Channel Elements Click the [+] button below the table to add new lines and enter the information relative to the RSS flow metadata: Element Name: name of the metadata. Element Value: content of the metadata. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the context of tRSSInput usage, the schema is made of four columns: TITLE, DESCRIPTION, PUBDATE, and Link. The parameter titles are read-only while their type and length are not.

Encoding

Append Mode Channel (in RSS mode)

Schema and Edit Schema

Talend Open Studio Components

1227

Internet components
tRSSOutput

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component must be linked to an input or intermediary component. n/a

Usage Limitation

Scenario 1: Creating an RSS flow and storing files on an FTP server


In this java scenario we: create an RSS flow for files that you would like to share with other people, and store the complete files on an FTP server. This scenario writes an RSS feed XML file about a Mysql table holding information about books. It adds links to the files stored on an FTP server in case users want to have access to the complete files. Drop the following components from the Palette onto the design workspace: tMysqlInput, tRSSOutput, and tFTPPut. Right-click tMysqlInput and connect it to tRSSOutput using a Row Main link. Right-click tMysqlInput and connect it to tFTPPut using an OnSubjobOk link.

In the design workspace, select tMysqlInput. Click the Component tab to define the basic settings for tMysqlInput.

1228

Talend Open Studio Components

Internet components
tRSSOutput

Set the Property type to Repository and click the three-dots button [...] to select the relevant DB entry from the list. The connection details along with the schema get filled in automatically. In the Table Name field, either type your table name or click the three dots button [...] and select your table name from the list. In this scenario, the Mysql input table is called rss_talend and the schema is made up of four columns, TITLE, Description, PUBDATE, and LINK. In the Query field, enter your DB query paying particular attention to properly sequence the fields in order to match the schema definition, or click Guess Query. In the design workspace, select tRSSOutput. Click the Component view to define the basic settings for tRSSOutput.

Talend Open Studio Components

1229

Internet components
tRSSOutput

In the File name field, use the by default file name and path, or browse to set your own for the output XML file. Select the encoding type on the Encoding Type list. In the Mode area, select RSS. In the Channel panel, enter a title, a description, a publication date, and a link to define your input data as a whole. Select your schema type on the Schema Type list and click Edit Schema to modify the schema if necessary.
You can click Sync Column to retrieve the generated schema from the preceding component.

Save your Job and press F5 to execute this first part.

1230

Talend Open Studio Components

Internet components
tRSSOutput

The tRSSOutput component created an output RSS flow in an XML format for the defined files. To store the complete files on an FTP server: In the design workspace, select FTPPut. Click the Component tab to define the basic settings for tFTPPut.

Talend Open Studio Components

1231

Internet components
tRSSOutput

Enter the host name and the port number in their corresponding fields. Enter your connection details in the corresponding Username and Password fields. Browse to the local directory, or enter it manually in the Local directory field. Enter the details of the remote server directory. Select the transfer mode from the Transfer mode list. On the Files panel, click on the plus button to add new lines and fill in the filemasks of all files to be copied onto the remote directory. In this scenario, the files to be saved on the FTP server are all text files. Save your Job and press F6 to execute it. Files defined in the Filemask are copied on the remote server.

Scenario 2: Creating an RSS flow that contains metadata


This Java scenario describes a two-component Job that creates an RSS flow that holds metadata and then redirects the obtained information in an XML file of the output RSS flow. Drop tRSSInput and tRSSOutput from the Palette to the design workspace. Connect the two components together using a Row Main link.

Click tRSSInput to open its Basic settings view and define the component properties.

Set Schema type to Built-in.


If you have already stored your schema locally in the Repository, set schema type to Repository and simply click the three-dot button next to the field to display a dialog box where you can select the appropriate metadata. For more information about metadata, see Managing Metadata in Talend Open Studio User Guide.

If needed, click the three-dot button next to Edit Schema to open a dialog box where you can check the schema.

1232

Talend Open Studio Components

Internet components
tRSSOutput

The read-only schema of tRSSInput is composed of four columns: TITLE, DESCRIPTION, PUBDATE, and LINK.

Click OK to close the dialog box. In the design workspace, click tRSSOutput to display its Basic settings view and define the component properties.

In the File name field, use the by default file name and path, or browse to set your own for the output XML file. Select the encoding type on the Encoding Type list. In the Mode area, select RSS.
Talend Open Studio Components 1233

Internet components
tRSSOutput

In the Channel panel, enter a title, a description, a publication date and a link to define your input data as a whole. In the Optional Channel Element, define the RSS flow metadata. In this example, the flow has two metadata: copyright, which value is tos, and language which value is en_us. Select your schema type on the Schema Type list and click Edit Schema to modify the schema if necessary.
You can click Sync Column to retrieve the generated schema from the preceding component.

Save your Job and press F6 to execute it.

The defined files are copied in the output XML file and the metadata display under the <channel> node above the information about the RSS flow.

Scenario 3: Creating an ATOM feed XML file


This Java scenario describes a two component Job that generates data and writes them in an ATOM feed XML file.

1234

Talend Open Studio Components

Internet components
tRSSOutput

Drop the following components from the Palette onto the deisgn workspace: tFixedFlowInput of the Misc component group and tRSSOutput of the Internet component group. Right-click tFixedFlowInput and connect it to tRSSOutput using a Row Main link. A popup window displays asking whether you want to pass on the schema of tRSSOutput to tFixedFlowInput, click Yes.

In the design workspace, double-click tFixedFlowInput to display its corresponding Component view and define its basic settings.

Leave the Schema list to Built-in. Click the [...] button next to the Edit schema field to display the schema imported from the output component. In the Number of rows field, leave the default setting to 1 to only generate one line of data. In the Mode area, leave the Use Single Table option selected and fill in the Values table. Note that the Column field of the Values table is filled in by the columns of the schema defined in the component. In the Value field of the Values table, type in the data you want to be sent to the following component. In the design workspace, double-click tRSSOutput to display its corresponding Component view and define its basic settings.

Talend Open Studio Components

1235

Internet components
tRSSOutput

Click the [...] button next to the File Name field to set the output XML file directory and name. In the Mode area, select ATOM to generate an ATOM feed XML file. In the Feed area, enter a title, link, id, update date, author name to define your input data as a whole. Select your schema type on the Schema Type list and click Edit Schema to display and modify it if necessary.

1236

Talend Open Studio Components

Internet components
tRSSOutput

As the ATOM feed format is strict, some default information is required to create the XML file. So, tRSSOutput contains default columns that will contain those information. Those default columns are greyed out to indicate that they must not be modified. If you choose to modify the schema of the component, the ATOM XML file created will not be valid.

Save your Job and press F6 to execute it.

The tRSSOutput component creates an output ATOM flow in an XML format.

Talend Open Studio Components

1237

Internet components
tSCPClose

tSCPClose
tSCPClose Properties
Componant family Internet/SCP

Function Purpose Basic settings Advanced settings

tSCPClose closes a connection to a fully encrypted channel. This component closes a connection to an SCP protocol. Component list If there is more than one connection in the current Job, select tSCPConnection from the list.

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level tSCPClose is generally used as a start component. It requires an output component. n/a

Usage Limitation

Related scenario
This component is closely related to tSCPConnection and tSCPRollback. It is generally used with SCPConnection as it allows you to close a connection for the transaction which is underway. For a related scenario see tMysqlConnection on page 594.

1238

Talend Open Studio Components

Internet components
tSCPConnection

tSCPConnection
tSCPConnection properties
Component family Internet/SCP

Function Purpose Basic settings

tSCPConnection opens an SCP connection for the current transaction. tSCPConnection allows you to open an SCP connection to transfer files in one transaction. Host Port Username Authentication method Password IP address of the SCP server. Number of listening port of the SCP server. User name for the SCP server. SCP authentication method. User password for the SCP server.

Usage Limitation

This component is typically used as a single-component sub-job. It is used along with other SCP components. n/a

Related scenarios
For a related scenario, see Scenario: Putting files on a remote FTP server on page 1187. For a related scenario using a different protocol, see Scenario: Getting files from a remote SCP server on page 1243.

Talend Open Studio Components

1239

Internet components
tSCPDelete

tSCPDelete
tSCPDelete properties
Component family Internet/SCP

Function Purpose Basic settings

This component deletes files from remote hosts over a fully encrypted channel. tSCPDelete allows you to remove a file from the defined SCP server. Host Port Username Authentication method Password Filelist SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. File name or path to the files to be deleted.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPDelete related scenario, see Scenario: Getting files from a remote SCP server on page 1243. For tSCPDelete related scenario using a different protocol, see Scenario: Putting files on a remote FTP server on page 1187.

1240

Talend Open Studio Components

Internet components
tSCPFileExists

tSCPFileExists
tSCPFileExists properties
Component family Internet/SCP

Function Purpose Basic settings

This component checks, over a fully encrypted channel, if a file exists on a remote host. tSCPFileExists allows you to verify the existence of a file on the defined SCP server. Host Port Username Authentication method Password Remote directory Filename SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. File path on the remote directory. Name of the file to check.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPFileExists related scenario, see Scenario: Getting files from a remote SCP server on page 1243. For tSCPFileExists related scenario using a different protocol, see Scenario: Putting files on a remote FTP server on page 1187.

Talend Open Studio Components

1241

Internet components
tSCPFileList

tSCPFileList
tSCPFileList properties
Component family Internet/SCP

Function Purpose Basic settings

This component iterates, over a fully encrypted channel, on files of a given directory on a remote host. tSCPFileList allows you to list files from the defined SCP server. Host Port Username Authentication method Password SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password.

Command separator The character used to separate multiple commands. Filelist Usage Limitation Directory name or path to the directory holding the files to list.

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPFileList related scenario, see Scenario: Getting files from a remote SCP server on page 1243. For tSCPFileList related scenario using a different protocol, see Scenario: Putting files on a remote FTP server on page 1187.

1242

Talend Open Studio Components

Internet components
tSCPGet

tSCPGet
tSCPGet properties
Component family Internet/SCP

Function Purpose Basic settings

This component transfers defined files via an SCP connection over a fully encrypted channel. tSCPGet allows you to copy files from the defined SCP server. Host Port Username Authentication method Password Local directory Overwrite or Append Filelist SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. Path to the destination folder. List of available options for the transferred files. File name or path to the file(s) to copy.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Scenario: Getting files from a remote SCP server


This Java scenario creates a single-component Job which gets the defined file from a remote SCP server. Drop a tSCPGet component from the Palette onto the design workspace. In the design workspace, select tSCPGet and click the Component tab to define its basic settings.

Talend Open Studio Components

1243

Internet components
tSCPGet

Fill in the Host IP address, the listening Port number, and the user name in the corresponding fields. On the Authentication method list, select the appropriate authentication method. Note that the field to follow changes according to the selected authentication method. The authentication form used in this scenario is password. Fill in the local directory details where you want to copy the fetched file. On the Overwrite or Append list, select the action to be carried out. In the Filelist area, click the plus button to add a line in the Source list and fill in the path to the given file on the remote SCP server. In this scenario, the file to copy from the remote SCP server to the local disk is backport. Save the Job and press F6 to execute it. The given file on the remote server is copied on the local disk.

1244

Talend Open Studio Components

Internet components
tSCPPut

tSCPPut
tSCPPut properties
Component family Internet/SCP

Function Purpose Basic settings

This component copies defined files to a remote SCP server over a fully encrypted channel. tSCPPut allows you to copy files to the defined SCP server. Host Port Username Authentication method Password Remote directory Filelist SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. Path. to the destination folder. File name or path to the file(s) to copy.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPPut related scenario, see Scenario: Getting files from a remote SCP server on page 1243. For tSCPut related scenario using a different protocol, see Scenario: Putting files on a remote FTP server on page 1187.

Talend Open Studio Components

1245

Internet components
tSCPRename

tSCPRename
tSCPRename properties
Component family Internet/SCP

Function Purpose Basic settings

This component renames files on a remote SCP server. tSCPRename allows you to rename file(s) on the defined SCP server. Host Port Username Authentication method Password File to rename Rename to SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. Enter the name or path to the file you want to rename. Enter the file new name.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPRename related scenario, see Scenario: Getting files from a remote SCP server on page 1243.

1246

Talend Open Studio Components

Internet components
tSCPTruncate

tSCPTruncate
tSCPRename properties
Component family Internet/SCP

Function Purpose Basic settings

This component removes all the data from a file via an SCP connection. tSCPTruncate allows you to remove data from file(s) on the defined SCP server. Host Port Username Authentication method Password Remote directory Filelist SCP IP address. Listening port number of the SCP server. SCP user name. SCP authentication method. SCP password. Path. to the destination file. File name or path to the file(s) to truncate.

Usage Limitation

This component is typically used as a single-component sub-job but can also be used with other components. n/a

Related scenario
For tSCPTruncate related scenario, see Scenario: Getting files from a remote SCP server on page 1243.

Talend Open Studio Components

1247

Internet components
tSendMail

tSendMail
tSendMail Properties
Component family Internet

Function Purpose Basic settings

tSendMail sends emails and attachments to defined recipients. tSendMail purpose is to notify recipients about a particular state of a Job or possible errors. To From Main recipient email address. Sending server email address.

Show senders name Select this check box if you want the sender name to show in the messages. Cc Bcc Email addresses of secondary recipients of the email message directed to another. Email addresses of secondary recipients of the email message. Recipients listed in the Bcc field receive a copy of the message but are not shown on any other recipient's copy. Heading of the mail. Body message of the email. Press Ctrl+Space to display the list of available variables.

Subject Message

Die if the attachment This check box is selected by default. Clear this file doesnt exist check box if you want the message to be sent even if there are no attachments. Attachments Click the plus button to add as many lines as needed where you can put Filemask or path to the file to be sent along with the mail, if any. Click the plus button to add as many lines as needed where you can type the Key and the corresponding Value of any header information that does not belong to the standard header.

Other Headers

SMTP Host and Port IP address of SMTP server used to send emails. SSL Support STARTTLS Support Importance Need authentication / Username and Password Die on error Select this check box to authenticate the server at the client side via an SSL protocol. Select this check box to authenticate the server at the client side via a STARTTLS protocol. Select in the list the priority level of your messages. Select this check box and enter a username and a password in the corresponding fields if this is necessary to access the service. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows.

1248

Talend Open Studio Components

Internet components
tSendMail

Advanced settings

MIME subtype from Select in the list the structural form for the text of the text MIME type the message. Encoding type tStatCatcher Statistics Select the encoding from the list or select Custom and define it manually. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is typically used as one sub-job but can also be used as output or end object. It can be connected to other components with either Row or Iterate links. Note that two different perl modules are required depending on whether the e-mails have attachments.

Limitation

Scenario: Email on error


This scenario creates a three-component Job which sends an email to defined recipients when an error occurs.

Drop the following components from your Palette to the design workspace: tFileInputDelimited, tFileOutputXML, tSendMail. Define tFileInputdelimited properties. Related topic: tFileInputDelimited on page 1054. Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the tFileOutputXML component and release when the plug symbol shows up. Define tFileOutputXML properties. Drag a Run on Error link from tFileDelimited to tSendMail component. Define the tSendMail component properties:

Talend Open Studio Components

1249

Internet components
tSendMail

Enter the recipient and sender email addresses, as well as the email subject. Enter a message containing the error code produced using the corresponding global variable. Access the list of variables by pressing Ctrl+Space. Add attachments and extra header information if any. Type in the SMTP information.

In this scenario, the file containing data to be transferred to XML output cannot be found. tSendmail runs on this error and sends a notification email to the defined recipient.

1250

Talend Open Studio Components

Internet components
tSetKeystore

tSetKeystore
tSetKeystore properties
Component family Internet

Function Purpose Basic settings

tSetKeystore submits authentication data of a truststore with or without keystore to validation for the SSL connection. This component allows you to set the authentication data type between PKCS 12 and JKS. TrustStore type TrustStore file Select the type of the TrustStore to be used. It may be PKCS 12 or JKS. Type in the path, or browse to the certificate TrustStore file (including filename) that contains the list of certificates that the client trusts. Type in the password used to check the integrity of the TrustStore data. Select this check box to validate the keystore data. Once doing so, you need complete three fields: - KeyStore type: select the type of the keystore to be used. It may be PKCS 12 or JKS. - KeyStore file: type in the path, or browse to the file (including filename) containing the keystore data. - KeyStore password: type in the password for this keystore.

TrustStore password Need Client authentication

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is used standalone. Outgoing links (from one component to another): Trigger: Run if; On Subjob Ok, On Subjob Error, On Component Ok; On Component Error. Incoming links (from one component to another): Trigger: Run if, On Subjob Ok, On Component Ok, On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage Connections

Limitation

n/a.

Scenario: Extracting customer information from a private WSDL file


This scenario describes a three-component Job that connects to a private WSDL file in order to extract customer information.
Talend Open Studio Components 1251

Internet components
tSetKeystore

The WSDL file used in this Job accesses the corresponding web service under the SSL protocol. For this purpose, the most relative code in this file reads as follows : <wsdl:port name="CustomerServiceHttpSoap11Endpoint" binding="ns:CustomerServiceSoap11Binding"> <soap:address location="https://192.168.0.22:8443/axis2/services/CustomerServic e.CustomerServiceHttpSoap11Endpoint/"/> </wsdl:port> Accordingly, we enter the following code in the server.xml file of Tomcat: <Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true" maxThreads="150" scheme="https" secure="true" clientAuth="true" sslProtocol="TLS" keystoreFile="D:/server.keystore" keystorePass="password" keystoreType="JKS" truststoreFile="D:/server.p12" truststorePass="password" truststoreType="PKCS12" /> So we need keystore files to connect to this WSDL file. To replicate this Job, proceed as follows: Drop the following components from the Palette onto the design workspace: tSetKeystore, tWebService, and tLogRow.

Right-click tSetKeystore to open its contextual menu. In this menu, select Trigger > On Subjob Ok to connect this component to tWebService. Right-click tWebService to open its contextual menu. In this menu, select Row > Main to connect this component to tLogRow. Double-click tSetKeystore to open its Basic settings view and define the component properties.

1252

Talend Open Studio Components

Internet components
tSetKeystore

In the TrustStore type field, select PKCS12 from the drop-down list. In the TrustStore file field, browse to the corresponding truststore file. Here, it is server.p12. In the TrustStore password field, type in the password for this truststore file. In this example, it is password. Select the Need Client authentication check box to activate the keystore configuration fields. In the KeyStore type field, select JKS from the drop-down list. In the KeyStore file field, browse to the corresponding keystore file. Here, it is server.keystore. Double-click tWebService to open the component editor, or select the component in the design workspace and in the Basic settings view, click the three-dot button next to Service configuration.

Talend Open Studio Components

1253

Internet components
tSetKeystore

In the WSDL field, browse to the private WSDL file to be used. In this example, it is CustomerService.wsdl. Click the refresh button next to the WSDL field to retrieve the WSDL description and display it in the fields that follow. In the Port Name list, select the port you want to use, CustomerServiceHttpSoap11Endpoint in this example. In the Operation list, select the service you want to use. In this example the selected service is getCustomer(parameters):Customer. Click Next to open a new view in the editor.

1254

Talend Open Studio Components

Internet components
tSetKeystore

In the panel to the right of the Input mapping view, the input parameter of the service displays automatically. However, you can add other parameters if you select [+] parameters and then click the plus button on top to display the [Parameter Tree] dialog box where you can select any of the listed parameters. The Web service in this example has only one input parameter, ID. In the Expression column of the parameters.ID row, type in the customer ID of your interest between quotation marks. In this example, it is A00001. Click Next to open a new view in the editor.

In the Element list to the left of the view, the output parameter of the web service displays automatically. However, you can add other parameters if you select [+] parameters and then click the plus button on top to display the [Parameter Tree] dialog box where you can select any of the parameters listed. The Web service in this example has four output parameter: return.address, return.email, return.name and return.phone. You now need to create a connection between the output parameter of the defined Web service and the schema of the output component. To do so: In the panel to the right of the view, click the three-dot button next to Edit Schema to open a dialog box in which you can define the output schema.

Talend Open Studio Components

1255

Internet components
tSetKeystore

In the schema editing dialog box, click the plus button to add four columns to the output schema. Click in each column and type in the new names, Name, Phone, Email and Address in this example. This will retrieve the customer information of your interest. Click OK to validate your changes and to close the schema editing dialog box. In the Element list to the right of the editor, drag each parameter to the field that corresponds to the column you have defined in the schema editing dialog box.
If available, use the Auto map! button, located at the bottom left of the interface, to carry out the mapping operation automatically.

Click OK to validate your changes and to close the editor. In the design workspace, double-click tLogRow to open its Basic settings view and define its properties. Click Sync columns to retrieve the schema from the preceding component. Save your Job and press F6 to execute it. The information of the customer with ID A00001 is returned and displayed in the console of Talend Open Studio.

1256

Talend Open Studio Components

Internet components
tSocketInput

tSocketInput
tSocketInput properties
Component family Internet

Function Purpose JAVA Basic settings

tSocketInput component opens the socket port and listens for the incoming data. tSocketInput component is a listening component, allowing to pass data via a defined port Host name Port Timeout Uncompress Die on error Name or IP address of the Host server Listening port to open Number of seconds for the port to listen before closing. Select this check box to unzip the data if relevant This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link. Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows. Character of the row to be escaped Character used to enclose text. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Encoding type Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Field separator Row separator Escape Char Text enclosure Schema type and Edit Schema

Talend Open Studio Components

1257

Internet components
tSocketInput

JAVA Advanced settings Usage Limitation

tStatCatcher Statistics

Select this check box to gather the job processing metadata at a job level as well as at each component level.

This component opens a point of access to a workstation or server. This component starts a Job and only stops after the time goes out. n/a

The Perl properties being slightly different from the Java properties, they are described in a separate table below.
PERL basic settings Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Host name Port Field separator End of Line separator End of data Opening message /Message /Acknowledge message /Closing message PERL Advanced settings Usage tStatCatcher Statistics Name or IP address of the Host server Listening port to open Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish row. Character, string or regular expression that points out the end of the data section Description of the message if relevant.

Select this check box to gather the job processing metadata at a job level as well as at each component level.

This component opens a point of access to a workstation or server. This component starts a Job and only stops after it receives a closing message.

1258

Talend Open Studio Components

Internet components
tSocketInput

Scenario: Passing on data to the listening port (Java)


The following scenario describes a double Job aiming at passing data via a listening port.Another application for the Socket components would be to allow controlled communication between servers which cannot communicate directly.

Create two Jobs: a first Job (SocketInput) opens the listening port and waits for the data to be sent over. The second Job (SocketOutput) passes delimited data from a file to a defined port number corresponding to the listening port. On the first Job, Drop the following components: tSocketInput and tLogRow from the Palette to the design workspace. On the second Job, Drop the following components from the Palette to the design workspace: tFileInputDelimited and tSocketOutput. Lets set the parameters of the second Job first... Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the access parameters to the input file.

In File Name, browse to the file. Define the Row and Field separators, as well as the Header.

Talend Open Studio Components

1259

Internet components
tSocketInput

Describe the Schema of the data to be passed on to the tSocketOutput component. Select the tSocketOutput component and set the parameters on the Basic Settings tab of the Component view.

Define the Host IP address and the Port number where the data will be passed on to. Set the number of retries in the Retry field and the amount of time (in seconds) after which the Job will time out. Define the rest of elements if need be. The schema should be propagated from the preceding component. Now on the other Job (SocketInput) design, define the parameters of the tSocketInput component.

Define the Host IP address and the listening Port number where the data are passed on to. Set the amount of time (in seconds) after which the Job will time out. Define the rest of elements if need be. Edit the schema and set it to reflect the whole or part of the other Jobs schema.

1260

Talend Open Studio Components

Internet components
tSocketInput

The tLogRow does not require any particular setting for this Job. Press F6 to execute this Job (SocketInput) first, in order to open the listening port and prepare it to receive the passed data. Before the time-out, launch the other Job (SocketOutput) to pass on the data. The result displays on the Run view, along with the opening socket information.

Talend Open Studio Components

1261

Internet components
tSocketOutput

tSocketOutput
tSocketOutput properties
Component family Internet

Function Purpose Basic settings

tSocketOutput component writes data to a listening port. tSocketOutput sends out the data from the incoming flow to listening socket port. Host name Port Compress Retry times Timeout Die on error Field separator Row separator Escape Char Text enclosure Schema and Edit Schema Name or IP address of the Host server Listening port to open Select this check box to zip the data if relevant. Number of retries before the Job fails. Number of seconds for the port to listen before closing. Clear this check box to skip the row on error and complete the process for error-free rows. Character, string or regular expression to separate fields. String (ex: \non Unix) to distinguish rows. Character of the row to be escaped Character used to enclose text. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Encoding type Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Usage Limitation

This component opens a point of access to a workstation or server. This component starts a Job and only stops after the time goes out. n/a

1262

Talend Open Studio Components

Internet components
tSocketOutput

Related Scenario
For use cases in relation with tSocketOutput, see Scenario: Passing on data to the listening port (Java) on page 1259.

Talend Open Studio Components

1263

Internet components
tSOAP

tSOAP
tSOAP properties
Component family Internet

Function

tSOAP sends the defined SOAP message with the given parameters to the invoked Web service and returns the value as defined, based on the given parameters. This component calls a method via a Web service in order to retrieve the values of the parameters defined in the component editor. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. This component always uses a built-in, read-only schema that contains three columns: - Header: stores the SOAP message header of the response from the server end. - Body: stores the SOAP message body of the response from the server end. - Fault: stores the error information when an error occurs during the SOAP message processing. Click Edit Schema to view the schema structure. Changing the schema type may result in loss of the schema structure and therefore failure of the component. Use NTLM Select this check box if you want to use the NTLM authentication protocol. Domain: Name of the client domain. Select this check box and enter a user name and a password in the corresponding fields if this is necessary to access the service. Select this check box if you are using a proxy server and fill in the necessary information. Select this check box to validate the server certificate to the client via an SSL protocol and fill in the corresponding fields: TrustStore file: enter the path (including filename) to the certificate TrustStore file that contains the list of certificates that the client trusts. TrustStore password: enter the password used to check the integrity of the TrustStore data. Type in the URL address of the invoked Web server. Type in the URL address of the SOAPAction HTTP header field to be used to identify the intent of the SOAP HTTP request.

Purpose Basic settings

Need authentication

Use http proxy Trust server with SSL

ENDPOINT SOAP Action

1264

Talend Open Studio Components

Internet components
tSOAP

SOAP version

Select the version of the SOAP system you are using.

The required SOAP Envelope varies among versions. SOAP message Type in the SOAP message to be sent to the invoked Web service. The global and context variables can be used when you write a SOAP message. For further information about the context variables, see Variables of the Talend Open Studio User Guide. Set or browse to a temporary folder that you configured in order to store the WSDL files.

Advanced settings

Temporary folder (for wsdl2java)

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Connections This component can be used as an input or as an intermediate component. Outgoing links (from one component to another): Row: Main; Iterate Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Iterate Trigger: Run if; On Component Ok; On Component Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide. Limitation N/A

Scenario: Extracting the weather information using a Web service


This java scenario describes a two-component Job that uses a Web service to retrieve the weather information of a given American city. The Web service to be used is http://www.deeptraining.com/webservices/weather.asmx. Drop the following components from the Palette onto the design workspace: tSOAP and tLogRow.

Right click tSOAP to open the contextual menu. Select Row > Main.

Talend Open Studio Components

1265

Internet components
tSOAP

Click tLogRow to connect the components together using a Main Row link. Double-click tSOAP to open its Basic settings view and define the component properties.

In ENDPOINT field, type in or copy-paste the URL address of the Web service to be used between the quotation marks: http://www.deeptraining.com/webservices/weather.asmx. In the SOAP Action field, type in or copy-paste the URL address of the SOAPAction HTTP header field that indicates that you want to retrieve the weather information: http://litwinconsulting.com/webservices/GetWeather.
You can see this address by looking at the WSDL for the Web service you are calling. For the Web service of this example, in a web browser, append ?wsdl on the end of the URL of the Web service used in the ENDPOINT field, open the corresponding web page, and then see the SOAPAction defined under the operation node: <wsdl:operation name="GetWeather"> <soap:operation soapAction="http://litwinconsulting.com/webservices/GetWeathe r" style="document"/>

In the SOAP version field, select the version of the SOAP system being used. In this scenario, the version is SOAP 1.1.

1266

Talend Open Studio Components

Internet components
tSOAP

In the SOAP message field, enter the XML-format message used to retrieve the weather information from the invoked Web service. In this example, the weather information of Chicago is needed, so the message is: "<soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:web=\"http://litwinconsulting.com/webservices/\"> <soapenv:Header/> <soapenv:Body> <web:GetWeather> <web:City>Chicago</web:City> </web:GetWeather> </soapenv:Body> </soapenv:Envelope>" Save your Job and press F6 to execute it. The weather of Chicago is returned and displayed in the console of the Run view.

Talend Open Studio Components

1267

Internet components
tWebServiceInput

tWebServiceInput
tWebServiceInput Properties
Component family Internet

Function Purpose

Calls the defined method from the invoked Web service, and returns the class as defined, based on the given parameters. Invokes a Method through a Web service. To handle complex hierarchical data, use the advanced features of tWebServiceInput and provide Java code directly in the Code field of the Advanced Settings view. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where the properties are stored. The fields that follow are completed automatically using the data retrieved. Click this icon to open a WSDL schema wizard and store your WSDL connection in the Repository tree view. For more information about setting up and storing database connection parameters, see Setting up a simple schema on page 347 of Talend Open Studio User Guide. Perl only field Encoding type Select the encoding type from the list or select Custom and define the type manually. This field is obligatory for manipulating data in a database. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component in the Job. Built-in: You create the schema and store it locally for the relevant component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

Schema and Edit Schema

Perl only field

End Point URI

Resource identifier of the Web service.

1268

Talend Open Studio Components

Internet components
tWebServiceInput

WSDL Java only field Need authentication / Username and Password

Description of Web service bindings and configuration. Select this check box and: -enter a username and a password in the corresponding fields if this is necessary to access the service. Or, -select the Windows authentication check box and enter the windows domain in the corresponding field if this is necessary to access the service. Select this check box if you are using a proxy server and fill in the necessary information. Select this check box to validate the server certificate to the client via an SSL protocol and fill in the corresponding fields: TrustStore file: enter the path (including filename) to the certificate TrustStore file that contains the list of certificates that the client trusts. TrustStore password: enter the password used to check the integrity of the TrustStore data. Set a value in seconds for Web service connection time out. Enter the exact name of the Method to be invoked. The Method name MUST match the corresponding method described in the Web Service. The Method name is also case-sensitive. Enter the parameters expected and the sought values to be returned. Make sure that the parameters entered fully match the names and the case of the parameters described in the method. Select this check box to display the fields dedicated for the advanced use of tWebServiceInput: WSDL2java: click the three-dot button to generate Talend routines that hold the Java code necessary to connect and query the Web service. Code: replace the generated model Java code with the code necessary to connect and query the specified Web service using the code in the generated Talend routines. Match Brackets: select the number of brackets to be used to close the for loop based on the number of open brackets.

Java only field Java only field

Use http proxy Trust server with SSL

Java only field

Time out (second) Method Name

Parameters

Advanced settings

Advanced Use Java only field

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component is generally used as a Start component. It must be linked to an output component. n/a

Talend Open Studio Components

1269

Internet components
tWebServiceInput

Scenario 1: Extracting images through a Web service


This scenario describes a two-component Job which uses a Web service method and displays the output on the Run console view. The method retrieves a full URL as an input string and returns a string array of images from a given Web page.

Drop a tWebServiceInput component and a tLogRow component from the Palette onto the design workspace. On the Component view of the tWebServiceInput component, define the WSDL specifications, such as End Point URI, WSDL and SOAPAction URI where required. If the Web service you invoke requires authentication details, select the Need authentication check box and provide the relevant authentication information.

If you are using a proxy server, select the Use http proxy check box and enter the necessary connection information. In the Method Name field, enter the method name as defined in the Web Service description. The name and the case of the method entered must match the corresponding Web service method exactly.

1270

Talend Open Studio Components

Internet components
tWebServiceInput

In the Parameters area, click the plus [+] button to add a row to the table, then enter the exact name of the parameters which correspond to the method. In the Value column, type in the URL of the Website that the images are to be extracted from. Link the tWebServiceInput component to the standard output component, tLogRow. Then save your Job and press F6 to execute it.

All of the images extracted from the Web site are returned as a list of URLs on the Run view.

Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features (Java only)
This Java scenario describes a two-component Job that retrieves a list of funds published by a financial Web service (distributed by www.xignite.com) and displays the output on the standard console (the Run view). This scenario is designed for advanced users with basic knowledge of Java. Since the aim of this Job is to retrieve complex hierarchical data, you need to code the necessary functions in Java.

Drop the following components from the Palette onto the design workspace: tWebServiceInput and tLogRow. Link the two components together using a Row Main connection. Double-click tWebServiceInput to show the Component view and set the component properties:

Talend Open Studio Components

1271

Internet components
tWebServiceInput

In the Basic settings view: In the Property Type list, select Built-in and complete the fields that follow manually. In the Schema Type list, select Built-in and click the [...] button to configure the data structure (schema) manually, as shown in the figure below:

Click OK to validate the schema and close the window. A dialog box opens and asks you if you want to propagate the modifications. Click Yes. In the WSDL field, enter the URL from which to get the WSDL. In the Time out field, enter the desired duration of the Web Service connection.

1272

Talend Open Studio Components

Internet components
tWebServiceInput

Click the Advanced settings tab to display the corresponding view where you can set the tWebServiceInput advanced features:

Select the check box next to Advanced Use to display the advanced configuration fields. Click the [...] button next to the WSDL2Java field in order to generate routines from the WSDL Web service.

Talend Open Studio Components

1273

Internet components
tWebServiceInput

The routines generated display automatically under Code > Routines in the Repository tree view. These routines can thus easily be called in the code to build the function required to fetch complex hierarchical data from the Web Service. Enter the relevant function in the Code field. By default, two examples of code are provided in the Code field. The first example returns one piece of data, and the second example returns several. In this scenario, several data are to be returned. Therefore, remove the first example of code and use the second example of code to build the function. Replace the pieces of code provided as examples with the relevant routines that have been automatically generated from the WSDL. Change TalendJob_PortType to the routine name ending with _Port_Type, such as: XigniteFundHoldingsSoap_PortType. Replace the various instances of TalendJob with a more relevant name such as the name of the method in use. In this use case: fundHolding Replace TalendJobServiceLocator with the name of the routine ending with Locator, such as: XigniteFundHoldingLocator. Replace both instances of TalendJobSoapBindingStub with the routine name ending with BindingStub, such as: XigniteFundHoldingsSoap_BindingStub. Within the brackets corresponding to the pieces of code: stub.setUsername and stub.setPassword, enter your username and password respectively, between quotes. For the sake of confidentiality or maintenance, you can store your username and password in context variables. The list of funds provided by the Xignite Web service is identified using so-called symbols, which are of string type. In this example, we intend to fetch the list of funds of which the symbol is between I and J. To do so, define the following statements: string startSymbol=I and string endSymbol=J. Then enter the piece of code to create the result table showing the list of funds (listFunds) of funds holdings using the statements defined earlier on: routines.Fund[] result = fundHoldings.listFunds(startSymbol, endSymbol); Run a loop on the fund list to fetch the funds ranging from I to J: for(int i = 0; i < result.length; i++) {. Define the results to return, for example: fetch the CIK data from the Security schema using the code getSecurity().getCIK(), then pass them on to the CIK output schema. The function that operates the Web service should read as follows: routines.XigniteFundHoldingsSoap_PortType fundHoldings = new routines.XigniteFundHoldingsLocator().getXigniteFundHoldingsSoap( ); routines.XigniteFundHoldingsSoap_BindingStub stub = (routines.XigniteFundHoldingsSoap_BindingStub)fundHoldings; stub.setUsername(identifiant); Stub.setPassword(mot de passe);

1274

Talend Open Studio Components

Internet components
tWebServiceInput

String startSymbol="I"; String endSymbol="J"; routines.Fund[ ] result = fundHoldings.listFunds(startSymbol, endSymbol); for(int i = 0; i < result.length; i++) { output_row.CIK = (result[i]).getSecurity().getCIK(); output_row.cusip = (result[i]).getSecurity().getCusip(); output_row.symbol = (result[i]).getSecurity().getSymbol(); output_row.ISIN = (result[i]).getSecurity().getISIN(); output_row.valoren = (result[i]).getSecurity().getValoren(); output_row.name = (result[i]).getSecurity().getName(); output_row.market = (result[i]).getSecurity().getMarket(); output_row.category = (result[i]).getSecurity().getCategoryOrIndustry(); output_row.asOfDate = (result[i]).getAsOfDate();
The outputs defined in the Java function output_row.output must match the columns defined in the component schema exactly. The case used must also be matched in order for the data to be retrieved.

In the Match Brackets field, select the number of brackets to use to end the For loop, based on the number of open brackets. For this scenario, select one bracket only as only one bracket has been opened in the function. Double-click the tLogRow component to display the Component view and set its parameters. Click the [...] button next to the Edit Schema field in order to check that the preceding component schema was properly propagated to the output component. If needed, click the Sync Columns button to retrieve the schema. Save your Job and press F6 to run it.

The funds comprised between I and J are returned and displayed in the Talend Open Studio console.

Talend Open Studio Components

1275

Internet components
tXMLRPCInput

tXMLRPCInput
tXMLRPCInput Properties
Component family Internet

Function Purpose Basic settings

Calls the defined method from the invoked RPC service, and returns the class as defined, based on the given parameters. Invokes a Method through a Web service and for the described purpose Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. In the RPC context, the schema corresponds to the output parameters. If two parameters are meant to be returned, then the schema should contain two columns. URL of the RPC service to be accessed Select this check box and fill in a username and password if required to access the service. Enter the exact name of the Method to be invoked. The Method name MUST match the corresponding method described in the RPC Service. The Method name is also case-sensitive. Select the type of data to be returned by the method. Make sure it fully matches the one defined in the method. Enter the parameters expected by the method as input parameters.

Server URL Need authentication / Username and Password Method Name

Return class

Parameters Usage Limitation

This component is generally used as a Start component. It requires to be linked to an output component. n/a

Scenario: Guessing the State name from an XMLRPC


This scenario describes a two-component Job aiming at using a RPC method and displaying the output on the console view.

1276

Talend Open Studio Components

Internet components
tXMLRPCInput

Drop the tXMLRPCInput and a tLogRow components from the Palette to the design workspac. Set the tXMLRPCInput basic settings.

Define the Schema type as Built-in for this use case. Set a single-column schema as the expected output for the called method is only one parameter: StateName.

Then set the Server url. For this demo, use: http://phpxmlrpc.sourceforge.net/server.php No authentication details are required in this use case. The Method to be called is: examples.getStateName The return class is not compulsory for this method but might be strictly required for another. Leave the default setting for this use case. Then set the input Parameters required by the method called. The Name field is not used in the code but the value should follow the syntax expected by the method. In this example, the Name used is State Nr and the value randomly chosen is 42.

Talend Open Studio Components

1277

Internet components
tXMLRPCInput

The class has not much impact using this demo method but could have with another method, so leave the default setting. On the tLogRow component Component view, check the box: Print schema column name in front of each value. Then save the Job and press F6 to execute it.

South Dakota is the state name found using the GetStateName RPC method and corresponds the 42nd State of the United States as defined as input parameter.

1278

Talend Open Studio Components

Logs & Errors components


This chapter details the main components that you can find in the Logs & Errors family of the Talend Open Studio Palette. The Logs & Errors family groups together the components which are dedicated to log information catching and job error handling.

Logs & Errors components


tAssert

tAssert

tAssert Properties
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution. It concludes with the boolean result based on an assertive statement related to the execution and feed the result to tAssertCatcher for proper Job status presentation.
Component family Logs & Errors

Function Purpose

Provides the job status messages to tAssertCatcher. Generates the boolean evaluation on the concern for the job execution status. The status includes: - Ok: the job execution succeeds. - Fail: the job execution fails. The tested Job's result does not match the expectation or an execution error occured at runtime. Description Expression Type in your descriptive message to help identify the assertion of a tAssert. Type in the assertive statement you base the evaluation on.

Basic settings

Usage

This component follows the action the assertive condition is directly related to. It can be the intermediate or end component of the main job, or the start, intermediate or end component of the secondary job. The evaluation of tAssert is captured only by tAssertCatcher.

Limitation

Scenario: Setting up the assertive condition for a Job execution


This scenario describes how to set up an assertive condition in tAssert in order to evaluate that a Job execution succeeds or not. Moreover, you can also find out how the two different evaluation results display and the way to read them. Apart from tAssert, the scenario uses the following components as well: tFileInputDelimited and tFileOutputDelimited. The two components compose the main Job of which the execution status is evaluated. For the detailed information on the two components, see component tFileInputDelimited on page 1054 and component tFileOutputDelimited on page 1115. tFileCompare. It realizes the comparison between the output file of the main Job and a standard reference file. The comparative result is evaluated by tAssert against the assertive condition set up in its settings. For more detailed information on tFileCompare, see component tFileCompare on page 1036 tAssertCatcher. It captures the evaluation generated by tAssert. For more information on tAssertCatcher, see component tAssertCatcher on page 1286. tLogRow. It allows you to read the captured evaluation. For more information on tLogRow, see component tLogRow on page 1305.

1280

Talend Open Studio Components

Logs & Errors components


tAssert

First proceed as follows to design the main Job: Prepare a delimited .csv file as the source file read by your main Job. Edit two rows in the delimited file. The contents you edit are not important, so feel free to simplify them. Name it source.csv. In Talend Open Studio, create a new job JobAssertion. Place tFileInputDelimited and tFileOutputDelimited on the workspace. Connect them with a Row Main link to create the main job.

Double-click tFileInputDelimited to open its Component view. In the File Name field of the Component view, fill in the path or browse to source.csv.

Still in the Component view, set Property Type to Built-In and click next to Edit schema to define the data to pass on to tFileOutputDelimited. In the scenario, define the data presented in source.csv you created. For more information about schema types, see How to set a built-in schema and How to set a repository schema of Talend Open Studio User Guide. Define the other parameters in the corresponding fields according to source.csv you created. Double-click tFileOutputDelimited to open its Component view. In the File Name field of the Component view, fill in or browse to specify the path to the output file, leaving the other fields as they are by default.

Talend Open Studio Components

1281

Logs & Errors components


tAssert

Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited and output an delimited file, out.csv. Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job. Rename out.csv as reference.csv.This file is used as the expected result the main Job should output. Place tFileCompare, tAssert and tLogRow on the workspace. Connect them with Row Main link. Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.

Double-click tFileCompare to open its Component view. In the Component view, fill in the corresponding file paths in the File to compare field and the Reference file field, leaving the other fields as default.

1282

Talend Open Studio Components

Logs & Errors components


tAssert

For more information on the tFileCompare component, see component tFileCompare on page 1036. Then click tAssert and click the Component tab on the lower side of the workspace.

In the Component view, edit the assertion row2.differ==0 in the expression field and the descriptive message of the assertion in description field. In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert, differ is one of the columns of the tFileCompare schema and presents whether the compared files are identical, and 0 means no difference is detected between the out.csv and reference.csv by tFileCompare. Hence when the compared files are identical, the assertive condition is thus fulfilled, tAssert concludes that the main Job succeeds; otherwise, it concludes failure.
The differ column is in the read-only tFileCompare schema. For more information on its schema, see component tFileCompare on page 1036

Press F6 to execute the Job. Check the result presented in the Run view

Talend Open Studio Components

1283

Logs & Errors components


tAssert

The console shows the comparison result of tFileCompare: Files are identical. But you find nowhere the evaluation result of tAssert. So you need tAssertCatcher to capture the evaluation. Place tAssertCatcher and tLogRow on the workspace. Connect them with Row Main link.

Use the default configuration in the Component view of tAssertCatcher.

1284

Talend Open Studio Components

Logs & Errors components


tAssert

Press F6 to execute the Job. Check the result presented in the Run view. You will see the Job status information is added in: 2010-01-29 15:37:33|fAvAzH|TASSERT|JobAssertion|java|tAssert_1|Ok|--| The output file should be identical with the reference file.

The descriptive information on JobAssertion in the console is organized according to the tAssertCatcher schema. This schema includes, in the following order, the execution time, the process ID, the project name, the Job name, the code language, the evaluation origin, the evaluation result, detailed information of the evaluation, descriptive message of the assertion. For more information on the schema of tAssertCatcher, see component tAssertCatcher on page 1286. The console indicates that the execution status of Job JobAssertion is Ok. In addition to the evalution, you can still see other descriptive information about JobAssertion including the descriptive message you have edited in the Basic settings of tAssert. Then you will perform operations to make the main Job fail to generate the expected file. To do so, proceed as follows in the same Job you have executed: Delete a row in reference.csv. Press F6 to execute the Job again. Check the result presented in Run view. 2010-02-01 19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test logically failed|The output file should be identical with the reference file.

The console shows that the execution status of the main Job is Failed. The detailed explanation for this status is closely behind it, reading Test logically failed. You can thus get a basic idea about your present Job status: it fails to generate the expected file because of a logical failure. This logical failure could come from a logical mistake during the Job design. The status and its explanatory information are presented respectively in the status and the substatus columns of the tAssertCatcher schema. For more information on the columns, see component tAssertCatcher on page 1286.
Talend Open Studio Components 1285

Logs & Errors components


tAssertCatcher

tAssertCatcher
tAssertCatcher Properties
Component family Logs & Errors

Function Purpose Basic settings

Based on its pre-defined schema, fetches the execution status information from repository, job execution and tAssert. Generates a data flow consolidating the status information of a job execution and transfer the data into defined output files. Schema type A schema is a row description, i.e., it defines the fields to be processed and passed on to the next component. In this particular case, the schema is read-only, as this component gathers standard log information including: Moment: Processing time and date. Pid: Process ID. Project: Project which the job belongs to. Job: Job name. Language: Language used by the job. It may be Java or Perl. Origin: Status evaluation origin. The origin may be different tAssert components. Status: Evaluation fetched from tAssert. They may be - Ok: if the assertive statement of tAssert is evaluated as true at runtime. - Failed: if the assertive statement of tAssert is evaluated as false or an execution error occurs at runtime. The tested Job's result does not match the expectation or an execution error occured at runtime. Substatus: Detailed explanation for failed execution. The explanation can be: - Test logically failed: the investigated Job does not produce the expected result. - Execution error: an execution error occured at runtime. Description: Descriptive message you typed in in Basic settings of tAssert. Catch Java Exception This check box allows to capture Java exception errors, once checked. Catch tAssert This check box allows to capture the evaluations of tAssert.

Usage

This component is the start component of a secondary Job which fetches the execution status information from several sources. It generates a data flow to transfer the information to the component which proceeds.

1286

Talend Open Studio Components

Logs & Errors components


tAssertCatcher

Limitation

This component must be used with tAssert together.

Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario: Scenario: Setting up the assertive condition for a Job execution on page 1280

Talend Open Studio Components

1287

Logs & Errors components


tChronometerStart

tChronometerStart
tChronometerStart Properties
Component family Logs & Errors

Function Purpose

Starts measuring the time a subjob takes to be executed. Operates as a chronometer device that starts calculating the processing time of one or more subjobs in the main Job, or that starts calculating the processing time of part of your subjob. You can use tChronometerStart as a start or middle component. It can precede one or more processing tasks in the subjob. It can precede one or more subjobs in the main Job. n/a

Usage

Limitation

Related scenario
For related scenario, see Scenario: Measuring the processing time of a subjob and part of a subjob on page 1289.

1288

Talend Open Studio Components

Logs & Errors components


tChronometerStop

tChronometerStop
tChronometerStop Properties
Component family Logs & Errors

Function Purpose

Measures the time a subjob takes to be executed. Operates as a chronometer device that stops calculating the processing time of one or more subjobs in the main Job, or that stops calculating the processing time of part of your subjob. In Perl, tChronometerStop displays the total execution time, number of runs, number of rows processed per second, and the average, minimum and maximum processing time of a row. In Java, tChronometerStop displays the total execution time. Since options Select either check box to select measurement starting point: Since the beginning: stops time measurement launched at the beginning of a subjob. Since a tChronometerStart: stops time measurement launched at one of the tChronometerStart components used on the data flow of the subjob. When selected, it displays subjob execution information on the console. When selected, it displays the name of the component on the console. Enter desired text, to identify your subjob for example. When selected, it displays subjob execution information in readable time unites.

Basic settings

Display duration in console Display component name Caption Display human readable duration Usage Limitation

Cannot be used as a start component. n/a

Scenario: Measuring the processing time of a subjob and part of a subjob


This scenario is a subjob that does the following in a sequence: generates 1000 000 rows of first and last names, gathers first names with their corresponding last names, stores the output data in a delimited file, measures the duration of the subjob as a whole, measures the duration of the name replacement operation,

Talend Open Studio Components

1289

Logs & Errors components


tChronometerStop

displays the gathered information about the processing time on the Run log console. To measure the processing time of the subjob: Drop the following components from the Palette onto the design workspace: tRowGenerator, tMap, tFileOutputDelimited, and tChronometerStop. Connect the first three components using Main Row links.
When connecting tMap to tFileOutputDelimited, you will be prompted to name the output table. The name used in this example is new_order.

Connect tFileOutputDelimited to tChronometerStop using an OnComponentOk link. Select tRowGenerator and click the Component tab to display the component view. In the component view, click Basic settings. The Component tab opens on the Basic settings view by default.

Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is composed of two columns: First_Name and Last_Name, so click twice the [+] button to add two columns and rename them. Click the RowGenerator Editor three-dot button to open the editor and define the data to be generated.

1290

Talend Open Studio Components

Logs & Errors components


tChronometerStop

In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows for RowGenerator field and click OK. The RowGenerator Editor closes. You will be prompted to propagate changes. Click Yes in the popup message. Double-click on the tMap component to open the Map editor. The Map editor opens displaying the input metadata of the tRowGenerator component.

In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and define them. In the Map editor, drag the First_Name row from the input table to the Last_Name row in the output table and drag the Last_Name row from the input table to the First_Name row in the output table. Click Apply to save changes. You will be prompted to propagate changes. Click Yes in the popup message. Click OK to close the editor.

Talend Open Studio Components

1291

Logs & Errors components


tChronometerStop

Select tFileOutputDelimited and click the Component tab to display the component view. In the Basic settings view, set tFileOutputDelimited properties as needed.

Select tChronometerStop and click the Component tab to display the component view. In the Since options panel of the Basic settings view, select Since the beginning option to measure the duration of the subjob as a whole.
t

1292

Talend Open Studio Components

Logs & Errors components


tChronometerStop

Select/clear the other check boxes as needed. In this scenario, we want to display the subjob duration on the console preceded by the component name. If needed, enter a text in the Caption field. Save your Job and press F6 to execute it.

You can measure the duration of the subjob the same way by placing tChronometerStop below tRowGenerator, and connecting the latter to tChronometerStop using an OnSubjobOk link.

Talend Open Studio Components

1293

Logs & Errors components


tDie

tDie
tDie properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the output defined.
Component family Logs & Errors

Function Purpose Basic settings

Kills the current Job. Generally used with a tCatch for log purpose. Triggers the tLogCatcher component for exhaustive log before killing the Job. Die message Error code Priority Enter the message to be displayed before the Job is killed. Enter the error code if need be, as an integer Set the level of priority, as an integer

Usage Limitation

Cannot be used as a start component. n/a

Related scenarios
For use cases in relation with tDie, see tLogCatcher scenarios: Scenario 1: warning & log on entries on page 1301 Scenario 2: Log & kill a Job on page 1303

1294

Talend Open Studio Components

Logs & Errors components


tFlowMeter

tFlowMeter
tFlowMeter Properties
Component family Logs & Errors

Function Purpose Basic settings

Counts the number of rows processed in the defined flow. The number of rows is then meant to be caught by the tFlowMeterCatcher for logging purpose. Use input connection name as label Mode Select this check box to reuse the name given to the input main row flow as label in the logged data. Select the type of values for the data measured: Absolute: the actual number of rows is logged Relative: a ratio (%) of the number of rows is logged. When this option is selected, a Connections List shows to let you select a reference connection. Adds a threshold to watch proportions in volumes measured. you can decide that the normal flow has to be between low and top end of a row number range, and if the flow is under this low end, there is a bottleneck.

Thresholds

Usage Limitation

Cannot be used as a start component as it requires an input flow to operate. n/a

If you have a need of log, statistics and other measurement of your data flows, see How to automate the use of statistics & logs in Talend Open Studio User Guide.

Related scenario
For related scenario, see Scenario: Catching flow metrics from a Job on page 1297.

Talend Open Studio Components

1295

Logs & Errors components


tFlowMeterCatcher

tFlowMeterCatcher
tFlowMeterCatcher Properties
Component family Logs & Errors

Function

Based on a defined sch.ema, the tFlowMeterCatcher catches the processing volumetric from the tFlowMeter component and passes them on to the output component. Operates as a log function triggered by the use of a tFlowMeter component in the Job. Schema type A schema is a row description, i.e., it defines the fields to be processed and passed on to the next component. In this particular case, the schema is read-only, as this component gathers standard log information including: Moment: Processing time and date Pid: Process ID Father_pid: Process ID of the father Job if applicable. If not applicable, Pid is duplicated. Root_pid: Process ID of the root Job if applicable. If not applicable, pid of current Job is duplicated. System_pid: Process id generated by the system Project: Project name, the Job belongs to. Job: Name of the current Job Job_repository_id: ID generated by the application. Job_version: Version number of the current Job Context: Name of the current context Origin: Name of the component if any Label: Label of the row connection preceding the tFlowMeter component in the Job, and that will be analyzed for volumetrics. Count: Actual number of rows being processed Reference: Name of the reference row as defined in the tFlowMeter component for relative counting mode. Thresholds: Only used when the relative mode is selected in the tFlowMeter component.

Purpose Basic settings

Usage Limitation

This component is the start component of a secondary Job which triggers automatically at the end of the main Job. The use of this component cannot be separated from the use of the tFlowMeter. For more information, see tFlowMeter on page 1295.

1296

Talend Open Studio Components

Logs & Errors components


tFlowMeterCatcher

Scenario: Catching flow metrics from a Job


The following basic Job aims at catching the number of rows being passed in the flow processed. The measures are taken twice, once after the input component, that is, before the filtering step and once right after the filtering step, that is, before the output component.

Drop the following components from the Palette to the design workspace: tMysqlInput, tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited. Link components using row main connections and click on the label to give consistent name throughout the Job, such as US_States from the input component and filtered_states for the output from the tMap component, for example. Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as data is passed. On the tMysqlInput Component view, configure the connection properties as Repository, if the table metadata are stored in the Repository. Or else, set the Type as Built-in and configure manually the connection and schema details if they are built-in for this Job.

The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to get selected, the query to run onto the Mysql database is as follows: select * from states. Select the relevant encoding type on the Advanced settings vertical tab.

Talend Open Studio Components

1297

Logs & Errors components


tFlowMeterCatcher

Then select the following component which is a tFlowMeter and set its properties.

Select the check box Use input connection name as label, in order to reuse the label you chose in the log output file (tFileOutputDelimited). The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set for this example. Then launch the tMap editor to set the filtering properties. For this use case, drag and drop the ID and State columns from the Input area of the tMap towards the Output area. No variable is used in this example.

On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to activate the expression filter field. Drag the State column from the Input area (row2) towards the expression filter field and type in the rest of the expression in order to filter the state labels starting with the letter M. The final expression looks like: row2.State.startsWith("M") Click OK to validate the setting. Then select the second tFlowMeter component and set its properties.

1298

Talend Open Studio Components

Logs & Errors components


tFlowMeterCatcher

Select the check box Use input connection name as label. Select Relative as Mode and in the Reference connections list, select US_States as reference to be measured against. Once again, no threshold is used for this use case. No particular setting is required in the tLogRow. Neither does the tFlowMeterCatcher as this components properties are limited to a preset schema which includes typical log information. So eventually set the log output component (tFileOutputDelimited).

Select the Append check box in order to log all tFlowMeter measures. Then save your Job and press F6 to execute it.

Talend Open Studio Components

1299

Logs & Errors components


tFlowMeterCatcher

The Run view shows the filtered state labels as defined in the Job.

In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1 and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this difference.

1300

Talend Open Studio Components

Logs & Errors components


tLogCatcher

tLogCatcher
tLogCatcher properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the output defined.
Component family Logs & Errors

Function Purpose Basic settings

Fetches set fields and messages from Java Exception/PerlDie, tDie and/or tWarn and passes them on to the next component. Operates as a log function triggered by one of the three: Java exception/PerlDie, tDie or tWarn, to collect and transfer log data. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Catch PerlDie Select this check box to trigger the tCatch function Catch Java Exception when a PerlDie/Java Exception occurs in the Job Catch tDie Catch tWarn Select this check box to trigger the tCatch function when a tDie is called in a Job Select this check box to trigger the tCatch function when a tWarn is called in a Job

Usage Limitation

This component is the start component of a secondary Job which automatically triggers at the end of the main Job n/a

Scenario 1: warning & log on entries


In this basic scenario made of three components, a tRowGenerator creates random entries (id to be incremented). The input hits a tWarn component which triggers the tLogCatcher subjob. This subjob fetches the warning message as well as standard predefined information and passes them on to the tLogRow for a quick display of the log data.

Talend Open Studio Components

1301

Logs & Errors components


tLogCatcher

Drop a tRowGenerator, a tWarn, a tLogCatcher and a tLogRow from the Palette, on your design workspace Connect the tRowGenerator to the tWarn component. Connect separately the tLogCatcher to the tLogRow. On the tRowGenerator editor, set the random entries creation using a basic function:

On the tWarn Component view, set your warning message, the code the priority level. In this case, the message is this is a warning. For this scenario, we will concatenate a function to the message above, in order to collect the first value from the input table.

On the Basic settings view of tLogCatcher, select the tWarn check box in order for the message from the latter to be collected by the subjob. Click Edit Schema to view the schema used as log output. Notice that the log is comprehensive.

1302

Talend Open Studio Components

Logs & Errors components


tLogCatcher

Press F6 to execute the Job. Notice that the Log produced is exhaustive.

Scenario 2: Log & kill a Job


This scenario uses a tLogCatcher and a tDie component. A tRowGenerator is connected to a tFileOutputDelimited using a Row link. On error, the tDie triggers the catcher subjob which displays the log data content on the Run console.

Drop all required components from various folders of the Palette to the design workspace: tRowGenerator, tFileOutputDelimited, tDie, tLogCatcher, tLogRow. On the tRowGenerator Component view, define the setting of the input entries to be handled.

Edit the schema and define the following columns as random input examples: id, name, quantity, flag and creation. Set the Number of rows onto 0. This will constitute the error which the Die operation is based on. On the Values table, define the functions to feed the input flow.
Talend Open Studio Components 1303

Logs & Errors components


tLogCatcher

Define the tFileOutputDelimited to hold the possible output data. The row connection from the tRowGenerator feeds automatically the output schema. The separator is a simple semi-colon. Connect this output component to the tDie using a Trigger > If connection. Double-click on the newly created connection to define the if: ((Integer)globalMap.get("tRowGenerator_1_NB_LINE")) <=0 Then double-click to select and define the Basic settings of the tDie component.

Enter your Die message to be transmitted to the tLogCatcher before the actual kill-job operation happens. Next to the Job but not physically connected to it, drop a tLogCatcher from the Palette to the design workspace and connect it to a tLogRow component. Define the tLogCatcher Basic settings. Make sure the tDie box is selected in order to add the Die message to the Log information transmitted to the final component.

Press F6 to run the Job and notice that the log contains a black message and a red one. The black log data come from the tDie and are transmitted by the tLogCatcher. In addition the normal Java Exception message in red displays as a Job abnormally died.

1304

Talend Open Studio Components

Logs & Errors components


tLogRow

tLogRow
tLogRow properties
Component family Logs & Errors

Function Purpose Basic settings

Displays data or results in the Run console tLogRow helps monitoring data processed. Print values in table cells Separator The output flow displays in table cells. Enter the separator which will delimit data on the Log display

Print component Select this check box in case several LogRow unique name in front components are used. Allows to differentiate of each output row outputs Print schema column name in front of each value Use fixed length for values Usage Limitation Select this check box to retrieve column labels from output schema. Select this check box to set a fixed width for the value display.

This component can be used as intermediate step in a data flow or as a n end object in the job flowchart. n/a

Scenario: Delimited file content display


Related topics using a tLogRow component: tFileInputDelimited Scenario: Delimited file content display on page 1055. tContextLoad Scenario: Dynamic context use in MySQL DB insert on page 1331 tWarn, tDie, tLogCatcher Scenario 1: warning & log on entries on page 1301 and Scenario 2: Log & kill a Job on page 1303

Talend Open Studio Components

1305

Logs & Errors components


tStatCatcher

tStatCatcher
tStatCatcher Properties
Component family Logs & Errors

Function Purpose

Based on a defined schema, gathers the job processing metadata at a job level as well as at each component level. Operates as a log function triggered by the StatsCatcher Statistics check box of individual components, and collects and transfers this log data to the output defined. Schema type A schema is a row description, i.e., it defines the fields to be processed and passed on to the next component. In this particular case, the schema is read-only, as this component gathers standard log information including: Moment: Processing time and date Pid: Process ID Father_pid: Process ID of the father Job if applicable. If not applicable, Pid is duplicated. Root-pid: Process ID of the root Job if applicable. If not applicable, pid of current Job is duplicated. Project: Project name, the Job belongs to. Job: Name of the current Job Context: Name of the current context Origin: Name of the component if any Message: Begin or End.

Basic settings

Usage

This component is the start component of a secondary Job which triggers automatically at the end of the main Job. The processing time is also displayed at the end of the log. n/a

Limitation

Scenario: Displaying job stats log


This scenario describes a four-component Job, aiming at displaying on the Run console the statistics log fetched from the file generation through the tStatCatcher component.

1306

Talend Open Studio Components

Logs & Errors components


tStatCatcher

Drop the required components: tRowGenerator, tFileOutputDelimited, tStatCatcher and tLogRow from the Palette to the design workspace. In the Basic settings panel of tRowGenerator, define the data to be generated. For this Job, the schema is composed of three columns: ID_Owners, Name_Customer and ID_Insurance, generated using Perl script.

The number of rows can be restricted to 100. Click on the Main tab of the Component view.

And select the tStatCatcher Statistics check box to enable the statistics fetching operation. Then, define the output components properties. In the tFileOutputDelimited Component view, browse to the output file or enter a name for the output file to be created. Define the delimiters, such as semi-colon, and the encoding. Click on Edit schema and make sure the schema is recollected from the input schema. If need be, click on Sync Columns. Then click on the Basic settings tab of the Component view, and select here as well the tStatCatcher Statistics check box to enable the processing data gathering.

Talend Open Studio Components

1307

Logs & Errors components


tStatCatcher

In the secondary Job, double-click on the tStatCatcher component. Note that the Properties are provided for information only as the schema representing the processing data to be gathered and aggregated in statistics, is defined and read-only.

Define then the tLogRow to set the delimiter to be displayed on the console. Eventually, press F6 to run the Job and display the job result.

The log shows the Begin and End information for the Job itself and for each of the component used in the Job.

1308

Talend Open Studio Components

Logs & Errors components


tWarn

tWarn
tWarn Properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the output defined.
Component family Logs & Errors

Function Purpose Basic settings

Provides a priority-rated message to the next component. Triggers a warning often caught by the tLogCatcher component for exhaustive log. Warn message Code Priority Type in your warning message. Define the code level. Enter the priority level as an integer.

Usage Limitation

Cannot be used as a start component. If an output component is connected to it, an input component should be preceding it. n/a

Related scenarios
For use cases in relation with tWarn, see tLogCatcher scenarios: Scenario 1: warning & log on entries on page 1301 Scenario 2: Log & kill a Job on page 1303

Talend Open Studio Components

1309

Logs & Errors components


tWarn

1310

Talend Open Studio Components

Misc group components


This chapter details the main components that you can find in Misc family of the Talend Open Studio Palette. The Misc family gathers miscellaneous components covering needs such as the creation of sets of dummy data rows, buffering data or loading context variables.

Misc group components


tAddLocationFromIP

tAddLocationFromIP
tAddLocationFromIP Properties
Component family Misc

Function Purpose

tAddLocationFromIP replaces IP addresses with geographical locations. tAddLocationFromIP helps you to geolocate visitors through their IP addresses. It identifies visitors geographical locations i.e. country, region, city, latitude, longitude, ZIP code...etc.using an IP address lookup database file. Schema type and Edit schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: Select the Repository file where Properties are stored. When selected, the fields that follow are pre-defined using fetched data. Database Filepath Input parameters The path to the IP address lookup database file. Input column: Select the input column from which the input values are to be taken. input value is a hostname: Check if the input column holds hostnames. input value is an IP address: Check if the input column holds IP addresses. Location type Country code: Check to replace IP with country code. Country name: Check to replace IP with country name.

Basic settings

Usage

This component is an intermediary step in the data flow allowing to replace IP with geolocation information. It can not be a start component as it requires an input flow. It also requires an output component. n/a

Limitation

1312

Talend Open Studio Components

Misc group components


tAddLocationFromIP

Scenario: Identifying a real-world geographic location of an IP


The following Java scenario creates a three-component Job that associates an IP with a geographical location. It obtains a site visitors geographical location based on its IP. Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tAddLocationFromIP, and tLogRow. Connect the three components using Row Main links.

In the design workspace, select tFixedFlowInput. Click the Component tab to define the basic settings for tFixedFlowInput. Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to define the data you want to use as input. In this scenario, the schema is made of one column that holds an IP address.

Click OK to close the dialog box, and accept propagating the changes when prompted by the system. The defined column displays in the Values panel of the Basic settings view. Click in the Value cell and set the value for the IP address.

In the Number of rows field, enter the number of rows to be generated.

Talend Open Studio Components

1313

Misc group components


tAddLocationFromIP

In the design workspace, select tAddLocationFromIP. Click the Component tab to define the basic settings for tAddLocationFromIP.

Click the Sync columns button to synchronize the schema with the input schema set with tFixedFlowInput. Browse to the GeoIP.dat file to set its path in the Database filepath field.
Make sure to download the latest version of the IP address lookup database file from the relevant site as indicated in the Basic settings view of tAddLocationFromIp.

In the Input parameters panel, set your input parameters as needed. In this scenario, the input column is the ip column defined earlier that holds an IP address. In the Location type panel, set location type as needed. In this scenario, we want to display the country name. In the design workspace, select tLogRow. Click the Component tab and define the basic settings for tLogRow as needed. In this scenario, we want to display values in cells of a table. Save your Job and press F6 to execute it.

The only row is generated to display the country name that is associated with the set IP address.

1314

Talend Open Studio Components

Misc group components


tBufferInput

tBufferInput
tBufferInput properties
Component family Misc

Function Purpose Basic settings

This component retrieves bufferized data in order to process it in a second subjob. The tBufferInput component retrieves data bufferized via a tBufferOutput component, for example, to process it in another subjob. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the case of tBufferInput, the column position is more important than the column label as this will be taken into account. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Usage

This component is the start component of a secondary Job which is triggered automatically at the end of the main Job.

Scenario: Retrieving bufferized data (Java)


This scenario describes a Job that retrieves bufferized data from a subjob and displays it on the console.

Talend Open Studio Components

1315

Misc group components


tBufferInput

Drop the following components from the Palette onto the design workspace: tFileInputDelimited and tBufferOutput. Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the access parameters to the input file.

In the File Name field, browse to the delimited file holding the data to be bufferized. Define the Row and Field separators, as well as the Header. Click [...] next to the Schema type field to describe the structure of the file.

Describe the Schema of the data to be passed on to the tBufferOutput component. Select the tBufferOutput component and set the parameters on the Basic Settings tab of the Component view.

1316

Talend Open Studio Components

Misc group components


tBufferInput

Generally speaking, the schema is propagated from the input component and automatically fed into the tBufferOutput schema. But you can also set part of the schema to be bufferized if you want to. Drop the tBufferInput and tLogRow components from the Palette onto the design workspace below the subjob you just created. Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and connect tBufferInput and tLogRow via a Row > Main link. Double-click tBufferInput to set its Basic settings in the Component view. In the Basic settings view, click [...] next to the Edit Schema field to describe the structure of the file.

Use the schema defined for the tFileInputDelimited component and click OK. The schema of the tBufferInput component is automatically propagated to the tLogRow. Otherwise, double-click tLogRow to display the Component view and click Sync column. Save your Job and press F6 to execute it.

The standard console returns the data retrieved from the buffer memory.

Talend Open Studio Components

1317

Misc group components


tBufferOutput

tBufferOutput
tBufferOutput properties
Component family Misc

Function Purpose

This component collects data in a buffer in order to access it later via webservice for example. This component allows a Webservice to access data. Indeed it had been designed to be exported as Webservice in order to access data on the web application server directly. For more information, see How to export Jobs as Webservice in Talend Open Studio User Guide. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the case of the tBufferOutput, the column position is more important than the column label as this will be taken into account. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Basic settings

Usage

This component is not startable (green background) and it requires an output component.

Scenario 1: Buffering data (Java)


This scenario describes an intentionally basic Job that bufferizes data in a child job while a parent Job simply displays the bufferized data onto the standard output console. For an example of how to use tBufferOutput to access output data directly on the Web application server, see Scenario 2: Buffering output data on the webapp server on page 1321.

1318

Talend Open Studio Components

Misc group components


tBufferOutput

Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content onto the Run console. The second Job (BufferChildJob) stores the defined data into a buffer memory. On the first Job, drop the following components: tRunJob and tLogRow from the Palette to the design workspace. On the second Job, drop the following components: tFileInputDelimited and tBufferOutput the same way. Lets set the parameters of the second Job first: Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the access parameters to the input file.

In File Name, browse to the delimited file whose data are to be bufferized. Define the Row and Field separators, as well as the Header.

Talend Open Studio Components

1319

Misc group components


tBufferOutput

Describe the Schema of the data to be passed on to the tBufferOutput component. Select the tBufferOutput component and set the parameters on the Basic Settings tab of the Component view.

Generally the schema is propagated from the input component and automatically fed into the tBufferOutput schema. But you could also set part of the schema to be bufferized if you want to. Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob component.

Edit the Schema if relevant and select the column to be displayed. The schema can be identical to the bufferized schema or different. You could also define context parameters to be used for this particular execution. To keep it simple, the default context with no particular setting is used for this use case. Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the data onto the standard console:

1320

Talend Open Studio Components

Misc group components


tBufferOutput

Scenario 2: Buffering output data on the webapp server


This scenario describes a Job that is called as a Webservice and stores the output data in a buffer directly on the server of the Web application. This scenario creates first a Webservice oriented Job with context variables, and next exports the Job as a Webservice. Creating a Webservice-oriented Job with context variables: Drop the following components from the Palette onto the design workspace: tFixedFlowInput and tBufferOutput. Connect tFixedFlowInput to tBufferOutput using a Row Main link.

In the design workspace, select tFixedFlowInput. Click the Component tab to define the basic settings for tFixedFlowInput. Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to describe the data structure you want to create from internal variables. In this scenario, the schema is made of three columns, now, firstname, and lastname.

Click the plus button to add the three parameter lines and define your variables. Click OK to close the dialog box and accept propagating the changes when prompted by the system. The three defined columns display in the Values panel of the Basic settings view of tFixedFlowInput.

Talend Open Studio Components

1321

Misc group components


tBufferOutput

Click in the Value cell of each of the first two defined columns and press Ctrl+Space to access the global variable list. From the global variable list, select TalendDate.getCurrentDate() and talendDatagenerator.getFirstName, for the now and firstname columns respectively. For this scenario, we want to define two context variables: nb_lines and lastname. In the first we set the number of lines to be generated, and in the second we set the last name to display in the output list. The tFixedFlowInput component will generate the number of lines set in the context variable with the three columns: now, firstname and lastname. For more information about how to create and use context variables, see How to use variables in the Contexts view of the Talend Open Studio User Guide. To define the two context variables: Select tFixedFlowInput and click the Contexts tab. In the Variables view, click the plus button to add two parameter lines and define them.

Click the Values as table tab and define the first parameter to set the number of lines to be generated and the second to set the last name to be displayed.

1322

Talend Open Studio Components

Misc group components


tBufferOutput

Click the Component tab to go back to the Basic settings view of tFixedFlowInput. Click in the Value cell of lastname column and press Ctrl+Space to access the global variable list. From the global variable list, select context.lastname, the context variable you created for the last name column.

Exporting your Job as a Webservice: Before exporting your Job as a Web service, see Importing/exporting items or Jobs in Talend Open Studio User Guide for more information. In the Repository tree view, right-click on the above created Job and select Export Job Scripts. The [Export Job Scripts] dialog box displays.

Talend Open Studio Components

1323

Misc group components


tBufferOutput

Click the Browse... button to select a directory to archive your Job in. In the Export type panel, select the export type you want to use in the Tomcat webapp directory (WAR in this example) and click Finish. The [Export Job Scripts] dialog box disappears. Copy the War folder and paste it in a Tomcat webapp directory.

Scenario 3: Calling a Job with context variables from a browser


This scenario describes how to call the Job you created in scenario 2 from your browser with/without modifying the values of the context variables. Type the following URL into your browser: http://localhost:8080//export_job/services/export_job3?method=runJob where export_job is the name of the webapp directory deployed in Tomcat and export_job3 is the name of the Job.

1324

Talend Open Studio Components

Misc group components


tBufferOutput

Click Enter to execute your Job from your browser.

The Job uses the default values of the context variables: nb_lines and lastname, that is it generates three lines with the current date, first name and Ford as a last name. You can modify the values of the context variables directly from your browser. To call the Job from your browser and modify the values of the two context variables, type the following URL: http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param %20lastname=MASSY&arg2=--context_param%20nb_lines=2. %20 stands for a blank space in the URL language. In the first argument arg1, you set the value of the context variable to display MASSY as last name. In the second argument arg2, you set the value of the context variable to 2 to generate only two lines. Click Enter to execute your Job from your browser.

Talend Open Studio Components

1325

Misc group components


tBufferOutput

The Job generates two lines with MASSY as last name.

Scenario 4: Calling a Job exported as Webservice in another Job


This scenario describes a Job that calls another Job exported as a Webservice using the tWebServiceInput. This scenario will call the Job created in scenario 2. Drop the following components from the Palette onto the design workspace: tWebServiceInput and tLogRow. Connect tWebserviceInput to tLogRow using a Row Main link.

In the design workspace, select tWebServiceInput. Click the Component tab to define the basic settings for tWebServiceInput.

Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to describe the data structure you want to call from the exported Job. In this scenario, the schema is made of three columns, now, firstname, and lastname.
1326 Talend Open Studio Components

Misc group components


tBufferOutput

Click the plus button to add the three parameter lines and define your variables.Click OK to close the dialog box. In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL http://localhost:8080/export_job/services/export_job3?WSDL where export_job is the name od the webapp directory where the Job to call is stored and export_job3 is the name of the Job itself.

In the Method name field, enter runJob. In the Parameters panel, Click the plus button to add two parameter lines to define your context variables. Click in the first Value cell to enter the parameter to set the number of generated lines using the following syntax: --context_param nb_line=3. Click in the second Value cell to enter the parameter to set the last name to display using the following syntax: --context_param lastname=Ford. Select tLogRow and click the Component tab to display the component view.
Talend Open Studio Components 1327

Misc group components


tBufferOutput

Set the Basic settings for the tLogRow component to display the output data in a tabular mode. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The system generates three columns with the current date, first name, and last name and displays them onto the log console in a tabular mode.

1328

Talend Open Studio Components

Misc group components


tContextDump

tContextDump
tContextDump properties
Component family Misc

Function Purpose

tContextDump makes a dump copy the values of the active job context. tContextDump can be used to transform the current context parameters into a flow that can then be used in a tContextLoad. This feature is very convenient in order to define once only the context and be able to reuse it in numerous Jobs via the tContextLoad. Schema type and Edit schema In the tContextDump use, the schema is read only and made of two columns, Key and Value, corresponding to the parameter name and the parameter value to be copied. A schema is a row description, i.e., it defines the fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Print operations Select this check box to display the context parameters set in the Run view.

Basic settings

Usage Limitation

This component creates from the current context values, a data flow, therefore it requires to be connected to an output component. tContextDump does not create any non-defined context variable.

Related Scenario
No scenario is available for this component yet.

Talend Open Studio Components

1329

Misc group components


tContextLoad

tContextLoad
tContextLoad properties
Component family Misc

Function Purpose

tContextLoad modifies dynamically the values of the active context. tContextLoad can be used to load a context from a flow. This component performs also two controls. It warns when the parameters defined in the incoming flow are not defined in the context, and the other way around, it also warns when a context value is not initialized in the incoming flow. But note that this does not block the processing. Schema type and Edit schema In tContextLoad, the schema must be made of two columns, including the parameter name and the parameter value to be loaded. A schema is a row description, i.e., it defines the fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. If a variable loaded, If a variable is loaded but does not appear in the but not in the context context, select how the notification must be displayed. In the shape of an Error, a warning ou an information (info). If a variable in the context, but not loaded Print operations Disable errors Disable warnings Disable infos If a variable appears in the context but is not loaded, select how the notification must be displayed. In the shape of an Error, a warning ou an information (info) Select this check box to display the context parameters set in the Run view. Select this check box to prevent the error from displaying. Select this check box to prevent the warning from displaying. Select this check box to prevent the information from displaying.

Basic settings

1330

Talend Open Studio Components

Misc group components


tContextLoad

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

tStatCatcher Statistics

Usage

This component relies on the data flow to load the context values to be used, therefore it requires a preceding input component and thus cannot be a start component. tContextLoad does not create any non-defined variable in the default context.

Limitation

Scenario: Dynamic context use in MySQL DB insert


This scenario is made of two subjobs. The first subjob aims at dynamically load the context parameters, and the second subjob uses the loaded context to display the content of a DB table.

For the first subjob, drop a tFilelist, tFileInputDelimited, tContextLoad from the Palette to the design workspace. Drop tMysqlInput and a tLogRow the same way for the second subjob. Connect all the components together. Create as many delimited files as there are different contexts and store them in a specific directory, named Contexts. In this scenario, test.txt contains the local database connection details for testing purpose. And prod.txt holds the actual production db details. Each file is made of two fields, contain the parameter name and the corresponding value, according to the context.

Talend Open Studio Components

1331

Misc group components


tContextLoad

In the tFileList component Basic settings panel, select the directory where both context files, test and prod, are held. In the tFileInputDelimited component Basic settings panel, press Ctrl+Space bar to access the global variable list. Select tFileList_1.CURRENT_FILEPATH to loop on the context files directory. Define the schema manually (Built-in). It contains two columns defined as: Key and Value. Accept the defined schema to be propagated to the next component (tContextLoad). For this scenario, select the Print operations check box in order for the context parameters in use to be displayed on the Run panel. Then double-click to open the tMySQLInput component Basic settings. For each of the field values being stored in a context file, press F5 and define the user-defined context parameter. For example: The Host field has for value parameter context.host, as the parameter name is host in the context file. Its actual value being talend-dbms.

1332

Talend Open Studio Components

Misc group components


tContextLoad

Then fill in the Schema information. If you stored the schema in the Repository Metadata, then you can retrieve it by selecting Repository and the relevant entry in the list. In the Query field, type in the SQL query to be executed on the DB table specified. In this case, a simple SELECT of the columns of the table, which will be displayed on the Run tab, through the tLogRow component. Eventually, press F6 to run the Job.

The context parameters as well as the select values from the DB table are all displayed on the Run view.

Talend Open Studio Components

1333

Misc group components


tFixedFlowInput

tFixedFlowInput
tFixedFlowInput properties
Component family Misc

Function Purpose Basic settings

tFixedFlowInput generates as many lines and columns as you want using the context variables. tFixedFlowInput allows you to generate fixed flow from internal variables. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Mode From the three options, select the mode that you want to use. Use Single Table : Enter the data that you want to generate in the relevant value field. Use Inline Table : Add the row(s) that you want to generate. Use Inline Content : SEnter the data that you want to generate, separated by the separators that you have already defined in the Row et Field Separator fields. Enter the number of lines to be generated. Between inverted commas, enter the values corresponding to the columns you defined in the schema dialog box via the Edit schema button.

Number of rows Values

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component can be used as a start or intermediate component and thus requires an output component.

Usage

Related scenarios
For related scenarios, see: Scenario 2: Buffering output data on the webapp server on page 1321, and,

1334

Talend Open Studio Components

Misc group components


tFixedFlowInput

Scenario: Iterating on a DB table and listing its column names on page 589.

Talend Open Studio Components

1335

Misc group components


tMemorizeRows

tMemorizeRows
tMemorizeRows properties
Component family Misc

Function

tMemorizeRows temporarily memorizes an array of incoming data in a row by row sequence and instantiates this array by indexing each of the memorized rows from 0. The maximum number of rows to be memorized at any given time is defined in the Basic settings view. tMemorizeRows memorizes a sequence of rows that pass this component and then allows its following component(s) to perform operations of your interest on the memorized rows. Schema type and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. - Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. - Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Row count to memorize Columns to memorize Define the row count to be memorized. Select the columns to be memorized from the incoming data schema. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Purpose

Basic settings

Advanced settings

tStatCatcher Statistics

Usage

This component can be used as intermediate step in a data flow or the last step before beginning a subjob. Note: You can use the global variable NB_LINE_ROWS to retrieve the value of the Row count to memorize field of the tMemorizeRows component.

1336

Talend Open Studio Components

Misc group components


tMemorizeRows

Connections

Outgoing links (from one component to another): Row: Main Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Scenario: Counting the occurrences of different ages


This scenario counts how many different ages there are within a group of 12 customers. In this scenario, the customer data is generated at random.

This Job uses 5 components which are: tRowGenerator: it generates 12 rows of customer data containing IDs, names and ages of the 12 customers. tSortRow: it sorts the 12 rows according to the age data. tMemorizeRows: it temporarily memorizes a specific number of incoming data rows at any give time and indexes the memorized data rows. tJavaFlex: it compares the age values of the data memorized by the preceding component, counts the occurrences of different ages and displays these ages in the Run view. tJava: it displays the number of occurrences of different ages. To replicate this scenario, proceed as follows: Drop tRowGenerator, tSortRow, tMemorizeRows, tJavaFlex and tJava on the design workspace. Right-click tRowGenerator In the contextual menu, select the Row > Main link. Click tSortRow to link these two components.

Talend Open Studio Components

1337

Misc group components


tMemorizeRows

Do the same to link together tSortRow, tMemorizeRows and tJavaFlex using the Row > Main link. Right-click tRowGenerator In the contextual menu, select the Trigger > On Subjob Ok link. Click tJava to link these two components. Double click the tRowGenerator component to open the its editor.

In this editor, click the plus button three times to add three columns and name them as: id, name, age. In the Type column, select Integer for id and age. In the Length column, enter 50 for name. In the Functions column, select random for id and age, then select getFirstName for name. In the field of Number of Rows for RowGenerator, type in 12. In the Column column, click age to open its corresponding Function parameters view in the lower part of this editor.

1338

Talend Open Studio Components

Misc group components


tMemorizeRows

In the Value column of the Function parameters view, type in the minimum age and maximum age that will be generated for the 12 customers. In this example, they are 10 and 25. Click OK to save the configuration. In the dialog box that pops up, click OK to propagate the change to the other components. Double click tSortRow to open its Component view.

In the Criteria table, click the plus button to add one row.

Talend Open Studio Components

1339

Misc group components


tMemorizeRows

In the Schema column column, select the data column you want to base the sort on. In this example, select age as it is the ages that should be compared and counted. In the Sort num or alpha column, select the type of the sort. In this example, as age is integer, select num, that is numerical, for this sort. In the Order asc or desc column, select desc as the sort order for this scenario. Double click tMemorizeRows to open its Component view.

In the Row count to memorize field, type in the maximum number of rows to be memorized at any given time. As you need to compare ages of two customers for each time, enter 2. Thus this component memorizes two rows at maximum at any given moment and always indexes the newly incoming row as 0 and the previously incoming row as 1. In the Memorize column of the Columns to memorize table, select the check box(es) to determine the column(s) to be memorized. In this example, select the check box corresponding to age. Double click tJavaFlex to open its Component view.

1340

Talend Open Studio Components

Misc group components


tMemorizeRows

In the Start code area, enter the Java code that will be called during the initialization phase. In this example, type in int count=0; in order to declare a variable count and assign the value 0 to it. In the Main code area, enter the Java code to be applied for each row in the data flow. In this scenario, type in if(age_tMemorizeRows_1[1]!=age_tMemorizeRows_1[0]) { count++; } System.out.println(age_tMemorizeRows_1[0]); This code compares two ages memorized by tMemorizeRows each time and count one change every time when the ages are found different. Then this code displays the ages that have been indexed as 0 by tMemorizeRows. In the End code area, enter the Java code that will be called during the closing phase. In this example, type in globalMap.put("count", count); to output the count result. Double click tJava to open its Component view.

In the Code area, type in the code System.out.println("Different ages : "+globalMap.get("count")); to retrieve the count result. Press F6 to run the Job. Then the result displays in the console of the Run view.

In the console, you can read that there are 10 different ages within the group of 12 customers.

Talend Open Studio Components

1341

Misc group components


tMsgBox

tMsgBox
tMsgBox properties
Component family Misc

Function Purpose Basic settings

Opens a dialog box with an OK button requiring action from the user. tMsgBox is a graphical break in the job execution progress. Title Buttons Text entered shows on the title bar of the dialog box created. Listbox of buttons you want to include in the dialog box. The button combinations are restricted and cannot be changed. Icon shows on the title bar of the dialog box. Free text to display as message on the dialog box. Text can be dynamic (for example: retrieve and show a file name).

Icon Message

Usage

This component can be used as intermediate step in a data flow or as a start or end object in the job flowchart. It can be connected to the next/previous component using either a Row or Iterate link. For Perl users: Make sure the relevant package is installed.

Limitation

Scenario: Hello world! type test


The following scenario creates a single-component Job, where tMsgBox is used to display the pid (process id) in place of the traditional Hello World! message. Drop a tMsgBox component from the Palette to the design workspace. Define the dialog box display properties:

Title is the message box title, it can be any variable.

1342

Talend Open Studio Components

Misc group components


tMsgBox

In the Message field, enter "Current date is: " between double quotation marks. Then click CTRL+Space to display the autocompletion list and select the following system routine, TalendDate.getCurrentDate. Put brackets around this routine. Switch to the Run tab to execute the Job defined. The Message box displays the message and requires the user to click OK to go to the next component or end the Job.

After the user clicked OK, the Run log is updated accordingly. Related topic: How to run a Job of Talend Open Studio User Guide.

Talend Open Studio Components

1343

Misc group components


tRowGenerator

tRowGenerator
tRowGenerator properties
Component family Misc

Function Purpose Basic settings

tRowGenerator generates as many rows and fields as are required using random values taken from a list. Can be used to create an input flow in a Job for testing purposes, in particular for boundary test sets Schema type and Edit schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or stored remotely in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: Select the Repository file where the properties are stored. When selected, the fields that follow are filled in automatically using fetched data. RowGenerato r editor The editor allows you to define the columns and the nature of data to be generated. You can use predefined routines or type in the function to be used to generate the data specified

Usage Limitation

The tRowGenerator Editors ease of use allows users without any Perl or Java knowledge to generate random data for test purposes. n/a

The tRowGenerator Editor opens up on a separate window made of two parts: a Schema definition panel at the top of the window and a Function definition and preview panel at the bottom.

Defining the schema


First you need to define the structure of data to be generated. Add as many columns to your schema as needed, using the plus (+) button. Type in the names of the columns to be created in the Columns area and select the Key check box if required Make sure you define then the nature of the data contained in the column, by selecting the Type in the list. According to the type you select, the list of Functions offered will differ. This information is therefore compulsory.

1344

Talend Open Studio Components

Misc group components


tRowGenerator

Some extra information, although not required, might be useful such as Length, Precision or Comment. You can also hide these columns, by clicking on the Columns drop-down button next to the toolbar, and unchecking the relevant entries on the list. In the Function area, you can select the predefined routine/function if one of them corresponds to your needs.You can also add to this list any routine you stored in the Routine area of the Repository. Or you can type in the function you want to use in the Function definition panel. Related topic: Defining the function on page 1345 of Talend Open Studio User Guide. Click Refresh to have a preview of the data generated. Type in a number of rows to be generated. The more rows to be generated, the longer itll take to carry out the generation operation.
The functions list differs from Perl to Java.

Defining the function


Select the [...] under Function in the Schema definition panel in order to customize the function parameters. Select the Function parameters tab The Parameter area displays Customized parameter as function name (read-only)

In the Value area, type in the Perl or Java function to be used to generate the data specified. Click on the Preview tab and click Preview to check out a sample of the data generated.

Talend Open Studio Components

1345

Misc group components


tRowGenerator

Scenario: Generating random java data


The following scenario creates a two-component Job made in Java, generating 50 rows structured as follows: a randomly picked-up ID in a 1-to-3 range, a random ascii First Name and Last Name generation and a random date taken in a defined range.

Drop a tRowGenerator and a tLogRow component from the Palette to the design workspace. Right-click tRowGenerator and select Row > Main. Drag this main row link onto the tLogRow component and release when the plug symbol displays. Double click tRowGenerator to open the Editor. Define the fields to be generated.

The random ID column is of integer type, the First and Last names are of string type and the Date is of date type. In the Function list, select the relevant function or set on the three dots for custom function. On the Function parameters tab, define the Values to be randomly picked up.

First_Name and Last_Name columns are to be generated using the getAsciiRandomString function that is predefined in the system routines. By default the length defined is 6 characters long. You can change this if need be. The Date column calls the predefined getRandomDate function. You can edit the parameter values in the Function parameters tab. Set the Number of Rows to be generated to 50. Click OK to validate the setting. Double click tLogRow to view the Basic settings. The default setting is retained for this Job. Press F6 to run the Job.

1346

Talend Open Studio Components

Misc group components


tRowGenerator

The 50 rows are generated following the setting defined in the tRowGenerator editor and the output is displayed in the Run console.

Talend Open Studio Components

1347

Misc group components


tRowGenerator

1348

Talend Open Studio Components

Orchestration components
This chapter details the main components that you can find in Orchestration family of the Talend Open Studio Palette. The Orchestration family groups together components that help you to sequence or orchestrate tasks or processing in your Jobs or subjobs and so on.

Orchestration components
tFileList

tFileList
tFileList belongs to two component families: File and Orchestration. For more information on tFileList, see tFileList on page 1108.

1350

Talend Open Studio Components

Orchestration components
tFlowToIterate

tFlowToIterate
tFlowToIterate Properties
Component family Orchestration

Function Purpose Basic settings

tFlowToIterate transforms a data flow into a list. Allows you to transform a processable flow into non processable data. Use the default (key, value) in global variables Customize When selected, the system uses the default value of the global variable in the current Job. key: Type in a name for the new global variable. Press Ctrl+Space to access all available variables either global or user-defined. value: Click in the cell to access a list of the columns attached to the defined global variable.

Usage Global Variables

You cannot use this component as a start component. tFlowToIterate requires an output component. Number of Lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Iterate Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Transforming data flow to a list


The following scenario describes a Job that reads a list of files from a defined input file, iterates on each of the files, selects input data and displays the output on the Run log console. Drop the following components from the Palette onto the design workspace: tFileInputDelimited (x2), tFlowToIterate, and tLogRow.

Talend Open Studio Components

1351

Orchestration components
tFlowToIterate

Via a right-click on each of the components, connect the first tFileInputdelimited to tFlowToIterate using a Row Main link, tFlowToIterate to the second tFileInputdelimited using an Iterate link, and the second tFileInputdelimited to tLogRow using a Row Main link.

In the design workspace, select the first tFileInputDelimited. Click the Component tab to display the relevant view where you can define the basic settings for tFileInputDelimited. In the Basic settings view, click the three-dot [...] button next to the File Name field to select the path to the input file.
The File Name field is mandatory.

The input file used in this scenario is called Customers. It is a text file that holds three other simple text files: Name, E-mail and Address. The first text file, Name, is made of one column holding customers names. The second text file, E-mail, is made of one column holding customers e-mail addresses. The third text file, Address, is made of one column holding customers postal addresses. Fill in all other fields as needed. For more information, see tFileInputDelimited properties on page 1054. In this scenario, the header and the footer are not set and there is no limit for the number of processed rows Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of one column, FileName.

1352

Talend Open Studio Components

Orchestration components
tFlowToIterate

In the design workspace, select tFlowToIterate. Click the Component tab to define the basic settings for tFlowToIterate.

If needed, select the Use the default (key, value) in global variables check box to use the default value of the global variable. Click the plus button to add new parameter lines and define your variables. Click in the key cell to modify the variable name as desired.
You can press Ctrl+Space in the key cell to access the list of global and user-specific variables.

In the design workspace, select the second tFileInputDelimited. Click the Component tab to define the basic settings for the second tFileInputDelimited.

Talend Open Studio Components

1353

Orchestration components
tFlowToIterate

In the File Name field, enter the file name using the variable containing the name of the file. You must use the correct syntax according to the language used, Perl or Java. In Perl, the relevant syntax is .$_globals{tFlowToIterate}{Name_of_File}.In java, the relevant syntax is +globalMap.get(file). Fill in all other fields as needed. For more information, see tFileInputDelimited properties on page 1054. In the design workspace, select the last component, tLogRow. Click the Component tab to define the basic settings for tLogRow.

Define your settings as needed. For more information, see tLogRow properties on page 1305. Save your Job and press F6 to execute it

Customers names, customers e-mails, and customers postal addresses display on the console preceded by the schema column name.

1354

Talend Open Studio Components

Orchestration components
tForeach

tForeach
tForeach Properties
Component Family Orchestration

Function Purpose Basic settings

tForeach creates a loop on a list for an iterate link. tForeach allows you to to create a loop on a list for an iterate link. Values Use the [+] button to add rows to the Values table. Then click on the fields to enter the list values to be iterated upon, between double quotation marks. Select this check box to collect the log data at a component level.

Advanced settings Usage Limitation

tStatCatcher Statistics

tForeach is an input component and requires an Iterate link to connect it to another component. n/a

Scenario: Iterating on a list and retrieving the values


This scenario describes a two component Job in which a list is created and iterated upon in a tForEach component. The values are then retrieved in a tJava component. Drop a tForeach and a tJava component onto the design workspace:

Link tForeach to tJava using a Row > Iterate connection. Double-click tForEach to open its Basic settings view:

Talend Open Studio Components

1355

Orchestration components
tForeach

Click the

button to add as many rows to the Values list as required.

Click on the Value fields to enter the list values, between double quotation marks. Double-click tJava to open its Basic settings view:

Enter the following Java code in the Code area: System.out.println(globalMap.get("tForeach_1_CURRENT_VALUE") +"_out"); Save the Job and press F6 to run it The tJava run view displays the list values retrieved from tForeach, each one suffixed with _out:

1356

Talend Open Studio Components

Orchestration components
tForeach

Talend Open Studio Components

1357

Orchestration components
tInfiniteLoop

tInfiniteLoop
tInfiniteLoop Properties
Component Family Orchestration

Function Purpose Basic settings

tInfiniteLoop runs an infiite loop on a task. tInfiniteLoop allows you to to execute a task or a Job automatically, based on a loop. Wait at each iteration (in milliseconds) tStatCatcher Statistics Enter the time delay between iterations.

Advanced settings Usage Global Variables

Select this check box to collect the log data at a component level.

tInifniteLoop is an input component and requires an Iterate link to connect it to the following component. Current iteration: Indicates the current iteration number. This is available as a Flow variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Iterate Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate; Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see Scenario: Job execution in a loop on page 1363, regarding the tLoop component.

1358

Talend Open Studio Components

Orchestration components
tIterateToFlow

tIterateToFlow
tIterateToFlow Properties
Component family Orchestration

Function Purpose Basic settings

tIterateToFlow transforms a list into a data flow that can be processed. Allows you to transform non processable data into a processable flow. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In the case of tIterateToFlow, the schema is to be defined Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Mapping Column: Enter a name for the column to be created Value: Press Ctrl+Space to access all of the available variables, be they global or user-defined. Select this check box to collect the log data at a component level.

Advanced Settings Usage Connections

tStatCatcher Statistics

This component is not startable (green background) and it requires an output component. Outgoing links (from one component to another): Row: Main. Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate; For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Talend Open Studio Components

1359

Orchestration components
tIterateToFlow

Scenario: Transforming a list of files as data flow


The following scenario describes a Job that iterates on a list of files, picks up the filename and current date and transforms this into a flow, that gets displayed on the console.

Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the design workspace. Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the tLogRow using a Row main connection. In the tFileList Component view, set the directory where the list of files is stored.

In this example, the files are three simple .txt files held in one directory: Countries. No need to care about the case, hence clear the Case sensitive check box. Leave the Include Subdirectories check box unchecked. Then select the tIterateToFlow component et click Edit Schema to set the new schema

Add two new columns: Filename of String type and Date of date type. Make sure you define the correct pattern in Java. Click OK to validate. Notice that the newly created schema shows on the Mapping table.

1360

Talend Open Studio Components

Orchestration components
tIterateToFlow

In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific variables. For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It retrieves the current filepath in order to catch the name of each file, the Job iterates on. For the Date column, use the Talend routine: Date.GetDate (Perl) or TalendDate.getCurrentDate() (in Java) Then on the tLogRow component view, select the Print values in cells of a table check box. Save your Job and press F6 to execute it.

The filepath displays on the Filename column and the current date displays on the Date column.

Talend Open Studio Components

1361

Orchestration components
tLoop

tLoop
tLoop Properties
Component family Orchestration

Function Purpose Basic settings

tLoop iterates on a task execution. tLoop allows you to execute a task or a Job automatically, based on a loop Loop Type Select a type of loop to be carried out: either For or While. For: The task or Job is carried out for the defined number of iteration While: The task or Job is carried until the condition is met. Type in the first instance number which the loop should start from. A start instance number of 2 with a step of 2 means the loop takes on every even number instance. Type in the last instance number which the loop should finish with. Type in the step the loop should be incremented of. A step of 2 means every second instance. Type in an expression initiating the loop. Type in the condition that should be met for the loop to end. Type in the expression showing the operation to be performed at each loop.

For

From

To Step While Declaration Condition Iteration Usage Global Variables

tLoop is to be used as a start component and can only be used with an iterate connection to the next component. Current value: Indicates the current value. This is available as a Flow variable. Returns an integer. Current iteration: Indicates the number of the current iteration. This is available as a Flow variable Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

1362

Talend Open Studio Components

Orchestration components
tLoop

Connections

Outgoing links (from one component to another): Row: Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Iterate; Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Job execution in a loop


This scenario describes a Job composed of a parent Job and a child Job. The parent Job implements a loop which executes n times a child Job, with a pause between each execution.

In the parent Job, drop a tLoop, a tRunJob and a tSleep component from the Palette to the design workspace. Connect the tLoop to the tRunJob using an Iterate connection. Then connect the tRunJob to a tSleep component using a Row connection. On the child Job, drop the following components: tPOP, tFileInputMail and tLogRow the same way. On the Basic settings panel of the tLoop component, type in the instance number to start from (1), the instance number to finish with (5) and the step (1)

Talend Open Studio Components

1363

Orchestration components
tLoop

On the Basic settings panel of the tRunJob component, select the child Job in the list of stored Jobs offered. In this example: popinputmail Select the context if relevant. In this use case, the context is default with no variables stored. In the tSleep Basic settings panel, type in the time-off value in second. In this example, type in 3 seconds in the Pause field. Then in the child Job, define the connection parameters to the pop server, on the Basic settings panel. In the tFileInputMail Basic settings panel, select a global variable as File Name, to collect the current file in the directory defined in the tPOP component. Press Ctrl+Space bar to access the variable list. In this example, the variable to be used is: $_globals{tPOP_1}{CURRENT_FILEPATH} (for Perl) ((String)globalMap.get("tPOP_1_CURRENT_FILEPATH")) (for Java) Define the Schema, for it to include the mail element to be processed, such as author, topic, delivery date and number of lines. In the Mail Parts table, type in the corresponding Mail part for each column defined in the schema. ex: author comes from the From part of the email file. Then connect the tFileInputMail to a tLogRow to check out the execution result on the Run view. Press F6 to run the Job.

1364

Talend Open Studio Components

Orchestration components
tPostjob

tPostjob
tPostjob Properties
Component family Orchestration

Function Purpose Usage Connections

tPostjob starts the execution of a postjob. tPostjob triggers a task required after the execution of a Job tPostjob is a start component and can only be used with an iterate connection to the next component. Outgoing links (from one component to another): Trigger: On Component Ok. Incoming links (from one component to another): Trigger: Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

For more information about the tPostjob component, see How to use the tPrejob and tPostjob components of Talend Open Studio User Guide.

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

1365

Orchestration components
tPrejob

tPrejob
tPrejob Properties
Component family Orchestration

Function Purpose Usage Connections

tPrejob starts the execution of a prejob. tPrejob triggers a task required for the execution of a job tPrejob is a start component and can only be used with an iterate connection to the next component. Outgoing links (from one component to another): Trigger: On Component Ok.. Incoming links (from one component to another): Trigger: Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

For more information about the tPrejob component, see How to use the tPrejob and tPostjob components of Talend Open Studio User Guide.

Related scenario
No scenario is available for this component yet.

1366

Talend Open Studio Components

Orchestration components
tReplicate

tReplicate
tReplicate Properties
Component family Orchestration

Function Purpose Basic settings

Duplicate the incoming schema into two identical output flows. Allows you to perform different operations on the same schema. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Usage Connections

This component is not startable (green background), it requires an Input component and an output component. Outgoing links (from one component to another): Row: Main. Trigger: Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row: Main; Reject; For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Related scenario
For use case showing this component in use, see tReplaceList on page 299.

Talend Open Studio Components

1367

Orchestration components
tRunJob

tRunJob
tRunJob belongs to two component families: System and Orchestration. For more information on tRunJob, see tRunJob on page 1494.

1368

Talend Open Studio Components

Orchestration components
tSleep

tSleep
tSleep Properties
Component family Orchestration

Function Purpose

tSleep implements a time off in a job execution. Allows you to identify possible bottlenecks using a time break in the Job for testing or tracking purpose. In production, it can be used for any needed pause in the Job to feed input flow for example. Pause (in second) Time in second the job execution is stopped for.

Basic settings Usage Connections

tSleep component is generally used as a middle component to make a break/pause in the Job, before resuming the Job. Outgoing links (from one component to another): Row: Main; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Related scenarios
For use cases in relation with tSleep, see tLoop Scenario: Job execution in a loop on page 1363.

Talend Open Studio Components

1369

Orchestration components
tUnite

tUnite
tUnite Properties
Component family Orchestration

Function Purpose Basic settings

Merges data from various sources, based on a common schema. Centralize data from various and heterogeneous sources. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings Usage Global Variables

tStatCatcher Statistics

Select this check box to collect log data at the component level.

This component is not startable and requires one or several input components and an output component. Number of lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Connections

Outgoing links (from one component to another): Row: Main. Trigger: Run if; On Component Ok; On Component Error Incoming links (from one component to another): Row: Main; Reject. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

1370

Talend Open Studio Components

Orchestration components
tUnite

Scenario: Iterate on files and merge the content


The following Job iterates on a list of files then merges their content and diplays the final 2-column content on the console.

Drop the following components onto the design workspace: tFileList, tFileInputDelimited, tUnite and tLogRow. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

In the Case Sensitive field, select Yes to consider the letter case. The files are pretty basic and contain a list of countries and their respective score.

Talend Open Studio Components

1371

Orchestration components
tUnite

Select the tFileInputDelimited component, and display this components Basic settings view. In this use case, the input files connection properties are not centrally stored in the Repository, therefore select Built-In as Property type and set every single field manually.

To fill in the File Name field, use the Ctrl+Space bar combination to access the variable completion list. To process all files from the directory defined in the tFileList, select tFileList.CURRENT_FILEPATH on the global variable list. Keep the default setting for the Row and Field separators as well as the other fields. Click the Edit Schema button and set manually the 2-column schema to reflect the input files content.

For this example, the 2 columns are Country and Points .They are both nullable. The Country column is of String type and the Points column is of Integer type. Click OK to validate the setting and accept to propagate the schema throughout the Job.

1372

Talend Open Studio Components

Orchestration components
tUnite

Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only. In the tLogRow Component view, select the Print values in cells of the table check box to display properly the output values. Save the Job and press F6 to execute it.

The console shows the data from the various files, merged into one single table. This uniformized output can then be aggregated to set

Talend Open Studio Components

1373

Orchestration components
tWaitForFile

tWaitForFile
tWaitForFile properties
Component family Orchestration

Function Purpose Basic settings

tWaitForFile component iterates on a given folder for file insertion or deletion then triggers a subjob to be executed when the condition is met. This component allows a subjob to be triggered given a condition linked to file presence or removal. Wait at each iteration (in seconds) Max. iterations (infinite if empty) Directory to scan File mask Include subdirectories Case sensitive Include present file Trigger action when rowcount is Set the time interval in seconds between each check for the file. Number of checks for file before the jobs times out. Name of the folder to be checked for insert or removal Mask of the file to be searched for insertion or removal. Select this check box to include the sub-folders. Select this check box to activate case sensitivity. Select this check box to include the file in use. Select the condition to be met for the action to be carried out: A file is created A file is deleted A file is updated A file is created or updated or deleted. Select the action to be carried out: either stop the iterations when the condition is met (exit loop) or continue the loop until the end of the max iteration number (continue loop). A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. . Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Then

Schema and Edit Schema

1374

Talend Open Studio Components

Orchestration components
tWaitForFile

Advanced Settings Usage

Wait for file to be released

Select this check box to delay the Job execution until the file is closed.

This component plays the role of the start (or trigger) component of the subjob which gets executed under the condition described. Therefore this component requires a subjob to be connected to via an Iterate link. Current iteration: Indicates the number of the current iteration. This is available as a Flow variable. Returns an integer. Present File: Indicates the name of the current file in the iteration which activated the trigger. This is available as a Flow variable. Returns a string. Deleted File: Indicates the path and name of the deleted file, which activated the trigger. This is available as a Flow variable Returns a string. Created File Name: Indicates the name and path to a newly created file which activated the trigger. This is available as a Flow variable. Returns a string. Updated File: Indicates the name and path to a file which has been updated, thereby activating the trigger. This is available as a Flow variable. Returns a string. File Name: Indicates the name of a file which has been created, deleted or updated, thereby activating the trigger. This is available as a Flow variable. Returns a string. Not Updated File Name: Indicates the names of files which have not been updated, thereby activating the trigger. This is available as a Flow variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Talend Open Studio Components

1375

Orchestration components
tWaitForFile

Connections

Outgoing links (from one component to another): Row: Main; Iterate. Trigger: On Subjob Ok; Run if; On Component Ok; On Component Error Incoming links (from one component to another): Row:Iterate. Trigger: On Subjob Ok; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Waiting for a file to be removed


This scenario describes a Job scanning a directory and waiting for a file to be removed from this directory, in order for a subjob to be executed. When the condition of file removal is met, then the subjob simply displays a message box showing the file being removed.

This use case only requires two components from the Palette: tWaitForFile and tMsgbox Click and place these components on the design workspace and connect them using an Iterate link to implement the loop. Then select the tWaitForFile component, and on the Basic Settings view of the Component tab, set the condition and loop properties:

1376

Talend Open Studio Components

Orchestration components
tWaitForFile

In the Time (in seconds) between iteration field, set the time in seconds you want to wait before the next iteration starts. In this example, the directory will be scanned every 5 seconds. In the Max. number of iterations (infinite loop if empty) field, fill out the number of iterations max you want to have before the whole Job is forced to end. In this example, the directory will be scanned a maximum of 5 times. In the Directory to scan field, type in the path to the folder to scan. In the Trigger action when field, select the condition to be met, for the subjob to be triggered. In this use case, the condition is a file is deleted (or moved) from the directory. In the Then field, select the action to be carried out when the condition is met before the number of iteration defined is reached. In this use case, as soon as the condition is met, the loop should be ended. Then set the subjob to be executed when the condition set is met. In this use case, the subjob simply displays a message box. Select the tMsgBox component, and on the Basic Setting view of the Component tab, set the message to be displayed. Fill out the Title and Message fields. Select the type of Buttons and the Icon

In the Message field, you can write any type of message you want to display and use global variables available in the auto-completion list via Ctrl+Space combination. For example, in Perl, the message used for this use case is: "Deleted File: $_globals{tWaitForFile_1}{DELETED_FILE}, on Iteration : $_globals{tWaitForFile_1}{CURRENT_ITERATION}\n" The equivalent Java message is: "Deleted file: "+((String)globalMap.get("tWaitForFile_1_DELETED_FILE"))+" on iteration Nr:"+((Integer)globalMap.get("tWaitForFile_1_CURRENT_ITERATI ON"))

Talend Open Studio Components

1377

Orchestration components
tWaitForFile

Then execute the Job via the F6 key. While the loop is executing, remove a file from the location defined. The message pops up and shows the defined message.

1378

Talend Open Studio Components

Orchestration components
tWaitForSocket

tWaitForSocket
tWaitForSocket properties
Component Family Orchestration

Function Purpose Basic settings

tWaitForSocket component makes a loop on a defined port, to look for data, and triggers a subjob when the condition is met. This component triggers a job based on a defined condition. Port End of line separator Then DB server listening port. Enter the end of line separator to be used.. Select the action to be carried out: keep on listening or close socket Select this check box to display the client or server data. Select this check box to collect the log data at a component level.

Print client/server data Advanced settings Usage tStatCatcher Statistics

This is an input, trigger component for the subjob executed depending on the condition set. Hence, it needs to be connected to a subjob via an Iterate link. Client input data: Returns the data transmitted by the client. This is available as a Flow variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Connections

Outgoing links (from one component to another): Row: Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error. Incoming links (from one component to another): Row:Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Talend Open Studio Components

1379

Orchestration components
tWaitForSocket

Related scenario
No scenario is available for this component yet.

1380

Talend Open Studio Components

Orchestration components
tWaitForSqlData

tWaitForSqlData
tWaitForSqlData properties
Component family Orchestration

Function

tWaitForSqlData component iterates on a given connection for insertion or deletion of rows and triggers a subjob to be executed when the condition is met. This component allows a subjob to be triggered given a condition linked to sql data presence. Wait at each iteration (in seconds) Max. iterations (infinite if empty) Set the time interval in seconds between each check for the sql data. Number of checks for sql data before the Jobs times out.

Purpose Basic settings

Use an existing A connection needs to be open to allow the loop connection/Compon to check for sql data on the defined DB. ent List When a Job contains the parent Job and the child Job, Component list presents only the connection components in the same Job level, so if you need to use an existing connection from the other level, you can - from the available database connection component in the level where the current component is, select the Use or register a shared DB connection check box. For more information about this check box, see the section for the connection components in Database components on page 315 according to the database you are using, - otherwise, still in the level of the current component, deactivate the connection components and use Dynamic settings of the component to specify the intended connection manually. In this case, make sure the connection name is unique and distinctive all over through the two Job levels. For more information about Dynamic settings, see your studio user guide. Table to scan Trigger action when rowcount is Name of the table to be checked for insert or deletion Select the condition to be met for the action to be carried out: Equal to Not Equal to Greater than Lower than Greater or equal to Lower or equal to Define the value to take into account. 1381

Value

Talend Open Studio Components

Orchestration components
tWaitForSqlData

Then

Select the action to be carried out: either stop the iterations when the condition is met (exit loop) or continue the loop until the end of the max iteration number (continue loop).

Usage

Although this component requires a Connection component to open the DB access, it plays also the role of the start (or trigger) component of the subjob which gets executed under the condition described. Therefore this component requires a subjob to be connected to via an Iterate link. Current iteration: Returns the number of the current iteration. This is available as a Flow variable. Returns an integer. Row count: Indicates the number of records detected in the table. This is available as a Flow variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Limitation

n/a

Scenario: Waiting for insertion of rows in a table


This scenario describes a Job reading a DB table and waiting for data to be put in this table in order for a subjob to be executed. When the condition of the data insertion in the table is met, then the subjob performs a Select* on the table and simply displays the content of the inserted data onto the standard console. This use case is presented in Perl, but there are no difference in setting if you implement it in Java.

Drop the following components from the Palette onto the design workspace: tMySqlConnection, tWaitForSqlData, tMysqlInput, tLogRow. Connect the tMysqlConnection component to the tWaitforSqlData using an OnSubjobOK link, available on the right-click menu. Then connect the tWaitForSqlData component to the subjob using an Iterate link as no actual data is transferred in this part. Indeed, simply a loop is implemented by the tWaitForSqlData until the condition is met. On the subjob to be executed if the condition is met, a tMysqlInput is connected to the standard console component, tLogRow. As the connection passes on data, use a Row main link.

1382

Talend Open Studio Components

Orchestration components
tWaitForSqlData

Now, set the connection to the table to check at regular intervals. On the Basic Settings view of the tMySqlConnection Component tab, set the DB connection properties

Fill out the Host, Port, Database, Username, Password fields to open the connection to the Database table. Select the relevant Encoding if needed. Then select the tWaitForSqlData component, and on the Basic Setting view of the Component tab, set its properties. In the Wait at each iteration field, set the time in seconds you want to wait before the next iteration starts.

In the Max iterations field, fill out the number of iterations max you want to have before the whole Job is forced to end. The tWaitForSqlData component requires a connection to be open in order to loop on the defined number of iteration. Select the relevant connection (if several) in the Component List combo box. In the Table to scan field, type in the name of the table in the DB to scan.In this example: test_datatypes. In the Trigger action when rowcount is and Value fields, select the condition to be met, for the subjob to be triggered. In this use case, the number of rows in the scanned table should be greater or equal to 1. In the Then field, select the action to be carried out when the condition is met before the number of iteration defined is reached. In this use case, as soon as the condition is met, the loop should be ended. Then set the subjob to be executed when the condition set is met. In this use case, the subjob simply selects the data from the scanned table and displays it on the console.

Talend Open Studio Components

1383

Orchestration components
tWaitForSqlData

Select the tMySqlInput component, and on the Basic Setting view of the Component tab, set the connection to the table.

If the connection is set in the Repository, select the relevant entry on the list. Or alternatively, select the Use an existing connection check box and select the relevant connection component on the list. In this use case, the schema corresponding to the table structure is stored in the Repository. Fill out the Table Name field with the table the data is extracted from, Test_datatypes. Then in the Query field, type in the Select statement to extract the content from the table. No particular setting is required in the tLogRow component for this use case. Then before executing the Job, make sure the table to scan (test_datatypes) is empty, in order for the condition (greater or equal to 1) to be met. Then execute the Job by pressing the F6 key on your keyboard. Before the end of the iterating loop, feed the test_datatypes table with one or more rows in order to meet the condition.

The Job ends when this table insert is detected during the loop, and the table content is thus displayed on the console.

1384

Talend Open Studio Components

Processing components
This chapter details the main components that you can find in Processing family of the Talend Open Studio Palette. The Processing family gathers together components that help you to perform all types of processing tasks on data flows, including aggregation, mapping, transformation, denormalizing, filtering and so on.

Processing components
tAggregateRow

tAggregateRow
tAggregateRow properties
Component family Processing

Function

tAggregateRow receives a flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum...). Helps to provide a set of metrics based on values or calculations. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Group by Define the aggregation sets, the values of which will be used for calculations. Output Column: Select the column label in the list offered based on the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations. Ex: Select Country to calculate an average of values for each country of a list or select Country and Region if you want to compare one countrys regions with another country regions. Input Column: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different. Operations Select the type of operation along with the value to use for the calculation and the output field. Output Column: Select the destination field in the list. Function: Select the operator among: count, min, max, avg, sum, first, last, list, list(objects), count(distinct), standard deviation. Input column: Select the input column from which the values are taken to be aggregated.

Purpose Basic settings

1386

Talend Open Studio Components

Processing components
tAggregateRow

Ignore null values: Select the check boxes corresponding to the names of the columns for which you want the NULL value to be ignored. Advanced settings Delimiter(only for list operation) Use financial precision, this is the max precision for sum and avg operations, checked option heaps more memory and slower than unchecked. Enter the delimiter you want to use to separate the different operations. Select this check box to use a financial precision. This is a max precision but consumes more memory and slows the processing. We advise you to use the BigDecimal type for the output in order to obtain precise results.

Check type overflow Checks the type of data to ensure that the job (slower) doesnt crash. Check ULP (Unit in the Last Place), ensure that a value will be incremented or decremented correctly, only float and double types. (slower) tStatCatcher Statistics Usage Select this check box to ensure the most precise results possible for the Float and Double types.

Check this box to collect the log data at component level.

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. Usually the use of tAggregateRow is combined with the tSortRow component n/a

Limitation

Scenario: Aggregating values and sorting data


The following scenario describes a four-component Job. As input component, a CSV file contains countries and notation values to be sorted by best average value. This component is connected to a tAggregateRow operator, in charge of the average calculation then to a tSortRow component for the ascending sort. The output flow goes to the new csv file.

From the File folder in the Palette, drop a tFileInputDelimited component to the design workspace.
Talend Open Studio Components 1387

Processing components
tAggregateRow

Click the label and rename it as Countries. Or rename it from the View tab panel In the Basic settings tab panel of this component, define the filepath and the delimitation criteria. Or select the metadata file in the repository if it exists. Click Edit schema... and set the columns: Countries and Points to match the file structure. If your file description is stored in the Metadata area of the Repository, the schema is automatically uploaded when you click Repository in Schema type field. Then from the Processing folder in the Palette, drop a tAggregateRow component to the design workspace. Rename it as Calculation. Connect Countries to Calculation via a right-click and select Row > Main. Double-click Calculation (tAggregateRow component) to set the properties. Click Edit schema and define the output schema. You can add as many columns as you need to hold the set operations results in the output flow.

In this example, well calculate the average notation value per country and we will display the max and the min notation for each country, given that each country holds several notations. Click OK when the schema is complete. To carry out the various set operations, back in the Basic settings panel, define the sets holding the operations in the Group By area. In this example, select Country as group by column. Note that the output column needs to be defined a key field in the schema. The first column mentioned as output column in the Group By table is the main set of calculation. All other output sets will be secondary by order of display. Select the input column which the values will be taken from. Then fill in the various operations to be carried out. The functions are average, min, max for this use case. Select the input columns, where the values are taken from and select the check boxes in the Ignore null values list as needed.

1388

Talend Open Studio Components

Processing components
tAggregateRow

Drop a tSortRow component from the Palette onto the design workspace. For more information regarding this component, see tSortRow properties on page 1483. Connect the tAggregateRow to this new component using a row main link. On the Component view of the tSortRow component, define the column the sorting is based on, the sorting type and order.

In this case, the column to be sorted by is Country, the sort type is alphabetical and the order is ascending. Drop a tFileOutputDelimited from the Palette to the design workspace and define it to set the output flow. Connect the tSortRow component to this output component.

Talend Open Studio Components

1389

Processing components
tAggregateRow

In the Component view, enter the output filepath. Edit the schema if need be. In this case the delimited file is of csv type. And select the Include Header check box to reuse the schema column labels in your output flow. Press F6 to execute the Job. The csv file thus created contains the aggregating result.

1390

Talend Open Studio Components

Processing components
tAggregateSortedRow

tAggregateSortedRow
tAggregateSortedRow properties
Component family Processing

Function

tAggregateSortedRow receives a sorted flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum...). Helps to provide a set of metrics based on values or calculations. As the input flow is meant to be sorted already, the performance are hence greatly optimized. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Input rows count Specify the number of rows that are sent to the tAggregateSortedRow component. If you specified a Limit for the number of rows to be processed in the input component, you will have to use that same limit in the Input rows count field. Group by Define the aggregation sets, the values of which will be used for calculations. Output Column: Select the column label in the list offered based on the schema structure you defined. You can add as many output columns as you wish to make more precise aggregations. Ex: Select Country to calculate an average of values for each country of a list or select Country and Region if you want to compare one countrys regions with another country regions.

Purpose

Basic settings

Talend Open Studio Components

1391

Processing components
tAggregateSortedRow

Input Column: Match the input column label with your output columns, in case the output label of the aggregation set needs to be different. Operations Select the type of operation along with the value to use for the calculation and the output field. Output Column: Select the destination field in the list. Function: Select the operator among: count, min, max, avg, first, last. Input column: Select the input column from which the values are taken to be aggregated. Ignore null values: Select the check boxes corresponding to the names of the columns for which you want the NULL value to be ignored. Usage Limitation This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. n/a

Related scenario
For related use case, see tAggregateRow Scenario: Aggregating values and sorting data on page 1387.

1392

Talend Open Studio Components

Processing components
tConvertType

tConvertType
tConvertType properties
Component family Processing

Function Purpose Basic settings

tConvertType allows specific conversions at run time from one Talend java type to another. Helps to automatically convert one Talend java type to another and thus.avoid compiling errors. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You create and store the schema locally for only the current component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository, and thus you can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Auto Cast Manual Cast This check box is selected by default. It performs an automatic java type conversion. This mode is not visible if the Auto Cast check box is selected. It allows you to precise manually the columns where a java type conversion is needed. This check box is selected to set the empty values of String or Object type to null for the input data. This check box is selected to kill the Job when an error occurs.

Set empty values to Null before converting Die on error Usage Limitation

This component cannot be used as a start component as it requires an input flow to operate. n/a

Talend Open Studio Components

1393

Processing components
tConvertType

Scenario: Converting java types


This Java scenario describes a four-component Job where the tConvertType component is used to convert Java types in three columns, and a tMap is used to adapt the schema and have as an output the first of the three columns and the sum of the two others after conversion.
In this scenario, the input schemas for the input delimited file are stored in the repository, you can simply drag and drop the relevant file node from Repository Metadata - File delimited onto the design workspace to automatically retrieve the tFileInputDelimited components setting. For more information, see How to drop components from the Metadata node in Talend Open Studio User Guide.

Drop the following components from the Palette onto the design workspace: tConvertType, tMap, and tLogRow. In the Repository tree view, expand Metadata and from File delimited drag the relevant node, JavaTypes in this scenario, to the design workspace. The [Components] dialog box displays. From the component list, select tFileInputDelimited and click Ok. A tFileInputComponent called Java types displays in the design workspace. Connect the components using Row Main links.

In the design workspace, select tFileInputDelimited and click the Component tab to define its basic settings. In the Basic settings view, set Property Type to Repository since the file details are stored in the repository. The fields to follow are pre-defined using the fetched data.

The input file used in this scenario is called input. It is a text file that holds string, integer, and float java types.

1394

Talend Open Studio Components

Processing components
tConvertType

In the Basic settings view, fill in all other fields as needed. For more information, see tFileInputDelimited properties on page 1054. In this scenario, the header and the footer are not set and there is no limit for the number of processed rows. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of three columns, StringtoInteger, IntegerField, and FloatToInteger.

Click Ok to close the dialog box. In the design workspace, select tConvertType and click the Component tab to define its basic settings.

Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns from the tFileInputDelimited component. If needed, click Edit schema to describe manually the data structure of this processing component.

Talend Open Studio Components

1395

Processing components
tConvertType

In this scenario, we want to convert a string type data into an integer type and a float type data into an integer type. Click OK to close the [Schema of tConvertType] dialog box. In the design workspace, double-click tMap to open the Map editor. The Map editor opens displaying the input metadata of the tFileInputDelimited component

In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and name them as StringToInteger and Sum. In the Map editor, drag the StringToInteger row from the input table to the StringToInteger row in the output table. In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input table to the Sum row in the output table and click OK to close the Map editor.

1396

Talend Open Studio Components

Processing components
tConvertType

In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The string type data is converted into an integer type and displayed in the StringToInteger column on the console. The float type data is converted into an integer and added to the IntegerField value to give the addition result in the Sum column on the console.

Talend Open Studio Components

1397

Processing components
tDenormalize

tDenormalize
tDenormalize Properties
Component family Processing/Fields

Function Purpose Basic settings

Denormalizes the input flow based on one column. tDenormalize helps synthesize the input flow. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In this component, the schema is read-only. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. To denormalize Java only In this table, define the parameters used to denormalize your columns. Column: Select the column to denormalize. Delimiter: Type in the separator you want to use to denormalize your data between double quotes. Merge same value: Select this check box to merge identical values. Select the column from the input flow which the normalization is based on (included in key) Select one or several columns to be grouped. We recommend to remove unused columns from the schema before processing. Enter the separator which will delimit data in the denormalized flow.

Perl only Java only

Column to denormalize Group by

Item Separator Perl only Advanced settings Deduplicate items Perl only tStatCatcher Statistics Usage Limitation

Removes duplicates when concatenating denormalized values. Select this ckeck box to collect the log data at component level.

This component can be used as intermediate step in a data flow. Note that this component may change the order of the input flow data in Java, and in Perl, when the Deduplicate check box is selected.

Scenario 1: Denormalizing on one column in Perl


This scenario illustrates a Job denormalizing one column in a delimited file.

1398

Talend Open Studio Components

Processing components
tDenormalize

Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to the design workspace. Connect the components using Row main connections. On the tFileInputDelimited Component view, set the filepath to the file to be denormalized.

Define the Header, Row Separator and Field Separator parameters. The input file schema is made of two columns, Fathers and Children.

In the Basic settings of tDenormalize, define the column that contains multiple values to be grouped. In this use case, the column to denormalize is Children.

Talend Open Studio Components

1399

Processing components
tDenormalize

Set the Delimiter to separate the grouped values. Beware as only one column can be denormalized. Select the Merge same value check box, if you know that some values to be grouped are strictly identical. Save your Job and press F6 to execute it.

All values from the column Children (set as column to denormalize) are grouped by their Fathers column. Values are separated by a comma.

Scenario 2: Denormalizing on multiple columns


This scenario illustrates a Job denormalizing two columns from a delimited file.

Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to the design workspace. Connect all components using a Row main connection. On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized.
1400 Talend Open Studio Components

Processing components
tDenormalize

Define the Row and Field separators, the Header and other information if required. The file schema is made of four columns including: Name, FirstName, HomeTown, WorkTown.

In the tDenormalize component Basic settings, select the columns that contain the repetition. These are the column which are meant to occur multiple times in the document. In this use case, FirstName, HomeCity and WorkCity are the columns against which the denormalization is performed. Add as many line to the table as you need using the plus button. Then select the relevant columns in the drop-down list.

Talend Open Studio Components

1401

Processing components
tDenormalize

In the Delimiter column, define the separator between double quotes, to split concanated values. For FirstName column, type in #, for HomeCity, type in , ans for WorkCity, type in . Save your Job and press F6 to execute it.

The result shows the denormalized values concatenated using a comma. Back to the tDenormalize components Basic settings, in the To denormalize table, select the Merge same value check box to remove the duplicate occurrences. Save your Job again and press F6 to execute it.

This time, the console shows the results with no duplicate instances.

1402

Talend Open Studio Components

Processing components
tDenormalizeSortedRow

tDenormalizeSortedRow
tDenormalizeSortedRow properties
Component family Processing/Fields

Function

tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the denormalized sorted row are joined with item separators. tDenormalizeSortedRow helps synthesizing sorted input flow to save memory. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component in the Job. Built-in: You create the schema and store it locally for the relevant component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Input rows count To denormalize Enter the number of input rows. Enter the name of the column to denormalize.

Purpose Basic settings

Usage Limitation

This component handles flows of data therefore it requires input and output components. n/a

Scenario: Regrouping sorted rows


This Java scenario describes a four-component Job. It aims at reading a given delimited file row by row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the output on the Run log console. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tSortRow, tDenormalizeSortedRow, and tLogRow. Connect the four components using Row Main links.

Talend Open Studio Components

1403

Processing components
tDenormalizeSortedRow

In the design workspace, select tFileInputDelimited. Click the Component tab to define the basic settings for tFileInputDelimited.

Set Property Type to Built-In. Fill in a path to the processed file in the File Name field. The name_list file used in this example holds two columns, id and first name.

If needed, define row and field separators, header and footer, and the number of processed rows. Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the next component. The schema in this example consists of two columns, id and name.

1404

Talend Open Studio Components

Processing components
tDenormalizeSortedRow

In the design workspace, select tSortRow. Click the Component tab to define the basic settings for tSortRow.

Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tFileInputDelimited component. In the Criteria panel, use the plus button to add a line and set the sorting parameters for the schema column to be processed. In this example we want to sort the id columns in ascending order. In the design workspace, select tDenormalizeSortedRow. Click the Component tab to define the basic settings for tDenormalizeSortedRow.

Talend Open Studio Components

1405

Processing components
tDenormalizeSortedRow

Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow component. In the Input rows countfield, enter the number of the input rows to be processed or press Ctrl+Space to access the context variable list and select the variable: tDenormalizeSortedRow_1.NB_LINE. In the To denormalize panel, use the plus button to add a line and set the parameters to the column to be denormalize. In this example we want to denormalize the name column. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The result displayed on the console shows how the name column was denormalize.

1406

Talend Open Studio Components

Processing components
tEmptyToNull

tEmptyToNull
tEmptyToNull properties
Component family Processing

Function Purpose Basic settings Advanced settings

tEmptyToNull transforms empty fields in a file or a table to NULL fields in a database table. tEmptyToNull allows to replace empty fields by undefined fields that give the NULL value (unknown value) in the output component; This component does not need any configuration. It executes automatically. tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

This component is generally used as an intermediate step in a data flow. It needs then an input and output components. The output component should be a database output component. n/a

Limitation

Scenario: Replacing empty fields by NULL fields (fields of unknown value)


This Perl scenario describes a three-component Job that makes it possible to replace input fields without character strings by undefined fields in order to generate Null values (unknown values) in the output fields. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tEmptyToNull and tMysqlOutput. Connect the three components using Row Main links.

In the design workspace, double-click tFileInputDelimited to display its Basic settings view where you can define the component properties.

Talend Open Studio Components

1407

Processing components
tEmptyToNull

From the Property Type list, select: -Repository if you have already stored the metadata of your input file in the Repository, the fields that follow are filled in with the stored information automatically, or -select Built-In and fill in the fields that follow manually. For this example, we use the Built-In mode. Click the three-dot button next to the File Name field and browse to the input file. In this example, our source file is name_list and it holds four columns: id, first name, last name and login.

In the Basic settings view of tFileInputDelimited, define in the corresponding fields the row and field separators used in the source file. If needed, set Header, Footer and Limit. In this example, set Header to 1 since the first row that holds columns names is to be ignored. Footer and Limit for the number of processed rows are not set. In the Schema field, set schema to Built in then click the three-dot button next to the Edit Schema field to define the data to be passed to the following component. In this example, the source schema consists of four columns: id, first_name, last_name and login.

1408

Talend Open Studio Components

Processing components
tEmptyToNull

In the design workspace, double-click tMysqlOutput to open its Basic Settings view where you can define the component properties.

Click Sync columns to retrieve the schema of the preceding component.


You can click the three-dot button next to Edit schema to check the retrieved schema.

From the Property Type list, select: -Repository if you have already stored the metadata of your database connection in the Repository, the fields that follow are filled in with the stored information automatically, or -select Built-In and fill in the connection information manually. For more information about tMysqlOutput properties, see tMysqlOutput on page 609. In the Table field, enter the name of the table that will hold the data extracted from the source delimited file. From the Action on table list, select the operation you want to carry out on the defined table. In this example, we select Create table to create the defined table.

Talend Open Studio Components

1409

Processing components
tEmptyToNull

From the Action on data list, select the operation you want to carry out on the data. In this example, we select Insert. Save your Job and press F6 to execute it.

Through your data explorer, you can check that the output database table namelist is created with the defined columns and that the empty fields in the source file have been replaced by the NULL values (unknown values).

1410

Talend Open Studio Components

Processing components
tExternalSortRow

tExternalSortRow
tExternalSortRow properties
Component family Processing

Function Purpose Basic settings

Uses an external sort application to sort input data based on one or several columns, by sort type and order Helps create metrics and classification table. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. File Name Name of the file to be processed. Related topic: How to define variables from the Component view of Talend Open Studio. User Guide. Character, string or regular expression to separate fields. Enter the path to the external file containing the sorting algorithm to use. Click the plus button to add as many lines as required for the sort to be complete. By default the first column defined in your schema is selected. Schema column: Select the column label from your schema, which the sort will be based on. Note that the order is essential as it determines the sorting priority. Sort type: Numerical and Alphabetical order are proposed. More sorting types to come. Order: Ascending or descending order.

Field separator External command sort path Criteria

Talend Open Studio Components

1411

Processing components
tExternalSortRow

Advanced settings

Maximum memory

Type in the size of physical memory you want to allocate to sort processing.

Temporary directory Specify the temporary directory to process the sorting command. Set temporary input file directory Add a dummy EOF line tStatCatcher Statistics Usage Limitation Select the check box to activate the field in which you can specify the directory to handle your temporary input file. Select this check box when using the tAggregateSortedRow component. Select this check box to gather the job processing metadata at the Job level as well as at each component level.

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. n/a

Related scenario
For related use case, see tSortRow Scenario: Sorting entries on page 1484.

1412

Talend Open Studio Components

Processing components
tExtractDelimitedFields

tExtractDelimitedFields
tExtractDelimitedFields properties
Component family Processing/Fields

Function Purpose Basic settings

tExtractDelimitedFields generates multiple columns from a given column in a delimited file. tExtractDelimitedFields helps to extract fields from within a string to write them elsewhere for example. Field to split Field separator Select from the list the field to split. Set field separator. Since this component uses regex to split a filed and the regex syntax uses special characters as operators, make sure to precede the regex operator you use as a field separator by a double backslash. For example, you have to use "\\|" instead of "|". This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected to tExtractDelimitedFields. Built-in: You create the schema and store it locally for the component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can use it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Die on error

Schema type and Edit Schema

Advanced settings

Advanced separator (for number) Trim column

Select this check box to modify the separators used for numbers. Select this check box to remove leading and trailing whitespace from all columns.

Talend Open Studio Components

1413

Processing components
tExtractDelimitedFields

Check each row structure against schema tStatCatcher Statistics Usage

Select this check box to synchronize every row against the input schema. Select this check box to gather the processing metadata at the Job level as well as at each component level.

This component handles flow of data therefore it requires input and output components. It allows you to extract data from a delimited field, using a Row > Main link, and enables you to create a reject flow filtering data which type does not match the defined type. n/a

Limitation

Scenario: Extracting fields from a comma-delimited file


This Java scenario describes a three-component Job where the tExtractdelimitedFields component is used to extract two columns from a comma-delimited file. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tExtractDelimitedFields, and tLogRow. Via a right-click each of the three components, connect them using Row Main links.

In the design workspace, select tFileInputDelimited. Click the Component tab to define the basic settings for tFileInputDelimited. In the Basic settings view, set Property Type to Built-In. Click the three-dot [...] button next to the File Name field to select the path to the input file.
The File Name field is mandatory.

1414

Talend Open Studio Components

Processing components
tExtractDelimitedFields

The input file used in this scenario is called test5. It is a text file that holds comma-delimited data.

In the Basic settings view, fill in all other fields as needed. For more information, see tFileInputDelimited properties on page 1054. In this scenario, the header and the footer are not set and there is no limit for the number of processed rows Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of one column, name.

In the design workspace, select tExtractDelimitedFields. Click the Component tab to define the basic settings for tExtractDelimitedFields.

Talend Open Studio Components

1415

Processing components
tExtractDelimitedFields

From the Field to split list, select the column to split, name in this scenario. In the Field separator field, enter the corresponding separator. Click Edit schema to describe the data structure of this processing component. In the output panel of the [Schema of tExtractRegexFields] dialog box, click the plus button to add two columns for the output schema, firstname and lastname.

In this scenario, we want to split the name column into two columns in the output flow, firstname and lastname. Click OK to close the [Schema of tExtractDelimitedFields] dialog box. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

1416

Talend Open Studio Components

Processing components
tExtractDelimitedFields

First names and last names are extracted and displayed in the corresponding defined columns on the console.

Talend Open Studio Components

1417

Processing components
tExtractPositionalFields

tExtractPositionalFields
tExtractPositionalFields properties
Component family Processing/Fields

Function Purpose Basic settings

tExtractPositionalFields generates multiple columns from one column using positional fields. tExtractPositionalFields allows to use a positional pattern to extract data from a formatted string. Field Customize Select from the list the field to extract from. Select this check box to customize the data format of the positional file and define the table columns: Column: Select the column you want to customize. Size: Enter the column size. Padding char: Type in between inverted commas the padding character used, in order for it to be removed from the field. A space by default. Alignment: Select the appropriate alignment parameter. Enter the pattern to use as basis for the extraction. A pattern is length values separated by commas, interpreted as a string between quotes. Make sure the values entered in this fields are consistent with the schema defined. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected to tPositionalFields. Built-in: You create the schema and store it locally for the component. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Pattern

Die on error

Schema type and Edit Schema

1418

Talend Open Studio Components

Processing components
tExtractPositionalFields

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Advanced settings Advanced separator (for number) Trim Column Check each row structure against schema tStatCatcher Statistics Usage Select this check box to modify the separators used for numbers. Select this check box to remove leading and trailing whitespace from all columns. Select this check box to synchronize every row against the input schema. Select this check box to gather the processing metadata at the Job level as well as at each component level.

This component handles flow of data therefore it requires input and output components. It allows you to extract data from a delimited field, using a Row > Main link, and enables you to create a reject flow filtering data which type does not match the defined type. n/a

Limitation

Related scenario
For related use case, see tExtractRegexFields on page 1420.

Talend Open Studio Components

1419

Processing components
tExtractRegexFields

tExtractRegexFields
tExtractRegexFields properties
Component family Processing/Fields

Function Purpose Basic settings

tExtractRegexFields generates multiple columns from a given column using regex matching. tExtractRegexFields allows to use regular expressions to extract data from a formatted string. Field to split Regex Schema type and Edit Schema Select on the list the field (column) to split. Enter a regular expression according to the programming language you are using. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click EditSchema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected to tExtractRegexFields. Built-in: You create and store the schema locally for the component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Advanced settings

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. Select this check box to synchronize every row against the input schema. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Check each row structure against schema tStatCatcher Statistics Usage

This component handles flow of data therefore it requires input and output components. It allows you to extract data from a delimited field, using a Row > Main link, and enables you to create a reject flow filtering data which type does not match the defined type. n/a

Limitation

1420

Talend Open Studio Components

Processing components
tExtractRegexFields

Scenario: Extracting name, domain and TLD from e-mail addresses


This Java scenario describes a three-component Job where tExtractRegexFields is used to specify a regular expression that corresponds to one column in the input data, email. The tExtractRegexFields component is used to perform the actual regular expression matching. This regular expression includes field identifiers for user name, domain name and Top-Level Domain name portions in each e-mail address. If the given e-mail address is valid, the name, domain and TLD are extracted and displayed on the console in three separate columns. Data in the other two input columns, id and age is extracted and routed to destination as well. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tExtractRegexFields, and tLogRow. Connect the three components using Row Main links.

In the design workspace, select tFileInputDelimited. Click the Component tab to define the basic settings for tFileInputDelimited. In the Basic settings view, set Property Type to Built-In. Click the three-dot [...] button next to the File Name field to select the path to the input file.
The File Name field is mandatory.

The input file used in this scenario is called test4. It is a text file that holds three columns: id, email, and age.

Talend Open Studio Components

1421

Processing components
tExtractRegexFields

Fill in all other fields as needed. For more information, see tFileInputDelimited properties on page 1054. In this scenario, the header and the footer are not set and there is no limit for the number of processed rows Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of the three columns, id, email and age. In the design workspace, select tExtractRegexFields. Click the Component tab to define the basic settings for tExtractRegexFields. From the Field to split list, select the column to split, email in this scenario. In the Regex panel, enter the regular expression you want to use to perform data matching, java regular expression in this scenario.

Click Edit schema to describe the data structure of this processing component. In the output panel of the [Schema of tExtractRegexFields] dialog box, click the plus button to add five columns for the output schema.

In this scenario, we want to split the input email column into three columns in the output flow, name, domain, and tld. The two other input columns will be extracted as they are. Click OK to close the [Schema of tExtractRegexFields] dialog box.

1422

Talend Open Studio Components

Processing components
tExtractRegexFields

In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The tExtractRegexFields component matches all given e-mail addresses with the defined regular expression and extracts the name, domain, and TLD names and displays them on the console in three separate columns. The two other columns, id and age, are extracted as they are.

Talend Open Studio Components

1423

Processing components
tExtractXMLFields

tExtractXMLFields
tExtractXMLFields belongs to two component families: Processing and XML. For more information on tExtractXMLFields, see tExtractXMLField on page 1585.

1424

Talend Open Studio Components

Processing components
tFilterColumns

tFilterColumns
tFilterColumns Properties
Component family Processing

Function Purpose Basic settings

Makes specified changes to the schema defined, based on column name mapping. Helps homogenizing schemas either on the columns order or by removing unwanted columns or adding new columns. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide.

Usage

This component is not startable (green background) and it requires an output component.

Related Scenario
For more info regarding the tFilterColumns component in use, see tReplace Scenario: multiple replacements and column filtering on page 1476

Talend Open Studio Components

1425

Processing components
tFilterRow

tFilterRow
tFilterRow Properties
Component family Processing

Function Purpose Basic settings

tFilterRow filters input rows by setting conditions on the selected columns. tFilterRow helps parametrizing filters on the source data. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. The schema is read-only. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Logical operator used to combine conditions Conditions In the case you want to combine simple filtering and advanced mode, select the operator to combine both modes. Click the plus button to add as many conditions as needed. The conditions are performed one after the other for each row. Input column: Select the column of the schema the function is to be operated on Function: Select the function on the list Operator: Select the operator to bind the input column with the value Value: Type in the filtered value, between quotes if need be.

Use advanced mode Select this check box when the operation you want to perform cannot be carried out through the standard functions offered. In the text field, type in the regular expression as required. Advanced settings tStatCatcher Statistics Select this check box to gather the job processing metadata at the job level as well as at each component level.

Usage

This component is not startable (green background) and it requires an output component.

1426

Talend Open Studio Components

Processing components
tFilterRow

Scenario: Filtering and searching a list of names


The following scenario is a Java Job that uses a simple condition and a regular expression to filter a list of records. This scenario will output two tables: the first will list all Italian records where first names are shorter than six characters; the second will list all rejected records. An error message for each rejected record will display in the same table to explain why such a record has been rejected.

Drop tFixedFlowInput, tFilterRow and tLogRow from the Palette onto the design workspace. Connect the tFixedFlowInput to the tFilterRow, using a Row > Main link. Then, connect the tFilterRow to the tLogRow, using a Row > Filter link. Drop tLogRow from the Palette onto the design workspace and rename it as reject. Then, connect the tFilterRow to the reject, using a Row > Reject link. Double-click tFixedFlowInput to display its Basic settings view and define its properties. Select the Use Inline Content(delimited file) option in the Mode area to define the input mode.

Set the row and field separators in the corresponding fields. The row separator is a carriage return and the field separator is a semi-colon. From the Schema list, select Built-in. The properties and schema are Built-in for this Job. This means, the schema is not stored in the Repository.

Talend Open Studio Components

1427

Processing components
tFilterRow

Click the three-dot button next to Edit schema to define the schema for the input file. In this example, the schema is made of the following four columns: firstname, gender, language and frequency. In the Type column, select String for the first three rows and select Integer for frequency.

Click OK to validate and close the editor. A dialog box opens and asks you if you want to propagate the schema. Click Yes. Type in content in the Content multiline textframe according to the setting in the schema. Double-click tFilterRow to display its Basic settings view and define its properties.

In the Conditions table, fill in the filtering parameters based on the firstname column. In InputColumn, select firstname, in Function, select Length, in Operator, select Lower than. In the Value column, type in 6 to filter only first names of which length is lower than six characters.
In the Value column, you must type in your values between double quotes for all data types, except for the Integer type, which does not need quotes.

Then to implement the search on names whose language is italian, select the Use advanced mode check box and type in the following regular expression that includes the name of the column to be searched: input_row.language.equals("italian") To combine both conditions (simple and advanced), select And as logical operator for this example.

1428

Talend Open Studio Components

Processing components
tFilterRow

In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the Mode area. Save your Job and press F6 to execute it.

Thus, the first table lists records that have Italian names made up of less than six characters and the second table lists all records that do not match the filter condition rejected record. Each rejected record has a corresponding error message that explains the reason of rejection.

Talend Open Studio Components

1429

Processing components
tJoin

tJoin
tJoin properties
Component family Processing

Function

tJoin joins two tables by doing an exact match on several columns. It compares columns from the main flow with reference columns from the lookup flow and outputs the main flow data and/or the rejected data. Helps ensuring the data quality of any source data against a reference data source. Schema and Edit schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created and stored the schema in the Repository. You can reuse it in other projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Include lookup columns in output Select this check box to include the lookup columns you define in the output flow. Select the column(s) from the main flow that needs to be checked against the reference (lookup) key column.

Purpose Basic settings

Key definition

Input key attribute

Lookup key attribute Select the lookup key columns that you will use as a reference against which to compare the columns from the input flow. Inner join (with reject output) Advanced settings Usage tStatCatcher Statistics Select this check box to join the two tables first and gather the rejected data from the main flow. Select this check box to collect log data at the component level.

This component is not startable and it requires two input components and one or more output components.

Limitation/prerequisite n/a

Scenario: Doing an exact match on two columns and outputting the main and rejected data
This java scenario describes a five-component Job aiming at carrying out an exact match between the firstnameClient column of an input file against the data of the reference input file, and the
1430 Talend Open Studio Components

Processing components
tJoin

lastnameClient column against the data of the reference input file. The outputs of this exact match are written in two separate files: exact data are written in an Excel file, and inaccurate data are written in a delimited file. In this scenario, we have already stored the input schemas of the input and reference files in the Repository. For more information about storing schema metadata in the Repository tree view, see Setting up a File Delimited schema and How to drop components from the Metadata node in Talend Open Studio User Guide. In the Repository tree view, expand Metadata and the file node where you have stored the input schemas and drop the relevant file onto the design workspace. The [Components] dialog box displays.

Select tFileInputDelimited from the list and click OK to close the dialog box. The tFileInputDelimited component displays in the workspace. The input file used in this scenario is called ClientSample. It holds four columns including the two columns firstnameClient and lastnameClient we want to do the exact match on. Do the same for the second input file you want to use as a reference, ClientSample_Update in this scenario. Drop the following components from the Palette onto the design workspace: tJoin, tFileOutputExcel, and tFileOutputDelimited.

Talend Open Studio Components

1431

Processing components
tJoin

Connect the main and reference input files to tJoin using Main links. The link between the reference input file and tJoin displays as a Lookup link on the design workspace. Connect tJoin to tFileOutputExcel using the Main link and tJoin to tFileOutputDelimited using the Inner join reject link. If needed, double-click the main and reference input files to display their Basic settings views. All their property fields are automatically filled in. If you do not define your input files in the Repository, fill in the details manually after selecting Built-in in the Property Type field. Double click tJoin to display its Basic settings view and define its properties.

Click the Edit schema button to open a dialog box that displays the data structure of the input files and then define the data you want to pass to the output components, three columns in this scenario, idClient, firstnameClient and lastnameClient.

1432

Talend Open Studio Components

Processing components
tJoin

Click OK to close the dialog box. In the Key definition area of the Basic settings view of tJoin, click the plus button to add two columns to the list and then select the input columns and the output columns you want to do the exact matching on from the Input key attribute and Lookup key attribute lists respectively, firstnameClient and lastnameClient in this example. Select the Inner join (with reject output) check box to define one of the outputs as inner join reject table. Double click tFileOutputExcel to display its Basic settings view and define its properties.

Set the destination file name as well as the Sheet name and select the Include header check box. Double click tFileOutputDelimited to display its Basic settings view and define its properties.
Talend Open Studio Components 1433

Processing components
tJoin

Set the destination file name as well as row and field separators in the corresponding fields and select the Include header check box. Save your Job and click F6 to execute it.

The output of the exact match on the firstnameClient and the lastnameClient column is written in the defined excel file.

The inaccurate data are written in the defined delimited file.

1434

Talend Open Studio Components

Processing components
tJoin

Talend Open Studio Components

1435

Processing components
tMap

tMap
tMap properties
Component family Processing

Function Purpose Basic settings

tMap is an advanced component, which integrates itself as plugin to Talend Open Studio. tMap transforms and routes data from single or multiple sources to single or multiple destinations. Preview The preview is an instant shot of the Mapper data. It becomes available when Mapper properties have been filled in with data. The preview synchronization takes effect only after saving changes. Auto: the default setting is curves links Curves: the mapping display as curves Lines: the mapping displays as straight lines. This last option allows to slightly enhance performance. It allows you to define the tMap routing and transformation properties. If you do not want to handle execution errors, you can click the Property Settings button at the top of the input area and select the Die on error check box (selected by default) in the [Property Settings] dialog box. It will kill the Job if there is an error.

Mapping links display as

Map editor

Usage

Possible uses are from a simple reorganization of fields to the most complex jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more... The use of tMap supposes minimum Perl or Java knowledge in order to fully exploit its functionalities. This component is a junction step, and for this reason cannot be a start nor end component in the Job.

Limitation

For further information, see How to map data flows in Talend Open Studio User Guide.

Scenario 1: Mapping data using a filter and a simple explicit join


The Java Job described below aims at reading data from a csv file with its schema stored in the Repository, looking up at a reference file, the schema of which is also stored in the Repository, then extracting data from these two files based on a defined filter to an output file and reject files.

1436

Talend Open Studio Components

Processing components
tMap

Click File in the Palette of components, select tFileInputDelimited and drop it onto the design workspace. Rename the component Cars, either by double-clicking the label in the design workspace or via the View tab of the Component view. Repeat this operation, and rename this second input component Owners. Click Processing in the Palette of components, select tMap and drop it onto the design workspace. Connect the two input components to the mapping component using Row > Main connections and label the connections Cars_data and Owners_data respectively. Double-click the tFileInputDelimited component labelled Cars to display its Basic settings view.

Select Repository from the Property type list and select the components schema, cars in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled. Double-click the component labelled Owners and repeat the setting operation. Select the appropriate metadata entry, owners in this scenario.
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see How to centralize the Metadata items in Talend Open Studio User Guide.

Talend Open Studio Components

1437

Processing components
tMap

Double-click the tMap component to open the Map Editor. Note that the input area is already filled with the defined input tables and that the top table is the main input table, and the respective row connection labels are displayed on the top bar of the table. Create a join between the two tables on the ID_Owner column by simply dropping the ID_Owner column from the Cars_data table onto the ID_Owner column in the Owners_data table. Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog box.

Click the [+] button on the output area of the Map Editor to add three output tables: Insured, Reject_NoInsur, Reject_OwnerID. Drag all the columns of the Cars_data table to the Insured table. Drag the ID_Owner, Registration, and ID_Reseller columns of the Cars_data table and the Name column of the Owners_data table to the Reject_NoInsur table. Drag all the columns of the Cars_data table to the Reject_OwnerID table. For more information regarding data mapping, see How to map data flows in Talend Open Studio User Guide. Click the plus arrow button at the top of the Insured table to add a filter row. Drag the ID_Insurance column of the Owners_data table to the filter condition area and enter the formula meaning not undefined: Owners_data.ID_Insurance != null. With this filter, the Insured table will gather all the records that include an insurance ID.

1438

Talend Open Studio Components

Processing components
tMap

Click the tMap settings button at the top of the Reject_NoInsur table and set Catch output reject to true to define the table as a standard reject output flow to gather the records that do not include an insurance ID.

Click the tMap settings button at the top of the Reject_OwnerID table and set Catch lookup inner join reject to true so that this output table will gather the records from the Cars_data flow with missing or unmatched owner IDs.

Click OK to validate the mappings and close the Map Editor.


Talend Open Studio Components 1439

Processing components
tMap

Add three tFileOutputDelimited components to the design workspace and connect the tMap component to the three output components using the relevant Row connections. Relabel the three output components accordingly. Double-click each of the output components, one after the other, to define their properties. If you want a new file to be created, browse to the destination output folder, and type in a file name including the extension. Select the Include header check box to reuse the column labels from the schema as header row in the output file.

Save your Job and press F6 to run it. The output files are created, which contain the relevant data as defined.

Scenario 2: Mapping data using inner join rejections


This scenario, based on scenario 1, adds one input file containing details about resellers and extra fields in the main output table. Two filters on inner joins are added to gather specific rejections.

1440

Talend Open Studio Components

Processing components
tMap

Click File in the Palette of Components, and drop a tFileInputDelimited component to the design workspace, and label the component Resellers. Connect it to the Mapper using a Row > Main connection, and label the connection Resellers_data. Double-click the Resellers component to display its Basic settings view.

Select Repository from the Property type list and select the components schema, resellers in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see How to centralize the Metadata items in Talend Open Studio User Guide.

Double-click the tMap component to open the Map Editor. Note that the schema of the new input component is already added in the Input area. Create a join between the main input flow and the new input flow by dropping the ID_Reseller column of the Cars_data table to the ID_Reseller column of the Resellers_data table.

Talend Open Studio Components

1441

Processing components
tMap

Click the tMap settings button at the top of the Resellers_data table and set Join Model to Inner Join.

Drag all the columns except ID_Reseller of the Resellers_data table to the main output table, Insured.

1442

Talend Open Studio Components

Processing components
tMap

When two inner joins are defined, you either need to define two different inner join reject tables to differentiate the two rejections or, if there is only one inner join reject output, both inner join rejections will be stored in the same output.

Click the [+] button at the top of the output area to add a new output table, and name this new output table Reject_ResellerID. Drag all the columns of the Cars_data table to the Reject_ResellerID table. Click the tMap settings button and select Catch lookup inner join reject to true to define this new output table as an inner join reject output. If the defined inner join cannot be established, the information about the relevant cars will be gathered through this output flow.

Now apply filters on the two Inner Join reject outputs, in order for to distinguish the two types of rejection. In the first Inner Join output table, Reject_OwnerID, click the plus arrow button to add a filter line and fill it with the following formula to gather only owner ID related rejection: Owners_data.ID_Owner==null In the second Inner Join output table, Reject_ResellerID, repeat the same operation using the following formula: Resellers_data.ID_Reseller==null

Talend Open Studio Components

1443

Processing components
tMap

Click OK to validate the map settings and close the Mapper Editor. Drop a new tFileOutputDelimited component from the Palette to the design workspace, and label the component No_Reseller_ID. Define the properties of the new tFileOutputDelimited component, as shown below. In this use case, simple specify the output file path and select the Include Header check box, and leave the other parameters as they are.

Connect the tMap component to the new tFileOutputDelimited component by using the Row connection named Reject_ResellerID. To demonstrate the work of the Mapper, in this example, remove reseller IDs 5 and 8 from the input file Resellers.csv. Save your Job and press F6 to run it. The four output files are all created in the specified folder, containing information as defined. The output file No_Reseller_ID.csv contains the cars information related to reseller IDs 5 and 8, which are missing in the input file Resellers.csv.
1444 Talend Open Studio Components

Processing components
tMap

Scenario 3: Cascading join mapping


As third advanced use scenario, based on the scenario 2, add a new Input table containing Insurance details for example. Set up an Inner Join between two lookup input tables (Owners and Insurance) in the Mapper to create a cascade lookup and hence retrieve Insurance details via the Owners table data.

Scenario 4: Advanced mapping using filters, explicit joins and rejections


This scenario introduces a Java Job that allows you to find BMW owners who have two to six children (inclusive), for sales promotion purpose for example.

Drop three tFileInputDelimited components, a tMap component, and two tFileOutputDelimited components from the Palette onto the design workspace, and label them to best describe their functions. Connect the input components to the tMap using Row > Main connections. Pay attention to the file you connect first as it will automatically be set as Main flow, and all the other connections will be Lookup flows. In this example, the connection for the input component Owners is the Main flow.
Talend Open Studio Components 1445

Processing components
tMap

Define the properties of each input components in the respective Basic settings view. Define the properties of Owners.

Select Repository from the Property type list and select the components schema, owners in this scenario, from the [Repository Content] dialog box. The rest fields are automatically filled.
In this scenario, the input schemas are stored in the Metadata node of the Repository tree view for easy retrieval. For further information regarding metadata creation in the Repository, see How to centralize the Metadata items in Talend Open Studio User Guide.

In the same way, set the properties of the other input components: Cars and Resellers. These two Lookup flows will fill in secondary (lookup) tables in the input area of the Map Editor. Then double-click the tMap component to launch the Map Editor and define the mappings and filters. Set an explicit join between the Main flow Owner and the Lookup flow Cars by dropping the ID_Owner column of the Owners table to the ID_Owner column of the Cars table. The explicit join is displayed along with a hash key.

1446

Talend Open Studio Components

Processing components
tMap

In the Expr. Key field of the Make column, type in a filter. In this use case, simply type in BMW as the search is focused on the owners of this particular make. Implement a cascading join between the two lookup tables Cars and Resellers on the ID_Reseller column in order to retrieve resellers information.

As you want to reject the null values into a separate table and exclude them from the standard output, click the tMap settings button and set Join Model to Inner Join in each of the Lookup tables.

Talend Open Studio Components

1447

Processing components
tMap

In the tMap settings, you can set Match Model to Unique match, First match, or All matches. In this use case, the All matches option is selected. Thus if several matches are found in the Inner Join, i.e. rows matching the explicit join as well as the filter, all of them will be added to the output flow (either in rejection or the regular output).
The Unique match option functions as a Last match. The First match and All matches options function as named.

On the output area of the Map Editor, click the plus button to add two tables, one for the full matches and the other for the rejections. Drag all the columns of the Owners table, the Registration, Make and Color columns of the Cars table, and the ID_Reseller and Name_Reseller columns of the Resellers table to the main output table. Drag all the columns of the Owners table to the reject output table. Click the Filter button at the top of the main output table to display the Filter expression area. Type in a filter statement to narrow down the number of rows loaded in the main output flow. In this use case, the statement reads: Owners.Children_Nr >= 2 && Owners.Children_Nr <= 6. In the reject output table, click the tMap settings button and set the reject types.

1448

Talend Open Studio Components

Processing components
tMap

Set Catch output reject to true to collect data about BMW car owners who have less than two or more than six children. Set Catch lookup inner join reject to true to collect data about owners of other car makes and owners for whom the reseller information is not found.

Click OK to validate the mappings and close the Map Editor. On the design workspace, right-click the tMap and pull the respective output link to the relevant output components. Define the properties of the output components in their respective Basic settings view. In this use case, simple specify the output file paths and select the Include Header check box, and leave the other parameters as they are.

Talend Open Studio Components

1449

Processing components
tMap

Save you Job and press F6 to run it. The main output file contains the information related to BMW owners who have two to six children, and the reject output file contains the information about the rest of the car owners.

Scenario 5: Advanced mapping with filters and different rejections


This scenario is a modified version of the preceding scenario. It describes a Job that applies filters to limit the search to BMW and Mercedes owners who have two to six children and divides unmatched data into different reject output flows.

1450

Talend Open Studio Components

Processing components
tMap

Take the same Job as in Scenario 4: Advanced mapping using filters, explicit joins and rejections on page 1445. Drop a new tFileOutputDelimited component from the Palette on the design workspace, and name it Rejects_BMW_Mercedes to present its functionality. Connect the tMap component to the new output component using a Row connection and label the connection according to the functionality of the output component. This connection label will appear as the name of the new output table in the Map Editor. Relabel the existing output connections and output components to reflect their functionality. The existing output tables in the Map Editor will be automatically renamed according to the connection labels. In this example, relabel the existing output connections BMW_Mercedes_withChildren and Owners_Other_Makes respectively. Double-click the tMap component to launch the Map Editor to change the mappings and the filters. Note that the output area contains a new, empty output table named Rejects_BMW_Mercedes. You can adjust the position of the table by selecting it and clicking the Up or Down arrow button at the top of the output area. Remove the Expr. key filter (BMW) from the Cars table in the input area. Click the Filters button to display the Filter field, and type in a new filter to limit the search to BMW or Mercedes car makes. The statement reads as follows: Cars.Make.equals("BMW") || Cars.Make.equals("Mercedes")

Talend Open Studio Components

1451

Processing components
tMap

Select all the columns of the main output table and drop them down to the new output table. Alternatively, you can also drag the corresponding columns from the relevant input tables to the new output table. Click the tMap settings button at the top of the new output table and set Catch output reject to true to collect data about BMW and Mercedes owners who have less than two or more than six children. In the Owners_Other_Makes table, set Catch lookup inner join reject to true to collect data about owners of other car makes and owners for whom the reseller information is not found.

1452

Talend Open Studio Components

Processing components
tMap

Click OK to validate the mappings and close the Map Editor. Define the properties of the output components in their respective Basic settings view. In this use case, simple specify the output file paths and select the Include Header check box, and leave the other parameters as they are.

Talend Open Studio Components

1453

Processing components
tMap

Save the Job and press F6 to run it. The output files contain content of the main output flow shows that the filtered rows have correctly been passed on.

Scenario 6: Advanced mapping with lookup reload at each row (Java)


The following scenario describes a Job that retrieves people details from a lookup database, based on a join on the age. The main flow source data is read from a MySQL database table called people_age that contains people details such as numeric id, alphanumeric first name and last name and numeric age. The people age is either 40 or 60. The number of records in this table is intentionally restricted. The reference or lookup information is also stored in a MySQL database table called large_data_volume. This lookup table contains a number of records including the city where people from the main flow have been to. For the sake of clarity, the number of records is restricted but, in a normal use, the usefulness of the feature described in the example below is more obvious for very large reference data volume. To optimize performance, a database connection component is used in the beginning of the Job to open the connection to the lookup database table in order not to do that every time we want to load a row from the lookup table. An Expression Filter is applied to this lookup source flow, in order to select only data from people whose age is equal to 60 or 40. This way only the relevant rows from the lookup database table are loaded for each row from the main flow. Therefore this Job shows how, from a limited number of main flow rows, the lookup join can be optimized to load only results matching the expression key.
Generally speaking, as the lookup loading is performed for each main flow row, this option is mainly interesting when a limited number of rows is processed in the main flow while a large number of reference rows are to be looked up to.

The join is solved on the age field. Then, using the relevant loading option in the tMap component editor, the lookup database information is loaded for each main flow incoming row.

1454

Talend Open Studio Components

Processing components
tMap

For this Job, the metadata has been prepared for the source and connection components. For more information on how to set up the DB connection schema metadata, see the relevant section in the Talend Open Studio User Guide. This Java Job is formed with five components, four database components and a mapping component. Drop the DB Connection under the Metadata node of the Repository to the design workspace. In this example, the source table is called people_age. Select tMysqlInput from the list that pops up when dropping the component.

Talend Open Studio Components

1455

Processing components
tMap

Drop the lookup DB connection table from the Metadata node to the design workspace selecting tMysqlInput from the list that pops up. In this Job, the lookup is called large_data_volume. The same way, drop the DB connection from the Metadata node to the design workspace selecting tMysqlConnection from the list that pops up. This component creates a permanent connection to the lookup database table in order not to do that every time we want to load a row from the lookup table. Then pick the tMap component from the Processing family, and the tMysqlOutput and tMysqlCommit components from the Database family in the Palette to the right hand side of the editor. Now connect all the components together. To do so, right-click the tMysqlInput component corresponding to the people table and drag the link towards tMap. Release the link over the tMap component, the main row flow is automatically set up. Rename the Main row link to people, to identify more easily the main flow data. Perform the same operation to connect the lookup table (large_data_volume) to the tMap component and the tMap to the tMysqlOutput component. A dialog box prompts for a name to the output link. In this example, the output flow is named: people_mixandmatch. Rename also the lookup row connection link to large_volume, to help identify the reference data flow. Connect tMysqlConnection to tMysqlInput using the trigger link OnSubjobOk. Connect the tMysqlInput component to the tMysqlCommit component using the trigger link OnSubjobOk. Then double-click the tMap component to open the graphical mapping editor.

1456

Talend Open Studio Components

Processing components
tMap

The Output table (that was created automatically when you linked the tMap to the tMySQLOutput will be formed by the matching rows from the lookup flow (large_data_volume) and the main flow (people_age). Select the main flow rows that are to be passed on to the output and drag them over to paste them in the Output table (to the right hand side of the mapping editor). In this example, the selection from the main flow include the following fields: id, first_name, last_Name and age. From the lookup table, the following column is selected: city. Drop the selected columns from the input tables (people and large_volume) to the output table. Now set up the join between the main and lookup flows. Select the age column of the main flow table (on top) and drag it towards the age column of the lookup flow table (large_volume in this example). A key icon appears next to the linked expression on the lookup table. The join is now established. Click the tMap settings button, click the three-dot button corresponding to Lookup Model, and select the Reload at each row option from the [Options] dialog box in order to reload the lookup for each row being processed.
Talend Open Studio Components 1457

Processing components
tMap

In the same way, set Match Model to All matches in the Lookup table, in order to gather all instances of age matches in the output flow. Now implement the filtering, based on the age column, in the Lookup table. The GlobalMapKey field is automatically created when you selected the Reload at each row option. Indeed you can use this expression to dynamically filter the reference data in order to load only the relevant information when joining with the main flow. As mentioned in the introduction of the scenario, the main flow data contains only people whose age is either 40 or 60. To avoid the pain of loading all lookup rows, including ages that are different from 40 and 60, you can use the main flow age as global variable to feed the lookup filtering.

1458

Talend Open Studio Components

Processing components
tMap

Drop the Age column from the main flow table to the Expr. field of the lookup table. Then in the globalMap Key field, put in the variable name, using the expression. In this example, it reads: people.Age Click OK to save the mapping setting and go back to the design workspace. To finalize the implementation of the dynamic filtering of the lookup flow, you need now to add a WHERE clause in the query of the database input.

At the end of the Query field, following the Select statement, type in the following WHERE clause: WHERE AGE ='"+((Integer)globalMap.get("people.Age"))+"'"
Talend Open Studio Components 1459

Processing components
tMap

Make sure that the type corresponds to the column used as variable. In this use case, Age is of Integer type. And use the variable the way you set in the globalMap key field of the map editor. Double-click the tMysqloutput component to define its properties.

Select the Use an existing connection check box to leverage the created DB connection. Define the target table name and relevant DB actions. Click the Run tab at the bottom of the design workspace, to display the Job execution tab. From the Debug Run view, click the Traces Debug button to view the data processing progress. For more comfort, you can maximize the Job design view while executing by simply double-clicking on the Job name tab.

The lookup data is reloaded for each of the main flows rows, corresponding to the age constraint. All age matches are retrieved in the lookup rows and grouped together in the output flow. Therefore if you check out the data contained in the newly created people_mixandmatch table, you will find all the age duplicates corresponding to different individuals whose age equals to 60 or 40 and the city where they have been to.

1460

Talend Open Studio Components

Processing components
tMap

Scenario 7: Mapping with join output tables


The following scenario describes a Job that processes reject flows without separating them from the main flow.

In the Repository tree view, click Metadata > File delimited. Drag and drop the customers metadata onto the workspace. The customers metadata contains information about customers, such as their ID, their name or their address, etc. For more information about centralizing metadata, see How to centralize the Metadata items in the Talend Open Studio User Guide.
Talend Open Studio Components 1461

Processing components
tMap

In the dialog box that asks you to choose which component type you want to use, select tFileInputDelimited and click OK. Drop the states metadata onto the design workspace. Select the same component in the dialog box and click OK. The states metadata contains the ID of the state, and its name. Drop a tMap and two tLogRow components from the Palette onto the design workspace. Connect the customers component to the tMap, using a Row > Main connection. Connect the states component to the tMap, using a Row > Main connection. This flow will automatically be defined as Lookup. Double-click the tMap component to open the Map Editor. Drop the idState column from the main input table to the idState column of the lookup table to create a join. Click the tMap settings button and set Join Model to Inner Join. Click the Property Settings button at the top of the input area to open the [Property Settings] dialog box, and clear the Die on error check box in order to handle the execution errors. The ErrorReject table is automatically created.

Select the id, idState, RegTime and RegisterTime in the input table and drag them to the ErrorReject table.

1462

Talend Open Studio Components

Processing components
tMap

Click the [+] button at the top right of the editor to add an output table. In the dialog box that opens, select New output. In the field next to it, type in the name of the table, out1. Click OK. Drag the following columns from the input tables to the out1 table: id, CustomerName, idState, and LabelState. Add two columns, RegTime and RegisterTime, to the end of the out1 table and set their date formats: "dd/MM/yyyy HH:mm" and "yyyy-MM-dd HH:mm:ss.SSS" respectively. Click in the Expression field for the RegTime column, and press Ctrl+Space to display the auto-completion list. Find and double-click TalendDate.parseDate. Change the pattern to ("dd/MM/yyyy HH:mm",row1.RegTime). Do the same thing for the RegisterTime column, but change the pattern to ("yyyy-MM-dd HH:mm:ss.SSS",row1.RegisterTime).

Click the [+] button at the top of the output area to add an output table. In the dialog box that opens, select Create join table from, choose Out1, and name it rejectInner. Click OK. Click the tMap settings button and set Catch lookup inner join reject to true in order to handle rejects. Drag the id, CustomerName, and idState columns from the input tables to the corresponding columns of the rejectInner table. Click in the Expression field for the LabelState column, and type in UNKNOWN. Click in the Expression field for the RegTime column, press Ctrl+Space, and select TalendDate.parseDate. Change the pattern to ("dd/MM/yyyy HH:mm",row1.RegTime).

Talend Open Studio Components

1463

Processing components
tMap

Click in the Expression field for the RegisterTime column, press Ctrl+Space, and select TalendDate.parseDate, but change the pattern to ("yyyy-MM-dd HH:mm:ss.SSS",row1.RegisterTime). If the data from row1 has a wrong pattern, it will be returned by the ErrorReject flow.

Click OK to validate the changes and close the editor. Double-click the first tLogRow component to display its Component view. Click Sync columns to retrieve the schema structure from the mapper if needed. In the Mode area, select Table. Do the same thing with the second tLogRow. Save your Job and press F6 to execute it. The Run console displays the main out flow and the ErrorReject flow. The main output flow unites both valid data and inner join rejects, while the ErrorReject flow contains the error information about rows with unparseable date formats.

1464

Talend Open Studio Components

Processing components
tNormalize

tNormalize
tNormalize Properties
Component family Processing/Fields

Function Purpose Basic settings

Normalizes the input flow following SQL standard. tNormalize helps improve data quality and thus eases the data update. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. In this component, the schema is read-only. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Java only Java only Get rid of duplicated Select this check box to deduplicate rows in the rows from output data of the output flow. Use CSVparameters Select this check box to include CSV specific parameters such as escape mode and enclosure character. Column to normalize Item separator Select the column from the input flow which the normalization is based on Enter the separator which will delimits data in the input flow.

Usage Limitation

This component can be used as intermediate step in a data flow. n/a

Scenario: Normalizing data (in Perl)


This simple scenario illustrates a Job that normalizes a list of tags for Web forum topics and outputs them into a table in the standard output console (Run tab).

Drop the following components from the Palette to the design workspace: tFileInputDelimited, tNormalize, tLogRow.
Talend Open Studio Components 1465

Processing components
tNormalize

In the tFileInputDelimited Basic settings, set the input file to be normalized.

The file schema is stored in the repository for ease of use. It is made of one column, called Tags, containing rows with one or more keywords. Set the Row Separator and the Field Separator.

On the tNormalize Basic settings panel, define the column the normalization operation is based on. In this use case, the column to normalize is Tags.

The Item separator is the comma, surrounded here by single quotes as the Job is done in Perl. In the tLogRow component, select the Print values in the cells of table check box. Save the Job and press F6 to execute it.

1466

Talend Open Studio Components

Processing components
tNormalize

The values are normalized and displayed in a table cell on the console.

Talend Open Studio Components

1467

Processing components
tPerl

tPerl
tPerl properties
Component family Processing

Function Purpose Basic settings

tPerl transforms any data entered as argument of Perl commands. tPerl is an (Perl) editor that is a very flexible tool within a job. Code Type in the Perl code based on the command and task you need to perform. For further information about Perl functions syntax, see Talend Open Studio online Help (under Talend Open Studio User Guide > Perl) Select this check box to gather the job processing metadata at a job level as well as at each component level.

Advanced settings

tStatCatcher Statistics

Usage Limitation

Typically used for debugging but can also be used to display a variable content. This component requires an advanced Perl user level and is not meant to be used with a Row connection as is meant for single use.

Scenario: Displaying a number of processed lines


This scenario is a three-component job showing in the Log the number of rows being processed and output in an Excel file.

Drop three components from the Palette onto the design workspace: tFileInputDelimited, tFileOutputExcel, tPerl Right-click tFileInputDelimited and connect it to tFileOutputExcel using a main Row. Right-click again tFileInputDelimited and link it to tPerl using a Trigger > OnSubjobOk link. This link means that, following the arrow direction, the first component (tFileDelimited) will run before the second component (tPerl).

1468

Talend Open Studio Components

Processing components
tPerl

Click once on tFileInputDelimited and select the Basic settings tab to define the component properties.

The Properties are not reused from or for another job stored in the repository, but instead are used for this job only. Therefore select Built-In in the drop-down list. Enter a path or browse to the file containing the data to be processed. In this example, the text file gathers a list of names facing the relevant email addresses. Define the Row and Field separators. In this scenario, there is one name and the matching email per row. And the fields are separated by a semi-colon. The first row of the file contains the labels of the columns, therefore it should be ignored in the job. Therefore the Header field value is 1. There is no footer nor limit value to be defined for this scenario. The Schema type is also built-in in this case. Click on Edit Schema and describe the content of the input file. In this scenario, there are two columns labelled Name and Emails, of type String and with no length defined. Key field being Email. Select the tFileOutputExcel component and define it accordingly. Select the output file path, Sheet and synchronize the schema. Then define the tPerl sub-job in order to retrieve the number of rows read from tFileInputDelimited.

Enter the Perl command print to get the variable containing the number of rows read in tFileInputDelimited. To access the list of available variables, press Ctrl+Space then select the relevant variable in the list. For a better readability in the Run Job log, add equal signs before and after the commands. Note also that commands, strings and variables are colored differently.
Talend Open Studio Components 1469

Processing components
tPerl

Then switch to the Run Job tab and execute the job. The job runs smoothly and creates an output Excel file following the two-field schema defined: Name and Email.

The Perl command result is shown in the job log.

1470

Talend Open Studio Components

Processing components
tPivotToRows

tPivotToRows
tPivotToRows properties
Component family Processing

Function Purpose Basic settings

tPivotToRows transforms multiple columns into multiple key/value lines. tPivotToRows makes it possible to choose a list of columns from the input flow and transforms it into lines in the output flow. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component in the Job. Built-in: You create the schema and store it locally for the relevant component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Row keys Select the list of the columns of the input schema which you want to display as a unique output line. The non-selected columns will constitute the pivot. Click the plus button to add as many lines as of columns to concatenate. click in each of the lines of the Input column list and select the name of the column. Row key concatenate delimiter Set separators for concatenated values.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at the job level as well as at each component level.

Usage

This component is generally used as an intermediate step in a data flow. It needs then an input and output components.

Talend Open Studio Components

1471

Processing components
tPivotToRows

Scenario: Concatenating a list of columns in a table by using the other table columns as pivot
This Perl scenario describes a four-component Job that allows to concatenate in a single line the information distributed in a list of columns by using the other columns in the table as pivot. Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tPivotToRows and tLogRow (x2). Connect the four components using Row Main links.

In the design workspace, double-click tFileInputDelimited to display its Basic settings view where you can define its properties.

From the Property Type list, select: -Repository if you have already stored the metadata of your input file in the Repository, the fields that follow are filled in with the stored information automatically, or -select Built-In and fill in the fields that follow manually. For this example, we use the Built-In mode. Click the three-dot button next to the File Name field and browse to the input file. In this example, our source file is use_case_tunpivotrow and it holds eight columns. The five columns: id, CustomerName, CustomerAddress, id2 and RegisterTime are to be concatenated, while the other three columns: Sum1, Sum2 and Sum3 are to be used as pivot. In the Basic settings view of tFileInputDelimited, define in the corresponding fields the row and field separators used in the source file. If needed, set Header, Footer and Limit. In this example, set Header to 1 since the first row that holds columns names is to be ignored. Footer and Limit for the number of processed rows are not set.

1472

Talend Open Studio Components

Processing components
tPivotToRows

In the Schema field, set schema to Built in then click the three-dot button next to the Edit Schema field to define the data to be passed to the following component. In this example, the source schema consists of the eight columns of the source file use_case_tunpivotrow.

In the design workspace, double-click tPivotToRows to display its Basic settings view where you can define the component properties. Click Sync columns to retrieve the schema from the preceding component.
You can click the three-dot button next to Edit schema to check the retrieved schema.

Click the plus button to add in the Row keys area as many lines as the columns to concatenate. In this example, we add five lines.

Click in each of the lines of the Input column list and select the name of the column to be concatenated. The input schema columns that are not selected will be used as pivot. In the Row key concatenate delimiterfield, define a character to separate the data of the various columns once the concatenation is completed. Double-click the first tLogRow component to display its Basic settings view and define its properties.

Talend Open Studio Components

1473

Processing components
tPivotToRows

In the mode area, select Table to display the source file and the tPivotToRows results together to be able to compare them. Do the same for the second tLogRow component save the Job and press F6 to execute it.

The console shows the results of the two tLogRow components. Table tLogRow_1 gives an outline of the source file and table tLogRow_2 shows the concatenation of the columns id, CustomerName, CustomerAddress, id2 and RegisterTime as well as the transformation of the columns Sum1, Sum2 and Sum3 as pivot.

1474

Talend Open Studio Components

Processing components
tReplace

tReplace
tReplace Properties
Component family Processing

Function Purpose Basic settings

Carries out a Search & Replace operation in the input columns defined. Helps to cleanse all files before further processing. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Two read-only columns, Value and Match are added to the output schema automatically. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Simple Mode Search / Replace Click Plus to add as many conditions as needed. The conditions are performed one after the other for each row. Input column: Select the column of the schema the search & replace is to be operated on Search: Type in the value to search in the input column Replace with: Type in the substitution value. Whole word: Select this check box if the searched value is to be considered as whole. Case sensitive: Select this check box to care about the case. Note that you cannot use regular expression in these columns.

Use advanced mode Select this check box when the operation you want to perform cannot be carried out through the simple mode. In the text field, type in the regular expression as required. Usage This component is not startable as it requires an input flow. And it requires an output component.

Talend Open Studio Components

1475

Processing components
tReplace

Scenario: multiple replacements and column filtering


This following Job (made in Perl) searches and replaces various typos and defects in a csv file then operates a column filtering before producing a new csv file with the final output.

Click & drop the following components from the Palette: tFileInputDelimited, tReplace, tFilterColumn and tFileOutputDelimited. Connect the components using Main Row connections via a right-click each component. Select the tFileInputDelimited component and set the input flow parameters.

The Property type for this scenario is Built-in. Therefore the following fields are to be set manually unlike the Properties stored centrally in the repository, that are retrieved automatically. The File is a simple csv file stored locally. The Row Separator is a carriage return and the Field Separator is a semi-colon. In this example no Header, no Footer and no Limit are to be set. The file contains characters such as: \t, |||, [d] or *d which should not be interpreted as special characters or wild card.

1476

Talend Open Studio Components

Processing components
tReplace

The schema for this file is built in also and made of four columns of various types (string or int). Now select the tReplace component to set the search & replace parameters.

The schema can be synchronized with the incoming flow. Select the Simple mode check box as the search parameters can be easily set without requiring the use of regexp. Click the plus sign to add some lines to the parameters table. On the first parameter line, select amount as input column. In the search field look for the decimal dot separator and replace it with a comma, in between single quotes. On the second parameter line, select str as input column. In the search field, look for stret or streat or stre. Note that these values are separated by a pipe that means or in Perl language. Replace them by Street. Select the whole word check box. On the third parameter line, select again str as input column, search the pipe character using a backslash in front, to differentiate it from the or in Perl language. and replace it with nothing between single quotes (). On the fourth parameter line, select firstname as input column. In the Search field, look for the following characters: [, ], +, *. Note that these values are separated by a pipe that means or in Perl language. Replace them with nothing between single quotes ().

Talend Open Studio Components

1477

Processing components
tReplace

On the fifth parameter line, select amount as input column. In the Search field, type in the dollar sign between single quotes and In the Replace field, type in the Euro sign. On the last parameter line, select firstname as input column. Search the string: \t. To differenciate it from the tabulation, add as many backslashes in front of it as there are parsing, in other words, two backlashes are used to avoid misinterpreting and two extra backslashes constitute part of the character being looked for. In total four backslahes including the one in the character it self are being searched. Replace them with nothing between single quotes (). And select the whole word check box. The advanced mode isnt used in this scenario. Select the next component in the Job, tFilterColumn.

The tFilterColumn component holds a schema editor allowing to build the output schema based on the column names of the input schema. In this use case, change the order of the input schema columns and add 3 new columns, to obtain a schema as follows: empty_field, firstname, name, str, amount, filler1, filler2. Click OK to validate.

1478

Talend Open Studio Components

Processing components
tReplace

Set the tFileOutputDelimited properties manually. The schema is built-in for this scenario, and comes from the preceding component in the Job. Save the Job and press F6 to execute it.

The first column is empty and the rest of the columns have been cleaned up from the parasitical characters. The street column was moved. And the decimal delimiter has been changed from a dot to a comma, along with the currency sign.

Talend Open Studio Components

1479

Processing components
tSampleRow

tSampleRow
tSampleRow properties
Component family Processing

Function Purpose Basic settings

tSampleRow filters rows according to line numbers. tSampleRow helps to select rows according to a list of single lines and/or a list of groups of lines. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component in the Job. Built-in: You create the schema and store it locally for the relevant component. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Range Enter a range using the relevant syntax to choose a list of single lines and/or a list of groups of lines.

Usage Limitation

This component handles flows of data therefore it requires input and output components. n/a

Scenario: Filtering rows and groups of rows


This Java scenario describes a three-component Job. A tRowGenerator is used to create random entries which are directly sent to a tSampleRow where they will be filtered according to a defined range. In this scenario, we suppose the input flow contains names of salespersons along with their respective number of sold products and their years of presence in the enterprise. The result of the filtering operation is displayed on the Run console. Drop the following components from the Palette onto the design workspace: tRowGenerator, tSampleRow, and tLogRow. Connect the three components using Row Main links.

1480

Talend Open Studio Components

Processing components
tSampleRow

In the design workspace, select tRowgenerator. Click the Component tab to define the basic settings for tRowGenerator. In the Basic settings view, set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to define the data you want to use as input. In this scenario, the schema is made of five columns.

In the Basic settings view, click RowGenerator Editor to define the data to be generated. In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows for RowGenerator field and click OK. The RowGenerator Editor closes.

In the design workspace, select tSampleRow.

Talend Open Studio Components

1481

Processing components
tSampleRow

Click the Component tab to define the basic settings for tSampleRow.

n the Basic settings view, set the Schema to Built-In and click Sync columns to retrieve the schema from the tRowGenerator component. In the Range panel, set the filter to select your rows using the correct syntax as explained. In this scenario, we want to select the first and fifth lines along with the group of lines between 9 and 12. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more information about tLogRow, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The filtering result displayed on the console shows the first and fifth rows and the group of rows between 9 and 12.

1482

Talend Open Studio Components

Processing components
tSortRow

tSortRow
tSortRow properties
Component family Processing

Function Purpose Basic settings

Sorts input data based on one or several columns, by sort type and order Helps creating metrics and classification table. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the previous component connected in the Job. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Criteria Click + to add as many lines as required for the sort to be complete. By default the first column defined in your schema is selected. Schema column: Select the column label from your schema, which the sort will be based on. Note that the order is essential as it determines the sorting priority. Sort type: Numerical and Alphabetical order are proposed. More sorting types to come. Order: Ascending or descending order.

Advanced settings

Sort on disk

Customize the memory used to temporarily store output data. Temp data directory path: Set the location where the temporary files should be stored. Create temp data directory if not exists: Select this checkbox to create the directory if it does not exist. Buffer size of external sort: Type in the size of physical memory you want to allocate to sort processing.

Talend Open Studio Components

1483

Processing components
tSortRow

tStatCatcher Statistics Usage Limitation

Select this check box to gather the job processing metadata at the job level as well as at each component level.

This component handles flow of data therefore it requires input and output, hence is defined as an intermediary step. n/a

Scenario: Sorting entries


This scenario describes a three-component Job. A tRowGenerator is used to create random entries which are directly sent to a tSortRow to be ordered following a defined value entry. In this scenario, we suppose the input flow contains names of salespersons along with their respective sales and their years of presence in the company. The result of the sorting operation is displayed on the Run console.

Drop the three components required for this use case: tRowGenerator, tSortRow and tLogRow from the Palette to the design workspace. Connect them together using Row main links. On the tRowGenerator editor, define the values to be randomly used in the Sort component. For more information regarding the use of this particular component, see tRowGenerator on page 1344.

In this scenario, we want to rank each salesperson according to its Sales value and to its number of years in the company. Double-click tSortRow to display the Basic settings tab panel. Set the sort priority on the Sales value and as secondary criteria, set the number of years in the company.

1484

Talend Open Studio Components

Processing components
tSortRow

Use the plus button to add the number of rows required. Set the type of sorting, in this case, both criteria being integer, the sort is numerical. At last, given that the output wanted is a rank classification, set the order as descending. Display the Advanced Settings tab and select the Sort on disk check box to modify the temporary memory parameters. In the Temp data directory path field, type the path to the directory where you want to store the temporary data. In the Buffer size of external sort field, set the maximum buffer value you want to allocate to the processing.
The default buffer value is 1000000 but the more rows and/or columns you process, the higher the value needs to be to prevent the Job from automatically stopping. In that event, an out of memory error message displays.

Make sure you connected this flow to the output component, tLogRow, to display the result in the job console. Press F6 to run the Job. The ranking is based first on the Sales value and then on the number of years of experience.

Talend Open Studio Components

1485

Processing components
tSortRow

1486

Talend Open Studio Components

Processing components
tXMLMap

tXMLMap
tXMLMap properties
Component family Processing

Function Purpose Basic settings

tXMLMap is an advanced component fine-tuned for transforming and routing XML data flow (data of the document type). tXMLMap transforms and routes data from single or multiple sources to single or multiple destinations. Map editor It allows you to define the tXMLMap routing and transformation properties. If you do not want to handle execution errors, you can select the Die on error check box (selected by default) on the top toolbar in the output area. It will kill the Job if there is an error.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the job processing metadata at the job level as well as at each component level.

Usage

Possible uses are from a simple reorganization of fields to the most complex jobs of data multiplexing or demultiplexing transformation, concatenation, inversion, filtering and so on. It is used as an intermediate component and fits perfectly the process requiring many XML data sources, such as, the ESB request-response processes. The use of tXMLMap supposes minimum Java and XML knowledge in order to fully exploit its functionalities. This component is a junction step, and for this reason cannot be a start nor end component in the Job. One and only one loop element is required for each XML data flow involved. For further information, see Mapping data flows in the Talend Open Studio User Guide.

Limitation

Scenario: Mapping and transforming XML data


In this scenario, a three-component Job is run to map and transform data from one XML source, customer.xml and generate a XML output flow which could be reused for various purposes in the future, such as, for a ESB request. These three components are: tFileInputXML: this component is used to provide input data to tXMLMap. tXMLMap: this component maps and transforms the received XML data flows into one single XML data flow.

Talend Open Studio Components

1487

Processing components
tXMLMap

tLogRow: this component is used to display the the output data.

To replicate this scenario, proceed as follows: From the Palette, drop tFileInputXML, tXMLMap and tLogRow into the Design workspace.
A component used in the workspace can be labelled the way you need. In this scenario, these two input components are labelled respectively Customers for tFileInputXML. For further information about how label a component, see section View tab in the Talend Open Studio User Guide

Double click the tFileInputXML component labelled Customers to open its contextual menu. From this menu, select Row > Main link to connect this component to tXMLMap.. Repeat this operation to connect tXMLMap to tLogRow using Row > *New output* (Main) link. A dialog box pops up to prompt you to name this output link. In this scenario, name it as Customer_States.

Double-click the tFileInputXML component labelled Customers to display its Basic settings view.

1488

Talend Open Studio Components

Processing components
tXMLMap

Next to Edit schema, click the three-dot button to open the schema editor.

In the schema editor, click the plus button to add one row. In the Column column, type in a new name for this row. In this scenario, it is Customer. In the Type column, select the data type of this row. In this scenario, it is Document. The document data type is essential for making full use of tXMLMap. For further information about this data type, see section Using the document type in the Talend Open Studio User Guide. Click OK to validate this editing and accept the propagation prompted by the popup dialog box. One row is added automatically to the Mapping table. In the File name / Stream field, browse to, or type in the path to the XML source that provides the customer data. In the Loop XPath query field, type in / to replace the default one. This means the source data is queried from the root. In the XPath query column of the Mapping table, type in the XPath. In this scenario, type in ., meaning that all of the data from source are queried. In the Get Nodes column of the Mapping table, select the check box.
Talend Open Studio Components 1489

Processing components
tXMLMap

Double-click the tXMLMap component to open the Map Editor. Note that the input area is already filled with the defined input tables and that the top table is the main input table. In the left table, right-click Customer to open the contextual menu.

From this contextual menu, select Import From File and in the popup dialog box, browse to the corresponding source file in order to import therefrom the XML structure used by the data to be received by tXMLMap. In this scenario, the source file is Customer..xml, which is also connected to by tFileInputXML (Customers). In the imported XML tree, right click the Customer node and select As loop element to set it as the loop element.

On the lower part of this map editor, click the schema editor tab to display the corresponding view. On the right side of this view, click the plus button to add one row to the Customer table and rename this row as Customer.

In the Type column of this Customer_States row, select Document as the data type. The corresponding XML root is added automatically to the top table on the right side which represents the output flow.

1490

Talend Open Studio Components

Processing components
tXMLMap

On the right side in the top table labelled Customer, import the XML data structure that you need to use from the corresponding XML source file. In this scenario, it is Customer_State.xml.

Right click the customer node and select As loop element from the contextual menu. Then you can begin to map the input flow to the output flow. In the top table on the input side (left) of the map editor, click the id node and drop it to the Expression column in the row corresponding to the output row you need map. In this scenario, it is the @id node.

Do the same to map CustomerName to CustomerName, CustomerAddress to CustomerAddress and idState to idState from the left side to the right side. Click OK to validate the mappings and close the Map Editor. Press F6 to run this Job.

Talend Open Studio Components

1491

Processing components
tXMLMap

1492

Talend Open Studio Components

System components
This chapter details the main components that you can find in the System family of the Talend Open Studio Palette. The System family groups together components that help you to interact with the operating system.

System components
tRunJob

tRunJob
tRunJob Properties
Component family System

Function Purpose Basic settings

tRunJob executes the Job called in the components properties, in the frame of the context defined. tRunJob helps mastering complex job systems which need to execute one Job after another. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide Use dynamic job Select this check box to allow multiple Jobs to be called and processed. When this option is enabled, only the latest version of the Jobs can be called and processed. An independent process will be used to run the subjob. The Context and the Use an independent process to run subjob options disappear. This field is visible only when the Use dynamic job option is selected. Enter the name of the Job that you want to call from the list of Jobs selected. Click to fetch the child job schema. Select the Job to be called in and processed. Make sure you already executed once the Job called, beforehand, in order to ensure a smooth run through tRunJob. Select the child Job version that you want to use. If you defined contexts and variables for the Job to be run by the tRunJob, select the applicable context entry on the list.

Context job

CopyChild Job Schema Job

Version Context

1494

Talend Open Studio Components

System components
tRunJob

Use an independent process to run subjob Die on child error

Select this check box to use an independent process to run the subjob. This helps in solving issues related to memory limits. Clear this check box to execute the parent Job even though there is an error when executing the child Job. Select this check box to get all the context variables from the parent Job. Deselect it to get all the context variables from the child Job. You can change the selected context parameters. Click the plus button to add the parameters as defined in the Context of the child Job. For more information on context parameters, see Context settings in the Talend Open Studio User Guide. Select this check box to display the internal and external parameters in the Console. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Transmit whole context Context Param

Advanced settings

Print Parameters tStatCatcher Statistics

Usage

This component can be used as a standalone Job or can help clarifying complex Job by avoiding having too many sub-jobs all together in one Job. Child return code: Indicates the Java return code of the child Job. This is available as an After variable. Returns an integer: - if no errors > the code value is 0. - if errors > an exception message shows. Child exception stack trace: Returns a Java stack trace from a child job. This is available as an After variable. Returns a string. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Talend Open Studio Components

1495

System components
tRunJob

Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide. Limitation n/a

Scenario: Executing a child Job


This scenario describes a single-component Job calling in and executing another Job. The Job to be executed reads a basic delimited file and simply displays its content on the Run log console. The particularity of this Job lies in the fact that this latter Job is executed from a separate Job and uses a context variable to prompt for the input file to be processed.

Create the first Job reading the delimited file. Drop a tFileInputDelimited and a tLogRow from the Palette to the design workspace. Connect the two components together using a Row Main link. Double-click tFileInputDelimited to open its Basic settings view and define its properties.

1496

Talend Open Studio Components

System components
tRunJob

Set the Property type to Built-In for this Job. Click in the File Name field and then press F5 to open the [New Context Parameter] dialog box and configure the context variable.

In the Name field, enter a name for this new context variable, File in this example. In this example, there is no need to either select the Prompt for value check box or to set a prompt message, as the default parameter value can be used. Click Finish to validate the modification and press Enter on your keyboard to make sure the new context variable is stored the File Name field. In the Basic settings view, type in the field and row separators used in the input file. If needed, set Header, Footer, and Limit. In this example, no header or footer are used and no limit for the number of processed rows is set.

Talend Open Studio Components

1497

System components
tRunJob

Set Schema type to Built-in for this example. Click the three-dot button next to the field name to open the schema dialog box where you can configure the schema manually. In the dialog box, click the plus button to add two columns and name them following the first and second column names of your input file, username and age in this example.
If you store your schema in the Repository tree view, you only need to select the relevant metadata entry corresponding to your input file structure.

Double-click tLogRow to display its Basic settings view and define its properties. Click Sync columns to retrieve the schema of the input component and then set other options according to your needs. Save your Job and press F6 to make sure that it executes without error. Create the second Job that will be the parent Job. Drop a tFileList and a tRunJob from the Palette to the design workspace. Connect the two components together using an Iterate link. Double-click tFileList to open its Basic settings view and define its properties.

In the Directory field, set the path to the directory that holds the files to be processed, or click the three-dot button next to the field to browse to the directory. In this example, the directory is called tRunJob and it holds three delimited files. In the FileList Type list, select Files. Select the Use Glob Expressions as Filemask check box to be able to use regular expressions in your file masks. In the Files area, click the plus button to add a line where you can set the filter to apply. In this example, we want only to retrieve delimited files *.csv. Double-click tRunJob to display its Basic settings view and define its properties.

1498

Talend Open Studio Components

System components
tRunJob

Click the three-dot button next to the Job field to open the [Find a Job] dialog box.

Select the child Job you want to execute and click OK to close the dialog box. The name of the selected Job displays in the Job field in the Basic settings view of tRunJob. Click Copy Child Job Schema to retrieve the schema from the child Job. In the Context Param area, click the plus button to add a line and define the context parameter. Click in the Values cell and then press Ctrl+Espace on your keyboard to access the list of context variables. In the list, select tFileList-1.CURRENT_FILEPATH. The corresponding context variable displays in the Values cell: ((String)globalMap.get(tFileList-1.CURRENT_FILEPATH)). For more information on context variables, see Context settings in Talend Open Studio User Guide.

Talend Open Studio Components

1499

System components
tRunJob

Save your Job and press F6 to execute it.

The called-in Job reads the data contained in the input file, as defined by the input schema, and the result of this Job is displayed directly in the Run console. Related topic: tLoop on page 1362, and Scenario 1: Buffering data (Java) on page 1318 of the tBufferOutput component.

1500

Talend Open Studio Components

System components
tSetEnv

tSetEnv
tSetEnv Properties
Component family System

Function Purpose

tSetEnv adds variables temporarily to system environment during the execution of a Job. tSetEnv allows to create variables and execute a job script through communicating the information about the newly created variables between subjobs. After job execution, the newly created variables are deleted. Parameters Click the plus button to add the variables needed for the job. name: Enter the syntax for the new variable. value: Enter a parameter value according to the context. append: Select this check box to add the new variable at the end.

Basic settings

Usage Limitation

tSetEnv can be used as a start or an intermediate component. n/a

Scenario: Modifying the Date variable during the execution of a Job


The following scenario is a Perl Job that reads a column in an Oracle DB, retrieves the current date from the column using a DB query, creates a new variable using tSetEnv to modify the date format, and outputs date, in the modified format, on the console. To modify the date format using a new variable created by tSetEnv: Drop the following components from the Palette onto the design workspace: tSetEnv, tOracleInput, and tLogRow. Connect tSetEnv to tOracleInput using an OnSubjobOk link. Connect tOracleInput to tLogRow using a Row Main link.

Talend Open Studio Components

1501

System components
tSetEnv

Select tSetEnv and click the Component tab to display the component view. In the Basic settings panel, click the plus button to add a new parameter line and define your new variable. Click in the name cell and enter the syntax for your date variable. In this example, we use NLS_DATE_FORMAT. Click in the value cell and enter the desired value for your new date variable.

In this example, we want to modify the date format DD-MMM-YY predefined in the system to display as YYYY-MM-DD. Select tOracleInput and click the Component tab to display the component view. Set the Basic settings for the tOracleInput component. For more information, see tOracleInput on page 689.

1502

Talend Open Studio Components

System components
tSetEnv

In this example, we query an Oracle database to extract the data held in the now column from the setenv table. Select tLogRow and click the Component tab to display the component view. Set the Basic settings for the tLogRow component as needed. For more information, see tLogRow on page 1305. Save your Job and press F6 to execute it.

The date displays on the console in the format YYYY-MM-DD set with the tSetEnv component. To display the date in the above Job in the format pre-defined in the system: In the design workspace, right-click the tSetEnv component and select Deactivate tSetEnv_1 from the drop-down list. Press F6 to execute your Job.

Talend Open Studio Components

1503

System components
tSetEnv

The date displays on the console in the format DD-MMM-YY pre-defined in the system.

1504

Talend Open Studio Components

System components
tSSH

tSSH
tSSH Properties
Component family System

Function Purpose Basic settings

Returns data from a remote computer, based on the secure shell command defined. Allows to establish a communication with distant server and return securely sensible information. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the preceding component in the Job. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide Host Port User IP address Listening port number User authentication information Select the relevant option. In case of Public Key, type in the passphrase, if required, in the Key Passphrase field and then in the Private key field, type in the private key or click the three dot button next to the Private key field to browse to it. Select the relevant option. In case of Password,type in the required password in the Password field.

Authentication method

Public Key/Key Passphrase/Private Key

Authentication method

Password/Password

Authentication method

Keyboard Select the relevant option. Interactive/Password In case of Keyboard Interactive, type in the required password in the Password field. Pseudo terminal Select this check box to call the interactive shell that performs the terminal operations.

Talend Open Studio Components

1505

System components
tSSH

Command separator

Type in the command separator required. Once the Pseudo terminal check box is selected, this field becomes unavailable. Type in the command for the relevant information to be returned from the remote computer. When you select the Pseudo terminal check box, this table becomes a terminal emulator and each row in this table is a single command. Define the timeout time period. A timeout message will be generated if the actual response time exceeds this expected processing time. Select the destination to which the standard output is returned. The output may be returned to: - to console: the output is displayed in the console of the Run view. - to global variable: the output is indicated by the corresponding global variable. - both to console and global variable: the output is indicated both of the two means. - normal: the output is a standard ssh output. Select the destination to which the error output is returned. The output may be returned to: - to console: the output is displayed in the console of the Run view. - to global variable: the output is indicated by the corresponding global variable. - both to console and global variable: the output is indicated both of the two means. - normal: the output is a standard ssh output.

Commands

Use timeout/timeout in seconds Standard Output

Error Output

Usage Global variables

This component can be used as standalone component. Standard Output: Indicates the standard execution output of the remote command. It is available as an After variable. Returns a String. Error output: Indicates the error execution output of the remote command. It is available as an After variable. Returns a String. Exit value: Indicates the exit status of the remote command. It is available as an After variable. Returns an Integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

1506

Talend Open Studio Components

System components
tSSH

Connections

Outgoing links (from one component to another): Row: Main Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. Incoming links (from one component to another): Row: Main; Iterate Trigger: Run if; On Component Ok; On Component Error; On Subjob Ok; On Subjob Error. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

The component use is optimized for Unix-like systems.

Scenario: Remote system information display via SSH


The following use case describes a basic Job that uses SSH command to display the hostname of the distant server being connected to, and the current date on this remote system. The tSSH component is sufficient for this Job. Drop it from the Palette to the design workspace. Double-click on the tSSH component and select the Basic settings view tab.

Talend Open Studio Components

1507

System components
tSSH

Type in the name of the Host to be accessed through SSH as well as the Port number. Fill in the User identification name on the remote machine. Select the Authentication method on the list. For this use case, the authentication method used is the public key. Thus fill in the corresponding Private key. On the Command field, type in the following command. For this use case, type in hostname; date between single quotes (as the Job is generated in Perl.). Select the Use timeout check box and set the time before falling in error to 5 seconds.

The remote machine returns the host name and the current date and time as defined on its system.

1508

Talend Open Studio Components

System components
tSystem

tSystem
tSystem Properties
Component family System

Function Purpose Basic settings

tSystem executes one or more system commands. tSystem can call other processing commands, already up and running in a larger Job. Use home directory Command Select this check box to change the name and path of a dedicated directory. Enter the system command. Note that the syntax is not checked. In Windows, the MS-DOS commands do not allow you to pass directly from the current folder to the folder containing the file to be launched. To launch a file, you must therefore use an initial command to change the curent folder, then a second one to launch the file Standard Output and Select the type of output for the processed data to Error Output be transferred to. to console: data is passed on to be viewed in the Run view. to global variable: data is passed on to an output variable linked to the tSystem component. to console and to global variable: data is passed on to the Run view and to an output variable linked to the tSystem component. normal: data is passed on to the component that comes next. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to make changes to the schema. Note that if you make changes, the schema automatically becomes built-in. Click Sync columns to retrieve the schema from the preceding component in the Job. Built-in: You create and store the schema locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide.

Talend Open Studio Components

1509

System components
tSystem

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide Environment variables Click the [+] button to add as many global variables as needed. name: Enter the syntax of the new variable. value: Enter a value for this variable according to the context.

Usage

This component can typically used for companies which already implemented other applications that they want to integrate into their processing flow through Talend. Standard Output: Returns the standard output from a process. This is available as an After variable Returns a string. Error Output: Returns the erroneous output from a process. This is available as an After variable. Returns a string. Exit Value: Returns an exit code. This is available as an After variable. Returns an integer: - if there are no errors > the exit code is 0. - if there are errors > the exit code is 1. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Global Variables

Connections

Outgoing links (from one component to another): Row: Main. Trigger: On Subjob Ok; On Subjob Error; Run if. Incoming links (from one component to another): Row: Main; Reject; Iterate. Trigger: On Subjob Ok; On Subjob Error; Run if; On Component Ok; On Component Error; Synchronize; Parallelize. For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Echo Hello World!


This Java scenario is a two-component Job that executes a system command and shows the results in the Run view console.

1510

Talend Open Studio Components

System components
tSystem

Drop a tJava and a tSystem components from the Palette to the design workspace. Connect the two components together using a Trigger > OnSubjobOk link between the two components.

When executing the Job, the first component triggers the second one. Double-click tSystem to open the Basic settings view and display the component properties.

In the Command field, enter the echo command followed by the string to display, Hello World! in this example. In the Standard Output field, select to a global variable to send the output to a global variable. Keep the by-default parameters in the other fields. Double-click tJava to open the Basic settings view and define the component properties.

Talend Open Studio Components

1511

System components
tSystem

Enter the following Java command to display the tSystem output variable in the console: System.out.println(Hello World!); Save your Job and press F6 to execute it.

The Job executes an echo command and shows the output in the Console of the Run view using a Println command in the tJava component.

1512

Talend Open Studio Components

Talend MDM components


This chapter details the main components that you can find in the Talend MDM family of the Talend Open Studio Palette. The Talend MDM family groups together connectors that read and write master data in the MDM hub (XML repository).

Talend MDM components


tMDMBulkLoad

tMDMBulkLoad
tMDMBulkLoad properties
Component family Talend MDM

Function Purpose Basic settings

tMDMBulkLoad writes XML structured master data into the MDM hub in bulk mode. This component uses bulk mode to write data so that big batches of data or data of high complexity can be fast uploaded onto the MDM server. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. If you modify the schema, it automatically becomes built-in. Click Sync columns to collect the schema from the previous component. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema in the Talend Open Studio User Guide. XML field URL Username and Password Version Select the name of the column in which you want to write the XML data. Type in the URL required to access the MDM server. Type in the user authentication data for the MDM server. Type in the name of the Version of master data you want to connect to, for which you have the required user rights. Leave this field empty if you want to display the default Version of master data. Type in the name of the data model against which the data to be written is validated. Type in the name of the data container where you want to write the master data. Type in the name of the entity that holds the data record(s) you want to write.

Data model Data Container Entity

1514

Talend Open Studio Components

Talend MDM components


tMDMBulkLoad

Validate

Select this checkbox to validate the data you want to write onto the MDM server against validation rules defined for the current data model. For more information on how to set the validation rules, see Data Models in your Talend Master Data Management Administrator Guide. If you need faster loading performance, do not select this checkbox.

Generate ID

Select this check box to generate an ID number for all of the data written. If you need faster loading performance, do not select this checkbox.

Commit size Advanced settings Connections tStatCatcher Statistics

Type in the row count of each batch to be written onto the MDM server. Select this check box to gather the processing metadata at the Job level as well as at each component level. Outgoing links (from one component to another): Row: Main, Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Main Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage

This component needs always an incoming link to offer XML structured data. If your data offered is not yet in the XML structure, you need use components like tWriteXMLField to transform this data into the XML structure. For further information about tWriteXMLField, see tWriteXMLField on page 1601.

Enhancing the MDM bulk data load


The information below concerns only MDM used with eXist. As XML parsing is a CPU and memory consuming process, it is not really compatible with large datasets. The Scenario: Loading records into a business entity given in a following section as an example for the tMDMBulkLoad component has some limitations because it can not work with large dataset, for the time being at least. An alternative scenario in which you process the dataset file per bulk load iterations can be designed as the following:

Talend Open Studio Components

1515

Talend MDM components


tMDMBulkLoad

In such a scenario, the tMDMBulkLoad component waits for XML data as an input. You must manually format this incoming data to match the entity schema defined in the MDM Studio. Most of the time, the data you want to import is in a flat format, and you have to transform it into XML. As XML parsing is memory consuming, you can workaround this problem by splitting your source file into several files using the tAdvancedFileOutputXML component. To do this, you select the Split output in several files option in the Advaced settings view of the component and then set the rows in each output file through a context variable (context.chunkSize), for example.

The XML schema you must define in the XML editor of this component should be an exact match of the business entity defined in the MDM Studio. The XMl schema in the editor must
1516 Talend Open Studio Components

Talend MDM components


tMDMBulkLoad

represent a single <root> element which contains all the other elements, so that you can loop on each of the element. The path of the file should be defined in a temporary folder. Use a tFileList component to read all the XML files that have just been created. This component enables you to parallelize the process. Connect it to a tFileInputXML component using the Iterate link.

For the Iterate link, it is recommended that you set as many threads as the number of the physical cores of the computer. You can achieve that using Runtime.getRuntime().availableProcessors()

The tFileInputXML component will read the data from the XML files you have created, by defining a loop on the elements, and getting all the nodes that are already formatted as XML. You must then select the Get Nodes check box.

Finally, you must setup the tMDMBulkLoad component as the following:

Make sure that you set the commit size to the same value you defined in the tAdvancedfileOutputXML, the context.chunkSize context variable.

The tFiledelete component in such a scenario will delete all the temporary data at the end of the Job.

Talend Open Studio Components

1517

Talend MDM components


tMDMBulkLoad

Scenario: Loading records into a business entity


This scenario describes a Job that loads records into the ProductFamily business entity defined by a specific data model in the MDM hub. Prerequisites of this Job: The Product data container: this data container is used to separate the product master data domain from the other master data domains. The Product data model: this data model is used to define the attributes, validation rules, user access rights and relationships of the entities of interest. Thus it defines the attributes of the ProductFamily business entity. The ProductFamily business entity: this business entity contains Id, Name, both defined by the Product data model. For further information about how to create a data container, a data model, and a business entity along with its attributes, see the Talend Master Data Management Administrator Guide. The Job in this scenario uses three components.

tFixedFlowInput: this component generates the records to be loaded into the ProductFamily business entity. In the real case, your records to be loaded are often voluminous and stored in a specific file, while in order to simplify the replication of this scenario, this Job uses tFixedFlowInput to generate four sample records. tWriteXMLField: this component transforms the incoming data into XML structure. tMDMBulkLoad: this component writes the incoming data into the ProductFamily business entity in bulk mode, generating ID value for each of the record data.
For the time being, tWriteXMLField has some limitations when used with very large datasets. Another scenario is possible to enhance the MDM bulk data load. For further information, see Enhancing the MDM bulk data load on page 1515.

To replicate this scenario, proceed as follows: Drop tFixedFlowInput, tWriteXMLField and tMDMBulkLoad onto the design workspace. Right click tFixedFlowInput to open its contextual menu. Select Row > Main to connect tFixedFlowInput to the following component using Main link. Do the same to link the other components. Double click tFixedFlowInput to open its Basic settings view.

1518

Talend Open Studio Components

Talend MDM components


tMDMBulkLoad

Click the three-dot button next to Edit schema to open the schema editor.

In the schema editor, click the plus button to add one row. In the schema editor, click the new row and type in the new name: family. Click OK. In the Mode area of the Basic settings view, select the Use inline table option. Under the inline table, click the plus button four times to add four rows in the table. In the inline table, click each of the added rows and type in their names between the quotation marks: Shirts, Hats, Pets, Mugs. Double click tWriteXMLField to open its Basic settings view.

Talend Open Studio Components

1519

Talend MDM components


tMDMBulkLoad

Click the three-dot button next to the Edit schema field to open the schema editor where you can add a row by clicking the plus button.

Click the newly added row to the right view of the schema editor and type in the name of the output column where you want to write the XML content. In this example, type in xmlRecord. Click OK to validate this output schema and close the schema editor. In the popped up dialog box, click OK to propagate this schema to the following component. On the Basic settings view, click the three-dot button next to Configure Xml Tree to open the interface that helps to create the XML structure.

In the Link Target area, click rootTag and rename it as ProductFamily, which is the name of the business entity used in this scenario.
1520 Talend Open Studio Components

Talend MDM components


tMDMBulkLoad

In the Linker source area, drop family to ProductFamily in the Link target area. A dialog box displays asking what type of operation you want to do. Select Create as sub-element of target node to create a sub-element of the ProductFamily node. Then the family element appears under the ProductFamily node. In the Link target area, click the family node and rename it as Name, which is one of the attributes of the ProductFamily business entity. Right-click the Name node and select from the contextual menu Set As Loop Element. Click OK to validate the XML structure you defined. Double-click tMDMBulkLoad to open its Basic settings view.

In XML Field, click this field and select xmlRecord from the drop-down list. In the URL field, enter the MDM server URL, between quotes: for example, http://localhost:8080/talend/TalendPort. In the Username and Password fields, enter your login and password to connect to the MDM server. In the Data Model and the Data Container fields, enter the names corresponding to the data model and the data container you need to use. Both are Product for this scenario. In the Entity field, enter the name of the business entity which the records are to be loaded in. In this example, type in ProductFamily. Select the Generate ID check box in order to generate ID values for the records to be loaded. In the Commit size field, type in the batch size to be written into the MDM hub in bulk mode. Press F6 to run the Job. Log into your Talend MDM Web User Interface to check the newly added records for the ProductFamily business entity.

Talend Open Studio Components

1521

Talend MDM components


tMDMBulkLoad

1522

Talend Open Studio Components

Talend MDM components


tMDMDelete

tMDMDelete
tMDMDelete properties
Component family Talend MDM

Function Purpose Basic settings

tMDMDelete deletes data records from specific entities in the MDM hub (XML repository). This component deletes master data in an MDM hub. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. If you modify the schema, it automatically becomes built-in. Click Sync columns to collect the schema from the previous component. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema in the Talend Open Studio User Guide. URL Username and Password Version Type in the URL required to access the MDM server. Type in the user authentication data for the MDM server. Type in the name of the Version of master data you want to connect to, for which you have the required user rights. Leave this field empty if you want to display the default Version of master data. Type in the name of the entity that holds the data record(s) you want to delete. Type in the name of the data container that holds the data record(s) you want to delete. Specify the field(s) (in sequence order) composing the key when the entity have a multiple key. Select this check box to send the master data to the Recycle bin and fill in the Recycle bin path. Once in the Recycle bin, the master data can be definitely deleted or restored. If you leave this check box clear, the master data will be permanently deleted.

Entity Data Container Keys Logical delete

Talend Open Studio Components

1523

Talend MDM components


tMDMDelete

Die on error

Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Advanced settings Usage

tStatCatcher Statistics

Use this component to write a file and separate the fields using a specific separator.

Scenario: Deleting master data from an MDM hub


This scenario describes a two-component Job that deletes the specified data record from the MDM XML repository.

Drop tMDMInput and tMDMDelete of the Talend MDM family from the Palette onto the design workspace. Connect the two components together using a Row > Main link. Double-click tMDMInput to display its Basic settings view and define the component properties.

In the Property Type list, select Built-in to complete the fields manually. If you have stored the MDM connection information in the repository metadata, select Repository from the list and the fields will be completed automatically.

1524

Talend Open Studio Components

Talend MDM components


tMDMDelete

In the Schema Type list, select Built-in and click [...] next to Edit schema to open a dialog box. Here you can define the structure of the master data you want to read in the MDM hub.

The master data is collected in a four column schema of the type String: Id, Name, City and State. Click OK to close the dialog box and proceed to the next step. In the URL field, enter the MDM server URL, between quotes: for example, "http://localhost:8080/talend/TalendPort". In the Username and Password fields, enter your login and password to connect to the MDM server. In the Version field, enter between quotes the name of the master data Version you want to access. Leave this field empty to access the default master data Version. In the Entity field, enter between quotes the name of the business entity that holds the data record(s) you want to read. Here, we want to access the Agency entity. In the Data Container field, enter between quotes the name of the data container that holds the master data you want to read, the DStar container in this example. The Use multiple conditions check box is selected by default. In the Operations table, define the conditions to filter the master data you want to delete as the following: -click the plus button to add a new line. -In the Xpath column, enter between quotes the Xpath and the tag of the XML node on which you want to apply the filter. In this example, we work with the Agency entity, so enter Agency/Id. -In the Function column, select the function you want to use. In this scenario, we use the Starts With function. -In the Value column, enter the value of your filter. Here, we want to filter the master data which Id starts with TA. In the Component view, click Advanced settings to set the advanced parameters.

Talend Open Studio Components

1525

Talend MDM components


tMDMDelete

In the Loop XPath query field, enter between quotes the structure and the name of the XML node on which the loop is to be carried out. In the Mapping table and in the XPath query column, enter between quotes the name of the XML tag in which you want to collect the master data, next to the corresponding output column name. In the design workspace, double-click the tMDMDelete component to display the Basic settings view and set the component properties.

In the Schema list, select Built-in and click the three-dot button next to the Edit Schema field to describe the structure of the master data in the MDM hub.

1526

Talend Open Studio Components

Talend MDM components


tMDMDelete

Click the plus button to the right to add one column of the type String. set the name of this column to xmlOutput. Click OK to close the dialog box and proceed to the next step. In the URL field, enter the URL required to connect to the MDM server, for example: "http://localhost:8080/talend/TalendPort". In the Username and Password fields, enter the authentication information required to connect to the server. In the Version field, enter between inverted commas the name of the master data Version you want to access. Leave the field empty if you want to access the default Version. In the Entity field, enter the name of the business entity that holds the master data you want to delete, the Agency entity in this example. In the Data Container, enter the name of the data container that holds the data to be deleted, DStar in this example. In the Keys table, click the plus button to add a new line. In the Keys column, select the column that holds the key of the Agency entity. Here, the key of the Agency entity is set on the Id field.
If the entity has multiple keys, add as many line as required for the keys and select them in sequential order.

Select the Logical delete check box if you do not want to delete the master data permanently. This will send the deleted data to the Recycle bin. Once in the Recycle bin, the master data can be restored or permanently deleted. If you leave this check box clear, the master data will be permanently deleted. Fill in the Recycle bin path field. Here, we left the default path but if your recycle bin is in a path different from the default, specify the path. Press Ctrl+S to save your Job and F6 to execute it. The master data which Id starts with TA have been deleted and sent to MDM Recycle bin.

Talend Open Studio Components

1527

Talend MDM components


tMDMInput

tMDMInput
tMDMInput properties
Component family Talend MDM

Function Purpose Basic Settings

tMDMInput reads master data in the MDM hub (XML repository). This component reads master data in an MDM hub and thus makes it possible to process this data. Property Type Either Built in or Repository. Built-in: No property data stored centrally Repository: Select the repository file where properties are stored. The fields that follow are completed automatically using the fetched data Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. If you modify the schema, it automatically becomes built-in. Built-in: The schema will be created and stored for this component only. Related Topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the repository. You can reuse it in various projects and jobs. Related Topic: How to set a repository schema in the Talend Open Studio User Guide. URL Username and Password Version Type in the URL to access the MDM server. Type in user authentication data for the MDM server. Type in the name of the master data Version you want to connect to and to which you have access rights. Leave this field empty if you want to display the default Version. Type in the name of the business entity that holds the master data you want to read. Type in the name of the data container that holds the master data you want to read.

Entity Data Container

1528

Talend Open Studio Components

Talend MDM components


tMDMInput

Use multiple conditions Select this check box to filter the master data using certain conditions. Xpath: Enter between quotes the path and the XML node to which you want to apply the condition. Function: Select the condition to be used from the list. Value: Enter between inverted commas the value you want to use. Predicate: Select a predicate if you use more than one condition. If you clear this check box, you have the option of selecting particular IDs to be displayed in the ID value column of the IDS table. If you clear the Use multiple conditions check box, the Batch Size option in the Advanced Settings tab will no longer be available Skip rows Limit Die on error Enter the number of lines to be ignored. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link. Number of lines in each processed batch. This option is not displayed if you have cleared the Use multiple conditions check box in the Basic settings view. The XML structure node on which the loop is based. Column: reflects the schema as defined in the Edit schema editor. XPath query: Type in the name of the fields to extract from the input XML structure. Get Nodes: Select this check box to retrieve the Xml node together with the data. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Advanced settings

Batch Size

Loop XPath query Mapping

tStatCatcher Statistics Usage

Use this component as a start component. It needs an output flow.

Scenario: Reading master data in an MDM hub


This scenario describes a two-component Job that reads master data on an MDM server. The master data is fetched and displayed in the log console.

From the Palette, drop tMDMInput and tLogRow onto the design workspace.
Talend Open Studio Components 1529

Talend MDM components


tMDMInput

Connect the two components together using a Row Main link. Double-click tMDMInput to open the Basic settings view and define the component properties.

In the Property Type list, select Built-In to complete the fields manually. If you have stored the MDM connection information in the repository metadata, select Repository from the list and the fields will be completed automatically. In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box. Here you can define the structure of the master data you want to read on the MDM server.

The master data is collected in a three column schema of the type String: ISO2Code, Name and Currency. Click OK to close the dialog box and proceed to the next step. In the URL field, enter between inverted commas the URL of the MDM server. In the Username and Password fields, enter your login and password to connect to the MDM server.

1530

Talend Open Studio Components

Talend MDM components


tMDMInput

In the Version field, enter between inverted commas the name of the master data Version you want to access. Leave this field empty to display the default Version. In the Entity field, enter between inverted commas the name of the business entity that holds the master data you want to read. In the Data Container field, enter between inverted commas the name of the data container that holds the master data you want to read. In the Component view, click Advanced settings to set the advanced parameters.

In the Loop XPath query field, enter between inverted commas the structure and the name of the XML node on which the loop is to be carried out. In the Mapping table and in the XPath query column, enter between inverted commas the name of the XML tag in which you want to collect the master data, next to the corresponding output column name. In the design workspace, click on the tLogRow component to display the Basic settings in the Component view and set the properties. Click on Edit Schema and ensure that the schema has been collected from the previous component. If not, click Sync Columns to fetch the schema from the previous component. Save the Job and press F6 to run it.

Talend Open Studio Components

1531

Talend MDM components


tMDMInput

The list of different countries along with their codes and currencies is displayed on the console of the Run view.

1532

Talend Open Studio Components

Talend MDM components


tMDMOutput

tMDMOutput
tMDMOutput properties
Component family Talend MDM

Function Purpose Basic settings

tMDMOutput writes master data in an MDM hub (XML repository). This component writes master data on the MDM server. Property Type Either Built-in or Repository. Built-in: No property data stored centrally Repository: Select the repository file where the properties are stored. The fields which follow are filled in automatically using the fetched data. Schema and Edit schema An input schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. Note that if you modify the schema, it automatically becomes built-in. Click Sync columns to collect the schema from the previous component. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema in the Talend Open Studio User Guide. XML Field URL Username and Password Lists the name of the xml output column that will hold the XML data. Type in the URL of the MDM server. Type in the user authentication data for the MDM server. This user should have the right role in MDM, i.e. can connect through a Job or any other web service call. For further information, see Security in Talend Master Data Management Administrator Guide. Type in the name of the master data management Version you want to connect to, for which you have the user rights required. Leave this field empty if you want to display the default perspective. Type in the name of the data model against which the data to be written is validated. Type in the name of the data container where you want to write the master data. 1533

Version

Data Model Data Container

Talend Open Studio Components

Talend MDM components


tMDMOutput

Return Keys

Columns corresponding to IDs in order: in sequential order, set the output columns that will store the return key values (primary keys) of the item(s) that will be created. Select this check box to update the modified fields. If you leave this check box unchecked, all fields will be replaced by the modified ones. Select this check box to add the actions carried out to a modification report. Source Name: Between quotes, enter the name of the application to be used to carry out the modifications. Enable verification by before saving process: Select this check box to verify the commit that has been just added; prior to saving.

Is Update

Fire Create/Update event

1534

Talend Open Studio Components

Talend MDM components


tMDMOutput

Use partial update

Select this check box if you need to update multi-occurrences elements (attributes) of an existing item (entity) from the content of a source XML stream. Once selected, you need to set the parameters presented below: - Pivot: type in the xpath to the multi-occurrences sub-element where data need to be added or replaced in the item of interest. For example, if you need to add a child sub-element to the below existing item:
<Person> <Id>1</Id> <!-- record key is mandatory --> <Children> <Child>[1234]</Child> <!-- FK to a Person Entity --> </Children> </Person>

then the Xpath you enter in this Pivot field must read as the following: Person/Children/Child where the Overwrite check box is set to false. And, if you need to replace a child sub-element in an existing item:
<Person> <Id>1</Id> <Addresses> <Address> <Type>office</Type> (...address elements are here....) </Address> <Address> <Type>home</Type> (...address elements are here....) </Address> <Addresses> </Person>

then the Xpath you enter in this Pivot field must read as the following: Person/Addresses/Adress where the Overwrites check box is set to true, and the Key field is set to Person/Addresses/Address/Type . In such example, assuming the item in MDM only has an office address, the office address will be replaced, and the home address will be added. - Overwrite: select this check box if you need to replace or update the original sub-elements with the input sub-elements. Leave unselected if you want to add a sub-element. - Key: type in the xpath relative to the pivot that will help matching a sub-element of the source XML with a sub-element of the item. If a key is not supplied, all sub-elements of an item with an XPath matching that of the sub-element of the source XML will be replaced. -Position: type in a number to indicate the position after which the new elements (those that do not match the key) will be added. If you do not provide a value in this field, the new element will be added at the end. Die on error Select this check box to skip the row in error and complete the process for error-free rows. If needed, you can retrieve the rows in error via a Row > Rejects link.

Talend Open Studio Components

1535

Talend MDM components


tMDMOutput

Advanced settings

Extended Output

Select this check box to commit master data in batches. You can specify the number of lines per batch in the Rows to commit field. Opens the interface which helps create the XML structure of the master data you want to write. Select the column to be used to regroup the master data.

Configure Xml Tree Group by

Create empty element if This check box is selected by default. If the content of the needed interfaces Related Column which enables creation of the XML structure is null, or if no column is associated with the XML node, this option creates an opening and closing tag at the required places. Advanced separator (for number) Select this check box to modify the number of separators used by default. - Thousands separator: enter between inverted commas the separator for thousands. - Decimal separator: enter between inverted commas the decimal separator. Select the generation mode you want to use, according to the available memory. - Fast but memory consuming (Dom4J) - Slow with no memory consumed Select the encoding type from the list or else select Custom and define it manually. This is an obligatory field for the manipulation of data on the server. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Generation mode

Encoding

tStatCatcher Statistics Usage

Use this component to write a data record and separate the fields using a specific separator.

Scenario: Writing master data in an MDM hub


This scenario describes a two-component Job that generates a data record, transforms it into XML and loads it into the defined business entity in the MDM server.

In this example, we want to load a new agency in the Agency business entity. This new agency should have an id, a name and a city. From the Palette, drop tFixedFlowInput and tMDMOutput onto the design workspace. Connect the components using a Row Main link. Double-click tFixedFlowInput to view its Basic settings, in the Component tab and set the component properties.

1536

Talend Open Studio Components

Talend MDM components


tMDMOutput

In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box in which you can define the structure of the master data you want to write on the MDM server.

Click the plus button and add three columns of the type String. Name the columns: Id, Name and City. Click OK to validate your changes and proceed to the next step. In the Number of rows field, enter the number of rows you want to generate. In the Mode area, select the Use Single Table option to generate just one table. In the Value fields, enter between inverted commas the values which correspond to each of the schema columns. In the design workspace, click tMDMOutput to open its Basic settings view and set the component properties.

Talend Open Studio Components

1537

Talend MDM components


tMDMOutput

In the Property Type list, select Built-In and complete the fields manually. If you have saved the MDM connection information under Metadata in the repository, select Repository from the list and the fields which follow will be completed automatically. In the Schema list, select Built-In and, if required, click on the three dot button next to the Edit Schema field to see the structure of the master data you want to load on the MDM server.

The tMDMOutput component basically generates an XML document, writes it in an output field, and then sends it to the MDM server, so the output schema always has a read-only xml column.

1538

Talend Open Studio Components

Talend MDM components


tMDMOutput

Click OK to proceed to the next step. The XML Field list in the Basic settings view is automatically filled in with the output xml column. In the URL field, enter the URL of the MDM server. In the Username and Password fields, enter the authentication information required to connect to the MDM server. In the Version field, enter between inverted commas the name of the master data Version you want to access, if more than one exists on the server. Leave the field blank to access the default Version. In the Data Model field, enter between inverted commas the name of the data model against which you want to validate the master data you want to write. In the Data Container, enter between inverted commas the name of the data container into which you want to write the master data. In the Component view, click Advanced settings to set the advanced parameters for the tMDMOutput component.

Select the Extended Output check box if you want to commit master data in batches. You can specify the number of lines per batch in the Rows to commit field. Click the three-dot button next to Configure Xml Tree to open the tMDMOutput editor.

Talend Open Studio Components

1539

Talend MDM components


tMDMOutput

In the Link target area to the right, click in the Xml Tree field and then replace rootTag with the name of the business entity in which you want to insert the data record, Agency in this example. In the Linker source area, select your three schema columns and drop them on the Agency node. The [Selection] dialog box displays.

Select the Create as sub-element of target node option so that the three columns are linked to the three XML sub-elements of the Agency node and then click OK to close the dialog box.

Right-click the element in the Link Target area you want to set as a loop element and select Set as Loop Element from the contextual menu. In this example, we want City to be the iterating object.
1540 Talend Open Studio Components

Talend MDM components


tMDMOutput

Click OK to validate your changes and close the dialog box. Save your Job and press F6 to run it. The new data record is inserted in the Agency business entity in the DStar data container on the MDM server. This data records holds, as you defined in the schema, the agency id, the agency name and the agency city.

Talend Open Studio Components

1541

Talend MDM components


tMDMReceive

tMDMReceive
tMDMReceive properties
Component family Talend MDM

Function Purpose Basic Settings

tMDMReceive receives an MDM record in XML from MDM triggers or MDM processes. This component decodes a context parameter holding MDM XML data and transforms it into a flat schema. Property Type Either Built in or Repository. Built-in: No property data stored centrally Repository: Select the repository file where properties are stored. The fields that follow are completed automatically using the fetched data Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Click Edit Schema to modify the schema. If you modify the schema, it automatically becomes built-in. Built-in: The schema will be created and stored for this component only. Related Topic: How to set a built-in schema in the Talend Open Studio User Guide. Repository: The schema already exists and is stored in the repository. You can reuse it in various projects and jobs. Related Topic: How to set a repository schema in the Talend Open Studio User Guide. XML Record Enter the context parameter allowing to retrieve the last changes made to the MDM server. For more information about creating and using a context parameter, see How to centralize contexts and variables in the Talend Open Studio User Guide. If required, select from the list the looping xpath expression which is a concatenation of the prefix + looping xpath. /item: select this xpath prefix when the component receives the record from a process because processes encapsulate the record within an item element only. /exchange/item: select this xpath prefix when the component receives the record from a trigger because triggers encapsulate the record within an item element which is within an exchange element. Set the XML structure node on which the loop is based.

XPath Prefix

Loop XPath query

1542

Talend Open Studio Components

Talend MDM components


tMDMReceive

Mapping

Column: reflects the schema as defined in the Edit schema editor. XPath query: Type in the name of the fields to extract from the input XML structure. Get Nodes: Select this check box to retrieve the XML node together with the data. Maximum number of rows to be processed. If Limit = 0, no row is read or processed. This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Limit Die on error

Advanced settings Usage

tStatCatcher Statistics

Use this component as a start component. It needs an output flow.

Related scenario
No scenario is available for this component yet.

Talend Open Studio Components

1543

Talend MDM components


tMDMRouteRecord

tMDMRouteRecord
tMDMRouteRecord properties
Component family Talend MDM

Function

tMDMRouteRecord submits the primary key of a record stored in your MDM hub (repository) to Event Manager in order for Event Manager to trigger the due process(es) against some specific conditions that you can define in the process or trigger pages of the MDM Studio. For more information on Event Manager and on a MDM process, see section Important terms in Talend MDM Studio in Talend Master Data Management Administrator Guide. This component helps Event Manager identify the changes which you have made on your data so that correlative actions can be triggered. URL Username and Password Version Type in the URL of the MDM server. Type in the user authentication data for the MDM server. Type in the name of the master data management Version you want to connect to, for which you have the user rights required. Leave this field empty if you want to display the default perspective. Type in the name of the data container that holds the record you want Event Manager to read. Type in the name of the business entity that holds the record you want Event Manager to read. Specify the primary key(s) of the record(s) you want Event Manager to read. Select this check box to gather the processing metadata at the Job level as well as at each component level. Number of Lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

Purpose Basic Settings

Data Container Entity Name IDS Advanced settings Global Variables tStatCatcher Statistics

1544

Talend Open Studio Components

Talend MDM components


tMDMRouteRecord

Connections

Outgoing links (from one component to another): Row: Iterate Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Iterate; Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage

Use this component as a start component. It needs an output flow.

Scenario: Routing a record to Event Manager


In this scenario, the tMDMRouteRecord component is used to submit the primary key of a record noting an update to Event Manager in order for this element to trigger a process that informs the user of this update.
Talend MDM is case-sensitive, so respect the differences of uppercase and lowercase when realizing the scenario.

Scenario prerequisites
The following prerequisites must be met in order to replicate this scenario: A data container stores several records using a specific model. In this scenario, the container is named Product, and a record in the container is entered against the model named Product:

This figure shows one of the stored product records with all of its viewable attributes.

Talend Open Studio Components

1545

Talend MDM components


tMDMRouteRecord

For further information about how to create a data container, a data model, see your Talend Master Data Management Administrator Guide. For further information about how to create a record and access its viewable attributes, see your Talend MDM Web User Interface User Guide. A Job used to inform the user of the update and already deployed on the MDM server. In this scenario, the job is called message, using only the tMsgBox component. Double-click the component to display and configure its Basic settings :

In the Title field, type in Talend MDM. In the Message field to be popped up, type in A record is updated. For further information about the tMsgBox component, see section tMsgBox on page 1342. For further information about how to deploy a Job onto the MDM server, see section Deploying Jobs automatically on the MDM server in Talend Master Data Management Administrator Guide.

Routing a record to trigger the corresponding process


This section shows you how to replicate the whole scenario using tMDMRouteRecord to trigger a process. Log onto your Talend MDM Web UI and click Browse Records. For further details about how to log onto the Talend MDM Web UI and open the Browse Records view, see Logging in to the Web User Interface, in the Talend MDM Web User Interface User Guide. In the upper right corner of the web page, click on the panel. button to show the Actions

On the Actions panel on the right, select the required data container and data model in which is the record to be updated. In this scenario, the data container and the data model are both Product. Click Save to save the selected data container and data model. In the Browse Records view, select the entity of your interest. In this example, it is Product.

1546

Talend Open Studio Components

Talend MDM components


tMDMRouteRecord

Click Search to open the record list on the lower part of the Web page.

Double-click one of the product records to display its viewable attributes in a new view dedicated to this product. For example, open the product Talend Mug with unique Id 231035938.

Talend Open Studio Components

1547

Talend MDM components


tMDMRouteRecord

In this view, modify one of the attribute values. You can, for example, update this product and make it available by selecting the Availability check box. Click Save to validate this update. Open your Talend MDM studio and access the MDM hub (MDM Server). For further information about how to launch the Talend MDM studio and connect it to the MDM hub, see the section Lauching Talend MDM Studio in the Talend Master Data Management Administrator Guide.

Under the Job Repository node of the MDM Server tree view, right click the message Job. In the contextual menu, select Generate Talend Job Caller Process.The process used to call this Job is generated and displays in the directory Event Management > Process.

1548

Talend Open Studio Components

Talend MDM components


tMDMRouteRecord

Under the Event Management node, right click Trigger. In the contextual menu, select New. In the pop-up New Trigger wizard, name the trigger as, for example, TriggerMessage.

Click OK to open the new triggers view in the workspace of your studio. In the triggers view, configure the trigger to make it launch the process that calls the message Job once an update is done.

Talend Open Studio Components

1549

Talend MDM components


tMDMRouteRecord

In the Description field, enter, for example, Trigger that calls the Talend Job: message_0.1.war to describe the trigger being created. In the Entity field, select or type in the business entity you want to trigger the process on. In this example, it is exactly Update. In the Service JNDI Name field, select callprocess from the drop-down list. In the Service Parameters field, complete the parameter definition by giving the value: CallJob_message_0.1.war. This value is the name of the process to be called that you can find in the directory Event Management > Process in the MDM server tree view. In the Trigger xPath Expressions area, click the new XPath line. button under the table to add a

In the newly added line, click the three-dot button to open a dialog box where you can select the entity or element on which you want to define conditions. In this example, it is Update/OperationType.

1550

Talend Open Studio Components

Talend MDM components


tMDMRouteRecord

In the Value column, enter a value for this line. In this example, it is exactly UPDATE. In the Condition Id column, enter a unique identifier for the condition you want to set, for example, C1. In the Conditions area, enter the query you want to undertake on the data record using the condition ID C1 you set earlier. Press Ctrl+S to save the trigger. In the MDM server tree view, double click Data container > system > UpdateReport to open the Data Container Browser UpdateReport view. An Update Report is a complete track of all create, update or delete actions on any master data

Talend Open Studio Components

1551

Talend MDM components


tMDMRouteRecord

Next to the Entity field of this view, click the button to search all the action records in the UpdateReport. Note that the Update entity does not necessarily mean that the corresponding action recorded is the update, as it is just the entity name defined by the data model of UpdateReport and may record different actions including create, delete, update. The last record corresponds to what is done on the product record at the beginning of the scenario. The primary key of this record is genericUI.1283244014172 and this is the record that will be routed to Event trigger. On the menu bar of the studio, click Window > Perspective > Data Integration to design the Job routing a record. On the Data Integration perspective, create a Job and name it RouteRecord. To do so, right-click Job Designs, in the Repository tree view. In the contextual menu, select Create Job. A wizard opens. In the Name field, type in RouteRecord, and click Finish. Drop the tMDMRouteRecord component from the Palette onto the design workspace. Double click this component to open its Component view.

In the URL field, enter the address of your MDM server. This example uses http://localhost:8080/talend/TalendPort. In the Username and the Password fields, type in the connection parameters. In the Data Container field, enter the data container name that stores the record you want to route. It is UpdateReport in this example. In the Entity Name field, enter the entity name that the record you want to route belongs to. In this example, the entity name is Update. In the IDS area, click the plus button under the table to add a new line. In the newly added line, fill in the primary key of the record to be routed to Event Manager, that is, genericUI.1283244014172, as was read earlier from the Data Container Browser UpdateReport. Press F6 to run this Job. Event Manager calls the process to execute the message Job and generate the dialog box informing the user that this recorded has been updated.

1552

Talend Open Studio Components

Talend MDM components


tMDMRouteRecord

This component submits the primary key of the record noting the update to Event Manager. When Event Manager checks this record and finds that this record meets the conditions you have defined on the trigger TriggerMessages configuration view, it calls the process that launches the message Job to pop up the dialog box informing the user of this update.

Talend Open Studio Components

1553

Talend MDM components


tMDMSP

tMDMSP
tMDMSP Properties
Component family Talend MDM

Function Purpose Basic settings

tMDMSP calls the MDM hub stored procedure. tMDMSP offers a convenient way to centralize multiple or complex queries in a MDM hub and call them easily. Schema and Edit Schema In SP principle, the schema is an input parameter. A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository. Built-in: The schema is created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused. Related topic: How to set a repository schema of Talend Open Studio User Guide. URL Username and Password Version Type in the URL of the MDM server. Type in the user authentication data for the MDM server. Type in the name of the master data management Version you want to connect to, for which you have the user rights required. Leave this field empty if you want to display the default perspective. Type in the name of the data container that stores the procedure you want to call. Type in the exact name of the Stored Procedure

Data Container SP Name

Parameters (in order) Click the Plus button and select the various Input Columns that will be required by the procedures. The SP schema can hold more columns than there are paramaters used in the procedure.

1554

Talend Open Studio Components

Talend MDM components


tMDMSP

Connections

Outgoing links (from one component to another): Row: Main Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Main, Iterate; Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Usage

This component is used as intermediary component. It can be used as start component but only no input parameters are thus needed for the procedure to be called. N/A

Limitation

Scenario: Executing a stored procedure in the MDM hub


The following job is intended for calculating the total price of each kind of products recorded on your MDM Web UI.

This Job will generate parameters used to execute a stored procedure in the MDM hub, then extract the desired data from the returned XML-format result and present the extracted data in the studio. The products of which the prices are to be treated are listed on your MDM Web UI.

Talend Open Studio Components

1555

Talend MDM components


tMDMSP

The stored procedure to be executed can be found in Stored Procedure node of the MDM servers tree view and reads as follows:

1556

Talend Open Studio Components

Talend MDM components


tMDMSP

For more information on a stored procedure in the MDM server, see section Stored Procedures in the Talend Master Data Management Administrator Guide. To realize this Job, proceed as follows: Drag and drop the following components used in this example: tFixedFlowInput, tMDMSP, tExtractXMLField, tLogRow. Connect the components using the Row Main link. The tFixedFlowInput is used to generate the price range of your interest for this calculation. In this example, define 10 as the minimum and 17 as the maximum in order to cover all of the products. Double-click on tFixedFlowInput to open its Component view. On the Component view, click the [...] button next to Edit schema to open the schema editor of this component. In the schema editor, add the two parameters min and max that are used to define the price range.

Click OK to validate this editing. On the Values table in the Mode area of the Component view, the two parameters min and max that you have defined in the schema editor of this component display. In the Value column of the Values table, enter 10 for the min parameter and 17 for the max parameter.

Talend Open Studio Components

1557

Talend MDM components


tMDMSP

Double-click on tMDMSP to open its Component view.

In the URL field of the Component view, type in the MDM server address, in this example, http://localhost:8080/talend/TalendPort. In Username and Password, enter the authentication information, in this example, admin and talend. In Data Container and Procedure Name, enter the exact names of the data container Product and of the stored procedure PriceAddition. Under the Parameters (in order) table, click the plus button two times to add two rows in this table. In the Parameters (in order) table, click each of both rows you have added and from the drop-down list, select the min parameter for one and the max parameter for the other. Double-click on tExtractXMLField to open its Component view.

1558

Talend Open Studio Components

Talend MDM components


tMDMSP

On the Component view, click the [...] button next to Edit schema to open the schema editor of this component. In the schema editor, add two columns to define the structure of the outcoming data. These two columns are name and sum. They represent respectively the name and the total price of each kind of product recorded in the MDM Web UI.

Click OK to validate the configuration and the two columns display in the Mapping table of the Component view.

Talend Open Studio Components

1559

Talend MDM components


tMDMSP

In the Loop XPath query field, type in the node of the XML tree, which the loop is based on. In this example, the node is /result as you can read in the procedure code: return <result><Name>{$d}</Name><Sum>{sum($product/Price)}</Sum></r esult>. In XPath query of the Mapping table, enter the exact node name on which the loop is applied. They are /result/Name used to extract the product names and /result/Sum used to extract the total prices. Eventually, double-click tLogRow to open its Component view.

Synchronize the schema with the preceding component. And select the Print values in cells of a table check box for reading convenience. Then press F6 to execute the Job. See the outcoming data in the console of the Run view.

The output lists the four kinds of products recorded in the MDM Web UI and the total price for each of them.

1560

Talend Open Studio Components

Talend MDM components


tMDMViewSearch

tMDMViewSearch
tMDMViewSearch properties
Component family Talend MDM

Function

tMDMViewSearch selects records from an MDM hub (XML repository) by applying filtering criteria you have created in a specific view. The resulting data is in XML structure. For more information on a view on which you can define filtering criteria, see section Views in Talend Master Data Management Administrator Guide. This component allows you to retrieve the MDM records from an MDM hub. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either Built-in or remote in the Repository. Click Edit Schema to modify the schema. Note that if you modify the schema, it automatically becomes built-in. Click Sync columns to collect the schema from the previous component. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and job designs. Related topic: How to set a repository schema in the Talend Open Studio User Guide. XML Field URL Username and Password Version Select the name of the column in which you want to write the XML data. Type in the URL of the MDM server. Type in the user authentication data for the MDM server. Type in the name of the master data management Version you want to connect to, for which you have the user rights required. Leave this field empty if you want to display the default perspective. Type in the name of the data container that holds the master data you want to read. Type in the name of the view whose filters will be applied to process the records.

Purpose Basic settings

Data Container View Name

Talend Open Studio Components

1561

Talend MDM components


tMDMViewSearch

Operations

Complete this table to create the WHERE clause. The parameters to be set are: - XPath: define the path expression to select the XML node at which point the filtering is operated. - Functions: select an operator from the drop-down list, like Contains, Starts with, Equals, etc. - Value: type in the value you want to retrieve. - Predicate: select the predicate to combine the filtering conditions in different manners. The predicate may be none, or, and, exactly, etc.

The parameters are case sensitive. Order (One Row) Complete this table to decide the presentation order of the retrieved records. The parameters to be set are: - XPath: define the path expression to select the XML node at which point the sorting operation is performed. - Order: select the presentation order that may be asc (ascending) or desc (descending).

The parameters are case sensitive. For the time being, only the first row created in the Order table is valid. Spell Threshold Skip Rows Set it to -1 to deactivate this threshold. This threshold is used to decide the spell checking level. Type in the count of rows to be ignored to specify from which row the process should begin. For example, if you type 8 in the field, the process will begin from the 9th row. Type in the maximum number of rows to be processed. If Limit = 0, no row is read or processed. By default, the limit is -1, meaning that no limit is set. Select this check box to gather the processing metadata at the Job level as well as at each component level.

Max Rows

Advanced settings Usage Global Variables

tStatCatcher Statistics

Use this component to retrieve specific records. Number of Lines: Indicates the number of lines processed. This is available as an After variable. Returns an integer. For further information about variables, see How to use variables in a Job in the Talend Open Studio User Guide.

1562

Talend Open Studio Components

Talend MDM components


tMDMViewSearch

Connections

Outgoing links (from one component to another): Row: Iterate Trigger: Run if; On Component Ok; On Component Error, On Subjob Ok, On Subjob Error. Incoming links (from one component to another): Row: Iterate; Trigger: Run if, On Component Ok, On Component Error, On Subjob Ok, On Subjob Error For further information regarding connections, see Connection types in the Talend Open Studio User Guide.

Limitation

n/a

Scenario: Retrieving records from an MDM hub via an existing view


This scenario describes a two-component Job that retrieves a data record in XML structure.

In this example, you will select the T-shirt information from the Product entity via the Browse_items_Product view created from Talend Open Studio. Each record in the entity contains the details defined as filtering criteria: Id, Name, Description and Price. From the Palette, drop tMDMViewSearch and tLogRow onto the design workspace. Connect the components using a Row Main link. Double-click tMDMViewSearch to view its Basic settings, in the Component tab and set the component properties.

Talend Open Studio Components

1563

Talend MDM components


tMDMViewSearch

In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box in which you can define the structure of the XML data you want to write in.

Click the plus button and add one column of the type String. Name the column as Tshirt. Click OK to validate your creation and proceed to the next step. In the XML Field field, select Tshirt as the column you will write the retrieved data in. Use your MDM server address in the URL field and type in the corresponding connection data in the Username and the Password fields. In this example, use the default url, then enter admin as username as well as password.

1564

Talend Open Studio Components

Talend MDM components


tMDMViewSearch

In the Data Container field, type in the container name: Product. In the View Name field, type in the view name: Browse_item_Product. Below the Operations table, click the plus button to add one row in this table. In the Operations table, define the XPath as Product/Name, meaning that the filtering operation will be performed at the Name node, then select Contains in the Function column and type in Tshirt in the Value column. Below the Order (One Row) table, click the plus button to add one row in this table. In the Order (One Row) table, define the XPath as Product/Id and select the asc order for the Order column. In the design workspace, click tLogRow to open its Basic settings view and set the properties.

Next to the three-dot button used for editing schema, click Sync columns to acquire the schema from the preceding component. Press F6 to execute the Job.

Talend Open Studio Components

1565

Talend MDM components


tMDMViewSearch

In the console docked in the Run view, you can read the retrieved Tshirt records in XML structure, which are sorted in the ascending order.

1566

Talend Open Studio Components

XML components
This chapter details the main components that you can find in the XML family of the Talend Open Studio Palette. The XML family groups together the components dedicated to XML related tasks such as parsing, validation, XML structure creation and so on.

XML components
tAdvancedFileOutputXML

tAdvancedFileOutputXML
tAdvancedFileOutputXML properties
Component family XML or File/Output

Function Purpose Basic settings

tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal with loop and group by elements if needed. tAdvancedFileOutputXML writes an XML file with separated data values according to an XML tree structure. File name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide Configure XML tree Opens the dedicated interface to help you set the XML mapping. For details about the interface, see Defining the XML tree on page 1569. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the Output component. Select this check box to add the new lines at the end of your source XML file.

Schema type and Edit Schema

Append the source xml file

Generate compact file Select this check box to generate a file that does not have any empty space or line separators. All elements then are presented in a unique line and this will reduce considerably file size. Include DTD or XSL Select this check box to to add the DOCTYPE declaration, indicating the root element, the access path and the DTD file, or to add the processing instruction, indicating the type of stylesheet used (such as XSL types), along with the access path and file name.

1568

Talend Open Studio Components

XML components
tAdvancedFileOutputXML

Advanced settings

Split output in several files Create directory only if not exists Create empty element if needed Create attribute even if its value is NULL Create attribute even if it is unmapped Create associated XSD file

If the XML file output is big, you can split the file every certain number of rows. This check box is selected by default. It creates a directory to hold the output XML files if required. This box is selected by default. If no column is associated to an XML node, this option will create an open/close tag in place of the expected tag. Select this check box to generate XML tag attribute for the associated input column whose value is null. Select this check box to generate XML tag attribute for the associated input column that is unmapped. If one of the XML elements is defined as a Namespace element, this option will create the corresponding XSD file. To use this option, you must select Dom4J as the generation mode. Select this check box to change the expected data separator. Thousands separator: define the thousands separator, between inverted commas Decimal separator: define the decimals separator between inverted commas Select the appropriate generation mode according to your memory availability. -Fast and memory-consuming (Dom4j): or -Slow with no memory consumed. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling. Select the check box to avoid the generation of an empty file.

Advanced separator (for number)

Generation mode

Encoding type

Dont generate empty file

tStatCatcher Statistics Select the check box to collect the log data at a Job level as well as at each component level. Usage Limitation Use this component to write an XML file with data passed on from other components using a Row link. n/a

Defining the XML tree


Double-click on the tAdvancedFileOutputXML component to open the dedicated interface or click on the three-dot button on the Basic settings vertical tab of the Component Settings tab.

Talend Open Studio Components

1569

XML components
tAdvancedFileOutputXML

To the left of the mapping interface, under Schema List, all of the columns retrieved from the incoming data flow are listed (on the condition that an input flow is connected to the tAdvancedFileOutputXML component). To the right of the interface, define the XML structure you want to obtain as output. You can easily import the XML structure or create it manually, then map the input schema columns onto each corresponding element of the XML tree. Importing the XML tree The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file. Rename the root tag that displays by default on the XML tree panel, by clicking on it once. Right-click on the root tag to display the contextual menu. On the menu, select Import XML tree. Browse to the file to import and click OK.
You can import an XML tree from files in XML, XSD and DTD formats.

1570

Talend Open Studio Components

XML components
tAdvancedFileOutputXML

The XML Tree column is hence automatically filled out with the correct elements. You can remove and insert elements or sub-elements from and to the tree: Select the relevant element of the tree. Right-click to display the contextual menu Select Delete to remove the selection from the tree or select the relevant option among: Add sub-element, Add attribute, Add namespace to enrich the tree. Creating the XML tree manually If you dont have any XML structure defined as yet, you can create it manually. Rename the root tag that displays by default on the XML tree panel, by clicking on it once. Right-click on the root tag to display the contextual menu. On the menu, select Add sub-element to create the first element of the structure. You can also add an attribute or a child element to any element of the tree or remove any element from the tree. Select the relevant element on the tree you just created. Right-click to the left of the element name to display the contextual menu. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace or Delete.

Mapping XML data


Once your XML tree is ready, you can map each input column with the relevant XML tree element or sub-element to fill out the Related Column: Click on one of the Schema column name. Drag it onto the relevant sub-element to the right. Release to implement the actual mapping.

Talend Open Studio Components

1571

XML components
tAdvancedFileOutputXML

A light blue link displays that illustrates this mapping. If available, use the Auto-Map button, located to the bottom left of the interface, to carry out this operation automatically. You can disconnect any mapping on any element of the XML tree: Select the element of the XML tree, that should be disconnected from its respective schema column. Right-click to the left of the element name to display the contextual menu. Select Disconnect linker. The light blue link disappears.

Defining the node status


Defining the XML tree and mapping the data is not sufficient. You also need to define the loop element and if required the group element. Loop element The loop element allows you to define the iterating object. Generally the Loop element is also the row generator. To define an element as loop element: Select the relevant element on the XML tree. Right-click to the left of the element name to display the contextual menu. Select Set as Loop Element.

1572

Talend Open Studio Components

XML components
tAdvancedFileOutputXML

The Node Status column shows the newly added status.


There can only be one loop element at a time.

Group element The group element is optional, it represents a constant element where the groupby operation can be performed. A group element can be defined on the condition that a loop element was defined before. When using a group element, the rows should sorted, in order to be able to group by the selected node. To define an element as group element: Select the relevant element on the XML tree. Right-click to the left of the element name to display the contextual menu. Select Set as Group Element.

Talend Open Studio Components

1573

XML components
tAdvancedFileOutputXML

The Node Status column shows the newly added status and any group status required are automatically defined, if needed. Click OK once the mapping is complete to validate the definition and continue the job configuration where needed.

Scenario: Creating an XML file using a loop


The following scenario describes the creation of an XML file from a sorted flat file gathering a video collection.

Drop a tFileInputDelimited and a tAdvancedFileOutputXML from the Palette onto the design workspace. Alternatively, if you configured a description for the input delimited file in the Metadata area of the Repository, then you can directly drag & drop the metadata entry onto the editor, to set up automatically the input flow. Right-click on the input component and drag a row main link towards the tAdvancedFileOutputXML component to implement a connection. Select the tFileInputDelimited component and display the Component settings tab located in the tab system at the bottom of the Studio.

1574

Talend Open Studio Components

XML components
tAdvancedFileOutputXML

Select the Property type, according to whether you stored the file description in the Repository or not. If you dragged & dropped the component directly from the Metadata, no changes to the setting should be needed. If you didnt setup the file description in the Repository, then select Built-in and manually fill out the fields displayed on the Basic settings vertical tab. The input file contains the following type of columns separated by semi-colons: id, name, category, year, language, director and cast.

In this simple use case, the Cast field gathers different values and the id increments when changing movie. If needed, define the tFileDelimitedInput schema according to the file structure.

Talend Open Studio Components

1575

XML components
tAdvancedFileOutputXML

Once you checked that the schema of the input file meets your expectation, click on OK to validate. Then select the tAdvancedFileOutputXML component and click on the Component settings tab to configure the basic settings as well as the mapping. Note that a double-click on the component will open directly the mapping interface.

In the File Name field, browse to the file to be written if it exists or type in the path and file name that needs to be created for the output. By default, the schema (file description) is automatically propagated from the input flow. But you can edit it if you need. Then click on the three-dot button or double-click on the tAdvancedFileOutputXML component on the design workspace to open the dedicated mapping editor. To the left of the interface, are listed the columns from the input file description. To the right of the interface, set the XML tree panel to reflect the expected XML structure output. You can create the structure node by node. For more information about the manual creation of an XML tree, see Defining the XML tree on page 1569.

1576

Talend Open Studio Components

XML components
tAdvancedFileOutputXML

In this example, an XML template is used to populate the XML tree automatically. Right-click on the root tag displaying by default and select Import XML tree at the end of the contextual menu options. Browse to the XML file to be imported and click OK to validate the import operation.
You can import the XML structure from XSML, XSD and STS files.

Then drag & drop each column name from the Schema List to the matching (or relevant) XML tree elements as described in Mapping XML data on page 1571. The mapping is shown as blue links between the left and right panels.

Finally, define the node status where the loop should take place. In this use case, the Cast being the changing element on which the iteration should operate, this element will be the loop element. Right-click on the Cast element on the XML tree, and select Set as loop element. To group by movie, this use case needs also a group element to be defined. Right-click on the Movie parent node of the XML tree, and select Set as group element. The newly defined node status show on the corresponding element lines. Click OK to validate the configuration. Press F6 to execute the Job.

Talend Open Studio Components

1577

XML components
tAdvancedFileOutputXML

The output XML file shows the structure as defined.

1578

Talend Open Studio Components

XML components
tDTDValidator

tDTDValidator
tDTDValidator Properties
Component family XML

Function Purpose Basic settings

Validates the XML input file against a DTD file and sends the validation log to the defined output. Helps at controlling data and structure quality of the file to be processed Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository but in this case, the schema is read-only. It contains standard information regarding the file validation. Filepath to the reference DTD file. Filepath to the XML file to be validated. Type in a message to be displayed in the Run console based on the result of the comparison.

DTD file XML file If XML is valid, display If XML is not valid detected, display Print to console Usage Limitation

Select this check box to display the validation message.

This component can be used as standalone component but it is usually linked to an output component to gather the log data. n/a

Scenario: Validating XML files


This Java scenario describes a Job that validates the specified type of files from a folder, displays the validation result on the Run tab console, and outputs the log information for the invalid files into a delimited file.

Drop the following components from the Palette to the design workspace: tFileList, tDTDValidator, tMap, tFileOutputDelimited. Connect the tFileList to the tDTDValidator with an Iterate link and the remaining component using a main row. Set the tFileList component properties, to fetch an XML file from a folder.

Talend Open Studio Components

1579

XML components
tDTDValidator

Click the plus button to add a filemask line and enter the filemask: *.xml. Mind the quotes depending on the Java or Perl version you are using. Set the path of the XML files to be verified. Select No from the Case Sensitive drop-down list. In the tDTDValidate Component view, the schema is read-only as it contains standard log information related to the validation process.

In the Dtd file field, browse to the DTD file to be used as reference. Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the current filepath global variable: tFileList.CURRENT_FILEPATH. In the various messages to display in the Run tab console, use the jobName variable to recall the job name tag. Recall the filename using the relevant global variable: ((String)globalMap.get("tFileList_1_CURRENT_FILE" )). Mind the Java or Perl operators such as the plus or dot symbol to build your message. Select the Print to Console check box. In the tMap component, drag and drop the information data from the standard schema that you want to pass on to the output file.

1580

Talend Open Studio Components

XML components
tDTDValidator

Once the Output schema is defined as required, add a filter condition to only select the log information data when the XML file is invalid. Follow the best practice by typing first the wanted value for the variable, then the operator based on the type of data filtered then the variable that should meet the requirement. In this case: 0 == row1.validate. Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row > Main connection. Name it as relevant, in this example: log_errorsOnly. In the tFileOutputDelimited Basic settings, Define the destination filepath, the field delimiters and the encoding. Save your Job and press F6 to run it.

On the Run console the messages defined display for each of the files. At the same time the output file is filled with the log data for invalid files.

Talend Open Studio Components

1581

XML components
tEDIFACTtoXML

tEDIFACTtoXML
tEDIFACTtoXML Properties
Component family XML / EDIFACT

Function

This component reads a United Nations/Electronic Data Interchange For Administration, Commerce and Transport (UN/EDIFACT) message and transforms it into the XML format according to the EDIFACT version and the EDIFACT family. This component is used to transform an EDIFACT message file into the XML format for better readability to users and compatibility with processing tools. Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema of this component is fixed and read-only, with only one column: document. Filepath to the EDIFACT message file to be transformed. Select the EDIFACT version of the input file. Select this check box to skip carriage returns in the input file. Select this check box to stop Job execution when an error is encountered. By default, this check box is cleared, and therefore illegal rows are skipped and the process is completed for the error free rows. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Purpose

Basic settings

EDI filename EDI version Ignore new line Die on error

Advanced settings

tStatCatcher Statistics

Usage Limitation

This component is usually linked to an output component to gather the transformation result. n/a

Scenario: From EDIFACT to XML


This scenario describes a simple Job that reads a UN/EDIFACT Customs Cargo (CUSCAR) message file and saves it as an XML file.

1582

Talend Open Studio Components

XML components
tEDIFACTtoXML

Drop the tEDIFACTtoXML component and the tFileOutputXML component from the Palette to the design workspace. Connect the tEDIFACTtoXML component and the tFileOutputXML component using a Row > Main connection. Double-click the tEDIFACTtoXML component to show its Basic settings view.

Fill the EDI filename field with the full path to the input EDIFACT message file. In this use case, the input file is 99a_cuscar.edi. From EDI version list, select the EDIFACT version of the input file, D99A in this use case. Select the Ignore new line check box to skip the carriage return characters in the input file during the transformation. Leave the other parameters as they are. Double-click the tFileOutputXML component to show its Basic settings view.

Fill the File Name field with the full path to the output XML file you want to generate. In this use case, the output XML is 99a_cuscar.xml. Leave the other parameters as they are. Save your Job and press F6 to run it. The input EDIFACT CUSCAR message file is transformed into the XML format and the output XML file is generated as defined.

Talend Open Studio Components

1583

XML components
tEDIFACTtoXML

1584

Talend Open Studio Components

XML components
tExtractXMLField

tExtractXMLField
tExtractXMLField properties
Component family XML

Function Purpose

tExtractXMLField reads an input XML field of a file or a database table and extracts desired data. tExtractXMLField opens an input XML field, reads the XML structured data directly without having first to write it out to a temporary file, and finally sends data as defined in the schema to the following component via a Row link. Property type Either Built-in or Repository. Built-in: No property data is stored centrally. Repository: Properties are stored in a repository file. When this file is selected, the fields that follow are pre-filled in using fetched data. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide Repository: You already created the schema and stored it in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. XML field Name of the XML field to be processed. Related topic:How to define variables from the Component view of Talend Open Studio User Guide Node of the XML tree, which the loop is based on. Column: reflects the schema as defined by the Schema type field. XPath Query: Enter the fields to be extracted from the structured input. Get nodes: Select this check box to recuperate the XML content of all current nodes specified in the Xpath query list or select the check box next to specific XML nodes to recuperate only the content of the selected nodes. Maximum number of rows to be processed. If Limit is 0, no rows are read or processed.

Basic settings

Loop XPath query Mapping

Limit

Talend Open Studio Components

1585

XML components
tExtractXMLField

Die on error

This check box is selected by default. Clear the check box to skip the row on error and complete the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. This component is an intermediate component. It needs an input and an output components. n/a

Usage Limitation

Scenario 1: Extracting XML data from a field in a database table


This three-component Java scenario allows to read the XML structure included in the fields of a database table and then extracts the data. Drop the following components from the Palette onto the design workspace: tMysqlInput, tExtractXMLField, and tFileOutputDelimited.

Connect the three components using Main links. Double-click tMysqlInput to display its Basic settings view and define its properties.

If you have already stored the input schema in the Repository tree view, select Repository first from the Property Type list and then from the Schema list to display the [Repository Content] dialog box where you can select the relevant metadata. For more information about storing schema metadata in the Repository tree view, see Setting up a DB connection and How to drop components from the Metadata node in Talend Open Studio User Guide.
1586 Talend Open Studio Components

XML components
tExtractXMLField

If you have not stored the input schema locally, select Built-in in the Property Type and Schema fields and enter the database connection and the data structure information manually. For more information about tMysqlInput properties, see tMysqlInput on page 598. In the Table Name field, enter the name of the table holding the XML data, customerdetails in this example. Click Guess Query to display the query corresponding to your schema. Double-click tExtractXMLField to display its Basic settings view and define its properties.

In the Property type list, select Repository if you have already stored the description of your file in the Repository tree view. The fields that follow are filled in automatically with the stored data. If not, select Built-in and fill in the fields that follow manually. Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button next to Edit schema to view/modify the schema. Column in the Mapping table will be automatically populated with the defined schema. In the Xml field list, select the column from which you want to extract the XML data. In this example, the filed holding the XML data is called CustomerDetails. In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data. In the Xpath query column, enter between inverted commas the node of the XML field holding the data you want to extract, CustomerName in this example. Double-click tFileOutputDelimited to display its Basic settings view and define its properties.

Talend Open Studio Components

1587

XML components
tExtractXMLField

In the File Name field, define or browse to the path of the output file you want to write the extracted data in. Click Sync columns to retrieve the schema from the preceding component. If needed, click the three-dot button next to Edit schema to view the schema. Save your Job and click F6 to execute it.

tExtractXMLField read and extracted the clients names under the node CustomerName of the CustomerDetails field of the defined database table.

Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file
This Java scenario describes a four-component Job that reads an XML structure from a delimited file, outputs the main data and rejects the erroneous data.

Drop the following components from the Palette to the design workspace: tFileInputDelimited, tExtractXMLField, tFileOutputDelimited and tLogRow.

1588

Talend Open Studio Components

XML components
tExtractXMLField

Connect the first three components using Row Main links. Connect tExtractXMLField to tLogRow using a Row Reject link. Double-click tFileInputDelimited to open its Basic settings view and define the component properties.

Select Built-in in the Schema list and fill in the file metadata manually in the corresponding fields. Click the three-dot button next to Edit schema to display a dialog box where you can define the structure of your data. Click the plus button to add as many columns as needed to your data structure. In this example, we have one column in the schema: xmlStr. Click OK to validate your changes and close the dialog box.
If you have already stored the schema in the Metadata folder under File delimited, select Repository from the Schema list and click the three-dot button next to the field to display the [Repository Content] dialog box where you can select the relevant schema from the list. Click Ok to close the dialog box and have the fields automatically filled in with the schema metadata. For more information about storing schema metadata in the Repository tree view, see Setting up a File Delimited schema and How to drop components from the Metadata node in Talend Open Studio User Guide.

In the Property type list, select: -Repository if you have already stored the metadata of your input file in the Repository, the fields that follow are automatically filled in with the stored information, or -select Built-in and fill in the fields that follow manually. For this example, we use the Built-in mode. In the File Name field, click the three-dot button and browse to the input delimited file you want to process, CustomerDetails_Error in this example. This delimited file holds a number of simple XML lines separated by double carriage return.
Talend Open Studio Components 1589

XML components
tExtractXMLField

Set the row and field separators used in the input file in the corresponding fields, double carriage return for the first and nothing for the second in this example. If needed, set Header, Footer and Limit. None is used in this example. In the design workspace, double-click tExtractXMLField to display its Basic settings view and define the component properties.

In the Property type list, select: -Repository if you have already stored the metadata of your file in the Repository, the fields that follow are automatically filled in with the stored information, or -select Built-in and fill in the fields that follow manually. For this example, we use the Built-in mode. Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button next to Edit schema to view/modify the schema. Column in the Mapping table will be automatically populated with the defined schema. In the Xml field list, select the column from which you want to extract the XML data. In this example, the filed holding the XML data is called xmlStr. In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and display the component properties.

Set Property Type to Built-in.

1590

Talend Open Studio Components

XML components
tExtractXMLField

In the File Name field, define or browse to the output file you want to write the correct data in, CustomerNames_right.csv in this example. Click Sync columns to retrieve the schema of the preceding component. You can click the three-dot button next to Edit schema to view/modify the schema. In the design workspace, double-click tLogRow to display its Basic settings view and define the component properties. Click Sync Columns to retrieve the schema of the preceding component. For more information on this component, see tLogRow on page 1305. Save your Job and press F6 to execute it.

tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client information for which the XML structure is correct, and displays as well erroneous data on the console of the Run view.

Talend Open Studio Components

1591

XML components
tFileInputXML

tFileInputXML
tFileInputXML Properties
Component family XML or File/Input

Function Purpose

tFileInputXML reads an XML structured file and extracts data row by row. Opens an XML structured file and reads it row by row to split them up into fields then sends fields as defined in the Schema to the next component, via a Row link. Property type Either Built-in or Repository. Built-in: No property data stored centrally. Repository: Select the Repository file where Properties are stored. The following fields are pre-filled in using fetched data. Schema type and Edit Schema A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. File Name/Input Stream File name: Name of the file to be processed and the access path. Stream: The data flow to be processed. If the data has been used in a flow in the past, then it can be fetched via the variable INPUT_STREAM. Press Ctrl + Space to select the data flow from the auto-completion list. Related topic:How to define variables from the Component view of Talend Open Studio User Guide. Node of the tree, which the loop is based on.

Basic settings

Loop XPath query

1592

Talend Open Studio Components

XML components
tFileInputXML

Mapping

Column: Columns to map. They reflect the schema as defined in the Schema type field. XPath Query: Enter the fields to be extracted from the structured input. Get nodes: Select this check box to recuperate the XML content of all current nodes specified in the Xpath query list, or select the check box next to specific XML nodes to recuperate only the content of the selected nodes. The Get Nodes option functions in the DOM4j and SAX modes, although in SAX mode namespaces are not supported. For further information concerning the DOM4j and SAX modes, please see the properties noted in the Generation mode list of the Advanced Settings tab.. Maximum number of rows to be processed. If Limit = 0, no row is read nor processed. If -1, all rows are read or processed. This check box is selected by default and stops the job in the event of error. Clear the check box to skip erroneous rows. The process will still be completed for error-free rows. If needed, you can retrieve the erroneous rows using a Row > Reject link. Select this check box to change data separator for numbers: Thousands separator: define the separators to use for thousands. Decimal separator: define the separators to use for decimals. Select this check box to ignore name spaces. Generate a temporary file: click the three-dot button to browse to the XML temporary file and set its path in the field. Select this check box if you want to separate concatenated children node values. This field can only be used if the selected Generation mode is Xerces. The following field displays: Field separator: Define the delimiter to be used to separate the children node values.

Limit

Die on error

Advanced settings

Advanced separator (for number)

Ignore the namespaces

Use Separator for mode Xerces

Encoding Type

Select the encoding type from the list or select Custom and define it manually. This field is compulsory for DB data handling. From the drop-down list select the generation mode for the XML file, according to the memory available and the desired speed:. -Fast and memory-consuming (Dom4j) -Memory-consuming (Xerces). -Less memory consumed (SAX)

Generation mode

Talend Open Studio Components

1593

XML components
tFileInputXML

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage tFileInputXML is for use as an entry componant. It allows you to create a flow of XML data using a Row > Main link. You can also create a rejection flow using a Row > Reject link to filter the data which doesnt correspond to the type defined. For an example of how to use these two links, see Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file on page 1588. n/a

Limitation

Scenario 1: Reading and extracting data from an XML structure


This Java scenario describes a basic Job that reads a defined XML directory and extracts specific information and outputs it on the Run console via a tLogRow component.

Drop tFileInputXML and tLogRow from the Palette to the design workspace. Connect both components together using a Main Row link. Double-click tFileInputXML to open its Basic settings view and define the component properties.

1594

Talend Open Studio Components

XML components
tFileInputXML

As the street dir file used as input file has been previously defined in the Metadata area, select Repository as Property type. This way, the properties are automatically leveraged and the rest of the properties fields are filled in (apart from Schema). For more information regarding the metadata creation wizardsHow to centralize the Metadata items in Talend Open Studio User Guide. Select the same way the relevant schema in the Repository metadata list. Edit schema if you want to make any change to the schema loaded. The Filename shows the structured file to be used as input In Loop XPath query, change if needed the node of the structure where the loop is based. On the Mapping table, fill the fields to be extracted and displayed in the output. If the file size is consequent, fill in a Limit of rows to be read. Enter the encoding if needed then double-click on tLogRow to define the separator character. Save your Job and press F6 to execute it.

The fields defined in the input properties are extracted from the XML structure and displayed on the console.

Scenario 2: Extracting erroneous XML data via a reject flow


This Java scenario describes a three-component Job that reads an XML file and: first, returns correct XML data in an output XML file, and second, displays on the console erroneous XML data which type does not correspond to the defined one in the schema.

Drop the following components from the Palette to the design workspace: tFileInputXML, tFileOutputXML and tLogRow.
Talend Open Studio Components 1595

XML components
tFileInputXML

Right-click tFileInputXML and select Row > Main in the contextual menu and then click tFileOutputXML to connect the components together. Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow to connect the components together using a reject link. Double-click tFileInputXML to display the Basic settings view and define the component properties.

In the Property Type list, select Repository and click the three-dot button next to the field to display the [Repository Content] dialog box where you can select the metadata relative to the input file if you have already stored it in the File xml node under the Metadata folder of the Repository tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-in and fill in the fields that follow manually. For more information about storing schema metadat in the Repository tree view, see and How to drop components from the Metadata node in Talend Open Studio User Guide. In the Schema Type list, select Repository and click the three-dot button to open the dialog box where you can select the schema that describe the structure of the input file if you have already stored it in the Repository tree view. If not, select Built-in and click the three-dot button next to Edit schema to open a dialog box where you can define the schema manually.

1596

Talend Open Studio Components

XML components
tFileInputXML

The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState and id2. Click the three-dot button next to the Filename field and browse to the XML file you want to process. In the Loop XPath query, enter between inverted commas the path of the XML node on which to loop in order to retrieve data. In the Mapping table, Column is automatically populated with the defined schema. In the XPath query column, enter between inverted commas the node of the XML file that holds the data you want to extract from the corresponding column. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example. Double-click tFileOutputXML to display its Basic settings view and define the component properties.

Click the three-dot button next to the File Name field and browse to the output XML file you want to collect data in, customer_data.xml in this example. In the Row tag field, enter between inverted commas the name you want to give to the tag that will hold the recuperated data. Click Edit schema to display the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema from the preceding component.

Talend Open Studio Components

1597

XML components
tFileInputXML

Double-click tLogRow to display its Basic settings view and define the component properties. Click Edit schema to open the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema of the preceding component. In the Mode area, select the Vertical option. Save your Job and press F6 to execute it.

The output file customer_data.xml holding the correct XML data is created in the defined path and erroneous XML data is displayed on the console of the Run view.

1598

Talend Open Studio Components

XML components
tFileOutputXML

tFileOutputXML
tFileOutputXML properties
Component family XML or File/Output

Function Purpose Basic settings

tFileOutputXML outputs data to an XML type of file. tFileOutputXML writes an XML file with separated data value according to a defined schema. File name Name or path to the output file and/or the variable to be used. Related topic: How to define variables from the Component view of Talend Open Studio User Guide Root tag Row tag Column name as tag name Split output in files Schema type and Edit Schema Wraps the whole output file structure and data. Wraps data and structure per row Select this check box to leverage the column labels from the input schema, as data wrapping tag. If the XML file output is big, you can split the file every certain number of rows. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: The schema will be created and stored locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: The schema already exists and is stored in the Repository, hence can be reused in various projects and job designs. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the Output component. Select the encoding from the list or select Custom and define it manually. This field is compulsory for DB data handling.

Encoding

Usage Limitation

Use this component to write an XML file with data passed on from other components using a Row link. n/a

Talend Open Studio Components

1599

XML components
tFileOutputXML

Scenario: From Positional to XML file


Find a scenario using tFileOutputXML component in the scenario of tFileInputPositional on page 1094.

1600

Talend Open Studio Components

XML components
tWriteXMLField

tWriteXMLField
tWriteXMLField properties
Component family XML

Function Purpose Basic settings

tWriteXMLField outputs data to defined fields of the output XML file. tWriteXMLField reads an input XML file and extracts the structure to insert it in defined fields of the output file. Output Column Configure Xml Tree Select the destination field in the output component where you want to write the XML structure. Opens the interface that supports the creation of the XML structure you want to write in a field. For more information about the interface, see Defining the XML tree on page 1569. A schema is a row description, i.e., it defines the number of fields that will be processed and passed on to the next component. The schema is either built-in or remote in the Repository. Built-in: You create the schema and store it locally for this component only. Related topic: How to set a built-in schema of Talend Open Studio User Guide. Repository: You already created the schema and stored it in the Repository, hence can be reused in various projects and job flowcharts. Related topic: How to set a repository schema of Talend Open Studio User Guide. Sync columns Click to synchronize the output file schema with the input file schema. The Sync function only displays once the Row connection is linked with the output component. Define the aggregation set, the columns you want to use to regroup the data. Select this check box if you do not want to include the XML header. This check box is selected by default. If the Related Column in the interface that supports the creation of the XML structure has null values, or if no column is associated with the XML node, this option creates an open/close tag in the expected place. If one of the XML elements is defined as a Namespace element, this option will create the corresponding XSD file. To use this option, you must select the Dom4J generation mode.

Schema type and Edit Schema

Group by Advanced settings Remove the xml declaration Create empty element if needed

Create associated XSD file

Talend Open Studio Components

1601

XML components
tWriteXMLField

Advanced separator (for number)

Select this check box if you want to modify the separators used by default for numbers. Thousands separator: enter between brackets the separators to use for thousands. Decimal separator: enter between brackets the separators to use for decimals. Select in the list the faster or slower generation mode according to the available memory in your system: Fast but memory-consuming - Dom4J, Slow with no memory consumed. Select the encoding type in the list or select Custom and define it manually. This field is compulsory when working with databases.

Generation mode

Encoding Type

tStatCatcher Statistics Select this check box to gather the job processing metadata at a job level as well as at each component level. Usage Limitation This component can be used as intermediate step in a data flow. n/a

Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table
This three-component Java scenario allows to read an XML file, extract the XML structure, and finally outputs the structure to the fields of a database table. Drop the following components from the Palette onto the design workspace: tFileInputXml, tWriteXMLField, and tMysqlOutput.

Connect the three components using Main links. Double-click tFileInputXml to open its Basic settings view and define its properties.

1602

Talend Open Studio Components

XML components
tWriteXMLField

If you have already stored the input schema in the Repository tree view, select Repository first from the Property Type list and then from the Schema list to display the [Repository Content] dialog box where you can select the relevant metadata. For more information about storing schema metadata in the Repository tree view, see Setting up an XML file schema and How to drop components from the Metadata node in Talend Open Studio User Guide. If you have not stored the input schema locally, select Built-in in the Property Type and Schema fields and fill in the fields that follow manually. For more information about tFileInputXML properties, see tFileInputXML on page 1592. If you have selected Built-in, click the three-dot button next to the Edit schema field to open a dialog box where you can manually define the structure of your file. In the Look Xpath query field, enter the node of the structure where the loop is based. In this example, the loop is based on the customer node. Column in the Mapping table will be automatically populated with the defined file content. In the Xpath query column, enter between inverted commas the node of the XML file that holds the data corresponding to each of the Column fields. In the design workspace, click tWriteXMLField and then in the Component view, click Basic settings to open the relevant view where you can define the component properties.

Click the three-dot button next to the Edit schema field to open a dialog box where you can add a line by clicking the plus button.

Talend Open Studio Components

1603

XML components
tWriteXMLField

Click in the line and enter the name of the output column where you want to write the XML content, CustomerDetails in this example. Define the type and length in the corresponding fields, String and 255in this example. Click Ok to validate your output schema and close the dialog box. In the Basic settings view and from the Output Column list, select the column you already defined where you want to write the XML content. Click the three-dot button next to Configure Xml Tree to open the interface that helps to create the XML structure.

In the Link Target area, click rootTag and rename it as CustomerDetails. In the Linker source area, drop CustomerName and CustomerAddress to CustomerDetails. A dialog box displays asking what type of operation you want to do. Select Create as sub-element of target node to create a sub-element of the CustomerDetails node. Right-click CustomerName and select from the contextual menu Set As Loop Element. Click OK to validate the XML structure you defined. Double-click tMysqlOutput to open its Basic settings view and define its properties.

1604

Talend Open Studio Components

XML components
tWriteXMLField

If you have already stored the schema in the DB Connection node in the Repository tree view, select Repository from the Schema list to display the [Repository Content] dialog box where you can select the relevant metadata. For more information about storing schema metadata in the Repository tree view, see Setting up a DB connection and How to drop components from the Metadata node in Talend Open Studio User Guide. If you have not stored the schema locally, select Built-in in the Property Type and Schema fields and enter the database connection and data structure information manually. For more information about tMysqlOutput properties, see tMysqlOutput on page 609. In the Table field, enter the name of the database table to be created, where you want to write the extracted XML data. From the Action on table list, select Create table to create the defined table. From the Action on data list, select Insert to write the data. Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button next to Edit schema to view the schema. Save your Job and click F6 to execute it.

tWriteXMLField fills every field of the CustomerDetails column with the XML structure of the input file: the XML processing instruction <?xml version=""1.0"" encoding=""ISO-8859-15""?>, the first node that separates each client

Talend Open Studio Components

1605

XML components
tWriteXMLField

<CustomerDetails> and finally customer information <CustomerAddress> and <CustomerName>.

1606

Talend Open Studio Components

XML components
tXSDValidator

tXSDValidator
tXSDValidator Properties
Component family XML

Function Purpose Basic settings

Validates an input XML file or an input XML flow against an XSD file and sends the validation log to the defined output. Helps at controlling data and structure quality of the file or flow to be processed Mode From this dropdown list, select: - File, to validate an input file - Flow, to validate an input flow A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the next component. The schema is either built-in or remotely stored in the Repository but in this case, the schema is read-only. It contains standard information regarding the file validation. Filepath to the reference XSD file. Filepath to the XML file to be validated. Type in a message to be displayed in the Run console based on the result of the comparison.

Schema and Edit Schema

XSD file File mode only XML file File mode only File mode only If XML is valid, display If XML is invalid detected, display Print to console File mode only Allocate Flow mode only Advanced settings tStatCatcher Statistics

Select this check box to display the validation message. Specify the column or columns to be validated and the path to the reference XSD file. Select this check box to gather the job processing metadata at a job level as well as at each component level.

Usage

When used in File mode, this component can be used as standalone component but it is usually linked to an output component to gather the log data. n/a

Limitation

Scenario: Validating data flows against an XSD file


This scenario describes a Java Job that validates an XML column in an input file against a reference XSD file and outputs the log information for the invalid rows of the column into a delimited file. For the tXSDValidator use case that validates an XML file, see Scenario: Validating XML files on page 1579.
Talend Open Studio Components 1607

XML components
tXSDValidator

Drop a tFileInputDelimited component, a tXSDValidator component, and two FileOutputDelimited components from the Palette to the design workspace.

Double-click the tFileInputDelimited to open its Component view and set its properties:

Use the Built-In property type for this scenario. Browse to the input file, and define the number of rows to be skipped in the beginning of the file. Use a Built-In schema for this scenario. This means that it is available for this Job only. Click Edit schema and edit the schema according to the input file. In this scenario, the input file has only two columns: ID and ShipmentInfo. The ShipmentInfo column is an XML column and needs to be validated.

1608

Talend Open Studio Components

XML components
tXSDValidator

On your design workspace, connect the tFileInputDelimited component to the tXSDValidator component using a Row > Main link. Double-click the tXSDValidator component, and set its properties:

From the Mode dropdown list, select Flow Mode. Use a Built-In schema for this scenario. Click Sync columns to retrieve the schema from the preceding component. To view or modify the schema, click the three-dot button next to Edit schema. Add a line in the Allocate table by clicking the plus button. The name of the first column of the input file automatically appears in the Input Column field. Click in the field and select the column you want to validate. In the XSD File field, fill in the path to your reference XSD file. On your design workspace, connect the tXSDValidator component to one tFileOutputDelimited component using a Row > Main link to output the information about valid XML rows. Connect the tXSDValidator component to the other tFileOutputDelimited component using a Row > Rejects link to output the information about invalid XML rows. Double-click each of the two tFileOutputDelimited components and configure the component properties. In the Property Type field, select Built-In. In the File Name field, enter or, if you want to use an existing output file, browse to the output file path. Select Built-In from the Schema list and click Sync columns to retrieve the schema from the preceding component.

Talend Open Studio Components

1609

XML components
tXSDValidator

Save your Job and press F6 to run it.

The output files contain the validation information about the valid and invalid XML rows of the specified column respectively.

1610

Talend Open Studio Components

XML components
tXSLT

tXSLT

tXSLT Properties
Component family XML

Function Purpose Basic settings

Refers to an XSL stylesheet, to transform an XML source file into a defined output file. Helps to transform data structure to another structure. XML file XSL file Output file File path to the XML file to be validated. File path to the reference XSL transformation file. File path to the output file. If the file does not exist, it will be created. The output file can be any structured or unstructured file such as html, xml, txt or even pdf or edifact depending on your xsl. Click the plus button to add new lines in the Parameters list and define the transformation parameters of the XSLT file. Click in each line and enter the key in the name list and its associated value in the value list.

Parameters

Usage Limitation

This component can be used as standalone component. n/a

Scenario: Transforming XML to html using an XSL stylesheet


This Java scenario describes a two-component Job that converts xml data into an html document using an xsl stylesheet. It as well defines a transformation parameter of the xsl stylesheet to change the background color of the header of the created html document. Drop the tXSLT and tMsBox components from the Palette to the design workspace.

Double-click tXSLT to open its Basic settings view where you can define the component properties.

Talend Open Studio Components

1611

XML components
tXSLT

In the XML file field, set the path or browse to the xml file to be transformed. In this example, the xml file holds a list of MP3 song titles and related information including artist names, company etc.

In the XSL file field in the Basic settings view, set the path or browse to the relevant xsl file. In the Output file field, set the path or browse to the output html file. In this example, we want to convert the xml data into an html file holding a table heading followed by a table listing artists names next to song titles.

1612

Talend Open Studio Components

XML components
tXSLT

In the Parameters area of the Basic settings view, click the plus button to add a line where you can define the name and value of the transformation parameter of the xsl file. In this example, the name of the transformation parameter we want to use is bgcolor and the value is green. Double-click the tMsgBox to display its Basic settings view and define its display properties as needed.

Save the Job and press F6 to execute it. The message box displays confirming that the output html file is created and stored in the defined path.

Talend Open Studio Components

1613

XML components
tXSLT

Click OK to close the message box. You can now open the output html file to check the transformation of the xml data and that of the background color of the table heading.

1614

Talend Open Studio Components

A Alias ........................................................... 906 B Business tAlfrescoOutput ....................................... 2 tBonitaDeploy ........................................ 13 tBonitaInstantiateProcess ....................... 15 tCentricCRMInput ................................. 21 tCentricCRMOutput ............................... 22 tHL7Input ............................................... 23 tHL7Output ............................................ 27 tMarketoInput ........................................ 28 tMarketoOutput ...................................... 31 tMicrosoftCRMInput ............................. 38 tMicrosoftCRMOutput ........................... 46 tMSAXInput .......................................... 48 tMSAXOutput ........................................ 49 tOpenbravoERPInput ............................. 56 tOpenBravoERPOutput ......................... 58 tSageX3Input ......................................... 59 tSageX3Output ....................................... 64 tSalesforceBulkExec .............................. 69 tSalesforceConnection ........................... 71 tSalesforceGetDeleted ........................... 72 tSalesforceGetServerTimestamp ........... 76 tSalesforceGetUpdated .......................... 78 tSalesforceInput ..................................... 80 tSalesforceOutput ................................... 86 tSalesforceOutputBulk ........................... 90 tSalesforceOutputBulkExec ................... 95 tSAPCommit ........................................ 100 tSAPConnection ................................... 101 tSAPInput ............................................. 102 tSAPOutput .......................................... 116 tSAPRollback ....................................... 118 tSugarCRMInput .................................. 119 tSugarCRMOutput ............................... 122 tVtigerCRMInput ................................. 123 tVtigerCRMOutput .............................. 125 Business Intelligence tGreenplumSCD ................................... 140 tInformixSCD ...................................... 142 tIngresSCD ........................................... 144 tMondrianInput .................................... 153

tMSSqlSCD ..........................................157 tMysqlSCD ...........................................159 tOracleSCD ...........................................174 tPaloCheckElements .............................179 tPaloConnection ...................................181 tPaloCube .............................................182 tPaloCubeList .......................................186 tPaloDatabase .......................................190 tPaloDatabaseList .................................193 tPaloDimension ....................................196 tPaloDimensionList ..............................205 tPaloInputMulti .....................................209 tPaloOutput ...........................................215 tPaloOutputMulti ..................................217 tPaloRule ..............................................226 tPaloRuleList ........................................230 tParAccelSCD .......................................234 tPostgresPlusSCD .................................236 tPostgresqlSCD .....................................241 tSPSSInput ............................................246 tSPSSOutput .........................................249 tSPSSProperties ....................................252 tSPSSStructure .....................................253 tSybaseSCD ..........................................254 BusinessIntelligence tBarChart ..............................................128 tDB2SCD ..............................................135 tDB2SCDELT ......................................137 tLineChart .............................................146 tOracleSCDELT ...................................176 tPostgresPlusSCDELT .........................238 tPostgresqlSCDELT .............................243 tSybaseSCDELT ...................................256 C Component, Composant ...........................1513 Context .....................................................1496 Custom Code tJava ......................................................263 tJavaFlex ...............................................267 tSetGlobalVar .......................................277 CustomCode tGroovy .................................................260 tGroovyFile ...........................................261 tJavaRow ..............................................273

Talend Open Studio Components

tLibraryLoad ........................................ 274 D Data Quality tAddCRCRow ...................................... 280 tChangeFileEncoding ........................... 283 tFileList ................................................ 284 tFuzzyMatch ........................................ 285 tIntervalMatch ...................................... 290 tParseAddress ....................................... 295 tParseName .......................................... 297 tReplaceList ......................................... 299 tSchemaComplianceCheck .................. 304 tUniqRow ............................................. 310 Data quality tFuzzyMatch ...................................... 1430 Database PostgresPlusOutput .............................. 744 tAccessBulkExec ................................. 316 tAccessCommit .................................... 318 tAccessConnection ............................... 319 tAccessInput ......................................... 323 tAccessOutput ...................................... 325 tAccessOutputBulk .............................. 329 tAccessOutputBulkExec ...................... 331 tAccessRollback ................................... 334 tAccessRow ......................................... 335 tAS400Close ........................................ 338 tAS400Commit .................................... 339 tAS400Connection ............................... 340 tAS400Input ......................................... 342 tAS400LastInsertId .............................. 344 tAS400Output ...................................... 345 tAS400Rollback ................................... 349 tAS400Row .......................................... 350 tCreateTable ......................................... 353 tDB2BulkExec ..................................... 358 tDB2CDC ............................................. 361 tDB2Close ............................................ 362 tDB2Commit ........................................ 363 tDB2Connection .................................. 364 tDB2Input ............................................ 366 tDB2Output .......................................... 368 tDB2Rollback ...................................... 372 tDB2Row ............................................. 373

tDB2SCD ..............................................376 tDB2SCDELT ......................................377 tDB2SP .................................................378 tDBInput ...............................................380 tDBOutput ............................................384 tDBSQLRow ........................................389 tELTJDBCInput ...................................892 tELTJDBCMap .....................................894 tELTJDBCOutput .................................896 tELTMSSqlInput ..................................898 tELTMSSqlMap ...................................900 tELTMSSqlOutput ...............................902 tELTMysqlInput ...................................904 tELTMysqlMap ....................................905 tELTMysqlOutput ................................916 tELTOracleInput ...................................918 tELTOracleMap ....................................920 tELTOracleOutput ................................925 tELTPostgresqlInput .............................931 tELTPostgresqlMap ..............................933 tELTPostgresqlOutput ..........................935 tELTTeradataInput ...............................943 tELTTeradataMap ................................944 tELTTeradataOutput .............................946 tEXAInput ............................................392 tEXAOutput ..........................................394 tEXARow .............................................397 tEXistConnection .................................399 tFirebirdClose .......................................414 tFirebirdCommit ...................................415 tFirebirdConnection ..............................416 tFirebirdInput ........................................418 tFirebirdOutput .....................................420 tFirebirdRollback ..................................423 tFirebirdRow .........................................424 tFixedFlowInput .................................1334 tGreenplumBulkExec ...........................427 tGreenplumClose ..................................430 tGreenplumCommit ..............................431 tGreenplumConnection .........................432 tGreenplumInput ...................................434 tGreenplumOutput ................................436 tGreenplumOutputBulk ........................439 tGreenplumOutputBulkExec ................441 tGreenplumRollback .............................443

ii

Talend Open Studio Components

tGreenplumRow ................................... 444 tGreenplumSCD ................................... 447 tHiveClose ........................................... 448 tHiveConnection .................................. 449 tHiveRow ............................................. 450 tHSQLDbInput ..................................... 452 tHSQLDbOutput .................................. 455 tHSQLDbRow ..................................... 459 tInformixBulkExec .............................. 462 tInformixClose ..................................... 465 tInformixCommit ................................. 466 tInformixConnection ............................ 467 tInformixInput ...................................... 469 tInformixOutput ................................... 471 tInformixOutputBulk ........................... 474 tInformixOutputBulkExec ................... 476 tInformixRollback ................................ 479 tInformixRow ....................................... 480 tInformixSCD ...................................... 483 tInformixSP .......................................... 484 tIngresClose ......................................... 487 tIngresCommit ..................................... 488 tIngresConnection ................................ 489 tIngresInput .......................................... 490 tIngresOutput ....................................... 492 tIngresRollback .................................... 495 tIngresRow ........................................... 496 tIngresSCD ........................................... 498 tInterbaseClose ..................................... 499 tInterbaseCommit ................................. 500 tInterbaseConnection ........................... 501 tInterbaseInput ..................................... 503 tInterbaseOutput ................................... 505 tInterbaseRollback ............................... 508 tInterbaseRow ...................................... 509 tJavaDBInput ....................................... 512 tJavaDBOutput ..................................... 514 tJavaDBRow ........................................ 517 tJDBCClose ......................................... 520 tJDBCColumnList ............................... 519 tJDBCCommit ..................................... 521 tJDBCConnection ................................ 522 tJDBCInput .......................................... 524 tJDBCOutput ....................................... 526 tJDBCRollback .................................... 529

tJDBCRow ............................................530 tJDBCSP ...............................................533 tJDBCTableList ....................................535 tLDAPAttributesInput ..........................536 tLDAPInput ..........................................539 tLDAPOutput .......................................543 tLDAPRenameEntry .............................547 tMaxDBInput ........................................549 tMaxDBOutput .....................................551 tMaxDBRow .........................................554 tMSSqlBulkExec ..................................556 tMSSqlCDC ..........................................559 tMSSqlClose .........................................561 tMSSqlColumnList ...............................560 tMSSqlCommit .....................................562 tMSSqlConnection ...............................563 tMSSqlInput .........................................565 tMSSqlLastInsertId ...............................567 tMSSqlOutput .......................................568 tMSSqlOutputBulk .......................572, 883 tMSSqlOutputBulkExec .......................574 tMSSqlRollback ...................................577 tMSSqlRow ..........................................578 tMSSqlSCD ..........................................581 tMSSqlSP .............................................582 tMSSqlTableList ...................................584 tMysqlBulkExec ...........................585, 658 tMysqlClose ..........................................588 tMysqlColumnList ................................589 tMysqlCommit ......................................593 tMysqlConnection ........................319, 594 tMysqlInput ..........................................598 tMysqlLastInsertId ...............................604 tMysqlOutput ........................................609 tMysqlOutputBulk ................................627 tMysqlOutputBulkExec ........................632 tMysqlRollback ....................................636 tMysqlRow ...........................................638 tMysqlSCD ...........................................647 tMysqlSCDELT ............................171, 648 tMySqlSP ..............................................649 tMysqlTableList ...................................653 tNetezzaBulkExec ................................658 tNetezzaClose .......................................660 tNetezzaCommit ...................................661

Talend Open Studio Components

iii

tNetezzaCommit_Prop ......................... 661 tNetezzaConnection ............................. 662 tNetezzaInput ....................................... 663 tNetezzaNzLoad ................................... 665 tNetezzaOutput .................................... 671 tNetezzaRollback ................................. 675 tNetezzaRow ........................................ 676 tOracleBulkExec .................................. 679 tOracleClose ......................................... 685 tOracleCommit ..................................... 686 tOracleConnection ............................... 687 tOracleInput ......................................... 689 tOracleOutput ....................................... 691 tOracleOutputBulk ............................... 695 tOracleOutputBulkExec ....................... 697 tOracleRollBack ................................... 701 tOracleRow .......................................... 702 tOracleSCD .......................................... 705 tOracleSCDELT ................................... 706 tOracleSP ............................................. 707 tOracleTableList .................................. 713 tParAccelBulkExec .............................. 714 tParAccelClose ..................................... 717 tParAccelCommit ................................. 718 tParAccelConnection ........................... 719 tParAccelInput ..................................... 721 tParAccelOutBulk ................................ 726 tParAccelOutput ................................... 723 tParAccelOutputBulkExec ................... 728 tParAccelRollBack ............................... 730 tParAccelRow ...................................... 731 tParAccelSCD ...................................... 734 tParseRecordSet ................................... 735 tPostgresOutputBulkExec .................... 750 tPostgresPlusBulkExec ........................ 736 tPostgresPlusClose ............................... 738 tPostgresPlusCommit ........................... 739 tPostgresPlusConnection ..................... 740 tPostgresPlusInput ............................... 742 tPostgresPlusOutBulk .......................... 748 tPostgresPlusRollback ......................... 752 tPostgresPlusRow ................................ 753 tPostgresPlusSCD ................................ 756 tPostgresPlusSCDELT ......................... 757 tPostgresqlBulkExec ............................ 758

tPostgresqlClose ...................................762 tPostgresqlConnection ..........................763 tPostgresqlInput ....................................764 tPostgresqlOutput .................................766 tPostgresqlOutputBulk .........................769 tPostgresqlOutputBulkExec .................771 tPostgresqlRollBack .............................774 tPostgresqlRow .....................................775 tPostgresqlSCD .....................................778 tPostgresqlSCDELT .............................779 tSASInput .............................................780 tSASOutput ...........................................782 tSQLiteClose ........................................785 tSQLiteCommit ....................................786 tSQLiteConnection ...............................787 tSQLiteInput .........................................788 tSQLiteOutput ......................................792 tSQLiteRollback ...................................795 tSQLiteRow ..........................................796 tSybaseBulkExec ..................................800 tSybaseClose .........................................803 tSybaseCommit .....................................804 tSybaseConnection ...............................805 tSybaseInput .........................................806 tSybaseIQBulkExec ..............................808 tSybaseIQOutputBulkExec ...................810 tSybaseOutput .......................................813 tSybaseOutputBulk ...............................817 tSybaseOutputBulkExec .......................819 tSybaseRollback ...................................822 tSybaseRow ..........................................823 tSybaseSCD ..........................................826 tSybaseSCDELT ...................................827 tSybaseSP .............................................828 tTeradataClose ......................................830 tTeradataCommit ..................................831 tTeradataConnection .............................832 tTeradataFastExport .............................834 tTeradataFastLoad ................................836 tTeradataFastLoadUtility ......................838 tTeradataInput .......................................840 tTeradataMultiLoad ..............................842 tTeradataOutput ....................................844 tTeradataRollback .................................848 tTeradataRow .......................................849

iv

Talend Open Studio Components

tTeradataTPump ................................... 852 tVectorWiseCommit ............................ 858 tVectorWiseConnection ....................... 859 tVectorWiseInput ................................. 861 tVectorWiseOutput .............................. 863 tVectorWiseRollback ........................... 866 tVectorWiseRow .................................. 867 tVerticaBulkExec ................................. 870 tVerticaClose ....................................... 873 tVerticaCommit ................................... 874 tVerticaConnection .............................. 875 tVerticaInput ........................................ 877 tVerticaOutput ..................................... 879 tVerticaOutputBulk .............................. 883 tVerticaOutputBulkExec ...................... 885 tVerticaRollback .................................. 887 tVerticaRow ......................................... 888 Databases tEXistDelete ......................................... 400 tEXistGet ............................................. 402 tEXistList ............................................. 406 tEXistPut .............................................. 408 tEXistXQuery ...................................... 410 tEXistXUpdate ..................................... 412 E ELT tSQLTemplateAggregate ..................... 948 tSQLTemplateCommit ......................... 955 tSQLTemplateFilterColumns ............... 957 tSQLTemplateFilterRows .................... 959 tSQLTemplateMerge ........................... 961 tSQLTemplateRollback ....................... 970 ESB tESBConsumer ..................................... 974 EST tESBProviderFault ............................... 983 tESBProviderRequest .......................... 995 tESBProviderResponse ...................... 1008 Explicit Join ............................................... 906 F File tAdvancedFileOutputXML ................ 1022 tApacheLogInput ............................... 1023

tChangeFileEncoding .........................1031 tCreateTemporaryFile .........................1026 tExtractDelimitedFields ......................1413 tExtractPositionalFields ......................1418 tExtractRegexFields ...........................1420 tFileArchive ........................................1033 tFileCompare ......................................1036 tFileCopy ............................................1039 tFileDelete ..........................................1042 tFileExist .............................................1045 tFileInputARFF ..................................1050 tFileInputDelimited ............................1054 tFileInputExcel ...................................1066 tFileInputFullRow ..............................1069 tFileInputJSON ...................................1072 tFileInputLDIF ...................................1076 tFileInputMail .....................................1078 tFileInputMSDelimited .......................1081 tFileInputMSPositional .......................1088 tFileInputMSXML ..............................1090 tFileInputPositional ............................1094 tFileInputProperties ............................1099 tFileInputRegex ..................................1103 tFileInputXML ...................................1107 tFileList ...............................283, 284, 1108 tFileOutputARFF ................................1113 tFileOutputDelimited ..........................1115 tFileOutputExcel .................................1124 tFileOutputJSON ................................1126 tFileOutputLDIF .................................1130 tFileOutputMSDelimited ....................1134 tFileOutputMSPositional ....................1136 tFileOutputMSXML ...........................1137 tFileOutputPositional ..........................1143 tFileOutputProperties .........................1145 tFileOutputXML .................................1146 tFileProperties .....................................1147 tFileRowCount ...................................1150 tFileTouch ...........................................1152 tFileUnarchive ....................................1153 tGPGDecrypt ......................................1155 tPivotToColumnsDelimited ................1159 I Internet

Talend Open Studio Components

tFileFetch ........................................... 1164 tFileInputJSON .................................. 1170 tFTPConnection ................................. 1171 tFTPDelete ......................................... 1173 tFTPFileExist ..................................... 1175 tFTPFileList ....................................... 1177 tFTPFileProperties ............................. 1182 tFTPGet .............................................. 1184 tFTPPut .............................................. 1186 tFTPRename ...................................... 1190 tFTPTruncate ..................................... 1192 tHttpRequest ...................................... 1194 tJMSInput ........................................... 1197 tJMSOutput ........................................ 1199 tMicrosoftMQInput ............................ 1201 tMicrosoftMQOutput ......................... 1205 tMomCommit ..................................... 1206 tMomInput ......................................... 1207 tMomMessageIdList .......................... 1212 tMomOutput ....................................... 1213 tMomRollback ................................... 1215 tPOP ................................................... 1216 tREST ................................................. 1220 tRSSInput ........................................... 1224 tRSSOutput ........................................ 1227 tSCPClose .......................................... 1238 tSCPConnection ................................. 1239 tSCPDelete ......................................... 1240 tSCPFileExists ................................... 1241 tSCPFileList ....................................... 1242 tSCPGet ............................................. 1243 tSCPPut .............................................. 1245 tSCPRename ...................................... 1246 tSCPTruncate ..................................... 1247 tSendMail ........................................... 1248 tSetKeyStore ...................................... 1251 tSOAP ................................................ 1264 tSocketInput ....................................... 1257 tSocketOutput .................................... 1262 tWebServiceInput .............................. 1268 tXMLRPCInput ................................. 1276 J Join Explicit ................................................. 906

Joining tables Oracle private keys ...............................922 outer joins (+) .......................................922 L Logs&Errors tAssert .................................................1280 tAssertCatcher ....................................1286 tChronometerStart ..............................1288 tChronometerStop ...............................1289 tDie .....................................................1294 tFlowMeter .........................................1295 tFlowMeterCatcher .............................1296 tLogCatcher ........................................1301 tLogRow .............................................1305 tStatCatcher ........................................1306 tWarn ..................................................1309 M Misc tAddLocationFromIP ..........................1312 tBufferInput ........................................1315 tBufferOutput .....................................1318 tContextDump ....................................1329 tContextLoad ......................................1330 tMemorizeRows .................................1336 tMsgBox .............................................1342 tRowGenerator ...................................1344 O Orchestration tFileList ...............................................1350 tFlowToIterate ....................................1351 tForeach ..............................................1355 tInfiniteLoop .......................................1358 tIterateToFlow ....................................1359 tLoop ...................................................1362 tPostjob ...............................................1365 tPrejob .................................................1366 tReplicate ............................................1367 tRunJob ...............................................1368 tSleep ..................................................1369 tUnite ..................................................1370 tWaitForFile .......................................1374 tWaitForSqlData .................................1381

Talend Open Studio Components

vi

P Processing tAggregateRow .................................. 1386 tAggregateSortedRow ........................ 1391 tConvertType ..................................... 1393 tDenormalize ...................................... 1398 tDenormalizeSortedRow .................... 1403 tEmptyToNull .................................... 1407 tExternalSortedRow ........................... 1411 tExtractXMLFields ............................ 1424 tFilterColumns ................................... 1425 tFilterRow .......................................... 1426 tJoin .................................................... 1430 tMap ................................................... 1436 tNormalize ......................................... 1465 tPerl .................................................... 1468 tPivotToRows .................................... 1471 tReplace ............................................. 1475 tSampleRow ....................................... 1480 tSortRow ............................................ 1483 tXMLMap .......................................... 1487 S System tRunJob .............................................. 1494 tSetEnv ............................................... 1501 tSSH ................................................... 1505 tSystem ............................................... 1509 T Table Alias ..................................................... 906 tAccessInput Advanced settings ................................ 324 tAccessOutput Advanced settings ................................ 327 Talend MDM tMDMBulkLoad ................................ 1514 tMDMDelete ...................................... 1523 tMDMInput ........................................ 1528 tMDMOutput ..................................... 1533 tMDMReceive ................................... 1542 tMDMRouteRecord ........................... 1544 tMDMSP ............................................ 1554 tMDMViewSearch ............................. 1561

tAlfrescoOutput Advanced settings .....................................3 tAS400Connection Advanced settings .................................340 tAS400Input Advanced settings .................343, 550, 555 tAS400Output Advanced settings .................................347 tDB2Input Advanced settings .............367, 1331, 1334 tDB2Output Advanced settings .139, 240, 245, 359, 370 tDBInput Advanced settings .................................381 tDBOutput Advanced settings .................................385 tFileInputEBCDIC ....................................1060 tFileInputExcel Advanced settings ...............................1067 tFileOutputEBCDIC .................................1121 tFirebirdInput Advanced settings .................393, 398, 435 tFirebirdOutput Advanced settings .........................395, 421 tHSQLDbInput Advanced settings .................................453 tHSQLDBOutput Advanced settings .................438, 457, 552 tHSQLDbOutput Advanced settings .........................438, 552 tInformixOutput Advanced settings .........................464, 472 tIngresInput Advanced settings .................................490 tIngresOutput Advanced settings .................................493 tInterbaseOutput Advanced settings .................................506 tJavaDBInput Advanced settings .................................512 tJavaDBOutput Advanced settings .........................515, 528 tJDBCInput Advanced settings .................................525 tJDBCOutput

vii

Talend Open Studio Components

Advanced settings ................................ 528 tMSSqlInput Advanced settings ................................ 566 tMSSqlOutput Advanced settings ................................ 570 tMSSqlSCD Advanced settings ................................ 158 tMysqlInput Advanced settings ................................ 599 tMysqlOutput Advanced settings ........ 464, 472, 611, 673 tNetezzaNzLoad Advanced settings ................................ 666 tOracleInput Advanced settings ................................ 690 tOracleOutput Advanced settings ................................ 693 tPostegresqlInput Advanced settings ................ 722, 743, 765 tPostegresqlOutput Advanced settings ........ 715, 725, 746, 768 tPostegrsqlOutput Advanced settings ........ 715, 725, 746, 768 tPostegrsqlOutputBulk Advanced settings ................................ 769 tSASInput Advanced settings ................................ 781 tSASOutput Advanced settings ................................ 783

tSQLiteInput Advanced settings .................................789 tSQLiteOutput Advanced settings .................................794 tSybaseInput Advanced settings .................................807 tSybaseOutput Advanced settings .................................815 tTeradataInput Advanced settings .834, 836, 838, 843, 853 tTeradataOutput Advanced settings 832, 841, 846, 850, 862, ...........................................865, 868 tVectorWiseCommit ...................................858 V Variable ....................................................1496 X XML tAdvancedFileOutputXML .................1568 tDTDValidator ..........................1579, 1582 tEDIFACTtoXML ..............................1582 tExtractXMLField ..............................1585 tFileInputXML ...................................1592 tFileOutputXML .................................1599 tWriteXMLField .................................1601 tXSDValidator ....................................1607 tXSLT .................................................1611

Talend Open Studio Components

viii

You might also like