17
Pentaho Data Integration (Kettle)

Pentaho Data Integration with Kettle

Embed Size (px)

DESCRIPTION

A course on Pentaho Data Integration with Kettle. Another interesting course on Talend is on http://www.slideshare.net/melphi_/talend-open-studio-data-integration

Citation preview

Page 1: Pentaho Data Integration with Kettle

Pentaho Data Integration(Kettle)

Page 2: Pentaho Data Integration with Kettle

www.robertomarchetto.com

PDI Overview (Kettle)

● An entry-level tool for data manipulation (ETL)● PDI (Kettle) reads procedures stored in XML

format● Spoon is a graphical tool used to develop that

procedures● Procedures are designed linking components● Many data sources can be used, JDBC, files,

web services● JavaScript and Java support for complex

routines

Page 3: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Development enviroment

Page 4: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Example, Source database

Page 5: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Example, destination database

Page 6: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Schema comparison

Page 7: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Procedure users_dimension

Query users:

SELECT u.id, CONCAT(u.first_name, ' ', u.last_name) as fullname, u.title FROM users uWHERE u.first_name is not null and u.last_name is not null

Page 8: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Testing

Page 9: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Procedure accounts_dimension

Query accounts:

select a.id, a.name, a.industry, a.billing_address_postalcode, a.billing_address_city, a.billing_address_countryfrom accounts a

Page 10: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Procedure opportunities_fact

Query opportunities:

SELECT o.id, o.date_entered, o.date_closed, o.assigned_user_id, o.sales_stage, o.name, o.amount FROM opportunities o WHERE o.sales_stage in ('Closed Won', 'Closed Lost') ORDER BY o.id

Page 11: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Procedure dates_dimension

Page 12: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Collect procedures in a job

Page 13: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Using JNDI

● Edit JNDI /simple-jndi/jdbc.properties orC:/Documents and Settings/<user>/.pentaho/simple-jndi/default.properties

Page 14: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Running procedures

● Directly from Spoon● From Pentaho BI Suite● Using command line (Kitchen, Pan)

kitchen.bat /file:D:\Jobs\jobname.kjb /level:Basic

● In a clustered enviroment● Using a web services (Carte)

Page 15: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Publishing on Pentaho

Page 16: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Running from Pentaho

Page 17: Pentaho Data Integration with Kettle

www.robertomarchetto.com

Scheduling

● Using Pentaho's scheduler● Using an external scheduler (cron)