Upload
roberto-marchetto
View
3.427
Download
11
Tags:
Embed Size (px)
DESCRIPTION
A course on Pentaho Data Integration with Kettle. Another interesting course on Talend is on http://www.slideshare.net/melphi_/talend-open-studio-data-integration
Citation preview
Pentaho Data Integration(Kettle)
www.robertomarchetto.com
PDI Overview (Kettle)
● An entry-level tool for data manipulation (ETL)● PDI (Kettle) reads procedures stored in XML
format● Spoon is a graphical tool used to develop that
procedures● Procedures are designed linking components● Many data sources can be used, JDBC, files,
web services● JavaScript and Java support for complex
routines
www.robertomarchetto.com
Procedure users_dimension
Query users:
SELECT u.id, CONCAT(u.first_name, ' ', u.last_name) as fullname, u.title FROM users uWHERE u.first_name is not null and u.last_name is not null
www.robertomarchetto.com
Procedure accounts_dimension
Query accounts:
select a.id, a.name, a.industry, a.billing_address_postalcode, a.billing_address_city, a.billing_address_countryfrom accounts a
www.robertomarchetto.com
Procedure opportunities_fact
Query opportunities:
SELECT o.id, o.date_entered, o.date_closed, o.assigned_user_id, o.sales_stage, o.name, o.amount FROM opportunities o WHERE o.sales_stage in ('Closed Won', 'Closed Lost') ORDER BY o.id
www.robertomarchetto.com
Using JNDI
● Edit JNDI /simple-jndi/jdbc.properties orC:/Documents and Settings/<user>/.pentaho/simple-jndi/default.properties
www.robertomarchetto.com
Running procedures
● Directly from Spoon● From Pentaho BI Suite● Using command line (Kitchen, Pan)
kitchen.bat /file:D:\Jobs\jobname.kjb /level:Basic
● In a clustered enviroment● Using a web services (Carte)
www.robertomarchetto.com
Scheduling
● Using Pentaho's scheduler● Using an external scheduler (cron)