Open Source ETL using Talend Open Studio
Luıs [email protected]
February 14, 2013
Luıs Santos [email protected] Open Source ETL February 14, 2013 1
Overview
1 Who am i?
2 What is ETL?
3 ETL Software Suites
4 Talend Open Studio for Data Integration
5 Hands on
6 Conclusion
Luıs Santos [email protected] Open Source ETL February 14, 2013 2
Warning!!!
This presentation was created using LatexWhy?
Because i can!
Luıs Santos [email protected] Open Source ETL February 14, 2013 3
Who am i?
Luıs Santos [email protected] Open Source ETL February 14, 2013 4
Who am i?
Software Engineer andMathematics Student
Open Source addicted
PHP and Java Developer
Luıs Santos [email protected] Open Source ETL February 14, 2013 5
What is ETL?
Luıs Santos [email protected] Open Source ETL February 14, 2013 6
What is ETL?
In computing, Extract, Transform and Load (ETL) refers to aprocess in database usage and especially in data warehousingthat involves:
Extracting data from outside sourcesTransforming it to fit operational needs (which can includequality levels)Loading it into the end target (database, more specifically,operational data store, data mart or data warehouse)
(2013, http://en.wikipedia.org/wiki/Extract, transform, load)
Luıs Santos [email protected] Open Source ETL February 14, 2013 7
ETL Software Suites
Pentaho Data Integration (Kettle)
SQL Server Integration Services
Talend Open Studio for Data Integration
etc...
Luıs Santos [email protected] Open Source ETL February 14, 2013 8
Talend Open Studio for Data Integration
Talend Open Studio is a set of tools for developing, testing, deploying andapplication integration projects.
Talend Open Studio for Big Data
Bonita Open Solution (BPM)
Talend Open Studio for Data Integration
Talend Open Studio for Data Quality
Talend ESB
Talend Open Studio for MDM
Luıs Santos [email protected] Open Source ETL February 14, 2013 9
Datasource(rer)s
Luıs Santos [email protected] Open Source ETL February 14, 2013 10
Datasources (Extract and Load)
Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP,REST, HTTP, FTP, SSH, Imap
Luıs Santos [email protected] Open Source ETL February 14, 2013 11
Transformers
Luıs Santos [email protected] Open Source ETL February 14, 2013 12
Transformers (Transform)
Sort data
Convert data
Cross data between datasources
Filter data
Fuzzy search
Normalize and Denormalize data
Luıs Santos [email protected] Open Source ETL February 14, 2013 13
Where and how ?
Where ?
Multi-platform ( Linux, MacOs, BSD-* even on windows )You just need a JVM (Java Virtual Machine)
How ?
Execute it from your favorite programming language using syscallsCommand lineFrom your JVM based application (Java, Groovy, JRuby)Webservices runing on the top Java App Server (Tomcat, Glassfish)
Luıs Santos [email protected] Open Source ETL February 14, 2013 14
Where and how ?
Where ?
Multi-platform ( Linux, MacOs, BSD-* even on windows )You just need a JVM (Java Virtual Machine)
How ?
Execute it from your favorite programming language using syscallsCommand lineFrom your JVM based application (Java, Groovy, JRuby)Webservices runing on the top Java App Server (Tomcat, Glassfish)
Luıs Santos [email protected] Open Source ETL February 14, 2013 14
Hands on
Luıs Santos [email protected] Open Source ETL February 14, 2013 15
Hands on
Querying data
Joining data from multiple datasources
Filtering and sorting data
Exporting data
Deploying your job
Calling it from PHP
Luıs Santos [email protected] Open Source ETL February 14, 2013 16
Database Schema
Luıs Santos [email protected] Open Source ETL February 14, 2013 17
Example
Luıs Santos [email protected] Open Source ETL February 14, 2013 18
”With great power comes great responsability.”(Voltair)
Luıs Santos [email protected] Open Source ETL February 14, 2013 19
The End
email: [email protected]
twitter: @santosluis87
linkedin: https://www.linkedin.com/in/luissantos87
Luıs Santos [email protected] Open Source ETL February 14, 2013 20