Upload
huguk
View
2.935
Download
2
Embed Size (px)
Citation preview
14/10/2011
1
Using Hadoop with Talend
Mark Chapman
Imad Rahman
© Talend 2011 2
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
© Talend 2011 3
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
© Talend 2011 4
Venture-backed
Global operations
Corporate Headquarters
San Francisco (Los Altos)
Paris (Suresnes)
Operations
Orange County (Irvine)
Boston (Burlington)
New York (Tarrytown)
London (Maidenhead)
Utrecht
Nuremberg
Bonn
Munich
Milan (Bergame)
Tokyo
Beijing
Talend across the world…
Global leader in open source integration
14/10/2011
2
© Talend 2011 5
Customers By Industry
Systems Integrators Public Sector & Education
Retail and Manufacturing
Media & Telco
Finance & Insurance
Software
Services & Others
© Talend 2011 6
Market Positioning
Data Quality Data profiling Data cleansing
Analytics (ETL) Operational data integration
Data Integration
Model and master any data or domain
Master
Data
Management
Application Integration Connect applications & services
© Talend 2011 7
Talend Unified Platform
Deployment
Monitoring
Execution
Repository
Studio
Complete unified environment supports all integration approaches – data & application
Uses consistent technology & leverages open standards
Comprehensive Eclipse-based user interface
Consolidated metadata & project information
Web-based deployment & scheduling
Same containers for batch processing, message routing & services
Single web-based monitoring console
© Talend 2011 8
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
14/10/2011
3
© Talend 2011 9
Background: MapReduce and Hadoop
MapReduce: Parallel Programming Model
“Divide and Conquer
Many possible implementations
Hadoop: Open Source Java MapReduce
Simplified framework
Cloud: flexible infrastructure
e.g. Amazon Elastic MapReduce
© Talend 2011 10
Talend Integration Suite MPx for Big Data
Right-Time
Batch ETL
High Volume (ELT)
Big Data ·Hadoop ·Filescale
• One platform
• All sources • All modes
• All scales
© Talend 2011 11
Talend’s Big Data Partnerships
Partnering with Enterprise Big Data Leaders
Cloudera: Enterprise Hadoop
Talend: Open Source Cloudera
Connect Partner for Data Integration
Greenplum: Hadoop-Powered Analytics
Big Data-scale Relational DB
Talend supports Greenplum for
Hadoop and ELT
© Talend 2011 12
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
Agenda
14/10/2011
4
© Talend 2011 13
Talend Integration Suite MPx
• Use case: process structured flat files (e.g. logs)
• Uses MapReduce techniques
• Performance optimized for this use case
• Native code, no Java
• Hadoop components for easy job design
• HDFS: store, retrieve data
• Cloudera Sqoop: Bulk ETL
• Hive: Relational DB layer
• Pig: In-Hadoop transformations
Hadoop Features
Filescale Features
© Talend 2011 14
Talend Components for Hadoop Features
HDFS (Hadoop File System) utilities – for loading/unloading files
Sqoop – utility for RDBMS extract to HDFS (Cloudera only)
Data Warehousing on Hadoop using Hive - SQL - like language, to
query and transform data
Transforming Data in Hadoop using Pig – transform, normalize, clean
HDFS data – very flexible
Talend Integration Suite MPx Hadoop Support
Components for HDFS and Sqoop loading/unloading
Components for defining Pig and Hive jobs
Integrate with any of Talend’s supported sources!
© Talend 2011 15
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
© Talend 2011 16
Applying Talend Big Data in Enterprise
Landing data from operational systems
Transforming it before loading DW
Performing additional analytics directly in Hadoop
Keeping historical data online for queries
Hadoop
HDFS Hive
Sqoop Sqoop Pig
Hive
DW BI
14/10/2011
5
© Talend 2011 17
Today’s Demo Scenario
View sample log data from an online game source
Load log data into Hive
Aggregate the data into 2 aggregate tables
Load aggregated data into RDBMS
Additional processing using PIG Show Time!
© Talend 2011 19
Wrap-up
Talend Integration Suite MPx…
delivers MapReduce technologies as part of a
comprehensive data management solution
makes using Hadoop like other data integration activities
…is available for you to try
Free 2 month license to Talend Integration Suite MPx
Visit http://info.talend.com/hugoffer.html
© Talend 2011 20
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers
14/10/2011
6
© Talend 2011 21
Questions and Answers
Mark Chapman
Technical Manager
Skype: mchapman68
Imad Rahman
Technical Presales Consultant
Skype: imadrahman.talend
Thank You!