6
14/10/2011 1 Using Hadoop with Talend Mark Chapman Imad Rahman © Talend 2011 2 Agenda Talend Introduction MapReduce and Hadoop Talend Integration Suite MPx Hadoop Features and TIS Components How to use Talend to simplify Hadoop Demo! Questions & Answers © Talend 2011 3 Agenda MapReduce and Hadoop Talend Integration Suite MPx Hadoop Features and TIS Components How to use Talend to simplify Hadoop Demo! Questions & Answers © Talend 2011 4 Venture-backed Global operations Corporate Headquarters San Francisco (Los Altos) Paris (Suresnes) Operations Orange County (Irvine) Boston (Burlington) New York (Tarrytown) London (Maidenhead) Utrecht Nuremberg Bonn Munich Milan (Bergame) Tokyo Beijing Talend across the world… Global leader in open source integration

Taland Hadoop data integration

  • Upload
    huguk

  • View
    2.935

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Taland Hadoop data integration

14/10/2011

1

Using Hadoop with Talend

Mark Chapman

Imad Rahman

© Talend 2011 2

Agenda

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

© Talend 2011 3

Agenda

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

© Talend 2011 4

Venture-backed

Global operations

Corporate Headquarters

San Francisco (Los Altos)

Paris (Suresnes)

Operations

Orange County (Irvine)

Boston (Burlington)

New York (Tarrytown)

London (Maidenhead)

Utrecht

Nuremberg

Bonn

Munich

Milan (Bergame)

Tokyo

Beijing

Talend across the world…

Global leader in open source integration

Page 2: Taland Hadoop data integration

14/10/2011

2

© Talend 2011 5

Customers By Industry

Systems Integrators Public Sector & Education

Retail and Manufacturing

Media & Telco

Finance & Insurance

Software

Services & Others

© Talend 2011 6

Market Positioning

Data Quality Data profiling Data cleansing

Analytics (ETL) Operational data integration

Data Integration

Model and master any data or domain

Master

Data

Management

Application Integration Connect applications & services

© Talend 2011 7

Talend Unified Platform

Deployment

Monitoring

Execution

Repository

Studio

Complete unified environment supports all integration approaches – data & application

Uses consistent technology & leverages open standards

Comprehensive Eclipse-based user interface

Consolidated metadata & project information

Web-based deployment & scheduling

Same containers for batch processing, message routing & services

Single web-based monitoring console

© Talend 2011 8

Agenda

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

Page 3: Taland Hadoop data integration

14/10/2011

3

© Talend 2011 9

Background: MapReduce and Hadoop

MapReduce: Parallel Programming Model

“Divide and Conquer

Many possible implementations

Hadoop: Open Source Java MapReduce

Simplified framework

Cloud: flexible infrastructure

e.g. Amazon Elastic MapReduce

© Talend 2011 10

Talend Integration Suite MPx for Big Data

Right-Time

Batch ETL

High Volume (ELT)

Big Data ·Hadoop ·Filescale

• One platform

• All sources • All modes

• All scales

© Talend 2011 11

Talend’s Big Data Partnerships

Partnering with Enterprise Big Data Leaders

Cloudera: Enterprise Hadoop

Talend: Open Source Cloudera

Connect Partner for Data Integration

Greenplum: Hadoop-Powered Analytics

Big Data-scale Relational DB

Talend supports Greenplum for

Hadoop and ELT

© Talend 2011 12

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

Agenda

Page 4: Taland Hadoop data integration

14/10/2011

4

© Talend 2011 13

Talend Integration Suite MPx

• Use case: process structured flat files (e.g. logs)

• Uses MapReduce techniques

• Performance optimized for this use case

• Native code, no Java

• Hadoop components for easy job design

• HDFS: store, retrieve data

• Cloudera Sqoop: Bulk ETL

• Hive: Relational DB layer

• Pig: In-Hadoop transformations

Hadoop Features

Filescale Features

© Talend 2011 14

Talend Components for Hadoop Features

HDFS (Hadoop File System) utilities – for loading/unloading files

Sqoop – utility for RDBMS extract to HDFS (Cloudera only)

Data Warehousing on Hadoop using Hive - SQL - like language, to

query and transform data

Transforming Data in Hadoop using Pig – transform, normalize, clean

HDFS data – very flexible

Talend Integration Suite MPx Hadoop Support

Components for HDFS and Sqoop loading/unloading

Components for defining Pig and Hive jobs

Integrate with any of Talend’s supported sources!

© Talend 2011 15

Agenda

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

© Talend 2011 16

Applying Talend Big Data in Enterprise

Landing data from operational systems

Transforming it before loading DW

Performing additional analytics directly in Hadoop

Keeping historical data online for queries

Hadoop

HDFS Hive

Sqoop Sqoop Pig

Hive

DW BI

Page 5: Taland Hadoop data integration

14/10/2011

5

© Talend 2011 17

Today’s Demo Scenario

View sample log data from an online game source

Load log data into Hive

Aggregate the data into 2 aggregate tables

Load aggregated data into RDBMS

Additional processing using PIG Show Time!

© Talend 2011 19

Wrap-up

Talend Integration Suite MPx…

delivers MapReduce technologies as part of a

comprehensive data management solution

makes using Hadoop like other data integration activities

…is available for you to try

Free 2 month license to Talend Integration Suite MPx

Visit http://info.talend.com/hugoffer.html

© Talend 2011 20

Agenda

Talend Introduction

MapReduce and Hadoop

Talend Integration Suite MPx

Hadoop Features and TIS Components

How to use Talend to simplify Hadoop

Demo!

Questions & Answers

Page 6: Taland Hadoop data integration

14/10/2011

6

© Talend 2011 21

Questions and Answers

Mark Chapman

Technical Manager

[email protected]

Skype: mchapman68

Imad Rahman

Technical Presales Consultant

[email protected]

Skype: imadrahman.talend

Thank You!