25
Talend Real-Time Big Data Sandbox Big Data Insights Cookbook

Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Embed Size (px)

Citation preview

Page 1: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Page 2: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to run

Sandbox

Sandbox Setup &

Configuration

Obtaining a TalendLicense

Demo (Scenario)

Page 3: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

What is the Talend Cookbook?

About this cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Using the Talend Real-Time Big Data Platform, this Cookbook provides step-by-step instructions to built and run an end-2-end integration scenario.

The demo is built on a real world use-case in the Retail industry and demonstrates how Talend, Spark, NoSQL and real-time messaging can be easily used together to provide real-time “offers” as part of an online shopping experience.

Whether batch, streaming or real-time integration, understand how Talend can be used to address your big data challenges and move you into and beyond the sandbox stage.

Page 4: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

About Talend

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

What does Talend offer?

At Talend, it’s our mission to connect the data-driven enterprise, so our customers can operate in real-time with new insight about their customers, markets and business.

• Talend helps companies with big data challenges with the most advanced big data integration platform, used by businesses to deliver timely and easy access to all their data.

• Talend provides the industry’s first data integration platform with native support for Apache Spark, Spark Streaming and Hadoop.

• Talend delivers unmatched data processing speed and enables any company to convert streaming big data or IoT sensor information into immediately actionable insights.

Page 5: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

About Talend Big Data

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Visually develop jobs that run 100% on Spark:

• 5X times faster using independent benchmarks

• 10X developer productivity gained over hand-coding Spark

• 100X faster with in-memory processing

Over 100 new drag-n-drop Spark components:

• HDFS, RDBMS, NoSQL, Cloud Storage, Transformation, Messaging, In-memory analytics & machine learning recommendations, and much more

• In-memory data caching & “windowed” computations

• Click to enable Spark Streaming for real-time data processing

Convert Talend MapReduce jobs to Spark with the click of a button, future proofing your investment

1st Data Integration Platform on Apache Spark

Page 6: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

What is the Big Data Sandbox?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

The Talend Real-Time Big Data Sandbox is a virtual environment that

combines the Talend Real-Time Big Data Platform with some sample

scenarios pre-built and ready-to-run.

See how Talend can turn data into real-time decisions through sandbox

examples that integrate Apache Kafka, Spark, Spark Streaming,

Hadoop and NoSQL.

Virtual Environment

Talend Real-Time Big Data

Platform

Sample scenarios

pre-built and ready-to-run

DataReal-time decisions

San

db

ox

Exam

ple

s

Page 7: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

What Pre-requisites are required to run Sandbox ?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Talend Platform for Big Data includes a graphical IDE (Talend Studio), teamwork management, data quality, and advanced big data features.

To see a full list of features please visit Talend’s Website: http://www.talend.com/products/platform-for-big-data

You will need a Virtual Machine player such as VMWare, which can be downloaded from VMware Player Site

Follow the VM Player install instructions from the provider

The recommended host machine

Memory8GB

Disk Space20GB (10GB is for the

image download)

Page 8: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I set-up & configure Sandbox ?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Follow the steps below to install and configure your Big Data Sandbox:

1. Open the VMware Player.

2. Click on “Open a Virtual Machine”

3. Find the .ova file that you downloaded. Select it and click Open.

4. Select where you would like the disk to be stored on your local host machine: e.g. C:/vmware/sandbox

5. Click on “Import”.

2

3a 4a

5

3b

1

Download the Sandbox Virtual Machine file at www.talend.com/talend-big-data-sandbox.

You will receive an email with a license key attachment and a second email with a list of support resources and videos.

Note: The Username/Sudo Username = talendPassword = talend

Having trouble with Sandbox configuration settings?click here for troubleshooting guide

Page 9: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I set-up & configure Sandbox? (cont.)

6. Edit Settings if needed:

a) Right-click NAT icon in upper-right corner and select settings. Check the setting to make sure the memory and processors are not too high for your host machine.

b) It is recommended to have 8GB or more allocated to the Sandbox VM and it runs very well with 10GB if your host machine can afford the memory.

7. The “NAT” Network Adaptor should already be configured for your VM. If it is not, you can add it by following the steps below:

a) Click “Add”

b) Select Network Adapter : “NAT” and select “Next”

c) Once finished select Finish to return to the main Player home page.

8. Start the VM

6b

7a

7b

6a

7c

8

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Page 10: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I set-up & configure Sandbox ?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Follow the steps below to install and configure your Big Data Sandbox (Cont.):

1. Click on “Play Virtual Machine”

2. The virtual machine starts loading

1

2

Page 11: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I set-up & configure Sandbox ?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Follow the steps below to install and configure your Big Data Sandbox (Cont.):

1. Once virtual machine has finished loading, you are brought to the login screen. Enter the password “talend” to continue

1

Page 12: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I setup the Talend License on Virtual machine?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

If you did not receive a license key click on link

To obtain the license key: https://info.talend.com/prodevaltpbdrealtimesandboxdrive.html

You should have been provided a license file by your Talend representative or by an automatic email from the Talend Real-time Big Data Sandbox program.

Page 13: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

How do I setup the Talend License on Virtual machine?

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

To get the license file on the VM:

1. Click the Download button of the license key document and click Save As, to save it on your laptop in a place you will be able to find it.

2. In the Virtual Player, click Files

3. Double-click “Documents folder”

4. Locate License Key document and Drag-and-Drop it into the Documents folder on the Virtual Player.

1a1b

2

3 4

“For VirtualBox users, there is a known issue with Drag-and-drop functionality. The easiest way to get the Talend license file onto the VM is by saving it to a cloud storage site such as Dropbox.com or sending it to a web-based email client that you have access (such as gmail, yahoo, hotmail, etc…), then navigating to that location from within the Virtual Machine web browser to download the file.”

Important Notes:

This license file is required to open the Talend Studio and must reside within the VM.

Page 14: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Customers Channels

Internal Systems

Spark Engine(Recommendation)

Shopping Cart(Recommendation

s)

Window Updates

POS

Clickstream

…….

NOSQL

Streaming

Streaming

• In this Demo you will see a simple version of making your website an Intelligent Application.

You will experience:

• Building a Spark Recommendation Model

• Setting up a new Kafka topic to help simulate live web traffic coming from Live web users browsing a retail web store.

• Most important you will see first-hand with Talend how you can take streaming data and turn it into real-time recommendations to help improve shopping cart sales.

Email

Website

Store

The following Demo will help you see the value that using Talend can bring to your big data projects:The Real-time Recommendation Demo is designed to illustrate the simplicity and flexibility Talend brings to using Spark in your Big Data Architecture.

Page 15: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

In this Demo, you will see how you can…

If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.

Create a Kafka Topic to Produce and Consume real-time streaming

data

Create a Spark recommendation model based on specific user

actions

See live streaming recommendations to a Cassandra NoSQL database for “Fast Data”

access for a WebUI

Create a recommendation

model

Create aKafka Topic

Steam LiveRecommendations

Pipeline

Page 16: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo REQUIRED

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Running a shell script:

1. From the Desktop, double click on the “Start_Kafka Icon”. If prompted for a password enter talend.

2. You can stop Kafka at any time by double-clicking on “Stop_Kafka”. If prompted for a password, enter talend.

2

1

Page 17: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

3a

3b

4

Real-time Recommendation Demo REQUIRED

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Starting Talend Studio: The first time you start up TalendStudio you have to browse for the license…

1. To begin, Click on “Talend-Studio”

2. Click “My product license is on the local file system” then click “Browse”

3. Navigate the “Documents” folder. Click on the license file you downloaded

4. Click “OK” then click “Next”

5. Talend Real-Time Big Data Platform window pops up, let it load, and when complete click “Finish”

2a

1

2b

5

Page 18: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.

To execute the Real-time Recommendation Demo:

First, a Kafka topic must be created . This task can be completed by executing the following job

1. Navigate to the “job designs” folder:

2. Click on Standard Jobs > Realtime_Recommendation_Demo

3. Double click on OneTime_Create_Clickstream_Kafka_Topic 0.1 This opens the job in the designer window

4. From the Run tab, click on Run to execute

Now you can generate the recommendation model by loading the product ratings data into the Alternating Least Squares (ALS) Algorithm. Rather than coding a complex algorithm with Scala, a single Spark component available in Talend Studio simplifies the model creation process. The resultant model can be stored in HDFS or in this case, locally.

1

2

3

3b

4

Page 19: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

3b

4

Real-time Recommendation Demo

Run the job to generate the recommendation model

1. Navigate to the “job designs” folder:

2. Click on BigData batch > Realtime_Recommendations_Demo

3. Double click on Build_Recommendation_Model_with_Spark This opens the job in the designer window.

4. From the Run tab, click on Run to execute

With the Recommendation model created, your lookup tables populated and your Kafka topic ready to consume data, you can now stream your Clickstream data into your Recommendation model and put the results into your Cassandra tables for reference from a WebUI.

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

1

23

Page 20: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

1. Navigate to the “job designs” folder:

2. Click on Standard Jobs > Realtime_Recommendations_Demo

3. Double click on Push_Clickstream_To_Kafka 0.1 This opens the job in the designer window

First, lets look quickly at the Push_Clickstream_To_Kafka job.

This job is setup to simulate real-time streaming of web traffic and clickstream data into a kafka topic that will then be consumed by our recommendation engine to produce our recommendations.

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

We are reviewing this job now. It will be executed in the next few steps

Page 21: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

1. Navigate to the “job designs” folder:

2. Click on Big Data Streaming > Realtime_Recommendation_Demo

3. Double click on Realtime_Recommendation_Engine_Pipeline 0.1 This opens the job in the designer window

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Page 22: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

Next, take a look at the Realtime_Recommendation_Engine_Pipeline job.

• In this job, you will see the input is your Kafka Consumer of Clickstream Data.

• The data will be fed into your Recommendation Engine to produce Real-time “offers” based on the current user’s activity.

• Using the tWindow component, you can control how often you send recommendations.

• Your recommendations are sent to 3 output streams - the execution window for viewing purposes, flat file for later processing in your Big Data Analytics environment and to a Cassandra table for use in your “Fast Data” layer by your WebUI.

Click on “Run” to Start Recommendation Engine

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

B

A

B

A

Page 23: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Real-time Recommendation Demo

With your recommendation engine running, you can start sending data to your Kafka topic.

1. Navigate back to the Push_Clickstream_To_Kafka job and

2. Click “Run” on the run tab to execute

3. Once this job starts…switch back over to the Recommendation Engine job

4. Watch the execution output window. You will now see your real-time data coming through with recommended products based on your Recommendation Model.

5. Once you have seen the results, you can “kill” the Recommendation Engine to stop the streaming recommendations.

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

Your recommendations are also written to a Cassandra database so they can be referenced by a WebUI to offer, for instance, last minute product suggestions when a customer is about to check-out.

2

13

5

4

Page 24: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Conclusion

Product recommendations have evolved…

• ETL – it would take weeks to gather and process required data• MapReduce – Now you can process even more data then before in hours rather then days and weeks• Spark – NOW you can process even more in minutes and even seconds

What are your next steps?

Now that you understand how you can address your big data opportunities using Talend...

The next step would be to discuss with your Talend sales representative your specific requirements and how Talend can help “Jumpstart” your big data project into production.

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)

The good news is that…With Talend, it is now just a few clicks to make this type of transformation a reality.

Let’s take one final

look at how Talend will help you…

Page 25: Talend Real-Time Big Data SandboxTalend Real-Time Big Data Sandbox Big Data Insights Cookbook What is the Talend Cookbook? About this cookbook Overview of Real-time Big Data Sandbox ·

Conclusion

How will Talend help you?

And third, Talend lowers operations costs.

Talend’s zero footprint solution takes the complexity out of…integration deployment, management, maintenance

A usage based subscription model provides a fast return on investment without large upfront costs.

First, Talend vastly simplifies big data integration, allowing you to leverage in-house resources to use Talend's rich graphical tools that generate big data code (Spark, MapReduce, PIG, Java) for you.

Talend is based on standards such as Eclipse, Java, and SQL, and is backed by a large collaborative community.

So you can up skill existing resources instead of finding new resources.

Second, Talend is built for batch and real-time big data. Unlike other solutions that “map” to big data or support a few components, Talend is the first data integration platform built on Spark with over 100 Spark components.

Whether integrating batch (MapReduce, Spark), streaming (Spark), NoSQL, or in real-time, Talend provides a single tool for all your integration needs.

Talend’s native Hadoop data quality solution delivers clean and consistent data at infinite scale.

Talend lowers operations costsTalend is built for batch and real-time big data.

Talend vastly simplifies big data integration

Talend Real-Time Big Data SandboxBig Data Insights Cookbook

Overview of Real-time Big Data Sandbox

Pre-requisites to Run Sandbox

Sandbox Setup & Configuration

Obtaining aTalend License

Demo(Scenario)