20
Apache Flink Introduction By: Ahmed Nader

Apache flink

Embed Size (px)

Citation preview

Page 1: Apache flink

Apache Flink Introduction

By: Ahmed Nader

Page 2: Apache flink

2

Agenda

• What’s Apache Flink?

• Deeper into Flink

• Quick Start and Configuration

• Get your hands dirty

• Tips and some useful links

• References

Page 3: Apache flink

3

What’s Apache Flink? Open Source platform for distributed Stream

and Batch Processing. Large scale data processing engine. Real Streaming engine, not cutting stream

into batches. Flink has 2 APIs.

DataStream DataSet

Page 4: Apache flink

4

Datastream API Represents a continuous stream of data of

certain type. Operations applied on each element of the

stream or windows.

Data Strea

mOperatio

nData Strea

mSource Sink

Page 5: Apache flink

5

Datastream API Example Live Stock Feed:

Apple 235

Alert if Microsoft > 120

Apple 235

Google 516

Sum every 10 seconds

Microsoft

124

Microsoft

124

Google 516

Write event to databas

e

Alert if sum > 10000

Page 6: Apache flink

6

Dataset API Uses Batch processing. Special case for Stream processing where

finite data sources are just streams that happen to end.

Offers dedicated API with machine learning and graph processing libraries.

Data Set

Operation

Data SetSource Sink

Page 7: Apache flink

7

Dataset API Example Map/Reduce paradigm:

Map Reduce

a

12…

Page 8: Apache flink

8

Flink Stack

Page 9: Apache flink

9

Analyzing flink stack Streaming dataflow runtime which interprets

every program as a dataflow graph. Some Libraries on top of Datastream and Dataset

API such as: Table: enables SQL like queries. Gelly: Graph processing to transform and

traverse graphs in a distributed fashion. ML: has a couple of machine learning algorithms

yet still too basic. CEP: easily detect complex events in a data

stream. Which can allow to get hold of what’s really important in your data.

Page 10: Apache flink

10

Deeper into Flink

Data Sources

From an input file

From a socket

From acollection

Page 11: Apache flink

11

Deeper into Flink

Data Sinks

Write to a CSV File

Write to a socket

Print on the terminal

Page 12: Apache flink

12

Deeper into Flink Data Transformations(for DataStream API): Map: takes 1 element and produces 1

element. flatMap: takes 1 element and produces 0 or

more elements. Filter: Evaluates a boolean value for each

element and retains those returning true. KeyBy: partitions a stream into disjoint

partitions each has elements of the same key. Window: groups all stream events according

to some characteristic ex: data arrived in last 5 seconds.

Union, Join, Split, Select…

Page 13: Apache flink

13

Deeper into Flink Interesting Use cases: Processing Twitter feed and one good

application for that can be collecting statistics on that feed.

see: http://blog.brakmic.com/stream-processing-with-apache-flink/ Identifying popular locations where people

arrive by taxis,By applying filter and map functions on a datastream of taxi ride records then getting the most popular places for the last 15 minutes for example.

see: https://www.mapr.com/blog/essential-guide-streaming-first-processing-apache-flink

Page 14: Apache flink

14

Setup Pre-requisites: Java 7.x or higher. Maven 3.0.4 or higher. Start a new flink project using Maven:Run the following script in the terminal:mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-java \ -DarchetypeVersion=1.0.1OR Add flink to an existing project:

see: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html

Page 15: Apache flink

15

Get your hands dirty:

Page 16: Apache flink

16

Get your hands dirty:

Page 17: Apache flink

17

Get your hands dirty:

Execution

Local/debuggingcluster Command Line

Interface

Web interface

See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html

Page 18: Apache flink

18

Tips and some useful links: Subscribe to the mailing list, by sending an

empty email to [email protected].

Clone the flink project on Github for more examples.

There’s a free course by DataArtisanssee: http://dataartisans.github.io/flink-

training/index.html Here are some other useful links too:• http://

www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink

• https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html

• https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html

Page 19: Apache flink

19

References http://blog.brakmic.com/stream-processing-with-apache-flink/ http://

www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink

https://www.mapr.com/blog/essential-guide-streaming-first-processing-apache-flink

https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html

http://dataartisans.github.io/flink-training/index.html https://

ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html

Page 20: Apache flink

20

Thanks!Any Questions??