27
Apache Flink: Real-World Use Cases for Streaming Analytics Slim Baltagi @SlimBaltagi Brazil - Sao Paulo Apache Flink Meetu p March 17 th , 2016

Apache Flink: Real-World Use Cases for Streaming Analytics

Embed Size (px)

Citation preview

Page 1: Apache Flink: Real-World Use Cases for Streaming Analytics

Apache Flink: Real-World Use Cases for Streaming Analytics

Slim Baltagi @SlimBaltagi

Brazil - Sao Paulo Apache Flink MeetupMarch 17th, 2016

Page 2: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Movement from Batch Analytics to Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

2

Page 3: Apache Flink: Real-World Use Cases for Streaming Analytics

I. What is Apache Flink stack?

Gel

lyTa

ble

Had

oop

M/R

SAM

OA

DataSet (Java/Scala/Python)Batch Processing

DataStream (Java/Scala)Stream Processing

Flin

kML

LocalSingle JVMEmbedded

Docker

ClusterStandalone YARN, Mesos (WIP)

CloudGoogle’s GCEAmazon’s EC2IBM Docker Cloud, …

Apa

che

Bea

m

Apa

che

Bea

m

MR

QL

Tabl

e

Cas

cadi

ng

Runtime : Distributed Streaming Dataflow

Zepp

elin

DEP

LOY

SYST

EMA

PIs

& L

IBR

AR

IES

STO

RA

GE Files

LocalHDFS

S3, Azure StorageTachyon

DatabasesMongoDB HBaseSQL …

Streams FlumekafkaRabbitMQ…

Batch Optimizer Stream Builder

Stor

m

Flin

kCEP

Gel

ly-S

trea

m3

Page 4: Apache Flink: Real-World Use Cases for Streaming Analytics

I. What is Apache Flink stack?See First Apache Flink meetup in South America that I

gave as a webinar on February 24th 2016. It is titled: Introduction to Apache Flink: What, How, Why, Who, Where? https://www.youtube.com/watch?v=YAKdD1rHCxs (Part 1)

See similar talk on February 2nd 2016 that I previously gave a at the New York City Apache Flink which. Now, the world’s largest Flink meetup• Slideshttp

://www.slideshare.net/sbaltagi/apacheflinkwhathowwhywhowherebyslimbaltagi-57825047

• Video recording https://www.youtube.com/watch?v=G77m6Ou_kFA

Flink Knowledge Base: all resources related to Flink http://sparkbigdata.com/component/tags/tag/27-flink 4

Page 5: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Movement from Batch Analytics to Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

5

Page 6: Apache Flink: Real-World Use Cases for Streaming Analytics

II. Movement from Batch Analytics to Streaming Analytics

Batch StreamingHigh-latency apps Low-latency apps

Static Files Event Streams

Process-after-store Sense-and-respondBatch processors Stream processors

6

Page 7: Apache Flink: Real-World Use Cases for Streaming Analytics

What is batch processing? Many big data sources represent series of events that

are continuously produced. Example: tweets, web logs, user transactions, system logs, sensor networks, …

Batch processing:  These events are collected together based on the number of records or a certain period of time (a day for example) and stored somewhere to be processed as a finite data set.

What’s the problem with ‘process-after-store’ model: • Unnecessary latencies between data generation and

analysis & actions on the data. • Implicit assumption that the data is complete after a

given period of time and can be used to make accurate predictions for example.

7

Page 8: Apache Flink: Real-World Use Cases for Streaming Analytics

What is stream processing? Most data is available as series of events (click

streams, mobile apps data, .. ) continuously produced by a variety of applications and systems in the enterprise.

Data sources are not anymore typical enterprise sources but new ones such as social media data, sensor data …

Data from disparate systems (internally and externally) can be integrated in a central hub and: Made available as low-latency data streams

required for real-time stream processing. Loaded into your data warehouse for offline

analysis.

8

Page 9: Apache Flink: Real-World Use Cases for Streaming Analytics

Factors behind the movement from Batch Analytics to Streaming Analytics

There is a movement in Big Data processing from Batch Analytics to Streaming Analytics driven by many factors:• Data streams: Sensors networks, mobile apps data, .. • Technology: Rapidly growing open source streaming

analytics tools, vendors innovating in this space, more mobile devices than human beings, cloud services for real-time stream processing…

• Business: Organizations are more and more embracing streaming analytics for faster time to insight and competitive advantages.

• Customers: Costumers are becoming more and more demanding for instant responses in the way they are used to in social networks: twitter, facebook, linkedin… 9

Page 10: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Batch vs. Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

10

Page 11: Apache Flink: Real-World Use Cases for Streaming Analytics

III. Key Differentiators of Apache Flink for Streaming AnalyticsThe 8 Requirements of Real-Time Stream Processing,

Stonebraker et al. 2005 • Original paper http://cs.brown.edu/~ugur/8rulesSigRec.pdf• A short summaryhttp

://blog.acolyer.org/2014/12/03/the-8-requirements-of-real-time-stream-processing/

Apache Flink fulfills all these requirements and more!• http://data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/• http

://data-artisans.com/flink-0-10-a-significant-step-forward-in-open-source-stream-processing/• http://data-artisans.com/flink-1-0-0/• https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison• https

://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5GANwE/edit

• http://www.slideshare.net/robertmetzger1/january-2016-flink-community-update-roadmap-2016/9

11

Page 12: Apache Flink: Real-World Use Cases for Streaming Analytics

III. Key Differentiators of Apache Flink for Streaming AnalyticsTrue Low latency streaming engine: fast results in milliseconds

High throughput: handle large data amounts (millions of events per second)• http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

Exactly once guarantees: Correct results, also in failure cases• http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-pr

ocessing-with-apache-flink/

Programmability: Higher level, Intuitive and easy to use APIs

Backpressure refers to the situation where a system is receiving data at a higher rate than it can process during a temporary load spike.• http://data-artisans.com/how-flink-handles-backpressure/

Event time and out of order stream processing • http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-pa

rt-1/

Stateful stream processing and versioning state• http://data-artisans.com/how-apache-flink-enables-new-streaming-applications/

12

Page 13: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Batch vs. Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

13

Page 14: Apache Flink: Real-World Use Cases for Streaming Analytics

IV. Real-World Use Cases with Flink for Streaming Analytics

Stonebraker et al. make the case in 2005 that stream processing is going to become increasingly important. Not just for the usual finance, fraud, and command-and-control use cases, but also….… “as the “sea change” caused by cheap micro-sensor technology takes hold, we expect to see everything of material significance on the planet get “sensor-tagged” and report its state or location in real time. This sensorization of the real world will lead to a “green field” of novel monitoring and control applications with high-volume and low-latency processing requirements.”Reference:http://blog.acolyer.org/2014/12/03/the-8-requirements-of-real-time-stream-processing/

14

Page 15: Apache Flink: Real-World Use Cases for Streaming Analytics

Shift from Reactive approach to proactive approach

Capturing new data and providing the ability to process streams of this data is allowing organizations to shift • From: taking a REACTIVE, post transaction

approach

• To: more of a PROACTIVE, pre decision approach to interactions with their customers, suppliers and employees.

Again, no matter the vertical, this transition is happening.

15

Page 16: Apache Flink: Real-World Use Cases for Streaming Analytics

…to real-time personalization

From static branding

…to repair before break

From break then fix

…to designer medicine

From mass treatment

…to automated algorithms

From educated investing

…to 1x1 targetingFrom mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Manufacturing

Big Data Analytics Frameworks enable shifting the business from…

Reactive

Proactive

Shift from Reactive approach to proactive approach

16

Page 17: Apache Flink: Real-World Use Cases for Streaming Analytics

Real-Time Monitoring of Customer Activity Events

17

Page 18: Apache Flink: Real-World Use Cases for Streaming Analytics

Generic Streaming Analytics Architectural pattern. Ev

ent

Prod

ucer

s

Even

t Col

lect

or

Even

t Bro

ker

Even

t Pro

cess

or

Inde

xer

Visu

aliz

er/S

earc

h

• Kafka• RabitMQ• JMS

• Flink• Spark• Storm• Samza

• ElasticSearch• Solr• Cassandra• NoSQL DB

• Kibana• Custom

GUI

• Flume• SpringXD• Logstash• Nifi• Fluentd

• Apps• Devices• Sensors

18

Page 19: Apache Flink: Real-World Use Cases for Streaming Analytics

IV. Real-World Use Cases with Flink for Streaming AnalyticsBelow is list several use cases, taken from real

industrial situations: Financial Services

– Real-time fraud detection.– Real-time mobile notifications.

Healthcare– Smart hospitals - collect data and readings from hospital

devices (vitals, IVs, MRI, etc.) and analyze and alert in real time.

– Biometrics - collect and analyze data from patient devices that collect vitals while outside of care facilities.

Ad Tech– Real-time user targeting based on segment and preferences.

Oil & Gas• Real-time monitoring of pumps/rigs. 19

Page 20: Apache Flink: Real-World Use Cases for Streaming Analytics

IV. Real-World Use Cases with Flink for Streaming Analytics

Retail• Build an intelligent supply chain by placing sensors or RFID

tags on items to alert if items aren’t in the right place, or proactively order more if supply is low.

• Smart logistics with real-time end-to-end tracking of delivery trucks.

Telecommunications• Real-time antenna optimization based on user location data.• Real-time charging and billing based on customer usage,

ability to populate up-to-date usage dashboards for users.• Mobile offers.• Optimized advertising for video/audio content based on what

users are consuming.

20

Page 21: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Batch vs. Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

21

Page 23: Apache Flink: Real-World Use Cases for Streaming Analytics

V. Who is using Flink? is using Apache Flink? has its hack week and the winner was

a Flink based streaming project! December 18, 2015• Extending the Yahoo! Streaming Benchmark and Winning

Twitter Hack-Week with Apache Flink. Posted on February 2, 2016 by Jamie Grier http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

did some benchmarks to compare performance of their use case implemented on Apache Storm against Spark Streaming and Flink. Results posted on December 18, 2015• http://yahooeng.tumblr.com/post/135321837876/benchmarking-stream

ing-computation-engines-at

• http://data-artisans.com/extending-the-yahoo-streaming-benchmark/• https://github.com/dataArtisans/yahoo-streaming-benchmark

23

Page 24: Apache Flink: Real-World Use Cases for Streaming Analytics

Agenda

I. What is Apache Flink Stack?II. Batch vs. Streaming AnalyticsIII. Key Differentiators of Apache Flink for Streaming AnalyticsIV. Real-World Use Cases with Flink for Streaming AnalyticsV. Who is using Flink?VI. Where do you go from here?

24

Page 25: Apache Flink: Real-World Use Cases for Streaming Analytics

VI. Where do you go from here?

A few resources for you:

• Flink at the Apache Software Foundation: flink.apache.org/

• Free ebook from MapR: Streaming Architecture: New Designs Using Apache Kafka and MapR Streams https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams

• Free Apache Flink training from data Artisans http://dataartisans.github.io/flink-training/ Still version 0.10.1 and not latest 1.0

• Flink Knowledge Base: One-Stop for everything related to Apache Flink http://sparkbigdata.com/component/tags/tag/27-flink

• Apache Flink in Action is probably the First book on Apache Flink! It will be published by Manning. I am co-authoring this book! Please stay tuned for the MEAP: Manning Early Access Program! 25

Page 26: Apache Flink: Real-World Use Cases for Streaming Analytics

VI. Where do you go from here? A few takeaways :

• Organizations are more and more embracing streaming analytics for:• Use cases requiring lower latency: monitoring,

altering, … • Faster time to insight • Competitive advantages

• By leveraging streaming analytics, new startups are challenging established companies. Example: Pay-As-You-Go insurance or Usage-Based Auto Insurance

• Speed is said to have become the new currency of business.

26

Page 27: Apache Flink: Real-World Use Cases for Streaming Analytics

Thanks!To all of you for attending!Let’s keep in touch!

[email protected]• @SlimBaltagi• https://www.linkedin.com/in/slimbaltagi

Any questions?

27