Upload
flink-forward
View
5.715
Download
1
Embed Size (px)
Citation preview
STATIC VS DYNAMIC STREAM PROCESSING
Christian Kreutzfeldt@mnxfst
STATIC VS DYNAMIC STREAM PROCESSING
Christian Kreutzfeldt@mnxfst
1. Introduction
2. Stream Processing - First Encounter
3. Increasing number of Use Cases
4. Arising Implementation Issues
5. Requirements for Stream Processing Framework
6. Way to SPQR (+ short demo)
7. Way to Apache Flink (extension points + short demo)
8. Future (hope to come)
9. Q&A
Christian Kreutzfeldt (@mnxfst)
Senior Software Developer & Architect atOtto Group Business Intelligence Department
Tech Lead “Real-Time Stream Processing”
Computer Science at University of Luebeck
w/ catalogue business, e-commerce and over-the-counter retail
Multichannel Retail
covering the entire portfolio of retail services across the value-added chain
Services
World’s Second-Largest Online Retailer in End-Consumer BusinessEurope’s Largest Online Retailer in End-Consumer Fashion & Lifestyle Business
providing retail-related financial services across the value-added chain
Financial Services
definition of business intelligence strategy
BI Strategy
talent recruitment & training,networking & consulting
Consulting
evaluation & impl. of data driven business models
Business Development
maintaining & providing data pools
Data Pool
software-as-a-service solutions
SaaS Products
Otto Group Business Intelligence Departmentdriven by data, inspired by our customers
Otto Group Business Intelligence Departmentdedicated to open source
stream processing framework
SPQR
scheduling framework for painfree agile development of your datahub
Schedoscope
framework for developing real-world machine learning solutions
Palladium
follow us on github.com/ottogroup
Stream Processingfirst steps w/ unified tracking
Unified
Tracking
Stream Processingprevent quality problems
Unified
Tracking
Tagging Template
Tagging Template
Tagging Template
Tagging Template
Stream Processingprevent quality problems
Unified
Tracking
Tagging Template
Tagging Template
Tagging Template
Tagging Template
EventStream
Event Validatorakka
-based
real stream
processi
ng
customer sessions
search sessions
user-agent identification
dynamic profile selection dynamic stream
queries
Stream Processingdeveloping project ideas
Umberto Salvagnin https://www.flickr.com/photos/kaibara/4688161016 (cc by 2.0)
Stream Processingsoftware development issues
resource intensive use-case implementation
required ops support for topology deployment and
monitoring
rather static implementations than highly flexible ones
highly time consuming
Static Topologies (Queries)
Dynamic Data
Highly Flexible Context
Stream Processingrequirements to ease the pain
unified runtime environment
operations support
support for multiple sources and sinks
real stream processing
easy-to-extend
steep learning curve
Stream Processingworking w/ data the business way
no-code topology definition(the SQL way)
self dependent, immediate deployments
consistent monitoring(behavior / result retrieval)
adjustment through re-deployments
Dynamic Topologies (Queries)
Dynamic Data
Highly Flexible Context
Stream Processingframework decision
unified runtime environment
operations support
support for multiple sources and sinks
real stream processing
easy-to-extend
steep learning curve
S P
Q R
(spo
oker
)
no-code topology definition
self dependent deployments
consistent monitoring
immediate deployments
short feedback circuit
SPQRconcepts
independent library deployments into node repositories for later use
library deployment
configuration based pipeline descriptions
zero-codetopologies
support for ad hoc queries, immediate adjustments and short feedback circuits
ad hoc queries
https://github.com/ottogroup/spqr
SPQRarchitecture
D E M O
Dynamic Stream Processingimportance for (business) acceptance
no-code topology definition
self dependent deployments
consistent monitoring
immediate deployments
short feedback circuit
steep learning curve, focus on functionality instead of implementation, better representation
no or less ops support, shorter time-to-execution, independency from tech teams, easier to use
short feedback circuit, easier to adjust
support people to try out new ideas, get more people to work with data streams
choose representation defined by topology author as foundation for monitoring to have common understanding (topology author, ops team)
Dynamic Stream Processingfrom spqr to apache flink - it’s all there
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
akka
Dynamic Stream Processingvariety of ways to interact with apache flink
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
variety to message types (request/response) available to interact with job manager / cluster:
● RequestNumberRegisteredTaskManager● RequestTotalNumberOfSlots● SubmitJob● CancelJob● RequestPartitionState● RequestJobStatus● RequestRunningJobs● RequestRunningJobsStatus● RequestJob● RequestRegisteredTaskManagers● RequestStackTrace● RequestJobManagerStatus● AccumulatorMessage (RequestAccumulatorResultsStringified,...)● ...
Apache Flinkshort feedback circuit & consistent monitoring (impl)
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
akka
FlinkMetricsCollector RunningJobsManagerspawns
queriesJobManager
JobMetricsCollector
spawns for each job
queriesJobManager
Apache Flinkshort feedback circuit & consistent monitoring (impl)
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
akka
public void preStart() throws Exception { context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, new RequestAccumulatorResults(this.jobId), context().dispatcher(), getSelf() ); } AccumulatorResultsFound
public void preStart() throws Exception {
context().system().scheduler().schedule( FiniteDuration.Zero(), FiniteDuration.apply(5, TimeUnit.SECONDS), this.remoteJobManagerRef, JobManagerMessages.getRequestRunningJobsStatus(), context().dispatcher(), getSelf() ); }
receive RunningJobsStatus
extract job identifier
start job metrics collector
RunningJobsManager
JobMetricsCollector
Apache Flinkmetrics retrieval through accumulators
D E M O
https://nifi.apache.org/
Apache Flinkhow to move on
deploy metrics
under construction
Apache Flinktopology definition & deployments (integration points)
akka
Martin Grandjean - http://www.martingrandjean.ch/wp-content/uploads/2013/10/Graphe3.png (cc by-sa 3.0)
no-code topology definition
self dependent deployments immediate deployments
expects code
requires far too much framework
modifications
the place to be
https://nifi.apache.org/
metricsdeploy
Apache Flinkrelevance
Static DataStatic Queries
Static DataDynamic Queries
Dynamic DataStatic Queries
Dynamic DataDynamic Queries
SQL
https://nifi.apache.org/
metricsdeploy
Apache Flinkapache zeppelin points the right direction
Static DataStatic Queries
Static DataDynamic Queries
Dynamic DataStatic Queries
Dynamic DataDynamic Queries
SQL
http://www.ottogroup.com/en/karriere/
We are hiring!