Upload
till-rohrmann
View
585
Download
0
Embed Size (px)
Citation preview
2
Original creators of Apache
Flink®
Providers of
dA Platform 2, including
open source Apache Flink +
dA Application Manager
Overview
Apache Flink 1.3 – Previously on Apache
Flink
Apache Flink 1.4 – What’s happening now?
Apache Flink 1.5+ – Next on Apache Flink
3
Previously on Apache Flink
Apache Flink 1.3
Apache Flink 1.3 in Numbers
141 contributors (no deduplication)
1400 commits
>= 680 resolved JIRA issues
+261813 / -65646 LOC
5
Evolution of Flink’s API
6
Flink 1.0.0
State API (ValueState
ReducingState, ListState)
Flink 1.1.0
Session Windows
Late arriving events
Flink 1.2.0
ProcessFunction (access
to state, timers, events)
Flink 1.3.0
Side outputs
Access to per-window state
Side Outputs
Additional outputs for a stream
Late events
Corrupted input data
More expressive APIs
FLINK-4460
7
Process
Function
Main output
Side output
Side Outputs: Example
8
DataStream<Integer> input = ...;final OutputTag<String> outputTag = new OutputTag<String>("side-output"){};
SingleOutputStreamOperator<Integer> mainDataStream = input.process(new ProcessFunction<Integer, Integer>() {
@Override public void processElement(Integer value,Context ctx,Collector<Integer> out) throws Exception {
// emit data to regular outputout.collect(value);
// emit data to side outputctx.output(outputTag, "sideout-" + String.valueOf(value));
}});
DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);
Evolution of Large State Handling
9
Flink 1.0.0
RocksDB for out-of-core
state support
Flink 1.1.0
Fully async RocksDB
snapshots
Flink 1.2.0
Rescalable keyed and
non-partitioned state
Flink 1.3.0
Incremental checkpoints
Fine-grained recovery
GHCD
Full Checkpoints
10Checkpoint 1 Checkpoint 2 Checkpoint 3
IE
ABCD
ABCD
AFCDE
@t1 @t2 @t3
AFC
DE
GHC
DIE
GHCD
Incremental Checkpoints
11Checkpoint 1 Checkpoint 2 Checkpoint 3
IE
ABCD
ABCD
AFCDE
EF
GHI
@t1 @t2 @t3
Incremental Checkpoints
12
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3Chunk
4Storage
C2 C4C3
Incremental Checkpointing Contd.
Currently supported for RocksDBstate backend
FLINK-5053
Faster and smaller checkpoints
13
Full checkpoint Incremental checkpoint
Size 60 GB 1 – 30 GB
Time 180 s 3 – 30 s
“A Look at Flink’s Internal
Data Structures and
Algorithms for Efficient
Checkpointing” by Stefan
Richter, Tomorrow @
12:20 pm Maschinenhaus
Evolution of High Level APIs
14
Flink 1.0.0
CEP library added
Table API v1
Flink 1.1.0
Table API overhaul
Integration with Apache Calcite
Flink 1.2.0
Tumbling, sliding and session
group-windows for Table API
Flink 1.3.0
Rescalable CEP operators
Retractions in Table API/SQL
Enriched CEP Language
Support for quantifiers (+, *, ?)
FLINK-3318
Iterative conditions
FLINK-6197
Not operator
FLINK-3320
15
“Complex Event Processing With
Flink: The State of FlinkCEP” by
Kostas Kloudas, Today @ 2:30
pm Maschinenhaus
CEP: Detect Dipping Stocks
16
DataStream<Stock> stocks = …;
Pattern<Stock, ?> pattern = Pattern.<Stock>begin("rising").where(new IterativeCondition<Stock>() {
@Overridepublic boolean filter(Stock stock, Context<Stock> ctx) throws Exception {
// calculate the average pricedouble sum = 0.0; int count = 0;for (Stock previousStock : ctx.getEventsForPattern("rising")) {
sum += previousStock.getPrice(); count++;}// only accept if the price is higher or equal than the average pricereturn stock.getPrice() >= sum / count;
}).oneOrMore().next("falling");
PatternStream<Stock> dippingStocks = new PatternStream<>(stocks.keyBy("name"), pattern);DataStream<String> namesOfDippingStocks = dippingStocks.select(…);
What’s Happening Now?
Apache Flink 1.4
Event Driven I/O
18
Rework of Flink’s network stack
Event driven network I/O
Use full available capacity
Near perfect latency behaviour
TCP
Buffer
capacity left
flush
Flow Control
Flow control for TaskManager communication
Single channel no longer stalls other multiplexed channels
Fine-grained backpressure control
Improves checkpoint alignments
19
“Building a Network Stack
for Optimal Throughput /
Low-Latency Trade-Offs”
by Nico Kruber, Today @
2:00 pm Palais Atelier
Receiver
Sender #1
Sender #2
Give credit
Send
credited data
New Deployment Model
Rework of Flink’s distributed architecture
Ready for multitude of deployment scenarios
Support for dynamic scaling
20
“Flink in Containerland” by
Patrick Lucas, Tomorrow
@ 3:20 pm Maschinenhaus
Producing Exactly Once with Kafka 0.11
Support for Kafka 0.11
First Kafka producer with exactly once processingguarantees
21
“Hit Me, Baby, Just One Time
– Building End-to-End Exactly
Once Applications With Flink”
by Piotr Nowojski, Today @
3:20 pm Palais Atelier
Consuming Producing
End-to-End exactly once processing
StreamSQL and Table API
Support for retractions
Extended aggregation support
Support for external table catalogs
Window joins
22
“Unified Stream and Batch
Processing With Apache
Flink’s Relational APIs” by
Fabian Hüske, Tomorrow
@ 11:00 am Kesselhaus
“From Streams to Tables
and Back Again: A Demo
of Flink’s Table & SQL
API” by Timo Walther,
Tomorrow @ 11:50 am
Kesselhaus
Operational Robustness
Drop Java 7
Support Scala 2.12
Avoid dependency hell
Child first class loading
Relocation of dependencies
De-Hadoopification
23
Next on Apache Flink
Apache Flink 1.5+
Side Inputs
Additional input for operator
Join with static data set
Feeding of externally trained ML model
Window joins
Flip-17 design document: https://goo.gl/W4yMEu
25
Process
Function
Main input
Side input
State Management & Evolution
Eager state declaration
State type, serializer and name known at pre-flight time
Flip-22 design document: https://goo.gl/trFiSi
Evolving existing state
Schema updates
Serializer upgrades
26
“Managing State in
Apache Flink” by
Tzu-Li Tai, Today @
4:30 pm Kesselhaus
State Replication
Replicate state between
TaskManagers
Faster recovery in
case of failures
High throughput
queryable state
27
TaskManager
TaskManager
Change log stream
Input
State
Programmatic Job Control
Improve client to give better job control
Run concurrent jobs from the same
program
Trigger savepoints programmatically
Better testing facilities
28
JobClient & ClusterClient
29
StreamExecutionEnvironment env = ...;// define program
JobClient jobClient = env.execute();
CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath);
// wait for the savepoint completionsavepointFuture.get();
CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture();
// cancel the jobjobClient.cancelJob();
// get the execution result --> should be canceledJobExecutionResult result = resultFuture.get();
// get list of all still running jobs on the clusterClusterClient clusterClient = jobClient.getClusterClient();CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos();List<JobInfo> jobInfos = jobInfosFuture.get();
TL;DL
Apache Flink one of the most innovative open source stream processing platforms
Stay tuned what’s happening next
Visit the in depths talks to learn more about Flink’s internals
30
31
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
We are hiring!
data-artisans.com/careers
32