31
1 Timo Walther Apache Flink PMC @twalthr With slides from Fabian Hueske Flink Meetup @ Amsterdam, March 2nd, 2017 Table & SQL API unified APIs for batch and stream processing

Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Embed Size (px)

Citation preview

Page 1: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

1

Timo WaltherApache Flink PMC

@twalthr

With slides from Fabian Hueske

Flink Meetup @ Amsterdam, March 2nd, 2017

Table & SQL APIunified APIs for batch and stream processing

Page 2: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

2

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 3: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Motivation

3

Page 4: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

DataStream API is not for Everyone

4

§ Writing DataStream programs is not easy• Stream processing technology spreads rapidly

§ Requires Knowledge & Skill• Stream processing concepts (time, state, windows, ...)• Programming experience (Java / Scala)

§ Program logic goes into UDFs• great for expressiveness• bad for optimization - need for manual tuning

Page 5: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Why not a Relational API?

5

§ Relational APIs are declarative• User says what is needed• System decides how to compute it

§ Users do not specify implementation

§ Queries are efficiently executed

§ “Everybody” knows SQL!

Page 6: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Goals

§ Flink is a platform for distributed stream and batch data processing

§ Relational APIs as a unifying layer• Queries on batch tables terminate and produce a finite result• Queries on streaming tables run continuously and produce

result stream

§ Same syntax & semantics for both queries

6

Page 7: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Table API & SQL

7

Page 8: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Table API & SQL

§ Flink features two relational APIs• Table API: LINQ-style API for Java & Scala (since Flink 0.9.0)• SQL: Standard SQL (since Flink 1.1.0)

§ Equivalent feature set (at the moment)• Table API and SQL can be mixed

§ Both are tightly integrated with Flink’s core APIs• DataStream• DataSet

8

Page 9: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Table API Example

9

val sensorData: DataStream[(String, Long, Double)] = ???

// convert DataSet into Tableval sensorTable: Table = sensorData.toTable(tableEnv, 'location, ’time, 'tempF)

// define query on Tableval avgTempCTable: Table = sensorTable.window(Tumble over 1.day on 'rowtime as 'w).groupBy('location, ’w).select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC)

.where('location like "room%")

Page 10: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

SQL Example

10

val sensorData: DataStream[(String, Long, Double)] = ???

// register DataStreamtableEnv.registerDataStream("sensorData", sensorData, 'location, ’time, 'tempF)

// query registered Tableval avgTempCTable: Table = tableEnv.sql(""" SELECT FLOOR(rowtime() TO DAY) AS day, location,

AVG((tempF - 32) * 0.556) AS avgTempCFROM sensorDataWHERE location LIKE 'room%'GROUP BY location, FLOOR(rowtime() TO DAY) """)

Page 11: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Architecture

2 APIs [SQL, Table API] *

2 backends [DataStream, DataSet]=

4 different translation paths?

11

Page 12: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Architecture

12

Page 13: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Architecture§ Table API and SQL queries

are translated into common logical plan representation.

§ Logical plans are translated and optimized depending on execution backend.

§ Plans are transformed into DataSet or DataStream programs.

13

Page 14: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Translation to Logical Plan

14

sensorTable.window(Tumble over 1.day on 'rowtime as 'w).groupBy('location, ’w).select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC)

.where('location like "room%")

Page 15: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Translation to Optimized Plan

15

Page 16: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Translation to Flink Program

16

Page 17: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Current State (in master)

§ Batch SQL & Table API support• Selection, Projection, Sort, Inner & Outer Joins, Set operations• Windows for Slide, Tumble, Session

§ Streaming Table API support• Selection, Projection, Union• Windows for Slide, Tumble, Session

§ Streaming SQL• Selection, Projection, Union, Tumble, but …

17

Page 18: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Use Cases for Streaming SQL§ Continuous ETL & Data Import

§ Live Dashboards & Reports

§ Ad-hoc Analytics & Exploration

18

Page 19: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Outlook: Dynamic Tables

19

Page 20: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Dynamic Tables

§ Dynamic tables change over time§ Dynamic tables are treated like static batch tables

• Dynamic tables are queried with standard SQL• A query returns another dynamic table

§ Stream ←→ Dynamic Table conversions without information loss• “Stream / Table Duality”

20

Page 21: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Stream to Dynamic Tables

§ Append:

§ Replace by key:

21

Page 22: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Querying Dynamic Tables

§ Dynamic tables change over time• A[t]: Table A at time t

§ Dynamic tables are queried with regular SQL• Result of a query changes as input table changes• q(A[t]): Evaluate query q on table A at time t

§ Query result is continuously updated as t progresses• Similar to maintaining a materialized view• t is current event time

22

Page 23: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Querying Dynamic Tables

23

Page 24: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Querying Dynamic Tables

§ Can we run any query on Dynamic Tables? No!

§ State may not grow infinitely as more data arrives• Set clean-up timeout or key constraints.

§ Input may only trigger partial re-computation

§ Queries with possibly unbounded state or computation are rejected

24

Page 25: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Dynamic Tables to Stream

§ Update:

25

Page 26: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Dynamic Tables to Stream

§ Add/Retract:

26

Page 27: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Result computation & refinement

27

Page 28: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

Contributions welcome!

§ Huge interest and many contributors• Adding more window operators• Introducing dynamic tables

§ And there is a lot more to do• New operators and features for streaming and batch• Performance improvements• Tooling and integration

§ Try it out, give feedback, and start contributing!28

Page 29: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

29

One day of hands-on Flink training

One day of conference

Tickets are on sale

Please visit our website:http://sf.flink-forward.org

Follow us on Twitter:

@FlinkForward

Page 30: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

We are hiring!data-artisans.com/careers

Page 31: Timo Walther - Table & SQL API - unified APIs for batch and stream processing

31

Thank you!@twalthr@ApacheFlink@dataArtisans