10
Structured Streaming Using Spark 2.1 BY YASWANTH YADLAPALLI

Structured Streaming Using Spark 2.1

  • Upload
    sigmoid

  • View
    283

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structured Streaming Using Spark 2.1

Structured Streaming Using Spark 2.1BY YASWANTH YADLAPALLI

Page 2: Structured Streaming Using Spark 2.1

What is Structured Streaming?

Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine.

Express streaming computation the same way as a batch computation on static data.

The Spark SQL engine takes care of running it incrementally and continuously and updating the final result as streaming data continues to arrive.

Use the Dataset/DataFrame API to express streaming aggregations, event-time windows, stream-to-batch joins, etc.

Structured Streaming is still ALPHA in Spark 2.1 and the APIs are still experimental

Page 3: Structured Streaming Using Spark 2.1

Programing Model

Page 4: Structured Streaming Using Spark 2.1

Output modes

There are 3 output modes in Spark Structured streaming

Append mode - Only the new rows added to the Result Table since the last trigger will be outputted to the sink. This is supported for only those queries where rows added to the Result Table is never going to change

Complete mode - The whole Result Table will be outputted to the sink after every trigger. This is supported for aggregation queries.

Update mode - Only the rows in the Result Table that were updated since the last trigger will be outputted to the sink.

Page 5: Structured Streaming Using Spark 2.1

Windowing

Page 6: Structured Streaming Using Spark 2.1

Windowing

import spark.implicits._

val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }

// Group the data by window and word and compute the count of each group

val windowedCounts = words.groupBy( window($"timestamp", "10 minutes", "5 minutes"), $"word" ).count()

Page 7: Structured Streaming Using Spark 2.1

Delayed events

Page 8: Structured Streaming Using Spark 2.1

Watermarking

Page 9: Structured Streaming Using Spark 2.1

Watermarking and output mode

Page 10: Structured Streaming Using Spark 2.1

Watermarking and output mode