13
Kafka and Spark Streaming PRESENTER PRAJOD VETTIYATTIL ARCHITECT OPEN SOURCE, BIG DATA LOCATION SPARK MEETUP, APIJEE, BANGALORE DATE 1 AUGUST 2015 1

Kafka and Spark Streaming

Embed Size (px)

Citation preview

Page 1: Kafka and Spark Streaming

Kafka and

Spark StreamingPRESENTER

PRAJOD VETTIYATTIL

ARCHITECT

OPEN SOURCE, BIG DATA

LOCATION

SPARK MEETUP, APIJEE,

BANGALORE

DATE

1 AUGUST 2015

1

Page 2: Kafka and Spark Streaming

Agenda

Kafka

Concepts

How to use

When to use

Spark Steaming

Micro batches

Time window API

Reliability

2

Page 3: Kafka and Spark Streaming

Kafka

3

Page 4: Kafka and Spark Streaming

Kafka concepts

Message Oriented Middleware

Brokers

Topics

Producers

Consumers

4

Page 5: Kafka and Spark Streaming

Kafka broker

Receive messages from producers

Persist messages to log files

Retrieve messages for consumers

Zookeeper

Coordination

State management

Cluster configuration management

Storm, HBase

5

Page 6: Kafka and Spark Streaming

Topics

Topics and Queues

Consumers and Consumer groups

Consumer offset

Partitions

One log for each partition

6

Page 7: Kafka and Spark Streaming

Producers

Send message to topics

Specify the partition, use custom partitioner or send to a random

partition

Zookeeper reference is needed

Messages are sent to the kafka leader in a cluster

7

Page 8: Kafka and Spark Streaming

Consumers

Consume messages from kafka brokers

Simple API

High Level API

Read from any position

Offset: position of message in a partition

Consumers are mapped to partitions

8

Page 9: Kafka and Spark Streaming

Spark streaming

9

Page 10: Kafka and Spark Streaming

Micro batches

Micro batch: A set of messages in a time window

Discretized streams

Receiver thread

Processing thread

Access to Spark platform API(GraphX, MLlib)

10

Page 11: Kafka and Spark Streaming

API

Receivers: Kafka, Flume, Kinesis, Twitter, Socket, ZeroMQ

Time window API: function performs once per batch

Receiver API

Direct API

11

Page 12: Kafka and Spark Streaming

Reliability

Exaclty once semantics with HDFS

Atleast once semantics with Receivers

Checkpointing

12

Page 13: Kafka and Spark Streaming

References

Kafka

http://kafka.apache.org/documentation.html

Install and run an example http://kafka.apache.org/documentation.html#quickstart

https://github.com/abhioncbr/Kafka-Message-Server/wiki/Apache-of-Kafka-Architecture-(As-per-Apache-Kafka-0.8.0-Dcoumentation)

Spark Streaming

http://spark.apache.org/docs/latest/streaming-programming-guide.html

Other references

http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617

http://www.slideshare.net/prajods/apache-spark-the-next-gen-toolset-for-big-data-processing

13