kafka-steaming-data

Preview:

Citation preview

KafkaStreaming Data Platform

Traditional Messaging System• Queue• Topic• After Consumed Removed• Out of order messaging

What is Kafka• Messaging system• Polyglot Consumers / Producers• Topics and Partitions• Scalable• Configurable Message Retention• Guaranteed order

Topic

Use Cases• Ordered Messaging• Log Aggregation• Metrics• Web Activity Tracking• Stream Processing

Kafka Brokers – Clusters and Replication• Topics can be replicated• Data stored across various nodes• Kafka clusters require broker.id=0• Zookeeper• Offsets• Topic names• partitions

Demo – Local Kafka• Startup zookeeper• bin/zookeeper-server-start.sh config/zookeeper.properties

• Start kafka• bin/kafka-server-start.sh config/server.properties

Demo Command line tools• bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-

factor 1 --partitions 1 --topic test• bin/kafka-topics.sh --list --zookeeper localhost:2181• bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test• bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic

test --from-beginning

Example Producer• <CODE>

Example Consumer• <CODE>

Deployment Options• Stand alone deployment • Confluent.io• Horton Works• AWS

HortonWorks Data Platform on AWS

Big Data in a one stop shop

Determine Cluster Sizing• Implement a producer and consumer• Use your data structures• 3 Zookeeper nodes and 3 Kafka nodes• Java Heap = 2GB• Network Saturation (1 gigabit / 10 gigabit)• Avro Data Serialization

Producer for testing throughput• <CODE>

Architectural Possibilities• Streaming data platform• Common interface• High throughput

WARNING• Kafka 0.8.x has a major bug…deletes data• Make sure to use 0.9.0.x

Question & Answerbryancjacobs@gmail.com

Recommended